Skip to content

apple/ml-codeswitching-translations

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

German and Spanish Targets for Fisher and Miami Codeswitching Datasets

This dataset accompanies the research paper, Towards Real-World Streaming Speech Translation for Code-Switched Speech.

In the paper, we investigate translation of code-switched speech to a third language (i.e., a language not included in the source). To this end, we extend the Fisher and Miami test and validation datasets, which contain English-Spanish codeswitched speech, to include new targets in monolingual Spanish and German.

This dataset extends a codeswitching-focused dataset split accompanying an earlier paper, End-to-End Speech Translation for Code Switched Speech, which can be found here.

Instructions:

  • Please follow the instructions found here.
  • The naming of the files in this dataset indicates which parallel data they belong to. Please note that a small portion of translations are marked as <removed>, these should not be included in evaluation.

License:

Fisher and Miami datasets are licensed differently, please refer to the LICENSE files in the respective subdirectories.

Citation:

If you use this dataset, please cite our paper as follows:

Belen Alastruey, Matthias Sperber, Christian Gollan, Dominic Telaar, Tim Ng, Aashish Agargwal (2023). Towards Real-World Streaming Speech Translation for Code-Switched Speech. EMNLP 2023 Workshop Computational Approaches to Linguistic Code-Switching (CALCS).

About

No description, website, or topics provided.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published