Skip to content

Hunalign dictionaries generated from Facebook MUSE dictionaries

Notifications You must be signed in to change notification settings

coezbek/hunalign-dict-muse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

hunalign-dict-muse

Dictionaries for hunalign generated from Facebook MUSE dictionaries.

Last updated: 11 Jan 2022

How to manually update the files

In bash run update.sh to trigger a new download and conversion of the Facebook dictionaries.

What happens during conversion:

  • Removes the test and train subset files.
  • Will rename *.txt to *.dict
  • Will put the required delimiter <space>@<space into the files
  • Reverse the order of the words, because hunalign expects the order to be <target language> @ <source language> (the reverse from the way the API and command line utility expect it)

NOTE The filename is <source language>-<target language>.dict

License

Facebook MUSE dictionaries are Creative Commons Attribution-NonCommercial 4.0 International. This repository is just a reformatting of this data and thus has no own license terms.

Please reference the original authors as follows:

[1] A. Conneau, G. Lample, L. Denoyer, MA. Ranzato, H. Jégou, Word Translation Without Parallel Data

@article{conneau2017word,
  title={Word Translation Without Parallel Data},
  author={Conneau, Alexis and Lample, Guillaume and Ranzato, Marc'Aurelio and Denoyer, Ludovic and J{\'e}gou, Herv{\'e}},
  journal={arXiv preprint arXiv:1710.04087},
  year={2017}
}

Related

Other hunalign dictionaries:

  • hunapertium - Hunalign dictionaries generated from Google's Apertium (GPL 3.0)
  • LF Aligner - LF Aligner generates dictionaries from 32 EU languages via hunalign itself

About

Hunalign dictionaries generated from Facebook MUSE dictionaries

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages