The script is a tool to extract bilingual lexicon for pair of languages from the Panlex Database released by Panlex.
Download extracted lexicons for 38 languages.
The code is written in Python 2.7.
Download the required file of Panlex language information. Put the downloaded file under the folder of 'data'
Download the preprocessed SQLite file of Panlex database, uncompress and put the file under the folder of 'data'
The script accepts 3-digit ISO 639-3 language codes.
python panlex_bilingual_extract.py --source_language=spa\
--target_language=eng\
--output_directory=data/lexicons
If you find the lexicon extractor useful, please cite the following paper: Embracing non-traditional linguistic resources for low-resource language name tagging
@inproceedings{zhang2017embracing,
title={Embracing non-traditional linguistic resources for low-resource language name tagging},
author={Zhang, Boliang and Lu, Di and Pan, Xiaoman and Lin, Ying and Abudukelimu, Halidanmu and Ji, Heng and Knight, Kevin},
booktitle={Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
volume={1},
pages={362--372},
year={2017}
}
Contact: Di Lu, lud2@rpi.edu