Skip to content

A library to extract bilingual lexicons from Panlex Database

License

Notifications You must be signed in to change notification settings

dylandilu/Panlex-Lexicon-Extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Panlex-Lexicon-Extractor

The script is a tool to extract bilingual lexicon for pair of languages from the Panlex Database released by Panlex.

Extracted Lexicons (English-{target})

Download extracted lexicons for 38 languages.

Usage

The code is written in Python 2.7.

Download the required file of Panlex language information. Put the downloaded file under the folder of 'data'

Download the preprocessed SQLite file of Panlex database, uncompress and put the file under the folder of 'data'

The script accepts 3-digit ISO 639-3 language codes.

python panlex_bilingual_extract.py --source_language=spa\
				   --target_language=eng\
				   --output_directory=data/lexicons

Citation

If you find the lexicon extractor useful, please cite the following paper: Embracing non-traditional linguistic resources for low-resource language name tagging

@inproceedings{zhang2017embracing,
  title={Embracing non-traditional linguistic resources for low-resource language name tagging},
  author={Zhang, Boliang and Lu, Di and Pan, Xiaoman and Lin, Ying and Abudukelimu, Halidanmu and Ji, Heng and Knight, Kevin},
  booktitle={Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
  volume={1},
  pages={362--372},
  year={2017}
}

Contact: Di Lu, lud2@rpi.edu

About

A library to extract bilingual lexicons from Panlex Database

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published