Skip to content

dylandilu/Panlex-Lexicon-Extractor

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 

Panlex-Lexicon-Extractor

The script is a tool to extract bilingual lexicon for pair of languages from the Panlex Database released by Panlex.

Extracted Lexicons (English-{target})

Download extracted lexicons for 38 languages.

Usage

The code is written in Python 2.7.

Download the required file of Panlex language information. Put the downloaded file under the folder of 'data'

Download the preprocessed SQLite file of Panlex database, uncompress and put the file under the folder of 'data'

The script accepts 3-digit ISO 639-3 language codes.

python panlex_bilingual_extract.py --source_language=spa\
				   --target_language=eng\
				   --output_directory=data/lexicons

Citation

If you find the lexicon extractor useful, please cite the following paper: Embracing non-traditional linguistic resources for low-resource language name tagging

@inproceedings{zhang2017embracing,
  title={Embracing non-traditional linguistic resources for low-resource language name tagging},
  author={Zhang, Boliang and Lu, Di and Pan, Xiaoman and Lin, Ying and Abudukelimu, Halidanmu and Ji, Heng and Knight, Kevin},
  booktitle={Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
  volume={1},
  pages={362--372},
  year={2017}
}

Contact: Di Lu, lud2@rpi.edu

About

A library to extract bilingual lexicons from Panlex Database

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published