Xlit-Crowd: Hindi-English Transliteration Corpus
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
README.md
crowd_transliterations.hi-en.txt

README.md

Xlit-Crowd: Hindi-English Transliteration Corpus

The corpus contains transliteration pairs for Hindi-English. These pairs were obtained via crowdsourcing by asking workers to transliterate Hindi words into the Roman script. The tasks were done on Amazon Mechanical Turk and yielded a total of 14919 pairs.

The details regarding the dataset are mentioned in the following paper. Kindly cite this paper if you are using this dataset for research:

Mitesh M. Khapra, Ananthakrishnan Ramanathan, Anoop Kunchukuttan, Karthik Visweswariah, Pushpak Bhattacharyya. When Transliteration Met Crowdsourcing : An Empirical Study of Transliteration via Crowdsourcing using Efficient, Non-redundant and Fair Quality Control . Language and Resources and Evaluation Conference (LREC 2014). 2014.

License

Creative Commons License
Xlit-Crowd: Hindi-English Transliteration Corpus by Mitesh Khapra is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.