NTU-MC

This is a legacy repository for the STB subcorpora of the Nanyang Technological University - Multilingual Corpus (NTU-MC) project. New editions of NTU-MC are maintained by NTU Computational Linguistics Lab

Spin-offs

NTU-MC Toolkit: An annotation toolkit for multilingual text (supports Arabic, Chinese, Japanese, Korean, Indonesian, Vietnamese and English)
GaChalign: A python implementation of Gale-Church Sentence-level Aligner with variable parameters
Mini-segmenter: A Dictionary based Chinese segmenter
Indotag: Implementation of Pisceldo et al. (2010) Bahasa Indonesian Part of Speech tagger, using 1M word corpus from the Pan Asia Networking Localization Project.

Changelog

NTU-MC v5.1 (26.08.14): Added NTU-MC Toolkit
NTU-MC v5.0 (29.04.13): Better cleaning with titles
NTU-MC v4.1 (08.04.13): Scheduled release.
NTU-MC v4.0 (27.01.13): Re-clean and retagged from scratch.
NTU-MC v3.0 (01.05.12): Scheduled release for IJALP
NTU-MC v2.0 (20.08.11): Cleaned and sentence aligned.
NTU-MC v1.0 (01.05.11): Foundation text.

References

Please cite the following when using the data/scripts from the NTU-MC:

@inproceedings{ntumc2011,
  author    = {Liling Tan and
               Francis Bond},
  title     = {Building and Annotating the Linguistically Diverse NTU-MC
               (NTU-Multilingual Corpus)},
  booktitle = {PACLIC},
  year      = {2011},
  pages     = {362-371},
  ee        = {http://www.aclweb.org/anthology/Y11-1038},
}

Liling Tan. 2011. Building the foundation text for Nanyang Technological University - Multilingual Corpus (NTU-MC).. Bachelor Final Year Project. Nanyang Technological University: Singapore.
Liling Tan and Francis Bond. 2012. Building and annotating the linguistically diverse NTU-MC (NTU-multilingual corpus). International Journal of Asian Language Processing, 22(4):161–174
Liling Tan and Francis Bond. 2014. NTU-MC Toolkit: Annotating a Linguistically Diverse Corpus. In Proceedings of 25th International Conference on Computational Linguistics (COLING 2014). Dublin, Ireland.

Other References:

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
ntumc		ntumc
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NTU-MC

Spin-offs

Changelog

References

About

Uh oh!

Releases

Packages

Languages

alvations/NTU-MC

Folders and files

Latest commit

History

Repository files navigation

NTU-MC

Spin-offs

Changelog

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages