Skip to content

alvations/NTU-MC

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 

NTU-MC

This is a legacy repository for the STB subcorpora of the Nanyang Technological University - Multilingual Corpus (NTU-MC) project. New editions of NTU-MC are maintained by NTU Computational Linguistics Lab

Spin-offs

  • NTU-MC Toolkit: An annotation toolkit for multilingual text (supports Arabic, Chinese, Japanese, Korean, Indonesian, Vietnamese and English)
  • GaChalign: A python implementation of Gale-Church Sentence-level Aligner with variable parameters
  • Mini-segmenter: A Dictionary based Chinese segmenter
  • Indotag: Implementation of Pisceldo et al. (2010) Bahasa Indonesian Part of Speech tagger, using 1M word corpus from the Pan Asia Networking Localization Project.

Changelog

  • NTU-MC v5.1 (26.08.14): Added NTU-MC Toolkit
  • NTU-MC v5.0 (29.04.13): Better cleaning with titles
  • NTU-MC v4.1 (08.04.13): Scheduled release.
  • NTU-MC v4.0 (27.01.13): Re-clean and retagged from scratch.
  • NTU-MC v3.0 (01.05.12): Scheduled release for IJALP
  • NTU-MC v2.0 (20.08.11): Cleaned and sentence aligned.
  • NTU-MC v1.0 (01.05.11): Foundation text.

References

Please cite the following when using the data/scripts from the NTU-MC:

@inproceedings{ntumc2011,
  author    = {Liling Tan and
               Francis Bond},
  title     = {Building and Annotating the Linguistically Diverse NTU-MC
               (NTU-Multilingual Corpus)},
  booktitle = {PACLIC},
  year      = {2011},
  pages     = {362-371},
  ee        = {http://www.aclweb.org/anthology/Y11-1038},
}

Other References:

About

Nanyang Technological University - Multilingual Corpus (STB subcorpora)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages