Skip to content

This is the distribution point for the NUS SMS Corpus as described and updated from This is a corpus of SMS (Short Message Service) messages collected for research at the Department of Computer Science at the National University of Singapore. This dataset consists of 67,093 SMS messages taken from the corpus on Mar 9, 2015. The messages largely …

Notifications You must be signed in to change notification settings

WING-NUS/nus-sms-corpus

 
 

Repository files navigation

NUS SMS Corpus

Due to some technicial problems, the NUS SMS Corpus website http://wing.comp.nus.edu.sg/SMSCorpus is temporally unavailable. For your convenience, we upload the most recent release (Mar 9, 2015) of the corpus here.

Please cite the following paper if you use our corpus. Thanks!

Tao Chen and Min-Yen Kan (2013). Creating a Live, Public Short Message Service Corpus: The NUS SMS Corpus. Language Resources and Evaluation, 47(2)(2013), pages 299-355.

Please do us a favor and send a quick message to Tao Chen (chentaokite @ gmail dot com), if download this corpus and plan on using it. It will only take a minute of your time and will help us get a better idea of what such a corpus might be used for.

Language File Format Size Number of Messages
English SQL 2,045K 55,835
English XML 2,359K 55,835
English JSON 2,740K 55,835
Chinese SQL 979K 31,465
Chinese XML 1,182K 31,465
Chinese JSON 1,700K 31,465

Our dataset has been added to Kaggle! Please consider participating a competition!

Group Members

About

This is the distribution point for the NUS SMS Corpus as described and updated from This is a corpus of SMS (Short Message Service) messages collected for research at the Department of Computer Science at the National University of Singapore. This dataset consists of 67,093 SMS messages taken from the corpus on Mar 9, 2015. The messages largely …

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published