NLP for Chinese
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
README.md
dictClassicalModernCharacters.csv
listKaomoji.csv
listPopularExpressions.csv
normTextExtractFeatures.R
segmentTagWeiboPosts.R

README.md

NLP for Chinese

===

As used for the Weibo age profiling task reported at the Language Resources & Evaluation Conference 2016 (Zhang, Caines, Alikaniotis & Buttery, 'Predicting author age from Weibo microblog posts')

Rscripts

normTextExtractFeatures.R
  • normalises Weibo posts and extracts linguistic / non-linguistic features in the process;
  • requires pre-obtained Weibo files: ours were rows of users, columns of posts, Excel files;
  • requires the resources listed below;
  • look for 'CHECK PATHS' comments where you should adapt filepaths to your filesystem accordingly
segmentTagWeiboPosts.R
  • passes normalised texts to Stanford NLP word segmenter and part-of-speech tagger;
  • requires (free) download of Stanford NLP segmenter and pos-tagger from here;
  • look for 'CHECK PATHS' comments where you should adapt filepaths to your filesystem accordingly

Resources