Discovering spelling variants on Urban Dictionary
Source code of the paper How to Evaluate Word Representations of Informal Domain?
- Scraping data from webpage:
+ scrapy crawl UD
- Scrapying data via API:
+ scrapy crawl UD_API
self-training based CRF tagging
Embedding pretraining with Tweets
train Word2Vec, FastText, GloVe with tweets data. `trainEmbedding/'
Twitter hashtag prediction task using pretrained embedding
Employ Twitter hashtag prediction downstream task using above pretrained informal word vectors as the extrinsic evaluation.
Use Mean Average Precision (MAP) as the intrinsic evaluation rate on word analogy task. Compare the correlations beween the intrinsic and extrinsic tasks.
informal word pair search tool, written in Flask: