Skip to content
Urban Dict spelling variant dataset. Source code of How to Evaluate Word Representations of Informal Domain?
Jupyter Notebook Python Roff C C++ Shell Other
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
HashtagPrediction
SeqLabeling
UD_Extractor
UrbanDictScraper
calcSim
demo
fig
preprocess
trainEmbedding
.gitignore
README.md
requirements.txt

README.md

Discovering spelling variants on Urban Dictionary

Source code of the paper How to Evaluate Word Representations of Informal Domain?

Scraping data from Urban Dictionary 🎍

  • Scraping data from webpage:
+ scrapy crawl UD
  • Scrapying data via API:
+ scrapy crawl UD_API

Bootstrapping algorithms

UD_Extractor/

self-training based CRF tagging

SeqLabeling/

Embedding pretraining with Tweets

train Word2Vec, FastText, GloVe with tweets data. `trainEmbedding/'

Twitter hashtag prediction task using pretrained embedding

Employ Twitter hashtag prediction downstream task using above pretrained informal word vectors as the extrinsic evaluation. HashtagPrediction/

Analysis

Use Mean Average Precision (MAP) as the intrinsic evaluation rate on word analogy task. Compare the correlations beween the intrinsic and extrinsic tasks. calcSim

Web interface

informal word pair search tool, written in Flask: demo/

You can’t perform that action at this time.