Skip to content

antonioribeiro/urban-dictionary-entry-collector

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Urban Dictionary Data Collector

Script used to download the entire Urban Dictionary dataset. Actual dataset is pretty large, so I've split it into four Google Fusion Tables:

Downloading the Data Yourself

If you want to collect your own sample from urban dictionary, this repo includes a few scripts that can help you do just that.

download.js

Main entry downloader. Requires a word list to download entries for. Try grabbing the one from here.

$ npm install

# Pass in a word list file
$ node download.js data/a.txt

This will attempt to download the first 10 definitions for each word in the list into a file data/a.txt. Data is stored in NeDB databases, but you should be able to easily update download.js to output whatever format you need.

gen_csv.py

Simple python script used to turn NeDB dataset from download.js into CSV:

$ python3 gen_csv.py data.db out.csv

gen_md.js

Simple Javascript script used to generate markdown for entries. Used for character level machine learning of urban dictionary entries.

$ node gen_md.js data.db urban.md

Notes

This is for research purposes. I'm not affiliated with Urban Dictionary.

About

Script used to collect entry data from Urban Dictionary

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 74.9%
  • Python 25.1%