TwitterNER

Twitter named entity extraction for WNUT 2016 http://noisy-text.github.io/2016/ner-shared-task.html and the corresponding workshop paper at WNUT COLING 2016, titled Semi-supervised Named Entity Recognition in noisy-text by Shubhanshu Mishra and Jana Diesner

Please cite as:

@inproceedings{mishra-diesner-2016-semi,
    title = "Semi-supervised Named Entity Recognition in noisy-text",
    author = "Mishra, Shubhanshu  and
      Diesner, Jana",
    booktitle = "Proceedings of the 2nd Workshop on Noisy User-generated Text ({WNUT})",
    month = dec,
    year = "2016",
    address = "Osaka, Japan",
    publisher = "The COLING 2016 Organizing Committee",
    url = "https://aclanthology.org/W16-3927",
    pages = "203--212",
}

Installation

pip install -r requirements.txt
cd data
wget http://nlp.stanford.edu/data/glove.twitter.27B.zip
unzip glove.twitter.27B.zip
cd ..

Usage

$ cd NoisyNLP
$ python
>>> from run_ner import TwitterNER
>>> from twokenize import tokenizeRawTweetText
>>> ner = TwitterNER()
>>> tweet = "Beautiful day in Chicago! Nice to get away from the Florida heat."
>>> tokens = tokenizeRawTweetText(tweet)
>>> ner.get_entities(tokens)
[(3, 4, 'LOCATION'), (11, 12, 'LOCATION')]
>>> " ".join(tokens[3:4])
'Chicago'
>>> " ".join(tokens[11:12])
'Florida'

Data download

The dataset used in this repository can bs downloaded from https://github.com/aritter/twitter_nlp/tree/master/data/annotated/wnut16

Submitted Solution [ST]

See Word2Vec.ipynb for details on the original submitted solution for the task.

Improved model

See Run Experiments.ipynb for the details on the improved system. See Run Experiment.ipynb for the details on the improved system with test data.

Using the API

The final system is packaged as an API specified in the folder NoisyNLP. More updates will be made to the API in upcoming days. See Run Experiment.ipynb for API usage.

Downloading Gazetteers

See Updated Gazetteers.ipynb, Extra Gazetteers.ipynb, Download Wikidata.ipynb

Generating word clusters

See Gen new clusters.ipynb

Data Pre-processing

See Data preprocessing.ipynb

Preliminary comparison with RNN models

See KerasCharRNN.ipynb, and KerasWordRNN.ipynb

Acknowledgements

George Cooper - Making the model available as a python library.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
NoisyNLP		NoisyNLP
brown_clusters_wnut_and_hege		brown_clusters_wnut_and_hege
clark_clusters_wnut_and_hege		clark_clusters_wnut_and_hege
data		data
models		models
word_clusters		word_clusters
word_clusters_wv		word_clusters_wv
.gitattributes		.gitattributes
.gitignore		.gitignore
50mpaths2		50mpaths2
COLING2016-WNUT-Model-Architechture.png		COLING2016-WNUT-Model-Architechture.png
CRFSuite.ipynb		CRFSuite.ipynb
Data Generation - Weka.ipynb		Data Generation - Weka.ipynb
Data preprocessing.ipynb		Data preprocessing.ipynb
Download Wikidata.ipynb		Download Wikidata.ipynb
Dynet - BiLSTM - Viterbi loss + Char + Word Embeds-Pretrained.ipynb		Dynet - BiLSTM - Viterbi loss + Char + Word Embeds-Pretrained.ipynb
Dynet - BiLSTM - Viterbi loss + Char + Word Embeds.ipynb		Dynet - BiLSTM - Viterbi loss + Char + Word Embeds.ipynb
Dynet tutorials.ipynb		Dynet tutorials.ipynb
Exploratory analysis.ipynb		Exploratory analysis.ipynb
Extra Gazetteers.ipynb		Extra Gazetteers.ipynb
Gen new clusters.ipynb		Gen new clusters.ipynb
KerasCharRNN.ipynb		KerasCharRNN.ipynb
KerasWordRNN.ipynb		KerasWordRNN.ipynb
LICENSE		LICENSE
Paper Figures.ipynb		Paper Figures.ipynb
README.md		README.md
Run Experiment.ipynb		Run Experiment.ipynb
Run Experiments.ipynb		Run Experiments.ipynb
Shubhanshu.DeepER.UIUC.WNUT_NER.10_types.txt		Shubhanshu.DeepER.UIUC.WNUT_NER.10_types.txt
Tensorflow RNN.ipynb		Tensorflow RNN.ipynb
Test module.ipynb		Test module.ipynb
Updated Gazetteers.ipynb		Updated Gazetteers.ipynb
WNUT_NER_2016_IM_models.txt		WNUT_NER_2016_IM_models.txt
Word2Vec.ipynb		Word2Vec.ipynb
all_sequences.clark_clusters.32.txt		all_sequences.clark_clusters.32.txt
environment.yml		environment.yml
hege.test.tsv		hege.test.tsv
requirements.txt		requirements.txt
run_experiment.py		run_experiment.py
twitter_ner_wnut_and_hege_model.pkl		twitter_ner_wnut_and_hege_model.pkl
twokenize.py		twokenize.py
vocab.no_extras.txt		vocab.no_extras.txt
vocab.txt		vocab.txt
wnut16_dev		wnut16_dev
wnut16_train		wnut16_train

License

napsternxg/TwitterNER

Folders and files

Latest commit

History

Repository files navigation

TwitterNER

Installation

Usage

Data download

Submitted Solution [ST]

Improved model

Using the API

Downloading Gazetteers

Generating word clusters

Data Pre-processing

Preliminary comparison with RNN models

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Languages