Skip to content
/ quinn Public

Quinn: Complex Word Identification using Neural Networks (CWI-NN)

License

Notifications You must be signed in to change notification settings

aayux/quinn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Quinn: Complex Word Identification using Neural Networks (CWI-NN)

Information available on the Shared Task (2018) website.

About Model

Bi-directional RNN (GRU) with "masked" soft-attention written in TensorFlow. The attention mask is decided from annotator specified context (see dataset).

Steps

  • Download the Shared Task dataset from the website.

  • Download GloVe dataset from here and copy into respective directories.

  • Generate embeddings and vocabulary with

python utils/generate_embeddings.py -d ./data/embeddings/glove.6B.300d.txt --npy_output ./data/dumps/embeddings.npy --dict_output ./data/dumps/vocab.pckl --dict_whitelist ./data/embeddings/vocab.txt

  • Train with python train.py

  • Test with python test.py --ckptdir=<checkpoint directory> --tsvfile=<test file>, example: python test.py --ckptdir=1528996355 --dataset=WikiNews_Test

References

J. Pennington, R. Socher and C. D. Manning, GloVe: Global Vectors for Word Representation, 2014.

N. S. Hartmann and L. B. dos Santos, NILC at CWI 2018: Exploring Feature Engineering and Feature Learning, 2018.

N. Gillin, Sensible at SemEval-2016 Task 11: Neural Nonsense Mangled in Ensemble Mess, 2016.

Sources

Embeddings helper: rampage644/qrnn


License: MIT