NLP with Deep Learning: Tagging with bi-LSTM based models

This project is an adaptation of Guillaume's model for NER. I adapted the model to have more configurable options and to solve more tasks. The additional configurations were added to investigate how they impact performance of the model. Documentation, which includes theoretical description of the model and comparison of the different model configurations , can be found here.

Contribution:

Added multitasking: This version solves POS ,CHUNKING and NER model
Added word2vec: choose between google's word2vec pre-trained word embedding or Stanford's of gloVe word embedding. One can compare the resulting accuracy
Added CNN-char embedding: (a terrible bug persists that needs fixing)

How to Run:

In the terminal run

$ make wembedding

to download the standford glove file and

$ ./word2vec_download_google_model.sh

to download google news word2vec file

Run build_data.py to generate the required files for training.

This generates .txt word embedding files and trimmed versions of them Note: When written, the word2vec to a .txt file is over 10GB in size so the process might take along time depending on your machine.

Configure model in config.py.

Here, you can configure the model to perform one task from POS, CHUNK and NER. You can specify to whether to use word_embedding or not and which type to use, whether to use CRF, character embedding etc, and tune a number of hyperparameters. Note: running build.data.py generates a tag-specific file. So for example, to perform NER one must re-run build.data.py with 'task' in config.py set to NER in order to generate the required tag file.

Run train.py to train the model.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
model		model
README.md		README.md
build_data.py		build_data.py
evaluate.py		evaluate.py
makefile		makefile
train.py		train.py
word2vec_download_google_model.sh		word2vec_download_google_model.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP with Deep Learning: Tagging with bi-LSTM based models

Contribution:

How to Run:

About

Releases

Packages

Languages

Frances255/Tagging-BiLSTM-WordCharEmbeddings-CRF

Folders and files

Latest commit

History

Repository files navigation

NLP with Deep Learning: Tagging with bi-LSTM based models

Contribution:

How to Run:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages