GitHub - WpnSta/CAS_Mod3_NER: Project for module 3 of CAS NLP

Project for CAS NLP module 3 - Machine learning

Content

The goal of this project is to train a neural network, explore the architecture and parameters and test their influence on model performance. Specifically, I have trained two separate BiLSTM models for Named Entity Recognition (NER) using the relatively small LitBank dataset (literary texts, see https://github.com/dbamman/litbank). The two models use different embeddings: randomly initialised vs. pre-trained FastText embeddings (300d). Both use identical architecture (layers, hidden dimension) and training parameters (dropout, learning rate) to ensure comparison.

Three notebooks:

data_analysis.ipynb: data loading and preprocessing
litbank_ner_fasttext.ipynb: model initialisation, training and evaluation
ner_newtext.ipynb: inference testing using custom texts

Expected outcome: Given the small size dataset, the overall performance of the models remains modest. Nevertheless, the model using FastText embeddings is expected to perform slightly better since it can benefit from the semantic "knowledge" of the pretrained model.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
notebooks		notebooks
results		results
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project for CAS NLP module 3 - Machine learning

Content

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

WpnSta/CAS_Mod3_NER

Folders and files

Latest commit

History

Repository files navigation

Project for CAS NLP module 3 - Machine learning

Content

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages