The objective of this project is to fully understand the structured perceptron algorithm applied to Named Entity Recognition (NER). NER problems are very useful in many contexts, from information retrieval to question answering systems. The goal of this project is not to achieve the best results, but to fully understand all the details about a simple solution.
The solution was developed by the following collaborators:
The repository is organized as follows:
train models.ipynb
: a Juppyter notebook containing all the code required to train the models and store them in fitted models.reproduce_results.ipynb
: a Jupyter a notebook that: loads the data, loads the fitted models from disk and evaluates the models.utils.py
: a Python module that contains all the helper functions for the two previous notebooks.
To reproduce the results obtained by our models, follow these steps:
- Clone this repository:
git clone https://github.com/sarabase/named-entity-recognition.git
- Create a conda environment and install the necessary requirements. Activate the environment:
conda create --name ner_env --file requirements.txt
conda activate ner_env
-
If you want to see how the models are trained, open the train_models.ipynb notebook in Jupyter and execute the cells. This step is optional, since the fitted models are already stored in the repository.
-
If you want to see how the models are evaluated, open the reproduce_results.ipynb notebook in Jupyter and execute the cells. This step is optional, since the results are already stored in the repository.