GitHub - NatashaKSS/part-of-speech-tagger-viterbi: CS4248 NLP Assignment 2 - Implementation of a POS tagger using the Viterbi Algorithm

Natural Language Processing Exploration

Implementation of a Part-of-Speech tagger using the Viterbi algorithm with the Penn Treebank tag set.

Instructions

Running the Viterbi part-of-speech tagger

python build_tagger.py sents.train sents.devt model_file
python run_tagger.py sents.test model_file sents.out

# For 10-fold cross validation
# -- BEWARE this might take some time
python cross_validator.py sents.train

File Structure

.
├── /build_tagger.py         # Executes the training phase of the tagger on sents.train
├── /run_tagger.py           # Executes the viterbi tagger on sents.test
├── /HMMProbGenerator.py     # Generates the model and computes the resulting P(w_i | t_i) and P(t_i | t_i-1) probabilities
├── /PennTreebankPOSTags.py  # Store of all POS tags used
├── /POSTagger.py            # Executes the viterbi & backpointer algorithms to generate the best POS tags
├── /POSTagModelTrainer      # Loads the training data and executes HMMProbGenerator to generate the model
├── /Tokenizer.py             # Tokenizes the training set, test set and dataset used in CrossValidator
├── /cross_validator.py       # Computes the 10-fold cross validation accuracy of the trained model
└── README.md

Thank you!

- END OF README -

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Natural Language Processing Exploration

Instructions

File Structure

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.gitignore		.gitignore
HMMProbGenerator.py		HMMProbGenerator.py
POSTagModelTrainer.py		POSTagModelTrainer.py
POSTagger.py		POSTagger.py
PennTreebankPOSTags.py		PennTreebankPOSTags.py
README.md		README.md
Tokenizer.py		Tokenizer.py
build_tagger.py		build_tagger.py
cross-validation-result		cross-validation-result
cross_valid_investigate_errors.py		cross_valid_investigate_errors.py
cross_validator.py		cross_validator.py
run_tagger.py		run_tagger.py
sents.devt		sents.devt
sents.out		sents.out
sents.test		sents.test
sents.train		sents.train

NatashaKSS/part-of-speech-tagger-viterbi

Folders and files

Latest commit

History

Repository files navigation

Natural Language Processing Exploration

Instructions

File Structure

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages