Part-of-Speech Tagger

For my code and more details, please see my notebook here

Data

The CoNLL dataset consists of roughly 10,000 sentences, with each word having a part-of-speech tag assigned to it, like the following:

Token	POS
When	WRB
bank	NN
financing	NN
for	IN
the	DT
buy-out	NN
collapsed	VBD
last	JJ
week	NN
,	,
so	RB
did	VBD
UAL	NNP
's	POS
stock	NN
.	.

A complete look-up table for each part-of-speech tag can be found here.

Model Architecture

According to Wikipedia, "words that are assigned to the same part of speech generally display similar syntaxic behavior (they play similar roles within the grammatical structure of sentences)". This means that the POS of a word depends on its role in the current sentence. Consider the word "right" in the following two sentences: "This is the right (JJ) answer" vs. "You have the right (NN) to remain silent"). By itself, you could not assign the correct part of speech to it, but only with the help of the rest of the sentence. Hence, we want to use whole sequences as a model's input, not just individual words. I decided to go for a very simple approach and use a vanilla RNN.

An overview of the complete architecture I used can be seen here:

One might wonder what a convolutional layer is doing in this architecture. If you have a look at the output of the RNN model, you see that its size is (sequence_length, hidden_size). However, the final output should be of size (sequence_length, num_classes), so that I end up with a probability for each step in the sequence and for POS tag. This computation cannot be done by a standard dense layer, since it would squeeze the dimension regarding sequence length. In Keras, there's what is called Time-Distributed Dense (TDD), which is a dense layer but considering time steps. Unfortunately, there's no implementation of it in PyTorch. However, a possible subsitution for it can be a convultional layer. One important characteristic of a Time-Distributed Dense is that it applies the same weights at each time step – just like a convolutional layer does with the help of its kernel. Below, a more detailed visualization of what the convolutional layer does.

Results

After training for around 40 epochs, the model achieved 0.8890 and 0.8667 top-1 accuracy on the validation and test set, respectively. The top-1 accuracy for each class looks like this:

An in-depth written-up analysis as well as further ideas for improvement can be found in the final sections of my notebook

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
data		data
images		images
utils		utils
README.md		README.md
postagger.ipynb		postagger.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

images

images

utils

utils

README.md

README.md

postagger.ipynb

postagger.ipynb

requirements.txt

requirements.txt

Repository files navigation

Part-of-Speech Tagger

Data

Model Architecture

Results

About

Releases

Packages

Languages

HeleneFabia/pos-tagger

Folders and files

Latest commit

History

Repository files navigation

Part-of-Speech Tagger

Data

Model Architecture

Results

About

Resources

Stars

Watchers

Forks

Languages