<a href="https://colab.research.google.com/github/AdamVinestock/NLP/blob/main/NLP_Part_of_speech_tagging.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Part-of-speech (POS) tagging

This notebook is about training and evaluating a POS tagger with some real data. The dataset is available through the Universal Dependencies (https://universaldependencies.org/) (UD) project. To get to know the project, please visit https://universaldependencies.org/introduction.html)

In [1]:
!pip install --q conllutils

  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for conllutils (setup.py) ... [?25l[?25hdone


In [2]:
import os
import random
from typing import List

import nltk
from nltk.tag import tnt

import operator

import numpy as np
import pandas as pd
from tabulate import tabulate

import conllutils
from scipy.special import softmax

## Part 1 - Dataset





For each package we set the random seed to 42.

In [3]:
# Set the random seed for Python
random.seed(42)

# Set the random seed for numpy
np.random.seed(42)

# Set the random seed for pandas
# If no random state is passed in pandas, system will inherit np random seed

You can download the dataset files directly from the UD website, but it will let you only download all the languages in one compressed file. In this assignment you will be working with th GUM dataset, which you can download directly from:
https://github.com/UniversalDependencies/UD_English-GUM.
Please download it to your colab machine.

In [4]:
!git clone https://github.com/UniversalDependencies/UD_English-GUM

Cloning into 'UD_English-GUM'...
remote: Enumerating objects: 5045, done.[K
remote: Counting objects: 100% (72/72), done.[K
remote: Compressing objects: 100% (66/66), done.[K
remote: Total 5045 (delta 48), reused 12 (delta 6), pack-reused 4973[K
Receiving objects: 100% (5045/5045), 43.88 MiB | 19.79 MiB/s, done.
Resolving deltas: 100% (4682/4682), done.


We will use the (train/dev/test) files:


```
UD_English-GUM/en_gum-ud-train.conllu
UD_English-GUM/en_gum-ud-dev.conllu
UD_English-GUM/en_gum-ud-test.conllu
```


They are all formatted in the conllu format. You may read about it [here](https://universaldependencies.org/format.html). There is a utility library **conllutils**, which can help you read the data into the memory. It has already been installed and imported above.

We will write code that reads the three datasets into memory. We can choose the data structure ourselves. As you can see, every word is represented by a line, with columns representing specific features.   
We are only interested in the first and fourth columns, corresponding to the word and its POS tag:
- `FORM: The Word`
- `UPOS: Universal part-of-speech tag`

In [5]:
#train_data,test_data = None, None

train_data_conll = conllutils.read_conllu("/content/UD_English-GUM/en_gum-ud-train.conllu")
dev_data_conll = conllutils.read_conllu("/content/UD_English-GUM/en_gum-ud-dev.conllu")
test_data_conll = conllutils.read_conllu("/content/UD_English-GUM/en_gum-ud-test.conllu")

In [6]:
train_data = np.array([
    [(word.get('form'), word.get('upos')) for word in sentence]
    for sentence in train_data_conll
    ], dtype=object)

test_data = np.array([
    [(word.get('form'), word.get('upos')) for word in sentence]
    for sentence in test_data_conll
    ], dtype=object)

dev_data = np.array([
    [(word.get('form'), word.get('upos')) for word in sentence]
    for sentence in dev_data_conll
    ], dtype=object)

## Part 2 - Simpler Tagger

Here we will write a class **simple_tagger**, with methods `train` and `evaluate`.

The method `train` receives the data, and use it for training the tagger.   
In this case, it should learn a simple dictionary that **maps words to tags**, defined as the most frequent tag for every word (in case there is more than one most frequent tag, you may select one of them randomly). The dictionary should be stored as a class member for evaluation.

The method `evaluate` receives the data, and use it to evaluate the tagger performance. Specifically, you should calculate the word and sentence level accuracy.
The evaluation process is simply going word by word, querying the dictionary (created by the train method) for each word’s tag and compare it to the true tag of that word.
 - The word-level accuracy is the number of successes divided by the number of words. For OOV (out of vocabulary, or unknown) words, the tagger should assign the most frequent tag in the entire training set (i.e., the mode).
 - Calculate the sentence-level accuracy by dividing the number of correctly predicted sentences by the total number of sentences in the dataset.

The function should return the two numbers: word level accuracy and sentence level accuracy.

<br>

Notes:  
 - We should avoid using loops except when absolutely necessary!
 - We should use numpy & pandas operations and function. For example, `apply`, `map`, `sum`, `unique`, etc.


In [7]:
class simple_tagger:
  def __init__(self) -> None:
    # TO DO
    self._seperator = '\w'
    self._word2tag = {}
    self._OOV = None
    self._maxtag = None

  def _get_all_tags(self) -> List[str]:
    return list(self._word2tag.values())

  def train(self, data) -> None:
    """ This method first counts the tags for each word
        Then it inserts into _word2tag the highest viewed tag
        It handals ties by choosing a random tag
        It also finds the highest seen tag overall
    """

    tag_count = {}       # Each unique word holds a dict of tag: #times seen
    all_tags_count = {}  # Count of all the tags across all words

    for sentence in data:
      for word in sentence:
        # Update the tag count for the current word
        if tag_count.get(word[0]) == None:
          tag_count[word[0]] = {word[1] : 1}
        elif tag_count[word[0]].get(word[1]) == None:
          tag_count[word[0]][word[1]] = 1
        else:
          tag_count[word[0]][word[1]] += 1

        # Update the count of all tags across all words
        if all_tags_count.get(word[1]) == None:
          all_tags_count[word[1]] = 1
        else:
          all_tags_count[word[1]] += 1

    # Extract the most common tag for each word
    for word, tags in tag_count.items():
      tag_counts = np.array(list(tags.values()))
      max_count = np.max(tag_counts)
      max_indices = np.where(tag_counts == max_count)[0]
      highest_tag = list(tags.keys())[np.random.choice(max_indices)]
      self._word2tag[word] = highest_tag

    # Extract the highest count tag across all words
    tags_count = np.array(list(all_tags_count.values()))
    max_count = np.max(tags_count)
    max_indices = np.where(tags_count == max_count)[0]
    self._maxtag = list(all_tags_count.keys())[np.random.choice(max_indices)]


  def evaluate(self, data) -> tuple[float,float]:
    word_count = 0
    correct_words = 0
    correct_sen = 0
    for sentence in data:
      temp_correct_words = 0
      for word in sentence:
        word_count += 1
        pred_tag = self._word2tag.get(word[0], self._maxtag)
        if word[1] == pred_tag:
          temp_correct_words += 1
      if temp_correct_words == len(sentence):
        correct_sen += 1
      correct_words += temp_correct_words

    word_acc = correct_words/word_count
    sen_acc = correct_sen/data.shape[0]
    return (word_acc, sen_acc)

  def create_pred(self, data):
    """
    Returns prediction on labled data
    """
    total_predictions = []
    for sentence in data:
      sentence_predictions = []
      for word in sentence:
        if self._word2tag.get(word[0]) == None:
          pred_tag = self._maxtag
        else:
          pred_tag = self._word2tag[word[0]]
        sentence_predictions.append((word[0], pred_tag))
      total_predictions.append(sentence_predictions)
    total_pred = np.array(total_predictions, dtype=object)
    return total_pred

**Train & Evaluate**  
Use the class you created to train and evaluate your model.
Save & Print the eveluation scores.

In [8]:
simple_t = simple_tagger()
simple_t.train(train_data)
train_acc = simple_t.evaluate(train_data)
dev_acc = simple_t.evaluate(dev_data)
test_acc = simple_t.evaluate(test_data)

file_name = '{312332372_1}_{209795624_2}_part2.csv'
df_simple_t = pd.DataFrame([[train_acc[0], train_acc[1]], [dev_acc[0], dev_acc[1]], [test_acc[0], test_acc[1]]], columns = ['Word_lvl_simple_tagger', 'sent_lvl_simple_tagger'],
                  index=['train','dev', 'test'])
df_simple_t.to_csv(file_name)

print(f"Simple Tagger word/sentence accuracy over training data: {train_acc[0]:.5f}, {train_acc[1]:.5f}")
print(f"Simple Tagger word/sentence accuracy over dev data: {dev_acc[0]:.5f}, {dev_acc[1]:.5f}")
print(f"Simple Tagger word/sentence accuracy over test data: {test_acc[0]:.5f}, {test_acc[1]:.5f} \n")
print(df_simple_t)

Simple Tagger word/sentence accuracy over training data: 0.93450, 0.42326
Simple Tagger word/sentence accuracy over dev data: 0.88434, 0.28380
Simple Tagger word/sentence accuracy over test data: 0.86624, 0.21168 

       Word_lvl_simple_tagger  sent_lvl_simple_tagger
train                0.934496                0.423257
dev                  0.884342                0.283796
test                 0.866244                0.211679


## Part 3 - Hidden Markov Model (HMM) Tagger

Similar to part 2, we will write the class `hmm_tagger`, which implements HMM tagging.

The method `train` should build the matrices A, B and Pi, from the data as discussed in class.   
The method `evaluate` should find the best tag sequence for every input sentence using he Viterbi decoding algorithm, and then calculate the word and sentence level accuracy using the gold-standard tags.

**Notice:** We will implement the Viterbi algorithm in the next block and call it from your class.

<br>

**Additional Notes:**
1. The matrix B represents the emissions probabilities. Since B is a matrix, you should build a dictionary that maps every unique word in the corpus to a serial numeric id (starting with 0). This way columns in B represents word ids.
2. During the evaluation, one should first convert each word into it’s index and then create the observation array to be given to Viterbi, as a list of ids. OOV words should be assigned with a random tag. To make sure Viterbi works appropriately, you can simply break the sentence into multiple segments every time you see an OOV word, and decode every segment individually using Viterbi.

In [9]:
class hmm_tagger:
  def __init__(self):

    self._Pi = None
    self._A = None
    self._B = None
    self.words_dict = {'UNK' : 0}
    self.tags_dict = {}

  def train(self, data) -> None:
    # Build the transition matrix A
    # Initialize the transition matrix A according to the unique tags values
    # Initialize the words and tags dictionaries:

    tag_id = 0
    word_id = 1
    for sentence in data:
      for word in sentence:
        if word[1] not in self.tags_dict:
          self.tags_dict[word[1]] = tag_id
          tag_id += 1
        if word[0] not in self.words_dict:
          self.words_dict[word[0]] = word_id
          word_id += 1

    self._A = np.zeros((len(self.tags_dict),len(self.tags_dict)))

    # Counting number of appearances of each transition
    for sentence in data:
      for idx, word in enumerate(sentence):
        if idx == 0:
          prev_word_tag = word[1]
        if idx != 0:
          self._A[self.tags_dict[prev_word_tag]][self.tags_dict[word[1]]] += 1
          prev_word_tag = word[1]

   # Calculating probabilities of transition matrix
    for i in range(self._A.shape[0]):
      self._A[i,:] /= np.sum(self._A[i,:])

   # Initialize the B and Pi with zeros
    self._B = np.zeros((len(self.tags_dict),len(self.words_dict)))
    self._Pi = np.zeros((len(self.tags_dict)))

   # Calculating emission and initial state matrices
    for sentence in data:
      for idx,word in enumerate(sentence):
        self._B[self.tags_dict[word[1]],self.words_dict[word[0]]] += 1
        if idx == 0:
          self._Pi[self.tags_dict[word[1]]] += 1

  # Calculating probabilities of emission and initial state matrices
    total_tags_count = np.sum(self._Pi)
    for i in range(self._B.shape[0]):
      self._B[i] /= np.sum(self._B[i,:])
      self._Pi[i] /= total_tags_count

  def flatten(self, l):
    return [item for sublist in l for item in sublist]

  def break_sen(self, sentence):

    # Get a sentence and break it into segments according to OOV words
    # Return list of lists of segments containing word id's and list of the observed tags

    result = []
    segment = []
    obs_tags_sen = []
    for word in sentence:
      if word[0] in self.words_dict:
        word_id = self.words_dict[word[0]]
        segment.append(word_id)
        obs_tags_sen.append(self.tags_dict[word[1]])
      else:        # OOV case
        if len(segment) > 0:
          result.append(segment)
        word_id = 0
        segment = [word_id]
        result.append(segment)
        segment = []
        random_tag = np.random.choice(np.arange(len(self._Pi)), size=1)[0]
        obs_tags_sen.append(random_tag)
    if segment != []:
      result.append(segment)
    result_len = 0

    return result , obs_tags_sen

  def evaluate(self, data) -> tuple[float,float]:

    # Converting words into IDs from the evaluated data:
    word_count = 0
    correct_words = 0
    correct_sen = 0
    obs_words_data = []
    pred_tags_data = []
    for sentence in data:
      broken_sen,  obs_tags_seq = self.break_sen(sentence)
      pred_tags_seq = []
      obs_word_sen = self.flatten(broken_sen)
      word_count += len(obs_word_sen)
      for segment in broken_sen:
        if len(segment) <= 1:
          random_tag = np.random.choice(np.arange(len(self._Pi)), size =1)[0]
          pred_tags_seq = pred_tags_seq + [random_tag]
        else:
          pred_seg = viterbi(segment, self._A, self._B, self._Pi)
          pred_tags_seq = pred_tags_seq + pred_seg
      obs_words_data.append(obs_word_sen)
      pred_tags_data.append(pred_tags_seq)

      if obs_tags_seq == pred_tags_seq:
        correct_sen += 1
      correct_words += np.sum(np.array(obs_tags_seq) == np.array(pred_tags_seq))
    word_acc = correct_words/word_count
    sen_acc = correct_sen/data.shape[0]

    return (word_acc, sen_acc)

  def create_pred(self, data):
    """
    Returns predicted tags on labled data
    """
    total_predictions = []
    for sentence in data:
      sentence_predictions = []
      obs_sentence = []
      broken_sen,  obs_tags_seq = self.break_sen(sentence)
      pred_tags_seq = []
      obs_word_sen = self.flatten(broken_sen)
      words = [list(self.words_dict.keys())[list(self.words_dict.values()).index(word_id)] for word_id in obs_word_sen]
      for segment in broken_sen:
        if len(segment) <= 1:
          pred_tags_seq = pred_tags_seq + [np.argmax(self._Pi)]
        else:
          pred_seg = viterbi(segment, self._A, self._B, self._Pi)
          pred_tags_seq = pred_tags_seq + pred_seg
      for i,tag in enumerate(pred_tags_seq):
        sentence_predictions.append((words[i], list(self.tags_dict.keys())[list(self.tags_dict.values()).index(tag)]))
      total_predictions.append(sentence_predictions)
    total_pred = np.array(total_predictions, dtype=object)
    return total_pred

**Viterbi Algorithm**

Here we implement the `viterbi` function.

Also we will run an example to test the Viterbi algorithm.


In [10]:
def viterbi(observations, A, B, Pi):
  # Number of hidden states
  N = len(A)

  # Length of observations
  T = len(observations)

  # Initialize the Viterbi trellis
  V = np.empty((N,T))
  path = np.empty((N,T))

  # Initialize the first column of Viterbi trellis
  for state in range(N):
    # Initial probability for each state
    V[state,0] = Pi[state] * B[state][observations[0]]
    path[state,0] = state

  # Forward algorithm
  for t in range(1, T):
    for state in range(N):
    # Compute the maximum probability and corresponding previous state
      cur_prob = 0
      best_prob = 0
      best_J = 0
      for j in range(N):
        cur_prob = V[j,t-1]*A[j,state]
        if cur_prob > best_prob:
          best_prob = cur_prob
          best_J = j

      V[state,t] = B[state, observations[t]] * best_prob
      path[state,t] = best_J

  result = [None for _ in range(T)]
  result[T-1] = int(np.argmax(V,axis =0)[T-1])

  for t in reversed(range(T-1)):
    result[t] = path[int(result[t+1]),t+1]

  return result

In [11]:
A = np.array([[0.3, 0.7], [0.2, 0.8]])
B = np.array([[0.1, 0.1, 0.3, 0.5], [0.3, 0.3, 0.2, 0.2]])
Pi = np.array([0.4, 0.6])

assert viterbi([0, 3, 2, 0], A, B, Pi) == [1,1,1,1] # Expected output: 1, 1, 1, 1

**Train & Evaluate**  
We use the class created to train and evaluate the model.
Then we will Save & Print the eveluation scores.


In [12]:
HMM = hmm_tagger()
HMM.train(train_data)

In [13]:
hmm_dev_acc = HMM.evaluate(dev_data)
hmm_test_acc = HMM.evaluate(test_data)

file_name = '{312332372_1}_{209795624_2}_part3.csv'
df_HMM = pd.DataFrame([[hmm_dev_acc[0], hmm_dev_acc[1]], [hmm_test_acc[0], hmm_test_acc[1]]], columns = ['Word_lvl_HMM', 'sent_lvl_HMM'],
                  index=['dev','test'])
df_HMM.to_csv(file_name)

print("HMM accuracy results:")
print(df_HMM)

HMM accuracy results:
      Word_lvl_HMM  sent_lvl_HMM
dev       0.857894      0.238138
test      0.833028      0.186131


## Part 4 - NLTK Tagger

We will compare the results obtained from both taggers and a MEMM (Maximum-entropy Markov model) tagger, implemented by `NLTK` (a known NLP library), over both datasets - dev & test.

In [14]:
# Data Preparation# Data has been prepared in a way that does'nt need any more preproccessing
# will take some time
tnt_pos_tagger = tnt.TnT()
tnt_pos_tagger.train(train_data)

In [15]:
def tnt_acc(data):
  """
  This method calculates words/sentence accuracy over the TnT MEMM tagger
  It takes data structured as a list of lists holding tuple of (word, tag)
  Returns tuple of (words, sentence) accuracy of predicted tags
  """
  correct_sentences = 0
  correct_words = 0
  total_words = 0

  for sentence in data:
    words = [word for word, _ in sentence]     # Extract the words from the sentence
    pred = tnt_pos_tagger.tag(words)
    correct_words += sum([1 for i in range(len(sentence)) if pred[i] == sentence[i]])
    if pred == sentence:
        correct_sentences += 1
    total_words += len(words)

  word_acc = correct_words/total_words
  sen_acc = correct_sentences/data.shape[0]
  return (word_acc, sen_acc)

Evaluating the `NLTK` tagger on the train & test datasets.  
Save & Print the eveluation scores.


In [16]:
dev_acc = tnt_acc(dev_data)
test_acc = tnt_acc(test_data)

In [17]:
file_name = '{312332372_1}_{209795624_2}_part4.csv'
df_tnt_t = pd.DataFrame([[dev_acc[0], dev_acc[1]], [test_acc[0], test_acc[1]]], columns = ['Word_lvl_MEMM', 'sent_lvl_MEMM'],
                  index=['dev','test'])
df_tnt_t.to_csv(file_name)

print(f"MEMM Tagger dev word accuracy: {dev_acc[0]:.5f}")
print(f"MEMM Tagger test word accuracy: {test_acc[0]:.5f}")
print(f"MEMM Tagger dev sentence accuracy: {dev_acc[1]:.5f}")
print(f"MEMM Tagger test sentence accuracy: {test_acc[1]:.5f} \n")
print(df_tnt_t)

MEMM Tagger dev word accuracy: 0.87728
MEMM Tagger test word accuracy: 0.85583
MEMM Tagger dev sentence accuracy: 0.26768
MEMM Tagger test sentence accuracy: 0.20620 

      Word_lvl_MEMM  sent_lvl_MEMM
dev        0.877279       0.267681
test       0.855833       0.206204


# Part 5 - Improved Tagger

Now we will calculate both word level and sentence level accuracy for all the three taggers.

In [18]:
df_combined = pd.concat([df_tnt_t, df_simple_t.iloc[1:,:], df_HMM] , axis=1)
file_name = '{312332372_1}_{209795624_2}_part5.csv'
df_combined.to_csv(file_name)
print(tabulate(df_combined, headers='keys', tablefmt='psql'))

+------+-----------------+-----------------+--------------------------+--------------------------+----------------+----------------+
|      |   Word_lvl_MEMM |   sent_lvl_MEMM |   Word_lvl_simple_tagger |   sent_lvl_simple_tagger |   Word_lvl_HMM |   sent_lvl_HMM |
|------+-----------------+-----------------+--------------------------+--------------------------+----------------+----------------|
| dev  |        0.877279 |        0.267681 |                 0.884342 |                 0.283796 |       0.857894 |       0.238138 |
| test |        0.855833 |        0.206204 |                 0.866244 |                 0.211679 |       0.833028 |       0.186131 |
+------+-----------------+-----------------+--------------------------+--------------------------+----------------+----------------+


**Improved Tagger**  

Base on our general knowlege in the filed of ML, and what we learned in the NLP course so far;  
We will atempt to create our own tagger, and **make sure to improve the scores on the test dataset.**

We performed a short EDA and noticed that around 10% of test data has OOV words.
Our assumption is that creating a more generalized model will improve performance.  We tried 3 approaches:
  1. Creating an ensemble of the 3 models, the prediction is the majority vote.
  2. Use Word2Vec and a linear classifier for tag predictions, the underlying assumption here is that OOV words will be handled better when mapping to latent space.
  3. Improving the HMM model by using Laplace smoothing over transition matrix and handling OOV cases with training data statistics (most common tag).

We found that the 3'rd option yeilded the best results.

In [19]:
class improved_tagger:
  def __init__(self):
    # TO DO
    self._Pi = None
    self._A = None
    self._B = None

    self.words_dict = {'UNK' : 0}
    self.tags_dict = {}


  def train(self, data) -> None:
    # Build the transition matrix A
    # Initialize the transition matrix A according to the unique tags values
    # Initialize the words and tags dictionaries:

    tag_id = 0
    word_id = 1
    for sentence in data:
      for word in sentence:
        if word[1] not in self.tags_dict:
          self.tags_dict[word[1]] = tag_id
          tag_id += 1
        if word[0] not in self.words_dict:
          self.words_dict[word[0]] = word_id
          word_id += 1

    self._A = np.zeros((len(self.tags_dict),len(self.tags_dict)))

    # Counting number of appearances of each transition
    for sentence in data:
      for idx, word in enumerate(sentence):
        if idx == 0:
          prev_word_tag = word[1]
        if idx != 0:
          self._A[self.tags_dict[prev_word_tag]][self.tags_dict[word[1]]] += 1
          prev_word_tag = word[1]

   # Calculating probabilities of transition matrix with laplace smoothing
    K = len(self.tags_dict)
    for i in range(self._A.shape[0]):
      N = np.sum(self._A[i,:])
      self._A[i,:] += 1
      self._A[i,:] /= (N + K)


   # Initialize the B and Pi with zeros

    self._B = np.zeros((len(self.tags_dict),len(self.words_dict)))
    self._Pi = np.zeros((len(self.tags_dict)))

   # Calculating emission and state matrices
    for sentence in data:
      for word in sentence:
        self._B[self.tags_dict[word[1]],self.words_dict[word[0]]] += 1
        self._Pi[self.tags_dict[word[1]]] += 1    # Note that here we created a global prior tag state


  # Calculating probabilities of emission and state matrices
    total_tags_count = np.sum(self._Pi)
    for i in range(self._B.shape[0]):
      self._B[i] /= np.sum(self._B[i,:])
      self._Pi[i] /= total_tags_count

  def flatten(self, l):
    return [item for sublist in l for item in sublist]

  def break_sen(self, sentence):

    # Get a sentence and break it into segments according to OOV words
    # Return list of lists of segments and obs_tags_sen.
    # Notice that if the function return a list with only one item (list) that means there were no OOV words

    result = []
    segment = []
    obs_tags_sen = []
    for word in sentence:
      if word[0] in self.words_dict:
        word_id = self.words_dict[word[0]]
        segment.append(word_id)
        obs_tags_sen.append(self.tags_dict[word[1]])
      else:        # OOV case
        if len(segment) > 0:
          result.append(segment)
        word_id = 0
        segment = [word_id]
        result.append(segment)
        segment = []
        max_tag_pi = np.argmax(self._Pi)
        obs_tags_sen.append(max_tag_pi)
    if segment != []:
      result.append(segment)
    result_len = 0

    return result , obs_tags_sen


  def evaluate(self, data) -> tuple[float,float]:

    # Converting words into IDs from the evaluated data:
    word_count = 0
    correct_words = 0
    correct_sen = 0
    obs_words_data = []
    pred_tags_data = []
    for sentence in data:
      broken_sen,  obs_tags_seq = self.break_sen(sentence)
      pred_tags_seq = []
      obs_word_sen = self.flatten(broken_sen)
      word_count += len(obs_word_sen)
      for segment in broken_sen:
        if len(segment) <= 1:
          pred_tags_seq = pred_tags_seq + [np.argmax(self._Pi)]
        else:
          pred_seg = viterbi(segment, self._A, self._B, self._Pi)
          pred_tags_seq = pred_tags_seq + pred_seg
      obs_words_data.append(obs_word_sen)
      pred_tags_data.append(pred_tags_seq)


      if obs_tags_seq == pred_tags_seq:
        correct_sen += 1
      correct_words += np.sum(np.array(obs_tags_seq) == np.array(pred_tags_seq))
    word_acc = correct_words/word_count
    sen_acc = correct_sen/data.shape[0]

    return (word_acc, sen_acc)

  def create_pred(self, data):
    """
    Returns predicted tags on labled data
    """
    total_predictions = []
    for sentence in data:
      sentence_predictions = []
      obs_sentence = []

      broken_sen,  obs_tags_seq = self.break_sen(sentence)
      pred_tags_seq = []
      obs_word_sen = self.flatten(broken_sen)
      words = [list(self.words_dict.keys())[list(self.words_dict.values()).index(word_id)] for word_id in obs_word_sen]
      for segment in broken_sen:
        if len(segment) <= 1:
          pred_tags_seq = pred_tags_seq + [np.argmax(self._Pi)]
        else:
          pred_seg = viterbi(segment, self._A, self._B, self._Pi)
          pred_tags_seq = pred_tags_seq + pred_seg
      for i,tag in enumerate(pred_tags_seq):
        sentence_predictions.append((words[i], list(self.tags_dict.keys())[list(self.tags_dict.values()).index(tag)]))
      total_predictions.append(sentence_predictions)
    total_pred = np.array(total_predictions, dtype=object)
    return total_pred

**Train & Evaluate**  
Now we will use the class we created to train and evaluate the model.
Save & Print the eveluation scores.


In [20]:
improved_t = improved_tagger()
improved_t.train(train_data)
improved_word_acc_test, improved_sen_acc_test = improved_t.evaluate(test_data)
improved_word_acc_dev, improved_sen_acc_dev = improved_t.evaluate(dev_data)
print(f"Test word accuracy: {improved_word_acc_test}")
print(f"Test sentence accuracy: {improved_sen_acc_test}")

Test word accuracy: 0.9293540231024738
Test sentence accuracy: 0.36405109489051096


## Part 6 - Results

In [21]:
df_improved= pd.DataFrame([[improved_word_acc_dev, improved_sen_acc_dev], [improved_word_acc_test, improved_sen_acc_test]], columns = ['word_lvl_IMPROVED', 'sent_lvl_IMPROVED'],
                  index=['dev','test'])
df_results = pd.concat([df_combined, df_improved] , axis=1)
print(tabulate(df_results, headers='keys', tablefmt='psql'))

+------+-----------------+-----------------+--------------------------+--------------------------+----------------+----------------+---------------------+---------------------+
|      |   Word_lvl_MEMM |   sent_lvl_MEMM |   Word_lvl_simple_tagger |   sent_lvl_simple_tagger |   Word_lvl_HMM |   sent_lvl_HMM |   word_lvl_IMPROVED |   sent_lvl_IMPROVED |
|------+-----------------+-----------------+--------------------------+--------------------------+----------------+----------------+---------------------+---------------------|
| dev  |        0.877279 |        0.267681 |                 0.884342 |                 0.283796 |       0.857894 |       0.238138 |            0.934182 |            0.376007 |
| test |        0.855833 |        0.206204 |                 0.866244 |                 0.211679 |       0.833028 |       0.186131 |            0.929354 |            0.364051 |
+------+-----------------+-----------------+--------------------------+--------------------------+----------------+

In [22]:
file_name = 'part6.csv'
df_results.to_csv(file_name)

<br><br><br>
Good Luck.