## Name: Blake Pritchard
## Date: 2024-05-25


# Building and Evaluating a Hidden Markov Model and a Viterbi Algorithm in NLP

## Overview
This exercise aims to guide you through the process of building and evaluating a Hidden Markov Model (HMM) with a Viterbi algorithm in the field of Natural Language Processing (NLP). We will use the Brown corpus from the NLTK library, focusing on the categories 'news', 'editorial', and 'reviews' with a 'universal' tagset. The purpose is to provide practical experience in implementing these fundamental concepts in NLP and to understand their applications and limitations.



## Preparing the Environment

To set up our NLP environment, we'll first import the necessary libraries. We use NLTK for accessing linguistic data and algorithms, including the Brown corpus, and `train_test_split` from `sklearn.model_selection` for splitting data. The Brown Corpus was the first million-word electronic corpus of English, created in 1961 at Brown University.

[Brown Corpus](https://en.wikipedia.org/wiki/Brown_Corpus)

This corpus contains text from 500 sources, and the sources have been categorized by genre, such as news, editorial, and so on. We need to download the 'brown' corpus and the 'universal_tagset' using the `nltk.download()` command. Use `nltk.download('brown')` to get the corpus and `nltk.download('universal_tagset')` to obtain a simplified version of the part-of-speech tags.

This step ensures we have all necessary components for building and evaluating our models.



In [None]:
#!pip install dill

Collecting dill
  Downloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: dill
Successfully installed dill-0.3.8


In [None]:
import nltk
import sklearn
import dill

In [None]:
nltk.download('brown')
nltk.download('universal_tagset')

[nltk_data] Downloading package brown to /root/nltk_data...
[nltk_data]   Package brown is already up-to-date!
[nltk_data] Downloading package universal_tagset to /root/nltk_data...
[nltk_data]   Package universal_tagset is already up-to-date!


True

In [None]:
from nltk.corpus import brown

from sklearn.model_selection import train_test_split
from nltk.tag import HiddenMarkovModelTrainer


## Loading and Exploring the Data

Load the 'Brown' corpus, focusing on specific categories: 'news', 'editorial', and 'reviews'. We'll use the 'universal' tagset for a more generalizable analysis. Utilize `brown.tagged_sents(categories=['news', 'editorial', 'reviews'], tagset='universal')` to load the data.

The **universal** tagset is a simplified schema developed to facilitate the comparison of grammatical categories across different languages. This tagset includes categories like:

    NOUN (noun)
    VERB (verb)
    ADJ (adjective)
    ADV (adverb)
    PRON (pronoun)
    DET (determiner, includes articles and quantifiers)
    ADP (adposition, includes prepositions and postpositions)
    NUM (numeral)
    CONJ (conjunction)
    PRT (particle, includes small function words like 'to' that are not clearly categorized under the above)
    . (punctuation)
    X (other category, including undefined and erroneous cases)

The `tagged_sents()` returns a list comprised of sentences, where each sentence is another list of word-tag pairs. Each pair consists of a word from the sentence and its corresponding part-of-speech tag

Once loaded, we encourage you to perform an exploratory analysis of the dataset to understand its structure and the nature of the tagged sentences.

In [None]:
brown_tagged_sents = brown.tagged_sents(categories=['news', 'editorial', 'reviews'], tagset='universal')


## Data Preprocessing

For our Hidden Markov Model, it's essential to preprocess the data to ensure consistency and effectiveness. A key preprocessing step is lowercasing the words in our dataset. This step will help in reducing the complexity of the model by treating words with different cases as the same word. Apply lowercasing to each word in the (word, tag) tuples in our dataset.

Be sure to keep **the same data structure** after lowercasing.



In [None]:
brown_lower = [[(word.lower(), tag) for word, tag in line] for line in brown_tagged_sents]

## Split Train and Test

Using train_test_split from sklearn, split the dataset from the previous step into train and test sets. Choose the train/test size and whether you want to use random_state or not.



In [None]:
train_data, test_data = train_test_split(brown_lower, test_size=0.2, random_state=42)


## Training the Hidden Markov Model with Viterbi

Construct a dictionary of words in the form of a python list that includes every unique word found in the training set. Follow the same process for the tags set. Then, consider this important question: Why is it not advisable to build the dictionary/tags set using all words from both the train and test datasets? Keep in mind the concept of [Data Leakage](https://towardsdatascience.com/data-leakage-in-machine-learning-how-it-can-be-detected-and-minimize-the-risk-8ef4e3a97562) while contemplating this.


In [None]:
# Create a list of words in the training set
words = []
for observation in train_data:
    for word, tag in observation:
        words.append(word)
words = list(set(words))

# Create a list of tags in the training set
tags = []
for state in train_data:
    for word, tag in state:
        tags.append(tag)
tags = list(set(tags))

For building our Hidden Markov Model (HMM), we will utilize the [`HiddenMarkovModelTrainer`](https://tedboy.github.io/nlps/generated/generated/nltk.HiddenMarkovModelTrainer.html) class from NLTK.  

This class encapsulates [**both the HMM and the Viterbi algorithm**](https://www.nltk.org/api/nltk.tag.hmm.html). The Viterbi algorithm is used here to determine the most likely sequence of tags (states) for a given sequence of words (observations), based on the probabilities learned by the HMM. It's essential to import the `HiddenMarkovModelTrainer` from nltk.tag for this purpose.

Create an object from the `HiddenMarkovModelTrainer` using the tag set list and the dictionary list created before. This object will be used to train our HMM models.

For this assignment, we will train five (yes, five!) different HMM models:

- A 'pure' HMM without smoothing (For more about Smoothing, read chapter three of this [thesis](https://digitalscholarship.unlv.edu/cgi/viewcontent.cgi?article=2008&context=thesesdissertations#:~:text=Smoothing%20techniques%20in%20HMM%20will,to%20produce%20more%20accurate%20probabilities.))

    Train a model using the HiddenMarkovModelTrainer object (train_supervised method), passing only the 'train' dataset
    
    
- HMM with [LidstoneProbDist](https://en.wikipedia.org/wiki/Additive_smoothing) smoothing and gamma = 0.01

     LidstoneProbDist is an implementation of a Lidstone probability distribution, which is a variant of the Laplace distribution. The gamma parameter is the smoothing factor. The value of gamma determines the degree of smoothing applied. It's generally a positive number. A gamma of 1 corresponds to Laplace smoothing (add-one), while values different from 1 indicate different degrees of smoothing
     
     Train a model using the HiddenMarkovModelTrainer object (train_supervised method), passing the 'train' dataset and
     supplied function `lidstone_prob_dist_001`
     
     
- HMM with [LidstoneProbDist](https://en.wikipedia.org/wiki/Additive_smoothing) smoothing and gamma = 0.1
    
    Same as above, but with different gamma value.

    Train a model using the HiddenMarkovModelTrainer object (train_supervised method), passing the 'train' dataset and the suplied function `lidstone_prob_dist_01`
    
    
- HMM with [MLEProbDist (Maximum Likelihood Estimation)](https://en.wikipedia.org/wiki/Maximum_likelihood_estimation)

    The basic idea of MLE is to choose the parameters of a model in such a way that the likelihood (probability) of the observed data is maximized. In other words, MLE seeks the parameter values that make the observed data most probable.
     
    Train a model using the HiddenMarkovModelTrainer object (train_supervised method), passing the 'train' dataset and  the suplied function `MLE_ProbDist`
    
     
- HMM with [ELEProbDist (Expected Likelihood Estimation)](https://machinelearningmastery.com/what-is-maximum-likelihood-estimation-in-machine-learning/)

    This method is a form of statistical smoothing, similar to LidstoneProbDist and LaplaceProbDist, but with a slightly different approach.

    The idea behind ELE smoothing is to adjust probabilities in a way that balances accuracy in modeling frequently occurring events with the capability to handle rare or unobserved events. In simple terms, ELE smoothing attempts to estimate the probability of future events based on observed frequency, making adjustments to ensure that unobserved events are not given a probability of zero.
     
    Train a model using the HiddenMarkovModelTrainer object (train_supervised method), passing the 'train' dataset and the suplied function `ELE_ProbDist`


Feel free to try any other approach besides those.

In [None]:
from nltk.probability import LidstoneProbDist, MLEProbDist, ELEProbDist

def lidstone_prob_dist_001(fd, bins):
    return LidstoneProbDist(fd, 0.01)

def lidstone_prob_dist_01(fd, bins):
    return LidstoneProbDist(fd, 0.1)

def MLE_ProbDist(fd, bins):
    return MLEProbDist(fd)

def ELE_ProbDist(fd, bins):
    return ELEProbDist(fd)

In [None]:
# Importing necessary modules from NLTK for HMM training and probability distributions
from nltk.tag import HiddenMarkovModelTrainer
from nltk.probability import LidstoneProbDist, MLEProbDist, ELEProbDist

# Create a HiddenMarkovModelTrainer object
hmm_trainer = HiddenMarkovModelTrainer(states=tags, symbols=words)

In [None]:
# Train the model using only the training set
pure_hmm = hmm_trainer.train_supervised(train_data)

In [None]:
hmm_001 = hmm_trainer.train_supervised(train_data, estimator=lidstone_prob_dist_001)

In [None]:
hmm_01 = hmm_trainer.train_supervised(train_data, estimator=lidstone_prob_dist_01)

In [None]:
hmm_mle = hmm_trainer.train_supervised(train_data, estimator=MLE_ProbDist)

In [None]:
hmm_ele = hmm_trainer.train_supervised(train_data, estimator=ELE_ProbDist)

## Applying the HMM with Viterbi Algorithm

Predict the tags from the 'test' dataset using **each of the models created before**.

Use the `best_path` (docs [here](https://www.nltk.org/api/nltk.tag.hmm.html#nltk.tag.hmm.HiddenMarkovModelTagger.best_path)) function from the model. This function is used to predict the most likely sequence of tags for the given sequence of words. The `best_path` takes an unlabelled (without tags) sentence and returns a sequence of predicted tags.

Make sure to 'break' the 'test' dataset and use only the sentence (without tags) part.

In [None]:
test_sentences = [[word for word, tag in sentence] for sentence in test_data]
test_correct_tags = [[tag for word, tag in sentence] for sentence in test_data]

In [None]:
predicted_tags_pure_hmm = [pure_hmm.best_path(test_sentence) for test_sentence in test_sentences]

  O[i, k] = self._output_logprob(si, self._symbols[k])
  X[i, j] = self._transitions[si].logprob(self._states[j])
  O[i, k] = self._output_logprob(si, self._symbols[k])
  O[i, k] = self._output_logprob(si, self._symbols[k])
  O[i, k] = self._output_logprob(si, self._symbols[k])
  O[i, k] = self._output_logprob(si, self._symbols[k])
  O[i, k] = self._output_logprob(si, self._symbols[k])
  O[i, k] = self._output_logprob(si, self._symbols[k])
  O[i, k] = self._output_logprob(si, self._symbols[k])
  O[i, k] = self._output_logprob(si, self._symbols[k])
  O[i, k] = self._output_logprob(si, self._symbols[k])
  O[i, k] = self._output_logprob(si, self._symbols[k])
  O[i, k] = self._output_logprob(si, self._symbols[k])
  O[i, k] = self._output_logprob(si, self._symbols[k])
  O[i, k] = self._output_logprob(si, self._symbols[k])
  O[i, k] = self._output_logprob(si, self._symbols[k])
  O[i, k] = self._output_logprob(si, self._symbols[k])


In [None]:
predicted_tags_hmm_001 = [hmm_001.best_path(test_sentence) for test_sentence in test_sentences]

In [None]:
predicted_tags_hmm_001 = [hmm_001.best_path(test_sentence) for test_sentence in test_sentences]

In [None]:
predicted_tags_hmm_01 = [hmm_01.best_path(test_sentence) for test_sentence in test_sentences]

In [None]:
predicted_tags_hmm_mle = [hmm_mle.best_path(test_sentence) for test_sentence in test_sentences]

  O[i, k] = self._output_logprob(si, self._symbols[k])
  X[i, j] = self._transitions[si].logprob(self._states[j])
  O[i, k] = self._output_logprob(si, self._symbols[k])
  O[i, k] = self._output_logprob(si, self._symbols[k])
  O[i, k] = self._output_logprob(si, self._symbols[k])
  O[i, k] = self._output_logprob(si, self._symbols[k])
  O[i, k] = self._output_logprob(si, self._symbols[k])
  O[i, k] = self._output_logprob(si, self._symbols[k])
  O[i, k] = self._output_logprob(si, self._symbols[k])
  O[i, k] = self._output_logprob(si, self._symbols[k])
  O[i, k] = self._output_logprob(si, self._symbols[k])
  O[i, k] = self._output_logprob(si, self._symbols[k])
  O[i, k] = self._output_logprob(si, self._symbols[k])
  O[i, k] = self._output_logprob(si, self._symbols[k])
  O[i, k] = self._output_logprob(si, self._symbols[k])
  O[i, k] = self._output_logprob(si, self._symbols[k])
  O[i, k] = self._output_logprob(si, self._symbols[k])
  O[i, k] = self._output_logprob(si, self._symbols[k])
  O[i,

In [None]:
predicted_tags_hmm_ele = [hmm_ele.best_path(test_sentence) for test_sentence in test_sentences]


## Model Evaluation

It's essential to evaluate the performance of our HMM equipped with the Viterbi algorithm to gauge how effectively it handles unseen data. This involves comparing the tags predicted by our model on the test dataset against the actual tags.

* Utilize the provided `printConlleval` function to compute Precision, Recall, and F1-score for each tag and the overall model.

* Utilize the provided `printConfusionMatrix` function to print a confusion matrix that provides insights into the types of errors made by the model and helps in evaluating the accuracy of predictions

**IMPORTANT:** Both functions require a list comprised of sentences, where each sentence is a another list of word-tag pair. Example:

    'labels_predicted' and 'labels_correct' format:
    
    [[('conservation', 'NOUN'), ('plan', 'NOUN')],
     [('pirate', 'NOUN'),
      ('manager', 'NOUN'),
      ('danny', 'NOUN'),
      ('murtaugh', 'NOUN'),
      ('said', 'VERB'),
      ('he', 'PRON'),
      ("hadn't", 'VERB'),
      ('decided', 'VERB'),
      .
      .
      .

After running the evaluations, print the conlleval results and Confusion Matrix **for each model** and address the following questions:

- Which model showed the best performance?
- Was there a noticeable difference between using LidstoneProbDist with gamma set to 0.01 and 0.1?
- How did the 'pure' HMM fare in terms of performance?
- Was there any significant difference between the 'pure' HMM and the MLE-based model? What can you infer by comparing these two models?
- Which of the models performed better with the 'X' tag?
- Which other models have you trained? How did they perform?

Initiate a conversation with your peers and GAs on the Brightspace forum about these topics, while ensuring you don't give away answers or code


In [None]:
from nltk.metrics import ConfusionMatrix
import itertools

def printConfusionMatrix(labels_predicted, labels_correct):
    actual_tags = list(itertools.chain(*[[tag for word, tag in sent] for sent in labels_correct]))
    predicted_tags = list(itertools.chain(*[[tag for word, tag in sent] for sent in labels_predicted]))
    conf_matrix = ConfusionMatrix(actual_tags, predicted_tags)
    print(conf_matrix)


In [None]:
# Importing necessary libraries for model evaluation and metrics calculation
from sklearn.metrics import classification_report, precision_score, recall_score, f1_score
from sklearn.preprocessing import LabelBinarizer
from itertools import chain
import numpy as np

# Defining the conlleval function for evaluating NLP models
def printConlleval(labels_predicted, labels_correct):
    lb = LabelBinarizer() # Initializing the LabelBinarizer for handling label encoding

    # Flattening the list of labels for correct and predicted
    labels_correct_flattened = [(word, tag) for sent in labels_correct for word, tag in sent]
    labels_predicted_flattened = [(word, tag) for sent in labels_predicted for word, tag in list(sent)]

    # Transforming the labels into a binary format for evaluation
    y_true_combined = lb.fit_transform([tag for _, tag in labels_correct_flattened])
    y_pred_combined = lb.transform([tag for _, tag in labels_predicted_flattened])

    tagset = set(lb.classes_)
    tagset = sorted(tagset, key=lambda tag: tag.split('-', 1)[::-1])
    class_indices = {cls: idx for idx, cls in enumerate(lb.classes_)}

    num_sentences = len(labels_predicted)
    total_tokens = sum(len(s) for s in labels_predicted)

    num_correct_sentences, total_correct_tokens = 0, 0
    for pred, true in zip(labels_predicted, labels_correct):
        if len(pred) == len(true):
            correct_tokens = sum(p == t for p, t in zip(pred, true))
            total_correct_tokens += correct_tokens
            if correct_tokens == len(pred):
                num_correct_sentences += 1

    correct_sentences_percentage = num_correct_sentences / num_sentences * 100
    total_correct_tokens_percentage = total_correct_tokens / total_tokens * 100

    classification_report_dict = classification_report(
        y_true_combined,
        y_pred_combined,
        labels=[class_indices[cls] for cls in tagset],
        target_names=tagset,
        output_dict=True,
        zero_division=1
    )


    classification_report_dict.pop('macro avg', None)
    classification_report_dict.pop('weighted avg', None)
    classification_report_dict.pop('samples avg', None)
    classification_report_dict.pop('micro avg', None)

    total_precision = precision_score(y_true_combined, y_pred_combined, average='weighted', zero_division=1)
    total_recall = recall_score(y_true_combined, y_pred_combined, average='weighted', zero_division=1)
    total_f1 = f1_score(y_true_combined, y_pred_combined, average='weighted', zero_division=1)
    total_line = f"{'Total':<15s} {total_precision:<10.2f} {total_recall:<10.2f} {total_f1:<10.2f}"

    report_lines = [f"{k:<15s} {classification_report_dict[k]['precision']:<10.2f} {classification_report_dict[k]['recall']:<10.2f} {classification_report_dict[k]['f1-score']:<10.2f}" for k in classification_report_dict if isinstance(classification_report_dict[k], dict)]
    report_lines.insert(0, "\n")
    report_lines.insert(1, f"{'TAG':<15s} {'Precision':<10s} {'Recall':<10s} {'F1-score':<10s}\n")
    report_lines.insert(2, total_line)
    report_lines.insert(3, '-'*50 + '\n')
    classification_report_str = "\n".join(report_lines)

    additional_info_str = ''
    additional_info_str += f'Total tokens: {total_tokens}\n'
    additional_info_str += f'Total correct tokens: {total_correct_tokens} ({total_correct_tokens_percentage:.2f}%)\n'
    additional_info_str += f'Processed sentences: {num_sentences}\n'
    additional_info_str += f'Completely correct sentences: {num_correct_sentences} ({correct_sentences_percentage:.2f}%)\n'

    print(additional_info_str + classification_report_str)

In [None]:
labels_predicted_pure_hmm = [[(sample, label) for sample, label in zip(sample_sentence, label_sentence)] for sample_sentence, label_sentence in zip(test_sentences, predicted_tags_pure_hmm)]

printConlleval(labels_predicted_pure_hmm, test_data)
printConfusionMatrix(labels_predicted_pure_hmm, test_data)

Total tokens: 40434
Total correct tokens: 21330 (52.75%)
Processed sentences: 1875
Completely correct sentences: 489 (26.08%)


TAG             Precision  Recall     F1-score  

Total           0.94       0.53       0.65      
--------------------------------------------------

.               1.00       0.45       0.62      
ADJ             0.93       0.47       0.62      
ADP             0.97       0.51       0.67      
ADV             0.89       0.54       0.67      
CONJ            0.06       1.00       0.11      
DET             1.00       0.59       0.74      
NOUN            0.97       0.47       0.64      
NUM             0.98       0.53       0.69      
PRON            0.96       0.67       0.79      
PRT             0.89       0.53       0.67      
VERB            0.98       0.55       0.71      
X               1.00       0.28       0.43      
     |                        C         N         P         V      |
     |         A    A    A    O    D    O    N    R    P    E   

In [None]:
labels_predicted_hmm_001 = [[(sample, label) for sample, label in zip(sample_sentence, label_sentence)] for sample_sentence, label_sentence in zip(test_sentences, predicted_tags_hmm_001)]

printConlleval(labels_predicted_hmm_001, test_data)
printConfusionMatrix(labels_predicted_hmm_001, test_data)

Total tokens: 40434
Total correct tokens: 37268 (92.17%)
Processed sentences: 1875
Completely correct sentences: 585 (31.20%)


TAG             Precision  Recall     F1-score  

Total           0.94       0.92       0.93      
--------------------------------------------------

.               0.96       0.99       0.98      
ADJ             0.88       0.85       0.87      
ADP             0.95       0.97       0.96      
ADV             0.81       0.86       0.84      
CONJ            0.88       1.00       0.93      
DET             0.96       0.99       0.97      
NOUN            0.97       0.87       0.92      
NUM             0.78       0.89       0.83      
PRON            0.91       0.98       0.95      
PRT             0.85       0.90       0.87      
VERB            0.97       0.90       0.93      
X               0.03       0.83       0.06      
     |                        C         N         P         V      |
     |         A    A    A    O    D    O    N    R    P    E   

In [None]:
labels_predicted_hmm_01 = [[(sample, label) for sample, label in zip(sample_sentence, label_sentence)] for sample_sentence, label_sentence in zip(test_sentences, predicted_tags_hmm_01)]

printConlleval(labels_predicted_hmm_01, test_data)
printConfusionMatrix(labels_predicted_hmm_01, test_data)

Total tokens: 40434
Total correct tokens: 34957 (86.45%)
Processed sentences: 1875
Completely correct sentences: 516 (27.52%)


TAG             Precision  Recall     F1-score  

Total           0.95       0.86       0.90      
--------------------------------------------------

.               0.98       0.97       0.98      
ADJ             0.89       0.77       0.83      
ADP             0.95       0.94       0.95      
ADV             0.82       0.81       0.82      
CONJ            0.94       0.95       0.94      
DET             0.97       0.98       0.97      
NOUN            0.97       0.76       0.85      
NUM             0.79       0.82       0.80      
PRON            0.91       0.96       0.94      
PRT             0.86       0.87       0.87      
VERB            0.97       0.85       0.91      
X               0.01       0.93       0.02      
     |                        C         N         P         V      |
     |         A    A    A    O    D    O    N    R    P    E   

In [None]:
labels_predicted_hmm_mle = [[(sample, label) for sample, label in zip(sample_sentence, label_sentence)] for sample_sentence, label_sentence in zip(test_sentences, predicted_tags_hmm_mle)]

printConlleval(labels_predicted_hmm_mle, test_data)
printConfusionMatrix(labels_predicted_hmm_mle, test_data)

Total tokens: 40434
Total correct tokens: 21330 (52.75%)
Processed sentences: 1875
Completely correct sentences: 489 (26.08%)


TAG             Precision  Recall     F1-score  

Total           0.94       0.53       0.65      
--------------------------------------------------

.               1.00       0.45       0.62      
ADJ             0.93       0.47       0.62      
ADP             0.97       0.51       0.67      
ADV             0.89       0.54       0.67      
CONJ            0.06       1.00       0.11      
DET             1.00       0.59       0.74      
NOUN            0.97       0.47       0.64      
NUM             0.98       0.53       0.69      
PRON            0.96       0.67       0.79      
PRT             0.89       0.53       0.67      
VERB            0.98       0.55       0.71      
X               1.00       0.28       0.43      
     |                        C         N         P         V      |
     |         A    A    A    O    D    O    N    R    P    E   

In [None]:
labels_predicted_hmm_ele = [[(sample, label) for sample, label in zip(sample_sentence, label_sentence)] for sample_sentence, label_sentence in zip(test_sentences, predicted_tags_hmm_ele)]

printConlleval(labels_predicted_hmm_ele, test_data)
printConfusionMatrix(labels_predicted_hmm_ele, test_data)

Total tokens: 40434
Total correct tokens: 26037 (64.39%)
Processed sentences: 1875
Completely correct sentences: 262 (13.97%)


TAG             Precision  Recall     F1-score  

Total           0.95       0.64       0.75      
--------------------------------------------------

.               1.00       0.89       0.94      
ADJ             0.89       0.45       0.60      
ADP             0.95       0.74       0.83      
ADV             0.83       0.58       0.68      
CONJ            0.96       0.75       0.84      
DET             0.98       0.89       0.93      
NOUN            0.98       0.45       0.61      
NUM             0.80       0.48       0.60      
PRON            0.87       0.83       0.85      
PRT             0.88       0.72       0.79      
VERB            0.98       0.60       0.74      
X               0.00       0.97       0.00      
     |                        C         N         P         V      |
     |         A    A    A    O    D    O    N    R    P    E   

## Choose your best model

Choose your best model and export it using the 'dill' library. You can locate the exported file in the same folder as your notebook. Make sure to submit it to Codegrade. And remember, don't change the final file name of the model - it should remain 'mybestmodel.dill'.

In [None]:
#from google.colab import drive
#drive.mount('/content/drive')
#drive_path = "/content/drive/MyDrive/School/Eastern University/DTSC/DTSC_685/Module_4/Assignment_3/Hidden_Markov_Model_Viterbi_NLP/"

Mounted at /content/drive


In [None]:
# Importing dill library for model serialization
# import dill
mybestmodel = hmm_001

# serialization with dill
with open(drive_path + 'mybestmodel.dill', 'wb') as file:
    dill.dump(mybestmodel, file)

This material is for enrolled students' academic use only and protected under U.S. Copyright Laws. This content must not be shared outside the confines of this course, in line with Eastern University's academic integrity policies. Unauthorized reproduction, distribution, or transmission of this material, including but not limited to posting on third-party platforms like GitHub, is strictly prohibited and may lead to disciplinary action. You may not alter or remove any copyright or other notice from copies of any content taken from BrightSpace or Eastern University’s website.

© Copyright Notice 2024, Eastern University - All Rights Reserved