# NLP Classification

In this assignment we look at several ways of classifying texts:
- Naive Bayes
- Logistic Regression
- Multinomial Regression

We also look at binary label classification problems (e.g., sentiment analysis) and multinomial classification problems (e.g., topic analysis).

We will use two datasets:
- [IMDb movie review sentiment](http://ai.stanford.edu/~amaas/data/sentiment/)
- [AG News topics](https://huggingface.co/datasets/ag_news)

**Tips:**
- Read all the code. We don't ask you to write the training loops, or evaluation loops, but it is often instructive to see how the models are trained and evaluated.
- If you have a model that is learning (loss is decreasing), but you want to increase accuracy, try using ``nn.Dropout`` layers just before the final linear layer to force the model to handle missing or unfamiliar data.

In [1]:
# start time - notebook execution
import time
start_nb = time.time()

# Set up

Import packages.

In [2]:
import nltk
import numpy as np
import os
import pandas as pd
import re
import torch
import torch.nn as nn
import torch.optim as optim

# Initialize the Autograder

In [3]:
# import the autograder tests
import hw2_tests as ag

# Functions for cleaning up raw texts and tokenizing the corpus

We perform text preprocessing that includes: removing HTML tags, making text lower case, stemming, and disposing of stopwords.
In the end, we will split the entire dataset into training, validation and test sets.

In [4]:
# Stemming the text
def simple_stemmer(text):
    ps=nltk.porter.PorterStemmer()
    text= [ps.stem(word) for word in text]
    return text

In [5]:
stopwords_english = ["i", "me", "my", "myself", "we", "our", "ours", "ourselves", "you", "your", "yours", "yourself", "yourselves", "he", "him", "his", "himself", "she", "her", "hers", "herself", "it", "its", "itself", "they", "them", "their", "theirs", "themselves", "what", "which", "who", "whom", "this", "that", "these", "those", "am", "is", "are", "was", "were", "be", "been", "being", "have", "has", "had", "having", "do", "does", "did", "doing", "a", "an", "the", "and", "but", "if", "or", "because", "as", "until", "while", "of", "at", "by", "for", "with", "about", "against", "between", "into", "through", "during", "before", "after", "above", "below", "to", "from", "up", "down", "in", "out", "on", "off", "over", "under", "again", "further", "then", "once", "here", "there", "when", "where", "why", "how", "all", "any", "both", "each", "few", "more", "most", "other", "some", "such", "no", "nor", "not", "only", "own", "same", "so", "than", "too", "very", "s", "t", "can", "will", "just", "don", "should", "now"]
print(stopwords_english)

#removing the stopwords
def remove_stopwords(text, stopword_list):
    tokens = [token.strip() for token in text]
    filtered_tokens = [token for token in tokens if token.lower() not in stopword_list]
    return filtered_tokens

['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers', 'herself', 'it', 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', 'should', 'no

In [6]:
def tokenize_and_clean(line, stem_and_remove_stop_words = True):

    line = re.sub(r"<.*?>", "", line).strip() # remove all HTML tags
    line = re.sub(r'[^a-zA-Z0-9]', ' ', line) # remove punc
    line = line.lower().split()  # lower case
    if stem_and_remove_stop_words:
        line = remove_stopwords(line, stopwords_english)
        line = simple_stemmer(line)

    return line

# Download and unpack the sentiment data



We are using IMDb Dataset for binary sentiment classification that provides a set of 25K highly polar reviews for training, and 25K for testing
(each set contains an equal number of positive and negative examples).

Dataset folder structure is as follows:

dataset/ \
├── test/ \
│     ├── pos/ \
│     ├── neg/ \
├── train/ \
      ├── pos/ \
      └── neg/

In [7]:
# check if dataset is downloaded
if not os.path.isfile('aclImdb_v1.tar'):
    print("Downloading dataset...")
    !wget http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
    !gunzip aclImdb_v1.tar.gz
    !tar -xvf aclImdb_v1.tar

Load in the text from the folders.

In [8]:
def load_text_from_folders(path, file_list, dataset, samples = 2000, stem_and_remove_stop_words = True):
    """Read set of files from given directory and save returned lines to list.

    Parameters
    ----------
    path : str
        Absolute or relative path to given file (or set of files).
    file_list: list
        List of files names to read.
    dataset: list
        List that stores read lines.
    samples: int
        Number of samples in the output
    """
    for i, file in enumerate(file_list):
        if i >= samples:
            break
        with open(os.path.join(path, file), 'r', encoding='utf8') as text:
            contents = text.read()
            contents_tokenized = tokenize_and_clean(contents, stem_and_remove_stop_words=stem_and_remove_stop_words)
            dataset.append(contents_tokenized)

# Creating training and test sets

This creates four arrays:


*   ```train_pos``` -- instances in the training set with positive sentiment labels
*   ```train_neg``` -- instances in the training set with negative sentiment labels
*   ```test_pos``` -- instances in the testing set with positive sentiment labels
*   ```test_neg``` -- instances in the testing set with negative sentiment labels





In [None]:
# Path to dataset location
path = 'aclImdb/'

# Create lists that will contain read lines
train_pos, train_neg, test_pos, test_neg = [], [], [], []

# Create a dictionary of paths and lists that store lines (key: value = path: list)
sets_dict = {'train/pos/': train_pos, 'train/neg/': train_neg,
             'test/pos/': test_pos, 'test/neg/': test_neg}

# Load the data
for dataset in sets_dict:
  file_list = [f for f in sorted(os.listdir(os.path.join(path, dataset))) if f.endswith('.txt')]
  load_text_from_folders(os.path.join(path, dataset), file_list, sets_dict[dataset])

Convert into Pandas dataframes. Pandas is a virtual spreadsheet with a programmatic API. A ```DataFrame``` is a spreadsheet. We will make a spreadsheet of training data and one for testing data and one with everything together.

In [None]:
# Concatenate training and testing examples into one dataset
TRAIN = pd.concat([pd.DataFrame({'review': train_pos, 'label':1}),
                     pd.DataFrame({'review': train_neg, 'label':0})],
                     axis=0, ignore_index=True)

TEST = pd.concat([pd.DataFrame({'review': test_pos, 'label':1}),
                    pd.DataFrame({'review': test_neg, 'label':0})],
                    axis=0, ignore_index=True)

ALL = pd.concat([TRAIN, TEST])

Look at the data.

This is a summary of the data. We see that the data is balanced between labels

In [None]:
TRAIN.label.value_counts()

This is the first few rows of the training set:

In [None]:
TRAIN.head()

# Creating a vocabulary file

Next, we have to build a vocabulary. This is effectively a look-up table where every unique word in your data set has a corresponding index (an integer).
We do this as our machine learning model cannot operate on strings, but only numbers. Each index is used to construct a one-hot vector for each word.

In [None]:
class Vocab:
    def __init__(self, name):
        self.name = name
        self._word2index = {}
        self._word2count = {}
        self._index2word = {}
        self._n_words = 0

    def get_words(self):
      return list(self._word2count.keys())

    def num_words(self):
      return self._n_words

    def word2index(self, word):
      return self._word2index[word]

    def index2word(self, word):
      return self._index2word[word]

    def word2count(self, word):
      return self._word2count[word]

    def add_sentence(self, sentence):
        for word in sentence.split(' '):
            self.add_word(word)

    def add_word(self, word):
        if word not in self._word2index:
            self._word2index[word] = self._n_words
            self._word2count[word] = 1
            self._index2word[self._n_words] = word
            self._n_words += 1
        else:
            self._word2count[word] += 1

Make a vocab object.

In [None]:
VOCAB = Vocab("imdb")
VOCAB_SIZE = 1000
NUM_LABELS = 2

Load the first ```n``` frequent words in the vocabulary. Do this by sorting by frequency and then truncating.

In [None]:
# Get word frequency counts
word_freq_dict = {}   # key = word, value = frequency
for review in ALL['review']:
  for word in review:
    if word in word_freq_dict:
      word_freq_dict[word] += 1
    else:
      word_freq_dict[word] = 1

# Get a list of (word, freq) tuples sorted by frequency
kv_list = []  # list of word-freq tuples so can sort
for (k,v) in word_freq_dict.items():
  kv_list.append((k,v))
sorted_kv_list = sorted(kv_list, key=lambda x: x[1], reverse=True)

# Load top n words in to vocab object
for word, freq in sorted_kv_list[:VOCAB_SIZE]:
  VOCAB.add_word(word)

# Naive Bayes
Naive Bayes Algorithm is based on the Bayes Rule which describes the probability of an event,
based on prior knowledge of conditions that might be related to the event.

According to Bayes theorem:


```Posterior = likelihood * proposition/evidence```

or

```P(A|B) = P(B|A) * P(A)/P(B)```


Using word presence as features, create an array of features for each review. Each review will thus be an array of size ```len(vocab)``` where each index in the array is a token number and the value in that position is whether the token is present in the review. There will be ```num_rows``` arrays, making a ```num_rows x len(vocab)``` 2D array.

This function creates a bag of words. It returns a vector where each element is a count of the words in the sentence corresponding to the word index.

In [None]:
def make_bow(sentence):
    vec = torch.zeros(VOCAB_SIZE, dtype=torch.float64)
    for word in sentence:
        if word not in VOCAB.get_words():
            continue
        vec[VOCAB.word2index(word)] += 1
    return vec.view(1, -1)

Prepare data ```X_TRAIN``` is a 2D array of size ```num_reviews x vocab_size``` that contains training data. Each row will be a bag of words, except each index contains a 1 or 0 based on word presence in the example. Each row is a vector of features $\phi_1 ... \phi_{|V|}$ assumed to be independent, where $|V|$ is size of the vocabulary. We don't need to know what the features are, only whether they are present in each example in the training set.

```X_TEST``` is the same as above but containing testing data.



In [None]:
# Vectorize text reviews to numbers
# Make empty vectors
X_TRAIN = np.zeros((len(TRAIN), VOCAB_SIZE))
X_TEST = np.zeros((len(TEST), VOCAB_SIZE))

# Load in frequency counts
for i, row in TRAIN.iterrows():
    X_TRAIN[i] = np.array(make_bow(row['review'])) > 0 # The > 0 converts to presence instead of counts

for i, row in TEST.iterrows():
    X_TEST[i] = np.array(make_bow(row['review'])) > 0 # The > 0 converts to presence instead of counts

# The labels
Y_TRAIN = np.array(TRAIN['label'])
Y_TEST = np.array(TEST['label'])

What you want to do is to compute probabilities over the training data and then apply those probabilities to the testing examples. Use the Bayes formula to compute $P_{\rm test}(L_{+}|\phi_{0:|V|})$ and $P_{\rm test}(L_{-}|\phi_{0:|V|})$ for each review. Classify examples based on whether one probability is higher than another. That is, $sign(P_{\rm test}(L_{+}|\phi_{0:|V|}) - P_{\rm test}(L_{-}|\phi_{0:|V|}))$ indicates a positive review when greater than 0 and a negative review when less than 0.

**Hint:** You do not need to implement any loops. Numpy indexing and slicing operations, along with built in functions like `.mean()`, `.sum()`, etc. will allow all operations to be performed on each row of the data in parallel.

Step 1: Compute the positive label condition:
$P(L_{+}|\phi_{0:|V|}) = P(\phi_{0:|V|}|L_{+})P(L_{+}) / P(\phi_{0:|V|})$

In [None]:
def prob_pos_given_features(x_train, y_train):
  log_probs = np.array([0] * x_train.shape[1])
  ### BEGIN SOLUTION
    # Calculate the prior probability P(L_+)
  P_pos = np.sum(y_train) / len(y_train)

  # Calculate the likelihood P(phi|L_+)
  pos_counts = np.sum(x_train[y_train == 1], axis=0)
  total_pos = np.sum(y_train == 1)

  # Apply Laplace smoothing
  log_probs = np.log((pos_counts + 1) / (total_pos + 2))

  # Combine to get the log probability
  log_probs += np.log(P_pos)
  ### END SOLUTION
  return log_probs

Step 2: Compute the negative label condition:
$P(L_{-}|\phi_{0:|V|}) = P(\phi_{0:|V|}|L_{-})P(L_{-}) / P(\phi_{0:|V|})$

In [None]:
def prob_neg_given_features(x_train, y_train):
  log_probs = np.array([0] * x_train.shape[1])
  ### BEGIN SOLUTION
# Calculate the prior probability P(L_-)
  P_neg = np.sum(y_train == 0) / len(y_train)

  # Calculate the likelihood P(phi|L_-)
  neg_counts = np.sum(x_train[y_train == 0], axis=0)
  total_neg = np.sum(y_train == 0)

  # Apply Laplace smoothing
  log_probs = np.log((neg_counts + 1) / (total_neg + 2))

  # Combine to get the log probability
  log_probs += np.log(P_neg)
  ### END SOLUTION
  return log_probs

In [None]:
pos_probs = prob_pos_given_features(X_TRAIN, Y_TRAIN)
neg_probs = prob_neg_given_features(X_TRAIN, Y_TRAIN)

Step 3: Make a label prediction. Subtract (in log scale) the positive from the negative. If the result is greater than zero then it is a prediction of `+` label. If the result is less thn zero then we make a prediction of `-` label.

In [None]:
def naive_bayes(x, pos_probs, neg_probs):
  label = 0
  ### BEGIN SOLUTION
    # Calculate log probabilities for positive and negative
  log_pos = np.sum(pos_probs * x)  # P(phi|L_+) for the features
  log_neg = np.sum(neg_probs * x)  # P(phi|L_-) for the features

  # Compare the log probabilities
  label = 1 if log_pos > log_neg else 0
  ### END SOLUTION
  return label

# Naive Bayes Test (20 Points)

In [None]:
# student check - accuracies >= 78% will receive full credit (no credit for less than 78%)
ag.test_naive_bayes(X_TRAIN, Y_TRAIN, X_TEST, Y_TEST)

# Logistic Regression - Part 1

We will be using a neural network to perform logistic regression. We will use word counts as the input feature vector.


Reload the data, but use word counts instead of word presence.

In [None]:
# Randomize the data
TRAIN = TRAIN.sample(frac=1).reset_index(drop=True)
TEST = TEST.sample(frac=1).reset_index(drop=True)

# Vectorize text reviews to numbers
X_TRAIN = np.zeros((len(TRAIN), VOCAB_SIZE))
X_TEST = np.zeros((len(TEST), VOCAB_SIZE))

for i, row in TRAIN.iterrows():
  X_TRAIN[i] = np.array(make_bow(row['review']))

for i, row in TEST.iterrows():
  X_TEST[i] = np.array(make_bow(row['review']))

Y_TRAIN = np.array(TRAIN['label'])
Y_TEST = np.array(TEST['label'])

Make a logistic classifier torch neural network.

Complete the constructor and forward function. The net will take an arbitrary number of outputs, but for binary logistic regression, only one is needed because the single output neuron can take a value that is between 0 and 1, with 0 meaning negative sentiment and 1 meaning positive sentiment. There should only be as many parameters as ```num_features x (num_labels-1)``` in binary logistic regression and ```num_features x num_labels``` for multinomial logistic regression.

The input will be a one-hot vector of size `vocab_size`.

In [None]:
# Defining neural network structure
class BoWClassifier(nn.Module):  # inheriting from nn.Module!

  def __init__(self, num_labels, vocab_size):
    super(BoWClassifier, self).__init__()

    # BEGIN SOLUTION
    self.fc = nn.Linear(vocab_size, num_labels)  # Fully connected layer
    # END SOLUTION

  def forward(self, bow_vec):
    # BEGIN SOLUTION
    out = self.fc(bow_vec)  # Pass the input through the fully connected layer
    out = torch.sigmoid(out)
    # END SOLUTION
    return out

In [None]:
# Initialize the model
# Use one label because the head can signify a 1 or 0 because of the sigmoid.
bow_nn_model = BoWClassifier(NUM_LABELS-1, VOCAB_SIZE)

This function should return two tensors. The first, containing training data, shoud be of size ```batch_size x vocab_size``` for the ```i```th batch. The second should be a list of labels of size ```batch_size```. Both tensors should be of type ```dtype=torch.float```.

In [None]:
def get_batch(i, batch_size, x_data, y_data):
  # Make some empty tensors
  x = None
  y = None
  ### BEGIN SOLUTION
  x = torch.tensor(x_data[i * batch_size:(i + 1) * batch_size], dtype=torch.float)  # Training data
  y = torch.tensor(y_data[i * batch_size:(i + 1) * batch_size], dtype=torch.float)  # Labels
  type(x)
  type(y)
  ### END SOLUTION
  return x, y

# Logistic Regression - Part 1 Test (20 Points)

In [None]:
# student check
ag.test_batch_output_shape(get_batch, X_TRAIN, Y_TRAIN, VOCAB_SIZE)

In [None]:
# student check - your model must have the expected number of layers to receive full credit, no credit otherwise
ag.check_bow_architecture(bow_nn_model)

In [None]:
# student check
ag.test_forward_pass_shape(X_TRAIN, Y_TRAIN, bow_nn_model)

# Logistic Regression - Part 2

Create a dataset as an array of (X_train, label).

Complete ```get_batch(i)``` and set ```batch_size``` and ```num_epochs```.

Training loop will call ```get_batch()``` with the iteration number and do everything else.


In [None]:
# Train the model
def train(model, train_data, test_data, epochs, batch_size):
  print(f"X_TRAIN shape: {train_data.shape}")
  print(f"Y_TRAIN shape: {test_data.shape}")
  n_iter = len(train_data) // batch_size
  print(n_iter, 'batches per epoch')
  # Loss Function
  loss_function = nn.BCELoss()
  # Optimizer initlialization
  optimizer = optim.SGD(bow_nn_model.parameters(), lr=0.1)

  for epoch in range(epochs):
    print(f"Starting epoch {epoch + 1}/{epochs}")
    # Make BOW vector for input features and target label
    for i in range(n_iter):
      x, y = get_batch(i, batch_size, train_data, test_data)

      # Step 3. Run the forward pass.
      y_hat = model(x)
      y_hat = y_hat.reshape(-1)

      # Step 4. Compute the loss, gradients, and update the parameters by
      loss = loss_function(y_hat,y)
      optimizer.zero_grad()
      loss.backward()
      optimizer.step()

      if (epoch+1)%10 == 0 and (i+1) == n_iter:
        print('epoch:', epoch+1,',loss =',loss.item(), ', training accuracy =',(torch.round(y_hat)==y).float().mean())
  return model

In [None]:
# It's ok to modify this cell.
BATCH_SIZE = 5
N_EPOCHS = 5

In [None]:
try:
    bow_nn_model = train(bow_nn_model, X_TRAIN, Y_TRAIN, N_EPOCHS, BATCH_SIZE)
except:
    print("Training failed. Please check your code.")

# Logistic Regression - Part 2 Test (20 Points)

In [None]:
# student check - accuracies >= 78% will receive full credit (no credit for less than 78%)
ag.test_model_accuracy_lr(TEST, bow_nn_model)

# Multinomial Regression

Load data.

In [None]:
!pip install datasets

In [None]:
from datasets import load_dataset

Unlike earlier, we will use a pre-defined set of embeddings, called [GLoVe](https://nlp.stanford.edu/projects/glove/). GLoVe replaces every word with a 100-dimensional vector of floating point values. The advantage of this is that words with similar semantic meanings will have similar vectors. This is important because the vocabulary size of the corpus we will use is 400,000.

For the assigment, instead of getting a one-hot vector for each word, the neural network will get a `batch_size x num_words x 100` tensor containing floating point values.

Download the GLoVe embedding vectors.

In [None]:
import gensim.downloader

In [None]:
glove_vectors = gensim.downloader.load('glove-wiki-gigaword-100')
VOCAB_SIZE = len(glove_vectors.vectors)
EMBEDDING_DIM = 100

This function will embed the dataset into sequences of 100-dimension vectors.

In [None]:
# pad dataset to a maximum review length in words
MAX_LEN = 50

def get_glove_seq(review, max_len):
  seq = np.zeros((max_len, 100))
  for i, word in enumerate(review):
    if i < max_len and word in glove_vectors:
      seq[i] = glove_vectors[word]
  return seq

In [None]:
news_data_train = load_dataset("ag_news", split="train").shuffle()
news_data_test = load_dataset("ag_news", split="test").shuffle()
NEWS_TRAIN = pd.DataFrame(news_data_train)[:5000]
NEWS_TEST = pd.DataFrame(news_data_test)[:5000]
NUM_LABELS = 4

In [None]:
NEWS_TEST.head()

Train/Test Sets using GloVe embeddings.

In [None]:
# Vectorize text reviews to numbers
X_NEWS_TRAIN = np.zeros((len(NEWS_TRAIN), MAX_LEN, 100))
X_NEWS_TEST = np.zeros((len(NEWS_TEST), MAX_LEN, 100))

for i, row in NEWS_TRAIN.iterrows():
  X_NEWS_TRAIN[i] = get_glove_seq(tokenize_and_clean(row['text'], stem_and_remove_stop_words=False), MAX_LEN)

for i, row in NEWS_TEST.iterrows():
  X_NEWS_TEST[i] = get_glove_seq(tokenize_and_clean(row['text'], stem_and_remove_stop_words=False), MAX_LEN)

Y_NEWS_TRAIN = np.array(NEWS_TRAIN['label'])
Y_NEWS_TEST = np.array(NEWS_TEST['label'])
NUM_LABELS = 4

In [None]:
# Defining neural network structure
class MultinomialBoWClassifier(nn.Module):  # inheriting from nn.Module!
  def __init__(self, max_word_len, embedding_dim, num_labels):
    super(MultinomialBoWClassifier, self).__init__()
    self.max_word_len = max_word_len
    self.embedding_dim = embedding_dim
    self.num_labels = num_labels
    ### BEGIN SOLUTION
    self.embedding = nn.EmbeddingBag(num_embeddings=max_word_len, embedding_dim=embedding_dim, mode='mean', sparse=True)  # Embedding layer
    self.fc = nn.Linear(embedding_dim, num_labels)  # Fully connected layer
    ### END SOLUTION

  def forward(self, x):
    out = None
    ### BEGIN SOLUTION
    embedded = self.embedding(x)  # Pass input through the embedding layer
    out = self.fc(embedded)  # Pass the embedded output through the fully connected layer
    ### END SOLUTION
    return out

In [None]:
multibow_model = MultinomialBoWClassifier(max_word_len=MAX_LEN, embedding_dim=EMBEDDING_DIM, num_labels=NUM_LABELS)

In [None]:
# Train the model
def train(model, x_train_data, y_train_data, epochs, batch_size, lr, weight_decay):
  print('Training Started!')
  optimizer = optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)
  criterion = nn.CrossEntropyLoss()
  n_iter = len(x_train_data) // batch_size
  print(n_iter, 'batches per epoch')

  for epoch in range(epochs):
    num_correct = 0
    total_loss = 0.0
    model.train()

    for i in range(n_iter):
      x, y = get_batch(i, batch_size, x_train_data, y_train_data)
      x = x
      y = y.long()

      y_hat = model(x)
      loss = criterion(y_hat, y)
      total_loss += loss
      optimizer.zero_grad()
      loss.backward()
      optimizer.step()

      if (epoch+1)%10 == 0 and (i+1) == n_iter:
        print('epoch:', epoch+1,',loss =',loss.item(), ', training accuracy =',(y_hat.argmax(dim=1)==y).float().mean().item())

In [None]:
# It's ok to modify this cell.
BATCH_SIZE = 10
N_EPOCHS = 100
LEARNING_RATE = 2e-3
WEIGHT_DECAY = 1e-2

In [None]:
try:
    train(multibow_model, X_NEWS_TRAIN, Y_NEWS_TRAIN, N_EPOCHS, BATCH_SIZE, lr=LEARNING_RATE, weight_decay=WEIGHT_DECAY)
except:
    print("Training failed. Please check your code.")

# Multinomial Regression - Test (40 Points)

In [None]:
# student check - accuracies >= 80% will receive full credit (no credit for less than 80%)
ag.test_model_accuracy_mr(X_NEWS_TEST, Y_NEWS_TEST, multibow_model)

# Grading

Please submit this .ipynb file to Gradescope for grading.

## Final Grade

In [None]:
# student check
ag.FINAL_GRADE()

## Notebook Runtime

In [None]:
# end time - notebook execution
end_nb = time.time()
# print notebook execution time in minutes
print("Notebook execution time in minutes =", (end_nb - start_nb)/60)
# warn student if notebook execution time is greater than 30 minutes
if (end_nb - start_nb)/60 > 30:
  print("WARNING: Notebook execution time is greater than 30 minutes. Your submission may not complete auto-grading on Gradescope. Please optimize your code to reduce the notebook execution time.")