---


The following required libraries are __not__ pre-installed in the Skills Network Labs environment. __You will need to run the following cell__ to install them:


In [None]:
!pip install --user --upgrade -U torch==2.2.0 torchtext==0.17.0 torchdata==0.7.1 portalocker==2.8.2
!pip install -U pandas==2.2.2
!pip install -U matplotlib==3.8.4
!pip install --user -U scikit-learn==1.4.2
!pip install --user -U plotly==5.22.0
!pip install -U numpy==1.26.4

After the installation of libraries is completed, restart your kernel. You can do this by running the code below.


In [None]:
import os
os._exit(00)

If, for some reason, the above code did not work, you can restart the kernel by clicking the **Restart the kernel** icon.


### Importing Required Libraries

Run the following code to import the required libraries.


In [None]:
from tqdm import tqdm
import numpy as np
import pandas as pd
from itertools import accumulate
import matplotlib.pyplot as plt
import math

import torch
import torch.nn as nn
from torchtext.vocab import build_vocab_from_iterator, GloVe, Vectors

from sklearn.manifold import TSNE

from torch.utils.data import DataLoader
import numpy as np
from torchtext.datasets import AG_NEWS
from IPython.display import Markdown as md
from tqdm import tqdm

from torch.utils.data.dataset import random_split
from torchtext.data.functional import to_map_style_dataset
from sklearn.manifold import TSNE
import plotly.graph_objs as go

# You can also use this section to suppress warnings generated by your code:
def warn(*args, **kwargs):
    pass
import warnings
warnings.warn = warn
warnings.filterwarnings('ignore')

import pickle

### Defining Helper Functions

The helper functions defined below are designed to improve the readability of the code that follows. These functions are primarily used for graph plotting and file operations, such as saving and loading data. It's important to note that these helper functions are not the primary focus of this lab, so you don't need to spend too much time on them. Run the subsequent cells to load these helper functions.


In [None]:
def plot(COST,ACC):
    fig, ax1 = plt.subplots()
    color = 'tab:red'
    ax1.plot(COST, color=color)
    ax1.set_xlabel('epoch', color=color)
    ax1.set_ylabel('total loss', color=color)
    ax1.tick_params(axis='y', color=color)

    ax2 = ax1.twinx()
    color = 'tab:blue'
    ax2.set_ylabel('accuracy', color=color)  # we already handled the x-label with ax1
    ax2.plot(ACC, color=color)
    ax2.tick_params(axis='y', color=color)
    fig.tight_layout()  # otherwise the right y-label is slightly clipped

    plt.show()

In [None]:
def plot_embdings(my_embdings,name,vocab):




  fig = plt.figure()
  ax = fig.add_subplot(111, projection='3d')

  # Plot the data points
  ax.scatter(my_embdings[:,0], my_embdings[:,1], my_embdings[:,2])

  # Label the points
  for j, label in enumerate(name):

      i=vocab.get_stoi()[label]

      ax.text(my_embdings[j,0], my_embdings[j,1], my_embdings[j,2], label)

  # Set axis labels
  ax.set_xlabel('X Label')
  ax.set_ylabel('Y Label')
  ax.set_zlabel('Z Label')

  # Show the plot
  plt.show()

In [None]:
def save_list_to_file(lst, filename):
    """
    Save a list to a file using pickle serialization.

    Parameters:
        lst (list): The list to be saved.
        filename (str): The name of the file to save the list to.

    Returns:
        None
    """
    with open(filename, 'wb') as file:
        pickle.dump(lst, file)

def load_list_from_file(filename):
    """
    Load a list from a file using pickle deserialization.

    Parameters:
        filename (str): The name of the file to load the list from.

    Returns:
        list: The loaded list.
    """
    with open(filename, 'rb') as file:
        loaded_list = pickle.load(file)
    return loaded_list

---


# Data Pipeline


### Tokenizer


A tokenizer takes an input a document and breaks it up into individual tokens. Now, you may wonder, what's a token?
This example may help you understand it better.

Imagine a token as a puzzle piece of a jigsaw puzzle. Each word, number, or small part of a word is a token. When we tokenize a document, we break it into these puzzle pieces so that a computer can understand and work with the text more easily, just like how you solve a puzzle by arranging its pieces.


First import the **```get_tokenizer```** function from **```torchtext.data.utils```**


In [None]:
from torchtext.data.utils import get_tokenizer

Next, we generate the tokenizer, and set it to "basic_english". Setting to "basic_english" forces **`get_tokenizer`** to create a tokenizer that handles basic English text and splits that text into individual tokens based on spaces and punctuation marks. For additional details, refer to the [`pytorch` documentation](https://pytorch.org/text/stable/data_utils.html#get-tokenizer).


In [None]:
tokenizer = get_tokenizer("basic_english")

To get an understanding of how the "basic_english" tokenizer works, run the following example:


In [None]:
tokens = tokenizer("You can now install TorchText using pip!")
tokens

## Text Classification
Let's build a text classification model using PyTorch and torchtext to classify news articles into one of four categories: World, Sports, Business, and Sci/Tech.


### Introduction to the dataset

The following introduces the dataset and provides context for some of the code that follows.

Let's load the training split of the **`AG_NEWS`** dataset from **`torchtext`** and illustrate how we can work with a dataset in this format:


In [None]:
train_iter= AG_NEWS(split="train")

The **`AG_NEWS`** dataset in **`torchtext`** does not support direct indexing like a list or tuple. It is not a random access dataset but rather an iterable dataset that needs to be used with an iterator. This approach is more efficient for text data.


The following code demonstrates how to work with an iterator. First, the iterator is created by wrapping `train_iter` inside the **`iter`** function. To retrieve the values from the iterator, you can use the **```next()```** function, which will yield the label index value and the text:


In [None]:
y,text= next(iter(train_iter))
print(y,text)

We can find the true label from the label index value using a dictionary:


In [None]:
ag_news_label = {1: "World", 2: "Sports", 3: "Business", 4: "Sci/Tec"}
ag_news_label[y]

We can also count the total number of classes that appear in `train_iter` to confirm that there is at least one instance of each class in the training data:


In [None]:
num_class = len(set([label for (label, text) in train_iter ]))
num_class

Because our dataset is an iterable we will create a generator function called **```yield_tokens```** to apply the **```tokenizer```** to the text items in the dataset. The purpose of the generator function **```yield_tokens```** is to yield tokenized texts one at a time. Instead of processing the entire dataset and returning all the tokenized texts in one go, the generator function processes and yields each tokenized text individually as it is requested. The tokenization process is performed lazily, which means the next tokenized text is generated only when needed, saving memory and computational resources.


In [None]:
def yield_tokens(data_iter):
    for  _,text in data_iter:
        yield tokenizer(text)

### Token Indices


We would like to represent tokens as numbers because NLP algorithms can process and manipulate numbers more efficiently and quickly than text tokens. We will generate a mapping between tokens in our vocabulary and numbers using the  **```build_vocab_from_iterator```** function. The numbers to which tokens in the vocabulary are mapped are typically referred to as 'token indices' or simply 'indices'. These indices are effectively the numeric representations of tokens inside the vocabulary.

The **```build_vocab_from_iterator```** function, when applied to a list of tokens, assigns a unique index to each token based on its position in the vocabulary. These indices serve as a way to represent the tokens in a numerical format that can be easily processed by machine learning models.

For example, given a vocabulary with tokens ["apple", "banana", "orange"], the corresponding indices might be [0, 1, 2], where "apple" is represented by index 0, "banana" by index 1, and "orange" by index 2.


The following code builds our vocabulary by iterating over each (y, sentence) tuple in the corpus, tokenizing each sentance as we iterate through the dataset using **`yield_tokens`**. The tokenized sentences are then passed to  **```build_vocab_from_iterator```** which yields the token indices.


In [None]:
vocabulary = build_vocab_from_iterator(yield_tokens(train_iter), specials=["<unk>"])
vocabulary.set_default_index(vocabulary["<unk>"])

The `vocabulary` object contains all the tokens in our vocabulary along with their numeric representations. For instance, the following prints the first 5 tokens in our vocabulary along with their token indices:


In [None]:
for key in list(vocabulary.vocab.get_stoi())[:5]:
    print(key +': ' + str(vocabulary.vocab.get_stoi()[key]))

Note that the vocabulary includes some non-obvious words, like "zzz" or the common Polish first name "Zygmunt" lower-cased. We can get specific token indices by passing in a list of tokens:


In [None]:
vocabulary(["zzz","zygmunt"])

Although the above exercise showed how you can construct a vocabulary of tokens and generate associated token indices from a corpus of text, in what follows we will use GloVe word embeddings, which necessitate the use of the vocabulary used to train GloVe. As such, the vocabulary we will actually use is the following:


In [None]:
# Note that GloVe embeddings are typically downloaded using:
#glove_embedding = GloVe(name="6B", dim=100
# However, the GloVe server is typically down. The class below offers a workaround


class GloVe_override(Vectors):
    url = {
        "6B": "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/tQdezXocAJMBMPfUJx_iUg/glove-6B.zip",
    }

    def __init__(self, name="6B", dim=100, **kwargs) -> None:
        url = self.url[name]
        name = "glove.{}.{}d.txt".format(name, str(dim))
        super(GloVe_override, self).__init__(name, url=url, **kwargs)

glove_embedding = GloVe_override(name="6B", dim=100)


In [None]:
# Setup GloVe vocabulary object
from torchtext.vocab import vocab
vocabulary = vocab(glove_embedding .stoi, 0,specials=('<unk>', '<pad>'))
vocabulary.set_default_index(vocabulary["<unk>"])

Print the first 5 tokens in the GloVe vocabulary along with their token indices:


In [None]:
for key in list(vocabulary.vocab.get_stoi())[:5]:
    print(key +': ' + str(vocabulary.vocab.get_stoi()[key]))

Get specific indices from the GloVe vocabulary by passing a list of tokens:


In [None]:
vocabulary(["zzz","zygmunt"])

### Split the dataset


We can convert the dataset into map-style datasets and then perform a random split to create separate training and validation datasets. The training dataset will contain 95% of the samples, while the validation dataset will contain the remaining 5%. These datasets can be used for training and evaluating a machine learning model for text classification on the `AG_NEWS` dataset.


In [None]:
# Split the dataset into training and testing iterators.
train_iter, test_iter = AG_NEWS()

# Convert the training and testing iterators to map-style datasets.
train_dataset = to_map_style_dataset(train_iter)
test_dataset = to_map_style_dataset(test_iter)

# Determine the number of samples to be used for training and validation (5% for validation).
num_train = int(len(train_dataset) * 0.95)

# Randomly split the training dataset into training and validation datasets using `random_split`.
# The training dataset will contain 95% of the samples, and the validation dataset will contain the remaining 5%.
split_train_, split_valid_ = random_split(train_dataset, [num_train, len(train_dataset) - num_train])

### Check if a compatible GPU is available


The following code checks if a CUDA-compatible GPU is available in the system using PyTorch, a popular deep learning framework. If a GPU is available, it assigns the device variable to "cuda" (which stands for CUDA, the parallel computing platform and application programming interface model developed by NVIDIA). If a GPU is not available, it assigns the device variable to "cpu" (which means the code will run on the CPU instead).


In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device

### Data Loader


In this section we will define the data loader.


We will begin by defining two functions for text processing pipelines. The **`text_pipeline`** function tokenizes the input text and subsequently produces the token indices from **`vocabulary`**. The **`label_pipeline`** function merely guarantees that the labels commence at zero (it's worth noting that the raw labels in the **`AG_NEWS`** dataset start at 1). These text and label pipelines will be used to process the raw data strings derived from the dataset iterators.


In [None]:
def text_pipeline(x):
  return vocabulary(tokenizer(x))

def label_pipeline(x):
   return int(x) - 1

In PyTorch, the **`collate_fn`** function is used in conjunction with data loaders to customize the way batches are created from individual samples. The provided code defines a `collate_batch` function in PyTorch, which is used with data loaders to customize batch creation from individual samples. It processes a batch of data, including labels and text sequences. It applies the `label_pipeline` and `text_pipeline` functions to preprocess the labels and texts, respectively. The processed data is then converted into PyTorch tensors and returned as a tuple containing the label tensor, text tensor, and offsets tensor representing the starting positions of each text sequence in the combined tensor. The function also ensures that the returned tensors are moved to the specified device (e.g., GPU) for efficient computation.


In [None]:
from torch.nn.utils.rnn import pad_sequence

def collate_batch(batch):
    label_list, text_list = [], []
    for _label, _text in batch:
        label_list.append(label_pipeline(_label))
        text_list.append(torch.tensor(text_pipeline(_text), dtype=torch.int64))


    label_list = torch.tensor(label_list, dtype=torch.int64)
    text_list = pad_sequence(text_list, batch_first=True)


    return label_list.to(device), text_list.to(device)

We convert the dataset objects to a data loader by applying the collate function.


In [None]:
BATCH_SIZE = 64

train_dataloader = DataLoader(
    split_train_, batch_size=BATCH_SIZE, shuffle=True, collate_fn=collate_batch
)
valid_dataloader = DataLoader(
    split_valid_, batch_size=BATCH_SIZE, shuffle=True, collate_fn=collate_batch
)
test_dataloader = DataLoader(
    test_dataset, batch_size=BATCH_SIZE, shuffle=True, collate_fn=collate_batch
)

We can see the output sequence when we have the label, text, and offsets for each batch.


In [None]:
label,seqence=next(iter(valid_dataloader ))


In [None]:
embedding = nn.Embedding.from_pretrained(glove_embedding.vectors,freeze=False).to(device)

In [None]:
word_embedding=embedding(seqence)

In [None]:
word_embedding.mean(dim=1).shape

### Neural Network


The following defines a neural network for text classification using an `Embedding` layer. Additionally, we have initialized the model using a specific method. When constructing the model, we utilize embedding layers loaded with pre-trained GloVe Word Vectors. The primary input parameters for the model include the number of target classes and the option to freeze the training layers.


In [None]:
from torch import nn

class TextClassifier(nn.Module):
    def __init__(self, num_classes,freeze=False):
        super(TextClassifier, self).__init__()
        self.embedding = nn.Embedding.from_pretrained(glove_embedding.vectors,freeze=freeze)
        # An example of adding additional layer: a linear layer and a ReLU activation
        self.fc1 = nn.Linear(in_features=100, out_features=128)
        self.relu1 = nn.ReLU()
        # The output layer that gives the final probabilities for the classes
        self.fc2 = nn.Linear(in_features=128, out_features=num_classes)

    def forward(self, x):
        # Pass the input through the embedding layer
        x = self.embedding(x)
        # Here we use a simple mean pooling

        x = torch.mean(x, dim=1)
        # Pass the pooled embeddings through the additional layers
        x = self.fc1(x)
        x = self.relu1(x)
        return self.fc2(x)

###  Not freezing pre-trained weights

We begin our analysis by unfreezing the pre-trained weights in the embedding layer:


In [None]:
model=TextClassifier(num_classes=4,freeze=False)
model.to(device)

We can obtain predicted labels from `model` using `predicted_label=model(text, offsets)` given input text and its corresponding offsets:


In [None]:
model.eval()
predicted_label=model(seqence)

In [None]:
predicted_label.shape

We verify the output shape of our model. In this case, the model is trained with a mini-batch size of 64 samples. The output layer of the model produces 4 logits for each neuron, corresponding to the four classes in the classification task. We can also create a function to find the accuracy given a dataset.


In [None]:
def predict(text, model, text_pipeline):
    with torch.no_grad():
        text = torch.unsqueeze(torch.tensor(text_pipeline(text)),0).to(device)

        output = model(text)
        return ag_news_label[output.argmax(1).item() + 1]

In [None]:
predict("I like sports and stuff", model, text_pipeline)

In [None]:
def evaluate(dataloader, model, device):
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for label, text in dataloader:
            label, text = label.to(device), text.to(device)
            outputs = model(text)
            _, predicted = torch.max(outputs.data, 1)
            total += label.size(0)
            correct += (predicted == label).sum().item()
    accuracy = 100 * correct / total
    return accuracy

The following evaluates the performance of the model. Note that, as of now, the model has no predictive power. This outcome is expected, considering that the model has not undergone any training yet.


In [None]:
evaluate(test_dataloader, model, device)

Let's now train our model.


In [None]:
def train_model(model, optimizer, criterion, train_dataloader, valid_dataloader, epochs=100, model_name="my_modeldrop"):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print("device: ",device)
    model = model.to(device)
    
    cum_loss_list = []
    acc_epoch = []
    best_acc = 0
    file_name = model_name

    for epoch in tqdm(range(1, epochs + 1)):
        model.train()
        cum_loss = 0
        for _, (label, text) in enumerate(train_dataloader):
            label, text = label.to(device), text.to(device)
            
            optimizer.zero_grad()
            predicted_label = model(text)
            loss = criterion(predicted_label, label)
            loss.backward()
            torch.nn.utils.clip_grad_norm_(model.parameters(), 0.1)
            optimizer.step()
            cum_loss += loss.item()
        print("Loss:", cum_loss)
        
        cum_loss_list.append(cum_loss)
        acc_val = evaluate(valid_dataloader, model, device)
        acc_epoch.append(acc_val)
        
        if acc_val > best_acc:
            best_acc = acc_val
            print(f"New best accuracy: {acc_val:.4f}")
            #torch.save(model.state_dict(), f"{model_name}.pth")

    #save_list_to_file(cum_loss_list, f"{model_name}_loss.pkl")
    #save_list_to_file(acc_epoch, f"{model_name}_acc.pkl")


---


In [None]:
LR=1

criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=LR)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 1.0, gamma=0.1)


For your convenience, we’ve already trained the model for 100 epochs and saved it. This process typically takes around 25 minutes using a V100 GPU. To give you a taste of the training process, we’ve configured the model to train for just 1 epoch below; on cpu this takes around 8 minutes. However, please note that our conclusions about the model’s performance will be based on the previously saved 100-epoch model, not this brief 1-epoch run:


In [None]:
model_name="my_model_freeze_false"
train_model(model, 
            optimizer, 
            criterion, 
            train_dataloader, 
            valid_dataloader, 
            epochs=1, 
            model_name=model_name)


We have the capability to upload the trained model along with comprehensive data on cumulative loss and average accuracy at each epoch.


We can plot the cost and accuracy for each epoch. We see that with just 100 epochs, we achieve an accuracy of over 80% on the validation data. You can increase the number of epochs to observe further improvement.


In [None]:
from urllib.request import urlopen

cum_loss_list=pickle.load(urlopen('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/Zc8G0LmpcVbwkhG68GYxVg/my-model-freeze-false-loss.pkl'))
acc_epoch=pickle.load(urlopen('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/DC_i68jjqbGxOmnJzQfTiw/my-model-freeze-false-acc.pkl'))

plot(cum_loss_list,acc_epoch)

The following loads the fine-tuned model:


In [None]:
import io

model=TextClassifier(num_classes=4,freeze=False)
model.to(device)

urlopened = urlopen('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/uGC04Pom651hQs1XrZ0NsQ/my-model-freeze-false.pth')
stream = io.BytesIO(urlopened.read())  # implements seek()
state_dict = torch.load(stream, map_location=device)
model.load_state_dict(state_dict)

Let's evaluate the model on the test dataset


In [None]:
evaluate(test_dataloader , model, device)

The model with unfrozen embedding weights achieves an accuracy of above 80% on the test data.


### Freezing pre-trained weights
The following model freezes the pre-trained weights, focusing training efforts exclusively on the layers situated above the embeddings. This approach leverages the stability of pre-existing knowledge while adapting the higher layers to our specific dataset


In [None]:
model_t=TextClassifier(num_classes=4,freeze=True)
model_t.to(device)

In [None]:
LR=1
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model_t.parameters(), lr=LR)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 1.0, gamma=0.1)

The following trains the model. Note that because the embedding layer is frozen, the training takes about 15 minutes to run 100 epochs on a V100 GPU. This is significantly shorter than the 25 minutes it takes to train the model with the embedding layer unfrozen. In order to give you a taste of how the training works, the following trains the model on just 1 epoch:


In [None]:
model_name="my_model_freeze_true"
train_model(model_t, 
            optimizer, 
            criterion, 
            train_dataloader, 
            valid_dataloader, 
            epochs=1, 
            model_name=model_name)

 We can plot the cost and accuracy for each epoch:


In [None]:
cum_loss_list=pickle.load(urlopen('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/7pehjCrIU7uW9eviumecqQ/my-model-freeze-true-loss.pkl'))
acc_epoch=pickle.load(urlopen('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/iaJp4DYtS9unbwR6zvBjNA/my-model-freeze-true-acc.pkl'))

plot(cum_loss_list,acc_epoch)

The following loads the fine-tuned model:


In [None]:
model_t=TextClassifier(num_classes=4,freeze=True)
model_t.to(device)

urlopened = urlopen('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/pYe-aNUEhAWGmKXcOBxTFw/my-model-freeze-true.pth')
stream = io.BytesIO(urlopened.read())  # implements seek()
state_dict = torch.load(stream, map_location=device)
model_t.load_state_dict(state_dict)

And the following evaluates the model's performance on test data:


In [None]:
evaluate(test_dataloader , model_t, device)

As you can seem the model with embeddings frozen performs significantly worse, achieving an accuracy of between 50% and 60% after 100 epochs.

This result is somewhat predictable given the fact that we have only one hidden layer with a ReLU activation. Perhaps the model was too simple, and there weren't enough parameters to tune? We can test this hypothesis by training a substantially more complicated model. We do this in the next section.


### Freezing pre-trained weights with a complicated model


The following describes a neural network that has significantly more layers and weights compared to the `TextClassifier` model. The selection of the number of hidden layers and the quantity of features within these layers was determined through experimentation. The goal was to ensure that the training duration of this model would be approximately the same as the original model (with its embedding layer unfrozen), which is around 25 minutes.


In [None]:
class TextClassifier2(nn.Module):
    def __init__(self, num_classes,freeze=False):
        super(TextClassifier2, self).__init__()
        self.embedding = nn.Embedding.from_pretrained(glove_embedding.vectors,freeze=freeze)
        # An example of adding additional layers: linear layers and ReLU activations
        self.fc1 = nn.Linear(in_features=100, out_features=4096)
        self.relu1 = nn.ReLU()
        self.fc2 = nn.Linear(in_features=4096, out_features=4096)
        self.relu2 = nn.ReLU()
        self.fc3 = nn.Linear(in_features=4096, out_features=4096)
        self.relu3 = nn.ReLU()
        self.fc4 = nn.Linear(in_features=4096, out_features=4096)
        self.relu4 = nn.ReLU()
        self.fc5 = nn.Linear(in_features=4096, out_features=4096)
        self.relu5 = nn.ReLU()
        self.fc6 = nn.Linear(in_features=4096, out_features=4096)
        self.relu6 = nn.ReLU()
        # The output layer that gives the final probabilities for the classes
        self.fc7 = nn.Linear(in_features=4096, out_features=num_classes)

    def forward(self, x):
        # Pass the input through the embedding layer
        x = self.embedding(x)
        # Here we use a simple mean pooling
        x = torch.mean(x, dim=1)
        # Pass the pooled embeddings through the additional layers
        x = self.fc1(x)
        x = self.relu1(x)
        x = self.fc2(x)
        x = self.relu2(x)
        x = self.fc3(x)
        x = self.relu3(x)
        x = self.fc4(x)
        x = self.relu4(x)
        x = self.fc5(x)
        x = self.relu5(x)
        x = self.fc6(x)
        x = self.relu6(x)
        return self.fc7(x)

In [None]:
model_t2=TextClassifier2(num_classes=4,freeze=True)
model_t2.to(device)

In [None]:
LR=1
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model_t2.parameters(), lr=LR)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 1.0, gamma=0.1)

The following trains the model, which we will not do here because it would take a long time to run on CPU:


In [None]:
model_name="my_model_freeze_true2"
'''
train_model(model_t2, 
            optimizer, 
            criterion, 
            train_dataloader, 
            valid_dataloader, 
            epochs=1, 
            model_name=model_name)
'''

In [None]:
cum_loss_list=pickle.load(urlopen('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/cQ_HdoQ_cMPj7xstATwKBA/my-model-freeze-true2-loss.pkl'))
acc_epoch=pickle.load(urlopen('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/6ikmqnCk3OVVF3wutoBZwA/my-model-freeze-true2-acc.pkl'))

plot(cum_loss_list,acc_epoch)

The following loads the fine-tuned model:


In [None]:
model_t2=TextClassifier2(num_classes=4,freeze=True)
model_t2.to(device)

urlopened = urlopen('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/8G4X025HlxCkHruVEesasg/my-model-freeze-true2.pth')
stream = io.BytesIO(urlopened.read())  # implements seek()
state_dict = torch.load(stream, map_location=device)
model_t2.load_state_dict(state_dict)

Let's evaluate the model's performance on the test data:


In [None]:
evaluate(test_dataloader , model_t2, device)

In [None]:
article="""Canada navigated a stiff test against the Republic of Ireland on a rain soaked evening in Perth, coming from behind to claim a vital 2-1 victory at the Women’s World Cup.
Katie McCabe opened the scoring with an incredible Olimpico goal – scoring straight from a corner kick – as her corner flew straight over the despairing Canada goalkeeper Kailen Sheridan at Perth Rectangular Stadium in Australia.
Just when Ireland thought it had safely navigated itself to half time with a lead, Megan Connolly failed to get a clean connection on a clearance with the resulting contact squirming into her own net to level the score.
Minutes into the second half, Adriana Leon completed the turnaround for the Olympic champion, slotting home from the edge of the area to seal the three points."""

In [None]:
result = predict(article, model, text_pipeline)

markdown_content = f'''
<div style="background-color: lightgray; padding: 10px;">
    <h3>{article}</h3>
    <h4>The category of the news article: {result}</h4>
</div>
'''

md(markdown_content)