## Section 1: Fundamentals in RNNs

You need to **manually** implement a multi-timestep Recurrent Neural Network that can take an input as a 3D tensor `[batch_size, seq_len, input_size]` for a classification task.

<div style="text-align: right"><font color="red">[Total: 10 marks]</font></div>

In [None]:
import torch

We declare the relevant variables.

In [None]:
input_size = 5
seq_len = 4
batch_size = 8
hidden_size = 3
num_classes = 3

We create random inputs (i.e., `inputs`) and random labels (i.e., `random_labels`).

In [None]:
inputs = torch.randn(batch_size, seq_len, input_size)
random_labels = torch.randint(0, num_classes, (batch_size,))
print(inputs.shape)
print(random_labels)

torch.Size([8, 4, 5])
tensor([1, 1, 1, 2, 2, 1, 0, 0])


(1) In what follows, we need to declare the model parameters, which include the matrices $U$ (``[input_size, hidden_size]``), W (``[hidden_size, hidden_size]``), $V$ (``[hidden_size, num_classes]``) and the biases $b$ and $c$ for the hidden states and logits respectively.

<div style="text-align: right"><font color="red">[2 marks]</font></div>

In [None]:
# Initialize model parameters
U = torch.randn(input_size, hidden_size, requires_grad=True)  # Input to hidden
W = torch.randn(hidden_size, hidden_size, requires_grad=True)  # Hidden to hidden
V = torch.randn(hidden_size, num_classes, requires_grad=True)  # Hidden to output
b = torch.zeros(hidden_size, requires_grad=True)  # Bias for hidden state
c = torch.zeros(num_classes, requires_grad=True)  # Bias for logits

(2) Next you need to write the code to compute `hiddens` which is a 3D tensor of the shape ``[batch_size, seq_len, hidden_size]`` using the formula of the simple/standard RNN cells. You can freely modify the code below.

<div style="text-align: right"><font color="red">[2 marks]</font></div>

In [None]:
#Initialize hiddens for update.
hiddens = torch.zeros(batch_size, seq_len, hidden_size)

#Insert your code here
h_t = torch.zeros(batch_size, hidden_size)
# Loop through each time step in the sequence
for t in range(seq_len):
    x_t = inputs[:, t, :]  # Input at time step t
    h_t = torch.tanh(x_t @ U + h_t @ W + b)  # Compute next hidden state
    hiddens[:, t, :] = h_t  # Store hidden state for this time step
print(hiddens)

tensor([[[ 0.8084,  0.9821,  0.8073],
         [ 0.9938, -0.9555,  0.7075],
         [ 0.4055, -0.9929,  0.9969],
         [ 0.9945, -0.9910,  0.8921]],

        [[ 0.0532,  0.9440,  0.2912],
         [-0.4986,  0.9506, -0.7748],
         [-0.7442, -0.9991, -0.9967],
         [-0.9792, -0.5459, -0.5721]],

        [[-0.8725,  0.3214, -0.7466],
         [-0.8302, -0.9683,  0.3266],
         [-0.5712,  0.8078,  0.8800],
         [ 0.9999, -0.9469,  0.9990]],

        [[ 1.0000, -0.1361,  0.9989],
         [ 1.0000, -0.2740,  0.9986],
         [-0.9996,  0.8255, -0.9945],
         [-0.4371, -1.0000,  0.9985]],

        [[ 0.9578, -0.3522,  0.4308],
         [-0.5824,  0.0759, -0.9854],
         [-0.9950, -0.3336, -0.9489],
         [-0.5506,  0.9973, -0.9959]],

        [[ 0.9999,  0.9941,  0.9936],
         [-0.9996, -0.5347, -0.9847],
         [ 0.9979, -0.8326,  0.8821],
         [ 0.9998,  0.9408,  0.8707]],

        [[-0.9778, -0.3619, -0.5486],
         [-0.9975, -0.9909, -0.9328],


(3) In what follows, you need to write the code to compute the logits based on the last hidden state (``[batch_size, hidden_size]``) of hiddens.

<div style="text-align: right"><font color="red">[1 mark]</font></div>

In [None]:
logits = hiddens[:, -1, :] @ V + c  # Using the last time step's hidden state
print(logits)

tensor([[-2.4187,  1.3505, -1.8277],
        [ 1.0086, -1.3806, -0.2787],
        [-2.6241,  1.4043, -1.6980],
        [-2.5619, -0.2007, -0.4485],
        [ 2.6146, -0.8975,  1.3598],
        [-1.4530,  1.5681,  1.0422],
        [-0.4685, -0.2738, -1.2464],
        [ 1.7942, -1.4959, -0.0793]], grad_fn=<AddBackward0>)


(4) Write the code to compute the cross-entropy loss by comparing the logits to the labels. You can use PyTorch's built-in loss function.

<div style="text-align: right"><font color="red">[1 mark]</font></div>

In [None]:
#Insert your code here
# Compute cross-entropy loss
criterion = torch.nn.CrossEntropyLoss()
loss = criterion(logits, random_labels)
print(loss)

tensor(0.8605, grad_fn=<NllLossBackward0>)


(5) Next, you need to do back-propagation to compute the gradients of the loss w.r.t. the model parameters. You can use PyTorch's built-in method to compute the gradients.

<div style="text-align: right"><font color="red">[2 marks]</font></div>

In [None]:
#Insert your code here
loss.backward()

(6) Finally, let assume that the learning rate $\eta = 0.1$, you need to write the code to **manually** update the new model parameters using the SGD manner.

<div style="text-align: right"><font color="red">[2 marks]</font></div>

In [None]:
#Insert your code here
# Learning rate
eta = 0.1

# Update parameters manually using gradients
with torch.no_grad():
    U -= eta * U.grad
    W -= eta * W.grad
    V -= eta * V.grad
    b -= eta * b.grad
    c -= eta * c.grad

    # Zero the gradients after updating
    U.grad.zero_()
    W.grad.zero_()
    V.grad.zero_()
    b.grad.zero_()
    c.grad.zero_()


## Section 2: Deep Learning for Sequential Data

### <font color="#0b486b">Set random seeds</font>

We start with importing PyTorch and NumPy and setting random seeds for PyTorch and NumPy. You can use any seeds you prefer.

In [None]:
import os
import torch
import random
import requests
import pandas as pd
import numpy as np
import torch.nn as nn
from torch.utils.data import DataLoader
from torch.nn.utils.rnn import pad_sequence
from transformers import BertTokenizer
import os
from six.moves.urllib.request import urlretrieve
from sklearn import preprocessing
import matplotlib.pyplot as plt
plt.style.use('ggplot')

In [None]:
def seed_all(seed=1029):
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)  # if you are using multi-GPU.
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True
seed_all(seed=1234)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

## <font color="#0b486b">Download and preprocess the data</font>

<div style="text-align: right"><font color="red; font-weight:bold"><span></div>

The dataset we use for this assignment is a question classification dataset for which the training set consists of $5,500$ questions belonging to 6 coarse question categories including:
- abbreviation (ABBR),
- entity (ENTY),
- description (DESC),
- human (HUM),
- location (LOC) and
- numeric (NUM).

In this assignment, we will utilize a subset of this dataset, containing $2,000$ questions for training and validation. We will use 80% of those 2000 questions for trainning and the rest for validation.


Preprocessing data is a crucial initial step in any machine learning or deep learning project. The *TextDataManager* class simplifies the process by providing functionalities to download and preprocess data specifically designed for the subsequent questions in this assignment. It is highly recommended to gain a comprehensive understanding of the class's functionality by **carefully reading** the content provided in the *TextDataManager* class before proceeding to answer the questions.

In [None]:
class DataManager:
    """
    This class manages and preprocesses a simple text dataset for a sentence classification task.

    Attributes:
        verbose (bool): Controls verbosity for printing information during data processing.
        max_sentence_len (int): The maximum length of a sentence in the dataset.
        str_questions (list): A list to store the string representations of the questions in the dataset.
        str_labels (list): A list to store the string representations of the labels in the dataset.
        numeral_labels (list): A list to store the numerical representations of the labels in the dataset.
        numeral_data (list): A list to store the numerical representations of the questions in the dataset.
        random_state (int): Seed value for random number generation to ensure reproducibility.
            Set this value to a specific integer to reproduce the same random sequence every time. Defaults to 6789.
        random (np.random.RandomState): Random number generator object initialized with the given random_state.
            It is used for various random operations in the class.

    Methods:
        maybe_download(dir_name, file_name, url, verbose=True):
            Downloads a file from a given URL if it does not exist in the specified directory.
            The directory and file are created if they do not exist.

        read_data(dir_name, file_names):
            Reads data from files in a directory, preprocesses it, and computes the maximum sentence length.
            Each file is expected to contain rows in the format "<label>:<question>".
            The labels and questions are stored as string representations.

        manipulate_data():
            Performs data manipulation by tokenizing, numericalizing, and padding the text data.
            The questions are tokenized and converted into numerical sequences using a tokenizer.
            The sequences are padded or truncated to the maximum sequence length.

        train_valid_test_split(train_ratio=0.9):
            Splits the data into training, validation, and test sets based on a given ratio.
            The data is randomly shuffled, and the specified ratio is used to determine the size of the training set.
            The string questions, numerical data, and numerical labels are split accordingly.
            TensorFlow `Dataset` objects are created for the training and validation sets.


    """

    def __init__(self, verbose=True, random_state=6789):
        self.verbose = verbose
        self.max_sentence_len = 0
        self.str_questions = list()
        self.str_labels = list()
        self.numeral_labels = list()
        self.maxlen = None
        self.numeral_data = list()
        self.random_state = random_state
        self.random = np.random.RandomState(random_state)

    @staticmethod
    def maybe_download(dir_name, file_name, url, verbose=True):
        if not os.path.exists(dir_name):
            os.mkdir(dir_name)
        if not os.path.exists(os.path.join(dir_name, file_name)):
            urlretrieve(url + file_name, os.path.join(dir_name, file_name))
        if verbose:
            print("Downloaded successfully {}".format(file_name))

    def read_data(self, dir_name, file_names):
        self.str_questions = list()
        self.str_labels = list()
        for file_name in file_names:
            file_path= os.path.join(dir_name, file_name)
            with open(file_path, "r", encoding="latin-1") as f:
                for row in f:
                    row_str = row.split(":")
                    label, question = row_str[0], row_str[1]
                    question = question.lower()
                    self.str_labels.append(label)
                    self.str_questions.append(question[0:-1])
                    if self.max_sentence_len < len(self.str_questions[-1]):
                        self.max_sentence_len = len(self.str_questions[-1])

        # turns labels into numbers
        le = preprocessing.LabelEncoder()
        le.fit(self.str_labels)
        self.numeral_labels = np.array(le.transform(self.str_labels))
        self.str_classes = le.classes_
        self.num_classes = len(self.str_classes)
        if self.verbose:
            print("\nSample questions and corresponding labels... \n")
            print(self.str_questions[0:5])
            print(self.str_labels[0:5])

    def manipulate_data(self):
        self.tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
        vocab = self.tokenizer.get_vocab()
        self.word2idx = {w: i for i, w in enumerate(vocab)}
        self.idx2word = {i:w for w,i in self.word2idx.items()}
        self.vocab_size = len(self.word2idx)

        token_ids = []
        num_seqs = []
        for text in self.str_questions:  # iterate over the list of text
          text_seqs = self.tokenizer.tokenize(str(text))  # tokenize each text individually
          # Convert tokens to IDs
          token_ids = self.tokenizer.convert_tokens_to_ids(text_seqs)
          # Convert token IDs to a tensor of indices using your word2idx mapping
          seq_tensor = torch.LongTensor(token_ids)
          num_seqs.append(seq_tensor)  # append the tensor for each sequence

        # Pad the sequences and create a tensor
        if num_seqs:
          self.numeral_data = pad_sequence(num_seqs, batch_first=True)  # Pads to max length of the sequences
          self.num_sentences, self.maxlen = self.numeral_data.shape

    def train_valid_test_split(self, train_ratio=0.8, test_ratio = 0.1):
        train_size = int(self.num_sentences*train_ratio) +1
        test_size = int(self.num_sentences*test_ratio) +1
        valid_size = self.num_sentences - (train_size + test_size)
        data_indices = list(range(self.num_sentences))
        random.shuffle(data_indices)
        self.train_str_questions = [self.str_questions[i] for i in data_indices[:train_size]]
        self.train_numeral_labels = self.numeral_labels[data_indices[:train_size]]
        train_set_data = self.numeral_data[data_indices[:train_size]]
        train_set_labels = self.numeral_labels[data_indices[:train_size]]
        train_set_labels = torch.from_numpy(train_set_labels)
        train_set = torch.utils.data.TensorDataset(train_set_data, train_set_labels)
        self.test_str_questions = [self.str_questions[i] for i in data_indices[-test_size:]]
        self.test_numeral_labels = self.numeral_labels[data_indices[-test_size:]]
        test_set_data = self.numeral_data[data_indices[-test_size:]]
        test_set_labels = self.numeral_labels[data_indices[-test_size:]]
        test_set_labels = torch.from_numpy(test_set_labels)
        test_set = torch.utils.data.TensorDataset(test_set_data, test_set_labels)
        self.valid_str_questions = [self.str_questions[i] for i in data_indices[train_size:-test_size]]
        self.valid_numeral_labels = self.numeral_labels[data_indices[train_size:-test_size]]
        valid_set_data = self.numeral_data[data_indices[train_size:-test_size]]
        valid_set_labels = self.numeral_labels[data_indices[train_size:-test_size]]
        valid_set_labels = torch.from_numpy(valid_set_labels)
        valid_set = torch.utils.data.TensorDataset(valid_set_data, valid_set_labels)
        self.train_loader = DataLoader(train_set, batch_size=64, shuffle=True)
        self.test_loader = DataLoader(test_set, batch_size=64, shuffle=False)
        self.valid_loader = DataLoader(valid_set, batch_size=64, shuffle=False)

In [None]:
print('Loading data...')
DataManager.maybe_download("data", "train_2000.label", "http://cogcomp.org/Data/QA/QC/")

dm = DataManager()
dm.read_data("data/", ["train_2000.label"])

Loading data...
Downloaded successfully train_2000.label

Sample questions and corresponding labels... 

['manner how did serfdom develop in and then leave russia ?', 'cremat what films featured the character popeye doyle ?', "manner how can i find a list of celebrities ' real names ?", 'animal what fowl grabs the spotlight after the chinese year of the monkey ?', 'exp what is the full form of .com ?']
['DESC', 'ENTY', 'DESC', 'ENTY', 'ABBR']


In [None]:
dm.manipulate_data()
dm.train_valid_test_split(train_ratio=0.8, test_ratio = 0.1)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]



In [None]:
for x, y in dm.train_loader:
    print(x.shape, y.shape)
    break

torch.Size([64, 36]) torch.Size([64])


## <font color="#0b486b">Part 1: Using Word2Vect to transform texts to vectors </font>

<div style="text-align: right"><font color="red; font-weight:bold">[Total marks for this part: 10 marks]<span></div>

In [None]:
import gensim.downloader as api
from gensim.models import Word2Vec
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import accuracy_score
import numpy as np

#### <font color="red">**Question 1.1**</font>
**Write code to download the pretrained model *glove-wiki-gigaword-100*. Note that this model transforms a word in its dictionary to a $100$ dimensional vector.**

**Write code for the function *get_word_vector(word, model)* used to transform a word to a vector using the pretrained Word2Vect model *model*. Note that for a word not in the vocabulary of our *word2vect*, you need to return a vector $0$ with 100 dimensions.**

<div style="text-align: right"><font color="red">[2 marks]</font></div>

In [None]:
word2vect = api.load("glove-wiki-gigaword-100")



In [None]:
def get_word_vector(word, model):
    try:
        vector = model[word]#Insert your code
    except KeyError:
        vector = torch.zeros(100)#Insert your code
    return vector

#### <font color="red">**Question 1.2**</font>

**Write the code for the function `get_sentence_vector(sentence, important_score=None, model= None)`. Note that this function will transform a sentence to a 100-dimensional vector using the pretrained model *model*. In addition, the list *important_score* which has the same length as the *sentence* specifies the important scores of the words in the sentence. In your code, you first need to apply *softmax* function over *important_score* to obtain the important weight *important_weight* which forms a probability over the words of the sentence. Furthermore, the final vector of the sentence will be weighted sum of the individual vectors for words and the weights in *important_weight*.**
- $important\_weight = softmax(important\_score)$.
- $final\_vector= important\_weight[1]\times v[1] + important\_weight[2]\times v[2] + ...+ important\_weight[T]\times v[T]$ where $T$ is the length of the sentence and $v[i]$ is the vector representation of the $i-th$  word in this sentence.

**Note that if `important_score=None` is set by default, your function should return the average of all representation vectors corresponding to set `important_score=[1,1,...,1]`.**

<div style="text-align: right"><font color="red">[2 marks]</font></div>

In [None]:
from scipy.special import softmax
import numpy as np

def get_sentence_vector(sentence, important_score=None, model=None):
    # Convert words to vectors
    word_vectors = np.array([get_word_vector(word, model) for word in sentence])

    # If no important_score is provided, assign uniform weights
    if important_score is None:
        important_score = np.ones(len(sentence))

    # Apply softmax to the importance scores to get importance weights
    important_weight = softmax(important_score)

    # Compute the weighted sum of the word vectors
    sentence_vector = np.dot(important_weight, word_vectors)

    return sentence_vector


#### <font color="red">**Question 1.3**</font>

**Write code to transform questions in *dm.train_str_questions* and *dm.valid_str_questions* to feature vectors. Note that after running the following cells, you must have $X\_train$ and $X\_valid$ which are two NumPy arrays of the feature vectors and $y\_train$ and $y\_valid$ which are two arrays of numeric labels (Hint: *dm.train_numeral_labels* and *dm.valid_numeral_labels*). You can add more lines to the following cells if necessary. In addition, you should decide the *important_score* by yourself. For example, the 1st score is 1, the 2nd score is decayed by 0.9, the 3rd is decayed by 0.9, and so on.**

<div style="text-align: right"><font color="red">[2 marks]</font></div>

In [None]:
print("Transform training set to feature vectors...")
X_train = np.array([get_sentence_vector(sentence, model=word2vect) for sentence in dm.train_str_questions])#Insert your code
y_train = dm.train_numeral_labels

Transform training set to feature vectors...


In [None]:
print("Transform validation set to feature vectors...")
X_valid = np.array([get_sentence_vector(sentence, model=word2vect) for sentence in dm.valid_str_questions])#Insert your code
y_valid = dm.valid_numeral_labels

Transform validation set to feature vectors...


#### <font color="red">**Question 1.4**</font>

**It is now to use *MinMaxScaler(feature_range=(-1,1))* in scikit-learn to scale both training and validation sets to the range $(-1,1)$.**

<div style="text-align: right"><font color="red">[2 marks]</font></div>

In [None]:
#Insert your code
from sklearn.preprocessing import MinMaxScaler

# Initialize MinMaxScaler with feature range (-1,1)
scaler = MinMaxScaler(feature_range=(-1, 1))

# Scale the training and validation sets
X_train_scaled = scaler.fit_transform(X_train)
X_valid_scaled = scaler.transform(X_valid)

print(X_train_scaled.shape, X_valid_scaled.shape)

(1601, 100) (198, 100)


#### <font color="red">**Question 1.5**</font>
**Train a Logistic Regression model on the training set and then evaluate on the validation set.** You can use any classification metrics in `sklearn` for evaluation.
<div style="text-align: right"><font color="red">[2 marks]</font></div>

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn import metrics

# Initialize and train logistic regression model
logistic_model = LogisticRegression(max_iter=1000)
logistic_model.fit(X_train_scaled, y_train)

# Predict on validation set
y_pred = logistic_model.predict(X_valid_scaled)

# Evaluate the model using accuracy
accuracy = accuracy_score(y_valid, y_pred)
print(f'Validation Accuracy: {accuracy * 100:.2f}%')

# You can also print additional metrics
print(metrics.classification_report(y_valid, y_pred))


Validation Accuracy: 45.96%
              precision    recall  f1-score   support

           0       0.67      1.00      0.80         2
           1       0.47      0.42      0.44        36
           2       0.46      0.48      0.47        54
           3       0.46      0.54      0.50        52
           4       0.48      0.48      0.48        29
           5       0.35      0.24      0.29        25

    accuracy                           0.46       198
   macro avg       0.48      0.53      0.50       198
weighted avg       0.45      0.46      0.45       198



We now declare the `BaseTrainer` class, which will be used later to train the subsequent deep learning models for text data.

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

class BaseTrainer:
    def __init__(self, model, criterion, optimizer, train_loader, val_loader):
        self.model = model
        self.criterion = criterion  #the loss function
        self.optimizer = optimizer  #the optimizer
        self.train_loader = train_loader  #the train loader
        self.val_loader = val_loader  #the valid loader

    #the function to train the model in many epochs
    def fit(self, num_epochs):
        self.num_batches = len(self.train_loader)

        for epoch in range(num_epochs):
            print(f'Epoch {epoch + 1}/{num_epochs}')
            train_loss, train_accuracy = self.train_one_epoch()
            val_loss, val_accuracy = self.validate_one_epoch()
            print(
                f'{self.num_batches}/{self.num_batches} - train_loss: {train_loss:.4f} - train_accuracy: {train_accuracy*100:.4f}% \
                - val_loss: {val_loss:.4f} - val_accuracy: {val_accuracy*100:.4f}%')

    #train in one epoch, return the train_acc, train_loss
    def train_one_epoch(self):
        self.model.train()
        running_loss, correct, total = 0.0, 0, 0
        for i, data in enumerate(self.train_loader):
            inputs, labels = data
            inputs, labels = inputs.to(device), labels.to(device)
            self.optimizer.zero_grad()
            outputs = self.model(inputs)
            loss = self.criterion(outputs, labels)
            loss.backward()
            self.optimizer.step()

            running_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
        train_accuracy = correct / total
        train_loss = running_loss / self.num_batches
        return train_loss, train_accuracy

    #evaluate on a loader and return the loss and accuracy
    def evaluate(self, loader):
        self.model.eval()
        loss, correct, total = 0.0, 0, 0
        with torch.no_grad():
            for data in loader:
                inputs, labels = data
                inputs, labels = inputs.to(device), labels.to(device)
                outputs = self.model(inputs)
                loss = self.criterion(outputs, labels)
                loss += loss.item()
                _, predicted = torch.max(outputs.data, 1)
                total += labels.size(0)
                correct += (predicted == labels).sum().item()

        accuracy = correct / total
        loss = loss / len(self.val_loader)
        return loss, accuracy

    #return the val_acc, val_loss, be called at the end of each epoch
    def validate_one_epoch(self):
      val_loss, val_accuracy = self.evaluate(self.val_loader)
      return val_loss, val_accuracy

## <font color="#0b486b">Part 2: Text CNN for sequence modeling and neural embedding </font>

<div style="text-align: right"><font color="red; font-weight:bold">[Total marks for this part: 10 marks]<span></div>

**In what follows, you are required to complete the code for Text CNN for sentence classification. The paper of Text CNN can be found at this [link](https://www.aclweb.org/anthology/D14-1181.pdf). Here is the description of the Text CNN that you need to construct.**
- There are three attributes (properties or instance variables): *embed_size, state_size, data_manager*.
  - `embed_size`: the dimension of the vector space for which the words are embedded to using the embedding matrix.
  - `state_size`: the number of filters used in *Conv1D* (reference [here](https://pytorch.org/docs/stable/generated/torch.nn.Conv1d.html)).
  - `data_manager`: the data manager to store information of the dataset.
- The detail of the computational process is as follows:
  - Given input $x$, we embed $x$ using the embedding matrix to obtain an $3D$ tensor $[batch\_size, seq\_len, embed\_size]$ as $e$.
  - We feed $e$ to three *Conv1D* layers, each of which has $state\_size$ filters, activation= $relu$, and $kernel\_size= 3, 5, 7$ respectively to obtain $h1, h2, h3$. Note that each $h1, h2, h3$ is a 3D tensor with the shape $[batch\_size, state\_size, output\_size]$. Moreover, you need to apply *Conv1D* to the $seq\_len$ dimension.
  - We then apply *GlobalMaxPool1D()* (reference [here](https://pytorch.org/docs/stable/generated/torch.nn.functional.max_pool1d.html#torch.nn.functional.max_pool1d)) over $h1, h2, h3$ to obtain 2D tensors stored in $h1, h2, h3$ again.
  - We then concatenate three 2D tensors $h1, h2, h3$ to obtain $h$ with the shape $\left[batch\_size, 3\times state\_size\right]$. Note that you need to specify the axis to concatenate.
  - We finally build up one dense layer $\left[3\times state\_size, num\_classes\right]$  on the top of $h$ for classification.
  

In [None]:
import torch.nn as nn
import torch.nn.functional as F

#You can modify the code if you want but need to keep the skeleton
class TextCNN(torch.nn.Module):
    def __init__(self, embed_size= 128, state_size=16, data_manager=None):
        super().__init__()
        self.data_manager = data_manager
        self.embed_size = embed_size
        self.state_size = state_size
        #declare the necessary layers here
        self.embed = nn.Embedding(self.data_manager.vocab_size, self.embed_size)
        self.conv1d_1 = nn.Conv1d(in_channels=self.embed_size, out_channels=self.state_size, kernel_size=3)# Insert your code here
        self.conv1d_2 = nn.Conv1d(in_channels=self.embed_size, out_channels=self.state_size, kernel_size=3)# Insert your code here
        self.conv1d_3 = nn.Conv1d(in_channels=self.embed_size, out_channels=self.state_size, kernel_size=7)# Insert your code here
        self.fc = nn.Linear(state_size*3, self.data_manager.num_classes)

    def forward(self, x):
        e = self.embed(x)
        #permute x before applying Conv1D
        e= e.permute(0,2,1)

        #applying Conv1D
        h1 = F.relu(self.conv1d_1(e))# Insert your code here
        h2 = F.relu(self.conv1d_2(e))# Insert your code here
        h3 = F.relu(self.conv1d_3(e))# Insert your code here

        #apply GlobalMaxPool
        h1 = F.max_pool1d(h1, kernel_size=h1.shape[2]).squeeze(2)# Insert your code here
        h2 = F.max_pool1d(h2, kernel_size=h2.shape[2]).squeeze(2)# Insert your code here
        h3 = F.max_pool1d(h3, kernel_size=h3.shape[2]).squeeze(2)# Insert your code here

        h =  torch.cat([h1, h2, h3], dim=1)# Insert your code here
        h = self.fc(h)
        return h

We declare `text_cnn` and train on several epochs (e.g., `50 epochs`).


In [None]:
text_cnn = TextCNN(data_manager=dm).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(text_cnn.parameters(), lr=0.001)
trainer = BaseTrainer(model=text_cnn, criterion=criterion, optimizer=optimizer, train_loader=dm.train_loader, val_loader=dm.valid_loader)
trainer.fit(num_epochs=10)

Epoch 1/50
26/26 - train_loss: 1.4946 - train_accuracy: 41.0369%                 - val_loss: 0.6486 - val_accuracy: 66.1616%
Epoch 2/50
26/26 - train_loss: 0.9472 - train_accuracy: 79.0756%                 - val_loss: 0.3782 - val_accuracy: 81.8182%
Epoch 3/50
26/26 - train_loss: 0.5159 - train_accuracy: 87.9450%                 - val_loss: 0.2682 - val_accuracy: 89.3939%
Epoch 4/50
26/26 - train_loss: 0.2982 - train_accuracy: 95.0031%                 - val_loss: 0.1904 - val_accuracy: 92.9293%
Epoch 5/50
26/26 - train_loss: 0.1830 - train_accuracy: 97.3766%                 - val_loss: 0.1549 - val_accuracy: 94.9495%
Epoch 6/50
26/26 - train_loss: 0.1261 - train_accuracy: 99.0631%                 - val_loss: 0.1237 - val_accuracy: 94.4444%
Epoch 7/50
26/26 - train_loss: 0.0857 - train_accuracy: 99.3129%                 - val_loss: 0.0951 - val_accuracy: 95.9596%
Epoch 8/50
26/26 - train_loss: 0.0591 - train_accuracy: 99.8751%                 - val_loss: 0.0923 - val_accuracy: 96.4646%


We evaluate the trained model on the testing set.

In [None]:
test_loss, test_acc = trainer.evaluate(dm.test_loader)
print(f'test_loss: {test_loss:.4f} - test_accuracy: {test_acc*100:.4f}%')

test_loss: 0.1813 - test_accuracy: 93.5323%
