In [2]:
import numpy as np
import torch
import torch.utils.data
import scipy
import torch.nn.functional as F
import datasets
from sklearn.metrics import accuracy_score
from sklearn.feature_extraction.text import TfidfVectorizer
datasets.logging.set_verbosity_error()

  from .autonotebook import tqdm as notebook_tqdm


# Exercise: Shallow neural networks

In this exercise session, you will be defining and training a multiclass logistic regression model (a.k.a. softmax regression or multinomial logit) as a simple neural net in PyTorch. First, we will set up the model. Then, we will load the `tweet_eval` dataset and turn it into a Torch dataset (the type of data input that PyTorch can work with). Next, we will train our network on the data using a variant of gradient descent called Adam. Finally, we will evaluate the model's performance.

We will provide you with code snippets to help you get started. Where you see a `FILLINTHEBLANK` in the code, or a line that ends in a `=`, that is where you complete the code to make it functional.

Note: in order to have this notebook display the Figure in problem 5, you should also download the file `Net for DLI exercise.drawio.png` from Absalon and store it in the same folder as this notebook.

# 1. Define the PyTorch model

Finish the definition of this model.

1. In the `__init__` method of your model you should define all the layers you are going to use. PyTorch provides a large amount of commonly used layers that are very easy to use. Please refer to the [documentation of PyTorch](https://pytorch.org/docs/stable/nn.html) for a complete list of layers. Here, the `__init__` method should contain one layer that linearly combines the inputs, and then a softmax activation function, which will produce the outputs.
2. In the forward pass, the model should do the following:
 - compute the z values of the node (linear combinations of the inputs, using the [Linear](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) neural net building block). There should be as many outputs as classes here (since the probability of each class gets to be based on its own linear combination of the features)
 - pass them through the [Softmax](https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html) activation function with `dim=1` to normalize them (squeeze the probabilities of all output classes so that they sum to one)
 - return the outputs
3. Try initializing and inspecting the model as a toy_model instance. It should have 4 features and 3 classes.

In [None]:
# exercise template

class LogisticRegression(torch.nn.Module):

    # initializing the model with a certain number of input features
    # and output classes
    def __init__(self, n_features, n_classes):
        super().__init__()

        # creating a layer that applies a linear transformation
        # to the incoming data: z = wT * x
        self.linear = 

        # creating a softmax activation function
        self.activ = 
        
    # you have to define the forward() method which will specify the forward propagation:
    # how the input values get to the next layer(s)
    def forward(self, inputs):

        # compute the linear z-values for the layer
        z = self.linear(FILLINTEBLANK)

        # pass them through the softmax activation function
        outputs = self.activ(FILLINTEBLANK)

        return outputs

In [None]:
toy_model = LogisticRegression(FILLINTHEBLANK)
toy_model

# 2. Prepare the *Tweet_eval* dataset

This is a data set of tweets that are hand-coded as either negative, neutral or positive

## Load the data set

1. Have a look at the paper by Rosenthal et al. below. How were the tweets selected? How were they annotated?
2. Load the dataset (train and validation splits) from the huggingface library.
3. Have a look at the dataset on [huggingface's website](https://huggingface.co/datasets/tweet_eval/viewer/sentiment/train), to get a sense of what's in it. Also do a count of the number of training samples with each label, to see whether the data is balanced (all classes represented evenly) or unbalanced.

The sentiment task is described in this paper:

> Sara Rosenthal, Noura Farra, and Preslav Nakov. 2017. SemEval-2017 Task 4: Sentiment Analysis in Twitter. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 502–518, Vancouver, Canada. Association for Computational Linguistics. [https://aclanthology.org/S17-2088/](https://aclanthology.org/S17-2088/)



## Turn it to Torch tensors

1. Vectorize the tweet texts with the TfIDF vectorizer from sklearn
2. Convert this data to Torch tensors. Note that the output of the sklearn vectorizer is not numpy arrays but scipy matrices, which can be converted with `toarray()` method. Next, you can convert those to torch vectors using the torch[from_numpy](https://pytorch.org/docs/stable/generated/torch.from_numpy.html) method. Finally, you will need to convert the feature tensors to float type with `float()` method.

Labels are lists, and so can be converted to numpy arrays with `np.array(mylist)`. Then we also need to turn those into torch vectors. 

If your computer is struggling with the conversion, simply reduce the amount of training data to a slice (e.g. first 10K examples).

In [None]:
# skeleton code for step 2, converting the vectorizer data to Torch tensors

x_train = torch.from_numpy(FILLINTHEBLANK.toarray()).float()
y_train = torch.from_numpy(np.array(FILLINTHEBLANK))
x_test = torch.from_numpy(FILLINTHEBLANK.toarray()).float()
y_test = torch.from_numpy(np.array(FILLINTHEBLANK))

print(x_train.size(), x_test.size())
print(y_train.size(), y_test.size())

## Turn the data into Torch Datasets

We're still not done with the data preparation! The canonical way to handle data in PyTorch is with objects of the [Dataset](https://pytorch.org/docs/stable/data.html?highlight=dataloader#torch.utils.data.Dataset) class. More specifically, we are going to define our own subclass `TweetEvalData` that is a special case of the `Dataset` class. There is an official PyTorch [Data Loading and Processing Tutorial](https://pytorch.org/tutorials/beginner/data_loading_tutorial.html).

1. You are provided with the code for the class `TweetEvalData` (a subclass of the torch Dataset class) that can contain our `tweet_eval` data. Fill in the parts of the new class definition where provide the features and labels. Next, you will need to complete the code that creates two instances of that class: one for the train data and one for the validation data. The features and labels must come in the form of torch tensors. Luckily, you have just done that in the previous step!


In [None]:
# Making a subclass of the torch Datasets class
# You don't need to modify this code; just take a look at
# what it does and run it

class TweetEvalData(torch.utils.data.Dataset): # initiating a class for representing data with three methods

    def __init__(self, X, y): # describes how the dataset is initialized; the arguments (when initializing) are the features and labels
        self.X = X #any instance of the TweetEvalData class will have an attribute X that contains its features 
        self.y = y #same with an attribute y that contains its labels

    def __getitem__(self, index): # getitem allows to retrieve a datapoint from the dataset by its index.
        X = self.X[index] 
        y = self.y[index].unsqueeze(0) #tensor is unsqueezed to ensure correct shape for training
        return X, y # methods returns corresponding data point to input index by an x and y tensor 

    # a helper to check and return the size of the dataset
    def __len__(self):
        return len(self.y) # Returning the number of labels in the data

In [None]:
# Initializing datasets

dataset_class_train = TweetEvalData(FILLINTHEBLANK) # initiating an instance of the class that contains the train data
dataset_class_val = TweetEvalData(FILLINTHEBLANK) #initating an instance of the class that contains the test data

2. Finally, we need so-called [Dataloaders](https://pytorch.org/docs/stable/data.html) for each of these data sets. As you will see in the example code later on, a Dataloader allows us to easily iterate over samples of data and corresponding labels during training of our model. Create Dataloaders for the train and test sets using `torch.utils.data.DataLoader`, with `batch_size` 64. Batch size is number of samples that are processed before the model is updated while training; each step of our gradient descent is based on one batch of data.

In [None]:
# Initiating dataloaders 

train_loader = torch.utils.data.DataLoader(FILLINTHEBLANK)
val_loader = torch.utils.data.DataLoader(FILLINTHEBLANK)

# 3. Let's train our neural network!

1. Create an instance of our LogisticRegression. The input feature size should correspond to the size of tfidf vectors. We still have 3-class classification.
2. Set up a loss function ([CrossEntropyLoss](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html)) and an optimizer, a.k.a. a gradient descent algorithm ([Adam optimizer](https://pytorch.org/docs/stable/generated/torch.optim.Adam.html) with learning rate 0.001)
3. Complete and run the provided code for training the model across 5 epochs. Is your loss going down?

In [16]:
model = LogisticRegressoin(FILLINTHEBLANK)
model

SimpleNN(
  (fc1): Linear(in_features=18484, out_features=100, bias=True)
  (fc1_activ): ReLU()
  (fc_logits): Linear(in_features=100, out_features=3, bias=True)
)

In [17]:
loss_function = 
optimizer = 

In [None]:
# skeleton code for step (3)
for epoch in range(num_epochs):
    losses = [] # storing the loss values for this epoch
    for batch_index, (inputs, targets) in enumerate(train_loader):

        # zero the gradients that are stored from the previous optimization step
        optimizer.zero_grad()
        
        # get the model outputs and the original target (true) labels
        outputs = # compute the outputs of the model
        targets = torch.flatten(targets) # converting target tensor from shape (batch_size,1) to (batch_size)
        targets = targets.type(torch.LongTensor) # Converting targets as required for loss function
        
        # compute the loss here
        loss = loss_function(FILLINTHEBLANK)

        # back-propagate
        loss.backward()

        # perform the optimization step
        optimizer.step()
        
        #add this batch's loss to the losses for this epoch
        losses.append(loss.item())
        
    print(f'Epoch {epoch}: loss {np.mean(losses)}')

# 4. Evaluating the trained model

1. Complete the following code to evaluate the model on the `tweet_eval` validation set. We are looping over batches (subsets) of the validation data using our validation loader, getting the most-probable category for each prediction, and adding them to a list. Then we compare them to the true categories.


In [None]:
# skeleton code

predictions = []

with torch.no_grad(): #this is evaluation, so we don't need to do backpropagation anymore
    for batch_index, (inputs, targets) in enumerate(val_loader):
        outputs = # compute model outputs
        # getting the indices of the output with the highest value, which corresponds to the predicted class (as labels 0, 1, 2)
        vals, indices = torch.max(outputs, 1)
        # accumulating the predictions
        predictions += indices.tolist()

# compute accuracy on the predicted and target values with sklearn accuracy_score.
# Use the original list of validation labels loaded from the tweet_eval dataset
acc =
print(f'Model accuracy: {acc}')

2. When we train a model on an unbalanced dataset, it sometimes does not learn enough in order to predict the less-represented categories. Instead, it will (almost) always guess one of the larger classes, as this is a "safer bet" in the absence of enough informaiton. Check if in our case, the model managed to produce predictions of all classes.

# 5. Optional: a taste of deep neural nets

Next week in lecture, we will see that deep neural nets are nothing more than neural nets with more than one layer. The layers are stacked on top of each other, so that the outputs from the first layer don't go into a prediction, until the last layer makes a prediction. Each layer first combines the outputs from the previous layer (a.k.a. activations) linearly using weights, and then applies an activation function.

The neural net we are implementing here looks like this:

<img src="Net for DLI exercise.drawio.png"> 

1. Finish the definition of this model. The `__init__` method should contain:
- one hidden layer with sigmoid activation functions (this is the "hidden" layer of the model, between inputs and outputs, and the number of nodes in it is a variable `hidden_size` that we should be able to set when we initialize the model)
- one softmax output layer.
2. In the forward pass, the model should do the following:
 - compute the z values for the hidden layer (linear combinations of the inputs, using the [Linear](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) neural net building block)
 - pass them through the [sigmoid](https://pytorch.org/docs/stable/generated/torch.nn.Sigmoid.html) activation function
 - compute the z values for the output layer (linear combinations of the hidden layer outputs, a.k.a. its activations)
 - pass them through the Softmax activation function
3. Try initializing and inspecting the model as a toy_model instance. It should have 4 features, hidden_size 8, and 3 classes.

In [None]:
# exercise template
class SimpleNN(torch.nn.Module):

    # initializing the model with a certain number of input features
    # output classes, and size of hidden layer
    def __init__(self, n_features, hidden_size, n_classes):
        super().__init__()

        # creating one hidden layer h1
        # that first applies a linear transformation to the incoming data: z1 = w1T * x
        self.h1_linear =

        # creating the sigmoid activation function for the hidden layer
        self.h1_activ =

        # creating one output layer
        # that again applies a linear transformation to the incoming activations: z2 = w2T * a1
        self.output_linear =
        
        # creating a softmax activation function for the output layer
        self.output_activ =

    # you have to define the forward() method which will specify the forward propagation:
    # how the input values get to the next layer(s)
    def forward(self, inputs):

        # compute the z-values of the hidden layer
        z1 =

        # pass them through the sigmoid activation function
        a1 =

        #compute the z-values of the output layer
        z2 = 
        
        # get the final values
        outputs =

        return outputs

In [16]:
toymodel = SimpleNN(FILLINTHEBLANK)
toymodel

SimpleNN(
  (fc1): Linear(in_features=18484, out_features=100, bias=True)
  (fc1_activ): ReLU()
  (fc_logits): Linear(in_features=100, out_features=3, bias=True)
)