In [1]:
import numpy as np
import torch
import torch.utils.data
import scipy
import torch.nn.functional as F
import datasets
from sklearn.metrics import accuracy_score
from sklearn.feature_extraction.text import TfidfVectorizer
datasets.logging.set_verbosity_error()

# Python refresher: classes, methods, attributes.

1. Create Python classes `professor` and `student`. The professor should have the attribute `courses_taught`, and the student - `courses_enrolled` (list of strings). Both of them should also have a `name` attribute (string), which is set when a class instance is initialized.
2. Implement the methods to update the list of courses for the student and the professor classes.
3. Initialize an instance of a student class named "Jane Doe", and a professor named "Mary Smith".
4. Use the methods you implemented to get the student enrolled in 3 courses of your choosing, and the professor - teaching 2 other courses.
5. Check whether the student is enrolled in any courses that the professor is teaching?

Python refresher: [classes](https://www.pythontutorial.net/python-oop/python-class/), [attributes](https://www.pythontutorial.net/python-oop/python-class-attributes/), [methods](https://www.tutorialspoint.com/difference-between-method-and-function-in-python), [.self and __init__](https://micropyramid.com/blog/understand-self-and-__init__-method-in-python-class/)

# Basic operations with PyTorch
1. Create two random Torch tensors of the size 4 x 6 and 6 x 8. [Multiply](https://pytorch.org/docs/stable/generated/torch.matmul.html) these tensors.
2. Create 2 numpy arrays with shapes 4 x 3 and 7 x 3. Turn them into Torch tensors and [concatenate](https://pytorch.org/docs/stable/generated/torch.cat.html) them. Confirm that the shape is correct and turn the result back into a numpy array.
3. Return the concatenated tensor to torch. Find which device it is in.

Basic tensor operations tutorial: [link](https://pytorch.org/tutorials/beginner/basics/tensorqs_tutorial.html)
If you're not familiar with numpy - it's a very popular library for data science, consider catching up with this [tutorial](https://numpy.org/doc/stable/user/quickstart.html).

# Defining the PyTorch model

1. Finish the definition of this model. The `__init__` method should contain one fully connected layer linear layer with ReLU activation function, and one output (aka "logits" layer.
2. In the forward pass, the model should do the following:
 - compute the input values on the fully-connected layer
 - pass them through the [ReLU](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html) activation function
 - compute and return the logits in the output layer
3. Try initializing and inspecting the model as a toy_model instance. It should have 4 features, hidden_size 8, and 3 classes.

In [None]:
# exercise template
class SimpleNN(torch.nn.Module):

    # initializing the model with a certain number of input features
    # output classes, and size of hidden layer(s)
    def __init__(self, n_features, hidden_size, n_classes):
        super().__init__()

        # creating one fully connected layer fc1
        # that applies a linear transformation to the incoming data: y=xA^T +b
        self.fc1 =

        # setting the ReLU activation function on the fully connected layer
        self.fc1_activ =

        # setting up the layer that will return the final values for prediction
        # this is often called "logits", but this is not the statistical log-odds function
        self.fc_logits =

    # you have to define the forward() method which will specify the forward propagation:
    # how the input values get to the next layer(s)
    def forward(self, inputs):

        # compute the input values on the fully-connected layer
        z1 =

        # pass them through the activation function
        z1_active =

        # get the final values
        logits =

        return logits

# Computing the loss

1. Instantiate the [Mean Squared Error](https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html) loss function.
2. Create two random tensors of the same shape: dummy_target and dummy_prediction. For this "dry run", let's pretend that these are the desired and actual outputs of our toy model. They need to be the size of the input layer.
3. Compute the loss on the dummy tensors using the MSE loss function. Inspect the result.


# Backward propagation (single step)

1. Instantiate the [Adam](https://pytorch.org/docs/stable/optim.html#torch.optim.Adam) optimizer with learning rate (`lr`) parameter set to 0.001.
2. Zero out the current gradients of the optimizer with the [zero_grad](https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_grad.html) method.
3. Compute the gradients based on the loss function value by calling the `.backward()` method on the loss.
4. Perform a single optimization step based on the computed gradients and inspect the loss.

In [None]:
# exercise template
optimizer = # initialize the adam optimizer here
# zero out its gradients here
# compute the gradients here
optimizer.step()
loss.item()

# Loading the Tweet_eval data and turning it to Torch tensors

1. Load the tweet_eval data as usual (we will only need the train and validation sets)
2. Vectorize the tweet texts with the TfIDF vectorizer from sklearn
3. Convert this data to Torch tensors. Note that the original sklearn vector data is not numpy arrays but scipy matrices, which can be converted with `toarray()` method. Labels are lists, and so can be converted with `np.array(mylist)`.  You will also need to convert all the feature tensors to float type with `float()` method.

If your computer is struggling with the conversion, simply reduce the amount of training data to a slice (e.g. first 10K examples).


# Turning the data into Torch Datasets

We're still not done with the data preparation! The canonical way to handle data in PyTorch is with the [Dataset](https://pytorch.org/docs/stable/data.html?highlight=dataloader#torch.utils.data.Dataset) class. It is an abstract class representing any dataset used as input to a model. It is conveniently designed in a way that all the classes subclassing it would only have to override `__len__` and `__getitem__` methods. The goal of the `__getitem__` method is, given an index, to return the corresponding input data. There is an official PyTorch [Data Loading and Processing Tutorial](https://pytorch.org/tutorials/beginner/data_loading_tutorial.html).

You are provided with the skeleton code for the dataset class for our training data.

1. Fill in the parts that provide the torch tensors corresponding to the vector and label data. Luckily, you have just done that in the previous step!
2. Create the same kind of class for the validation data.
3. Instantiate both classes and load them using `torch.utils.data.DataLoader`, with `batch_size` 64.


In [None]:
# exercise template cell
class TweetEvalTrain(torch.utils.data.Dataset):
    # define how you're getting the data in your X and y attributes. They can be loaded from csv file,
    # from some other resource, etc.
    def __init__(self):
        self.X = # tensor corresponding to the tfidf vectors for the train data
        self.y = # tensor corresponding to the labels for the train data

    # this method implements retrieval of a datapoint by index
    def __getitem__(self, index):
        X = self.X[index]
        y = self.y[index].unsqueeze(0)
        return X, y

    # a helper to check the size of the dataset
    def __len__(self):
        return len(self.y)

In [None]:
class TweetEvalVal(torch.utils.data.Dataset):
    pass #implement this class using the above as a template

In [None]:
data_train = # instantiate your train data class
train_loader = #load it using batch size 64

# do the same for validation data

# Let's train our neural network!

1. Create an instance of our SimpleNN model using hidden_size 100. The input feature size should correspond to the size of tfidf vectors. We still have 3-class classification.
2. Like before, set up the loss function ([CrossEntropyLoss](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html)) and [Adam optimizer](https://pytorch.org/docs/stable/generated/torch.optim.Adam.html) with learning rate 0.001
3. Complete and run the provided code for training the model across 5 epochs. Is your loss going down?

In [None]:
# skeleton code for step (3)
for epoch in range(num_epochs):
    losses = [] # storing the loss values
    for batch_index, (inputs, targets) in enumerate(train_loader):

        # zeroing the gradients that are stored from the previous optimization step
        optimizer.zero_grad()
        outputs = # compute the outputs
        targets = torch.flatten(targets)
        # compute the loss here

        # back-propagate

        # perform the optimization step
        losses.append(loss.item())
    print(f'Epoch {epoch}: loss {np.mean(losses)}')

# Evaluating the trained model

Complete the following code to evaluate the model:


In [160]:
# skeleton code

predictions = []

with torch.no_grad(): #this is evaluation, so we don't need to do backpropagation anymore
    for batch_index, (inputs, targets) in enumerate(val_loader):
        outputs = # compute model outputs
        # getting the indices of the logit with the highest value, which corresponds to the predicted class (as labels 0, 1, 2)
        vals, indices = torch.max(outputs, 1)
        # accumulating the predictions
        predictions += indices.tolist()

# compute accuracy on the predicted and target values with sklearn accuracy_score.
# Use the original list of validation labels loaded from the tweet_eval dataset
acc =
print(f'Model accuracy: {acc}')

SyntaxError: invalid syntax (459789159.py, line 4)

# Advanced, optional

If you're done with the above, try to write the same kind of simple neural net model and its training loop from scratch, without looking at the skeleton code.