## Seminar 2

### Intro to PyTorch

based on official [PyTorch Blitz Tutorial](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html)

## To install PyTorch please follow instructions from official [website](https://pytorch.org/get-started/locally/).

### What is PyTorch?

* It's a package for scientific computations, basically, a replacement for NumPy, that supports GPUs.
* It's a deep learning research platform

### Tensors

Tensors are similar to NumPy's ndarrays, with the exception of being able to be operated with using GPUs.

In [None]:
import torch

To construct a 5x3 matrix, uninitialized:

In [None]:
x = torch.empty(5, 3)
print(x)

NB! An uninitialized matrix is declared, but does not contain definite known values before it is used. When an uninitialized matrix is created, whatever values were in the allocated memory at the time will appear as the initial values.

To construct a randomly initialized matrix:

In [None]:
x = torch.rand(5, 3)
print(x)

To construct a matrix, filled with zeros and data-type long:

In [None]:
x = torch.zeros(5, 3, dtype=torch.long)
print(x)

A tensor may be initialized directly from data:

In [None]:
x = torch.tensor([5.5, 3])
print(x)

A tensor may be created using an existing tensor. The new one will inherit all the properties of the one, that was passed as a parameter, apart from those, that were parametrized explicitly:

In [None]:
x = x.new_ones(5, 3, dtype=torch.double)      # new_* methods take in sizes
print(x)

x = torch.randn_like(x, dtype=torch.float)    # override dtype!
print(x)   

To check the size of a tensor we use:

In [None]:
x.size()

NB! The type torch.Size is an abstraction from a mere tuple, so it supports all the tuple operations

### Operations

PyTorch is so pythonic, that it implements operations on tensors in many different syntaxes to match everyones needs and tastes. Let us take a look at the addition operation:

In [None]:
y = torch.rand(5, 3)
print(x + y)

In [None]:
print(torch.add(x, y))

In case you need it, you can pass an out variable as a parameter to any operation like add:

In [None]:
result = torch.empty(5, 3)
torch.add(x, y, out=result)
print(result)

Tensor objects support all the operations as methods:

In [None]:
x.add(y)

In case you need to perform an operation in-place, you use the operation_ syntax:

In [None]:
x.add_(y)

The result of an in-place operation is stored in the left operand object, in this particular case in x

In [None]:
x

The sugarish NumPy indexing syntax is also supported:

In [None]:
print(x[:, 1])

In case there is a need to resize (*reshape*) a tensor, the ``` view ``` method comes into action:

In [None]:
x = torch.randn(4, 4)
y = x.view(16)
z = x.view(-1, 8)  # the size -1 denotes the original dimension size
print(x.size(), y.size(), z.size())

To get the number out of the tensor use:

In [None]:
x = torch.randn(1)
print(x)
print(x.item())

In [None]:
y[1].item()

In case we need to check, if CUDA is available, we use:

In [None]:
# let us run this cell only if CUDA is available
# We will use ``torch.device`` objects to move tensors in and out of GPU
if torch.cuda.is_available():
    device = torch.device("cuda")          # a CUDA device object
    y = torch.ones_like(x, device=device)  # directly create a tensor on GPU
    x = x.to(device)                       # or just use strings ``.to("cuda")``
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))

### Autograd

The next thing that is worth looking at is the automatic gradient computation module of pyTorch. It is called
*torch.autograd* . This module does all the *magic* that is connected with gradient computations, using a sofisticated computation graph architecture, that is going to be covered later. For now we will get to know only basic concepts of it.

To include a `Tensor` into the computation graph, its `.requires_grad` attribute should be set to `True`

In [None]:
x = torch.ones(2, 2, requires_grad=True)
print(x)

After any operation is applied (in this particular case - addition), a `Function` object is assigned to the `.grad_fn` attribute of the tensor `y` and added to the computation graph for backward propagation of the gradient.

In [None]:
y = x + 2
print(y)

In [None]:
print(y.grad_fn)

In [None]:
z = y * y * 3
out = z.mean()

print(z, out)

This `.grad_fn` attribute can be changed on the fly. See the difference: if a tensor does not require gradient, it is not included into the computation graph, hence it does not store any backward function. However, once `.grad_fn` changed to `True`, all the operations start to be tracked.

In [None]:
a = torch.randn(2, 2)
a = ((a * 3) / (a - 1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)

One of the most important things in the torch framework is the `.backward()` method. It triggers the calculation of the gradients for all the nodes (e.g. neural net parameters) in the computation graph that are chained to the callee node. 

NB! `.backward()` when called on a \[1, 1\] tensor, requires no arguments

In [None]:
out.backward()

In [None]:
print(x.grad)

In [None]:
x = torch.randn(3, requires_grad=True)

y = x * 2
while y.data.norm() < 1000:
    y = y * 2

print(y)

If there is a need to stop autograd from tracking history on Tensors you can use either context manager:

In [None]:
print(x.requires_grad)
print((x ** 2).requires_grad)

with torch.no_grad():
    print((x ** 2).requires_grad)

or `.detach()` method:

In [None]:
print(x.requires_grad)
y = x.detach()
print(y.requires_grad)
print(x.eq(y).all())

## Logistic Regression Using PyTorch
### based on [this](https://blog.goodaudience.com/awesome-introduction-to-logistic-regression-in-pytorch-d13883ceaa90) blogpost

Basically, most of pyTorch modeling can be broken down into these steps:
* loading the dataset
* making the dataset iterable
* instantiating the **model** class
* instantiating the **loss** class
* instantiating the **optimizer** class
* training the model

#### Load Dataset

In [None]:
%pip install torchtext

In [None]:
from torchtext import data
from torch.nn import functional as F
import torch

In [None]:
if torch.cuda.is_available():
    DEVICE = torch.device("cuda")
else:
    DEVICE = torch.device("cpu")

In [None]:
SEED = 1234

torch.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

In [None]:
import nltk

In [None]:
nltk.download("movie_reviews")

In [None]:
import re
import os

In [None]:
POS = "pos"
NEG = "neg"

In [None]:
text_sentiments = (POS, NEG)

train_data_list = []
test_data_list = []

examples = []

for sentiment in text_sentiments:
    for filename in os.listdir(os.path.join(nltk.corpus.movie_reviews.root.path, sentiment)):
        with open(os.path.join(nltk.corpus.movie_reviews.root.path, sentiment, filename), "r", encoding="utf-8") as file:
            examples.append({"text": file.read().strip(),
                             "sentiment": int(sentiment == POS)})

In [None]:
%pip install pandas

In [None]:
import pandas as pd

In [None]:
examples_df = pd.DataFrame(examples)

In [None]:
examples_df = examples_df.sample(frac=1)
train_df = examples_df.sample(frac=0.7)
test_df = examples_df.drop(index=train_df.index)
train_texts, train_labels = train_df.text.values, train_df.sentiment.values
test_texts, test_labels = test_df.text.values, test_df.sentiment.values

In [None]:
test_labels

In [None]:
len(test_df.text.values), len(test_df.sentiment.values), len(test_labels)

In [None]:
from typing import List, Dict, Any, Iterable
from collections import Counter, OrderedDict
import math
from itertools import islice
import torch.nn.functional as F

In [None]:
class TfIdfVectorizer:

            
    def __init__(self, lower=True, tokenizer_pattern=r"(?i)\b[a-z]{2,}\b"):
        self.lower = lower
        self.tokenizer_pattern = re.compile(tokenizer_pattern)
        self.vocab_df = OrderedDict()
        
    def __tokenize(self, text: str) -> List[str]:
        return self.tokenizer_pattern.findall(text.lower() if self.lower else text)
    
    def fit(self, texts: Iterable[str]):
        term_id = 0
        for doc_idx, doc in enumerate(texts):
            tokenized = self.__tokenize(doc)
            for term in tokenized:
                if term not in self.vocab_df:
                    self.vocab_df[term] = {}
                    self.vocab_df[term]["doc_ids"] = {doc_idx}
                    self.vocab_df[term]["doc_count"] = 1
                    self.vocab_df[term]["id"] = term_id
                    term_id += 1
                elif doc_idx not in self.vocab_df[term]["doc_ids"]:
                    self.vocab_df[term]["doc_ids"].add(doc_idx)
                    self.vocab_df[term]["doc_count"] += 1
        texts_len = len(texts)
        for term in self.vocab_df:
            self.vocab_df[term]["idf"] = math.log(texts_len / self.vocab_df[term]["doc_count"])
        
        
    def transform(self, texts: Iterable[str]) -> torch.sparse.LongTensor:
        values = []
        doc_indices = []
        term_indices = []
        for doc_idx, raw_doc in enumerate(texts):
            term_counter = {}
            for token in self.__tokenize(raw_doc):
                if token in self.vocab_df:
                    term = self.vocab_df[token]
                    term_idx = term["id"]
                    term_idf = term["idf"]
                    if term_idx not in term_counter:
                        term_counter[term_idx] = term_idf
                    else:
                        term_counter[term_idx] += term_idf
            term_indices.extend(term_counter.keys())
            values.extend(term_counter.values())
            doc_indices.extend([doc_idx] * len(term_counter))
        indices = torch.LongTensor([doc_indices, term_indices], device = DEVICE)
        values_tensor = torch.LongTensor(values, device = DEVICE)
        tf_idf = torch.sparse.LongTensor(indices, values_tensor, torch.Size([len(texts), len(self.vocab_df)]), device = DEVICE)
        return tf_idf

In [None]:
%%time
vectorizer = TfIdfVectorizer()
vectorizer.fit(train_texts)

In [None]:
%%time
train_data = vectorizer.transform(train_texts)
test_data = vectorizer.transform(test_texts)

#### Make the dataset iterable

In [None]:
from torch.utils.data import DataLoader, Dataset

In [None]:
train_data_loader = DataLoader(train_texts, batch_size=64)
test_data_loader = DataLoader(test_texts, batch_size=64)

In [None]:
def batch(iterable, n=1):
    l = len(iterable)
    for ndx in range(0, l, n):
        yield iterable[ndx:min(ndx + n, l)]

#### Build the model

In [None]:
from torch import nn
from torch.nn import functional as F

class LogisticRegressionModel(nn.Module):

    def __init__(self, input_dim, output_dim):
        super(LogisticRegressionModel, self).__init__()
        self.linear = nn.Linear(input_dim, output_dim)

    def forward(self, x):
        out = F.softmax(self.linear(x))
        return out

In [None]:
model = LogisticRegressionModel(len(vectorizer.vocab_df), 2)

In [None]:
criterion = nn.CrossEntropyLoss()

In [None]:
learning_rate = 0.001

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

In [None]:
# Type of parameter object
print(model.parameters())

# Length of parameters
print(len(list(model.parameters())))

# FC 1 Parameters
print(list(model.parameters())[0].size())

# FC 1 Bias Parameters
print(list(model.parameters())[1].size())

In [None]:
num_epochs = 5

In [None]:
iteration = 0
for epoch in range(num_epochs):
    print(f"Epoch #{epoch}")
    for i, (texts, labels) in enumerate(zip(train_data_loader, batch(train_labels, 64))):
        labels = torch.LongTensor(labels)
        texts = F.normalize(vectorizer.transform(texts).to(torch.float).to_dense()).requires_grad_()
#         print(texts.size(), labels.size(0))

        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()

        # Forward pass to get output/logits
        outputs = model(texts)

        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)

        # Getting gradients w.r.t. parameters
        loss.backward()

        # Updating parameters
        optimizer.step()

        iteration += 1

        if iteration % 50 == 0:
            # Calculate Accuracy         
            correct = 0
            total = 0
            # Iterate through test dataset
            for test_texts, test_labels_batch in zip(test_data_loader, batch(test_labels, 64)):
                # Load value to a Torch Variable
                test_texts = F.normalize(vectorizer.transform(test_texts).to(torch.float).to_dense())
                test_labels_batch = torch.Tensor(test_labels_batch).to(torch.long)
                # Forward pass only to get logits/output
                outputs = model(test_texts)

                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)

                # Total number of labels
                total += test_labels_batch.size(0)

                # Total correct predictions
                correct += (predicted == test_labels_batch).sum()

            accuracy = 100 * correct / total

            # Print Loss
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iteration, loss.item(), accuracy))