## Homework 1 - Supervised Learning II - MDS Computational Linguistics

### Assignment Topics
- Operations on tensor
- Linearities, non-linearities and loss functions 
- Very-short answer questions

### Software Requirements
- Python (>=3.6)
- PyTorch (>=1.2.0) 
- Jupyter (latest)

### Submission Info.
- Due Date: January 18, 2020, 18:00:00 (Vancouver time)

## Getting Started

In [1]:
# all necessary imports
import numpy as np
import random
import math
import matplotlib.pyplot as plt
import torch
import torch.nn as nn

# set the seed (allows reproducibility of the results)
manual_seed = 123
torch.manual_seed(manual_seed) # allows us to reproduce results when using random generation on the cpu
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # checks if GPU is there in this system and automatically uses GPU if its available, otherwise uses CPU.

## Tidy Submission

rubric={mechanics:1}

To get the marks for tidy submission:
- Submit the assignment by filling in this jupyter notebook with your answers embedded
- Be sure to follow the [general lab instructions](https://ubc-mds.github.io/resources_pages/general_lab_instructions)

## Exercise 1: Operations on Tensor

### 1.1 Write code that creates a tensor, **X** of size $5 \times 5$ containing longs with values initialized to ones. </font>
rubric={accuracy:1}

In [2]:
# your code goes here



### 1.2 Write code that takes the tensor, **X** (from the previous question 1.1) and sets the values along the diagonal to two.
rubric={accuracy:1}

In [3]:
# your code goes here



### 1.3 Write code that takes the tensor, **X** (from the previous question 1.2), squares all the values in **X**, sums all the squared values in **X** and prints the square root of this sum? (L2-norm) 
rubric={accuracy:1}

In [4]:
# your code goes here



## 1.4 Given the following two tensors, **X** $\in \mathcal{R}^{4\times4}$ and $\textbf{Y} \in \mathcal{R}^{4\times4}$

In [5]:
X = torch.rand(4,4)
print(X)
Y = torch.rand(4,4)
print(Y)

tensor([[0.2961, 0.5166, 0.2517, 0.6886],
        [0.0740, 0.8665, 0.1366, 0.1025],
        [0.1841, 0.7264, 0.3153, 0.6871],
        [0.0756, 0.1966, 0.3164, 0.4017]])
tensor([[0.1186, 0.8274, 0.3821, 0.6605],
        [0.8536, 0.5932, 0.6367, 0.9826],
        [0.2745, 0.6584, 0.2775, 0.8573],
        [0.8993, 0.0390, 0.9268, 0.7388]])


### 1.4.1 Write code that performs standard matrix multiplication, multiply **X** and **Y** without changing their values and prints the result.
rubric={accuracy:1}

In [6]:
# your code goes here



### 1.4.2 Write code that performs standard addition of two matrices, add **X** and **Y** without changing their values and prints the result.
rubric={accuracy:1}

In [7]:
# your code goes here



### 1.4.3 Write code that subtracts matrix **Y** from **X** without changing their values and prints the result.
rubric={accuracy:1}

In [8]:
# your code goes here



### 1.4.4 Write code that performs standard matrix multiplication, multiply **X** and **Y** and placing the results directly in **X** (modifying **X**) and prints the result.
rubric={accuracy:1}

In [9]:
# your code goes here



## 1.5 Given the following tensor, **X** $\in \mathcal{R}^{5\times3}$

In [10]:
X = torch.rand(5,3)

### 1.5.1 Write code to print all the elements in the last row of **X**.
rubric={accuracy:1}

In [11]:
# your code goes here



### 1.5.2 Write code to print all the elements in the middle column of **X**.
rubric={accuracy:1}

In [12]:
# your code goes here



### 1.5.3 Write code to create a 3D tensor of size $1 \times 5 \times 3$ using the $5 \times 3$ values from **X** (unsqueeze operation)
rubric={accuracy:1}

In [13]:
# your code goes here



### 1.5.4 Write code that converts the 3D tensor (created in the previous question (c)) back into 2D tensor (of size $5 \times 3$). (squeeze operation)
rubric={accuracy:1}

In [14]:
# your code goes here



##  Exercise 2: Linearities, Non-linearities, Loss functions

Sample question:

In [15]:
linear_layer = torch.nn.Linear(5, 1)
linear_layer.weight.data[0] = torch.tensor([1, 2, 3, 4, 5]) # sets the weight value
linear_layer.bias.data = torch.tensor([3]).float() # sets the bias value
model_out = linear_layer(torch.tensor([0, 10, 20, 15, 5]).float())
print(model_out)

tensor([168.], grad_fn=<AddBackward0>)


Compute the values in **model\_out** by hand. Show your work.

Sample answer: (write it in markdown, not as code. if you don't like markdown, you can write the steps in a piece of paper, take a photo and attach an image in the answer block)

your answer goes here:

$model\_out = A x + b = [1, 2, 3, 4, 5] * [0, 10, 20, 15, 5] + 3 = (1*0 + 2*10 + 3*20 + 4*15 + 5*5) + 3 = 165 + 3 = 168 $


### 2.1

In [16]:
linear_layer = torch.nn.Linear(5, 1, bias=False)
linear_layer.weight.data[0] = torch.tensor([1, 2, 3, 4, 5]) # sets the weight value
model_out = linear_layer(torch.tensor([0, 10, 20, 15, 5]).float())
print(model_out)

tensor([165.], grad_fn=<SqueezeBackward3>)


### Compute the values in **model\_out** by hand. Show your work.
rubric={accuracy:2}

your answer goes here (double-click this block to edit):

### 2.2

In [17]:
linear_layer = torch.nn.Linear(5, 2)
linear_layer.weight.data[0] = torch.tensor([1, 2, 3, 4, 5]) # sets the weight value
linear_layer.weight.data[1] = torch.tensor([1, 0, 0, 0, 1]) # sets the weight value
linear_layer.bias.data = torch.tensor([1]).float() # sets the bias value
model_out = linear_layer(torch.tensor([0, 10, 20, 15, 1]).float())
sigmoid_out = torch.nn.Sigmoid()(model_out)
print(sigmoid_out)

tensor([1.0000, 0.8808], grad_fn=<SigmoidBackward>)


### Compute the values in **sigmoid\_out** by hand. Show your work.
rubric={accuracy:2}

your answer goes here:

### 2.3

In [18]:
linear_layer = torch.nn.Linear(5, 2)
linear_layer.weight.data[0] = torch.tensor([1, 2, 3, 4, 5]) # sets the weight value
linear_layer.weight.data[1] = torch.tensor([1, 3, 0, 0, 10]) # sets the weight value
linear_layer.bias.data = torch.tensor([3]).float() # sets the bias value
model_out = linear_layer(torch.tensor([[100, 10, 20, 15, 1], [10, 5, 2, 1, 0]]).float())
softmax_out = torch.nn.Softmax(dim=1)(model_out)
print(softmax_out)

tensor([[1.0000, 0.0000],
        [0.9933, 0.0067]], grad_fn=<SoftmaxBackward>)


### Compute the values in **softmax\_out** by hand. Show your work.
rubric={accuracy:2}

your answer goes here:

### 2.4

In [19]:
linear_layer = torch.nn.Linear(5, 2)
linear_layer.weight.data[0] = torch.tensor([1, 2, 3, 4, 5]) # sets the weight value
linear_layer.weight.data[1] = torch.tensor([1, 3, 0, 0, 10]) # sets the weight value
linear_layer.bias.data = torch.tensor([3]).float() # sets the bias value
model_out = linear_layer(torch.tensor([[100, 10, 20, 15, 1], [10, 5, 2, 1, 0]]).float())
criterion = torch.nn.MSELoss()
loss = criterion(model_out, torch.tensor([[245, 140], [30, 30]]).float())
print(loss)

tensor(7.7500, grad_fn=<MseLossBackward>)


### Compute the values in **loss** by hand. Show your work.
rubric={accuracy:2}

your answer goes here:

## Exercise 3: Very-Short answer questions

(Double-click each question block and place your answer at the end of the question) 

### 3.1 What is NumPy? What are the differences between PyTorch's Tensor and NumPy?
rubric={reasoning:2}

### 3.2 What is the key difference between ``torch.LongTensor`` and ``torch.cuda.LongTensor``?
rubric={reasoning:2}

### 3.3 What is the default data type of a PyTorch tensor?
rubric={accuracy:1}

### 3.4 What is ``autograd`` in PyTorch? How is it related to computational graph?
rubric={reasoning:2}

### 3.5 What is SGD? What role SGD plays in building machine learning models?
rubric={reasoning:2}