# Assignment 2 - CNNs and PyTorch

### Name: Anirudh Swaminathan
### PID: A53316083
### Email ID: aswamina@ucsd.edu

#### Notebook created by Anirudh Swaminathan from ECE department majoring in Intelligent Systems, Robotics and Control for the course ECE285 Machine Learning for Image Processing for Fall 2019

## Getting Started

In [None]:
import numpy as np
import torch

## Tensors

#### Question 1

In [None]:
# Construct 5*3 tensor and print it
x = torch.Tensor(5, 3)
print(x)

# printing its type
print(type(x))

# printing its data type
print(x.dtype)

$x$ was randomly initialized. $x$ is of type $\textbf{torch.Tensor}$ and it's data is of type $\text{torch.float32}$

#### Question 2

In [None]:
y = torch.rand(5, 3)
print(y)

# Finding the type of y
print(type(y))
print(y.dtype)

# Using randn() instead of rand()
y1 = torch.randn(5, 3)
print(y1)
print(type(y1))
print(y1.dtype)

$y$ is a $(5, 3)$ tensor with random values distributed in a uniform distribution from $0$ to $1$<br>
$y$ is of type $\textbf{torch.Tensor}$ and it's data is of type torch.float32<br>
$y_1$ is a $(5, 3)$ tensor with random values distributed as a Gaussian with mean $0$ and variance $1$. So, $y_1$ is a tensor filled with random values from a standard normal distribution<br>
So, if we use $torch.randn()$ function instead of $torch.rand()$, we may get negative values for $torch.randn()$ but not for $torch.rand()$ function.

#### Question 3

In [None]:
x = x.double()
y = y.double()
print(x)
print(y)

The type displayed when we print $x$ and $y$ are $torch.float64$

#### Question 4

In [None]:
# Initialize tensors with values directly
x = torch.Tensor([[-0.1859, 1.3970, 0.5236],
[ 2.3854, 0.0707, 2.1970],
[-0.3587, 1.2359, 1.8951],
[-0.1189, -0.1376, 0.4647],
[-1.8968, 2.0164, 0.1092]])

y = torch.Tensor([[ 0.4838, 0.5822, 0.2755],
[ 1.0982, 0.4932, -0.6680],
[ 0.7915, 0.6580, -0.5819],
[ 0.3825, -1.1822, 1.5217],
[ 0.6042, -0.2280, 1.3210]])

In [None]:
# Display the shapes of the two tensors x and y
print(x.shape, y.shape)

Shape of $x$ is $(5, 3)$. <br>
Shape of $y$ is $(5, 3)$.

#### Question 5

In [None]:
# Stack 2 tensors
z = torch.stack((x, y))

In [None]:
print(z, z.dtype, z.shape)

In [None]:
# Now, compare it with torch.cat()
z1 = torch.cat((x, y), 0)
z2 = torch.cat((x, y), 1)
print(z1, z1.shape)
print(z2, z2.shape)

The shape of the tensor $z$ is $(2, 5, 3)$. <br>
$torch.stack()$ stacks its arguments on a new dimension, i.e., on top of one another in this case. <br>

We also compared it to $torch.cat()$. <br>
In $torch.cat((x, y), 0)$, the tensor $y$ is concatenated to tensor $x$ along axis $0$, i.e., it is concatenated along the row. This results in a tensor that is of the shape $(10, 3)$ that is obtained by combining the two tensors, each of shape $(5, 3)$ row-wise. The output of this $torch.cat()$ is still the same dimension $(2D)$. <br>

Similarly, in $torch.cat((x, y), 1)$, the tensor $y$ is concatenated to tensor $x$ along axis $1$, i.e., it is concatenated along the column. This results in a tensor that is of the shape $(5, 6)$ that is obtained by combining the two tensors, each of shape $(5, 3)$ column-wise. The output of this $torch.cat()$ is still the same dimension $(2D)$. 

#### Question 6

In [None]:
# Accessing the element of the 5th row and 3rd column in 2d tensor y
ele = y[4, 2]
print("The element of the 5th row and 3rd column in 2d tensor y:", ele.item())

# Accessing the same element in the 3D tensor z
ele_3d = z[1, 4, 2]
print("Accessing the same element in the 3d tensor z, we have", ele_3d.item())

Hence, we were able to access the element of the $5^{th}$ row and $3^{rd}$ column in the $2D$ tensor $y$. <br>
We were also able to access the same element from the $3D$ tensor $z$.

#### Question 7

In [None]:
# Print all elements corresponding to the 5th row and 3rd column in z
eles = z[:, 4, 2]
print("Printing all elements corresponding to the 5th row and 3rd column in z:", eles)
print(eles.shape)

There are $2$ elements in $z$ that correspond to the $5^{th}$ row and $3^{rd}$ column of the tensor $z$. <br>
This is beacause $z$ is the stacked tensor of $x$ and $y$. Hence, the $1^{st}$ returned element corresponds to the element at the $5^{th}$ row and $3^{rd}$ column of the tensor $x$, and the $2^{nd}$ returned element corresponds to the element at the $5^{th}$ row and $3^{rd}$ column of the tensor $y$.

#### Question 8

In [None]:
print(x + y)
print(torch.add(x, y))
print(x.add(y))
torch.add(x, y, out=x)
print(x)

All the $4$ methods of addition print the same output. <br>
Also, all the $4$ methods are equivalent. They all take in 2 tensors x and y, and then output a new tensor. <br>
They all do NOT modify the tensors $x$ and $y$. <br>
Tensor $x$ seems modified in the last statement only because $out=x$ was specified, which meant that $torch$ stored the output of the addition operation between $x$ and $y$ in the variable $x$.

#### Question 9

In [None]:
# create a tensor whose values are sampled from a Normal Distribution with mean 0 and variance 1
x = torch.randn(4, 4)

# store a reshaped version of x in y such that it is a 1D tensor of size 16
y = x.view(16)

# store the reshaped version of x in z such that it is a 2D tensor of size (2, 8)
z = x.view(-1, 8)
print(x.size(), y.size(), z.size())
print(x)
print(y)
print(z)

The $1^{st}$ statement creates tensor $x$ of shape $(4, 4)$ using the $randn()$ function whose values are sampled from a Normal Distribution with $\mu = 0$ and $\sigma^2 = 1$. <br>
The $2^{nd}$ statement stores a reshaped version of $x$ in $y$ such that it is a $1D$ tensor of size $16$. <br>
The $3^{rd}$ statement stores a reshaped version of $x$ in $z$ such that it is a $2D$ tensor of size $(2, 8)$. <br>
The $-1$ in the argument for the $view()$ function states that the $1^{st}$ dimension of $z$ should be inferred by $torch$ directly given that the $2^{nd}$ dimension of $z$ is $8$. <br>
This conversion is just $4 * 4 = 16; \frac{16}{8} = 2$. Thus, the $1^{st}$ dimension of $z$ should be $2$.

#### Question 10

In [None]:
# Generate random x of dimension 10*10
x = torch.randn(10, 10)

# Generate random y of dimension 2*100
y = torch.randn(2, 100)
print(x.size(), y.size())

# reshape x to become a row vector and make it compatible for matrix multiplication
x = x.view(1, 100)

# reshape y to become a matrix compatible for matrix multiplication with x
y = y.view(100, 2)
print(x.size(), y.size())

# perform row vector by matrix multiplication
z = torch.mm(x, y)
print(z.size())
print(z)

We created a tensor $x$ of size $(10, 10)$ and tensor $y$ of size $(2, 100)$. <br>
We then reshaped the tensor $x$ to row vector of size $(1, 100)$. <br>
Tensor $y$ was also reshaped to size $(100, 2)$ to make it conformable for matrix multiplication with $x$. <br>
Finally, the result of the matrix multiplication carried out by $torch.mm(x, y)$ is stored in the tensor $z$. <br>
Tensor $z$ is of size $(1, 100)*(100, 2) = (1, 2)$. <br>

## Numpy and PyTorch

#### Question 11

In [None]:
a = torch.ones(5)
print(a)
b = a.numpy()
print(b)

print(type(a), type(b))
print(a.dtype, b.dtype)
print(a.size(), b.shape)

Variable $a$ is a $1D$ tensor of size $(5)$ carrying data of type $torch.float32$. Variable $b$ is a $1D$ numpy array of shape $(5,)$ carrying data of type $float32$. <br>
$b$ is the $numpy$ version of tensor $a$ and both carry the same data.

#### Question 12

In [None]:
a[0] += 1
print(a)
print(b)

Tensor $a$ and numpy array $b$ both share the same underlying memory location. <br>
Modifying $a$ changes $b$ and modifying $b$ changes $a$ if the tensor $a$ is on the CPU, which is the case here.

#### Question 13

In [None]:
a.add_(1)
print(a)
print(b)

The $add\_(1)$ modifies $a$ in-place, thus modifying numpy array $b$ also.

In [None]:
a[:] += 1
print(a)
print(b)

This statement modifies $a$, thus modifying numpy $b$ also.

In [None]:
a = a.add(1)
print(a)
print(b)

The satement $a.add(1)$ adds $1$ to $a$ and returns a new tensor. <br>
Since we store the result in $a$, only $a$ is now the variable that points to the new tensor output, but the underlying memory location is not modified. Thus, $b$ is not modified.

#### Question 14

In [None]:
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)
print(b)

Numpy array $a$ and tensor $b$ share the same underlying location. <br>
Modifying $a$ changes $b$ and vice-versa if the tensor is in the CPU.

#### Question 15

In [None]:
# GPU experiments
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(device)

# Create a tensor on CPU and then move it to GPU
x = torch.randn(5, 3).to(device)

# Create a tensor directly on the GPU
y = torch.randn(5, 3, device=device)
z = x + y
print(x.size(), x.dtype, x.device)
print(y.size(), y.dtype, y.device)
print(z.size(), z.dtype, z.device)
print(x)
print(y)
print(z)

Tensor $x$ is first created in the CPU and then transferred to the GPU using the $.to()$ command. <br>
Tensor $y$ is created in the GPU directly with the $device$ argument in the $randn()$ function. <br>
I feel that the allocation instruction for $y$ is more efficient than the one for $x$ as creating a tensor on CPU and then transferring it to GPU is $2$ steps, with the additional overhead of transferring it between devices. <br>
Directly allocating the tensor to the GPU would avoid these extra steps, and hence would be more efficient comparatively

#### Question 16

In [None]:
# This line runs fine
print(z.cpu().numpy())

# The following line produces an error
print(z.numpy())

In the $1^{st}$ line, the tensor $z$ is copied to CPU first using the $.cpu()$ function. Then, it is converted to a numpy array in the CPU. <br>
The $2^{nd}$ line throws a $TypeError$ as $torch$ can't convert the CUDA tensor to numpy directly. It means that the conversion has to be carried out in the CPU, and hence $z.cpu()$ has to be used first before the conversion.

## Autograd: automatic differentiation

#### Question 17

In [None]:
x = torch.ones(2, 2, requires_grad=True)
print(x)
y = x + 2
print(y)

In [None]:
print(y.requires_grad)
print(x.grad)
print(y.grad)
print(x.grad_fn)
print(y.grad_fn)

Since the $requires\_grad$ attribute of $x$ is $True$ and we are performing operations on $x$ to obtain $y$, $y$ will have its attribute $requires\_grad$ set to $True$ automatically. <br>
The $grad$ attributes of both the tensors $x$ and $y$ are $None$. This is because the gradient has not been computed yet for these tensors using the $.backward()$ function.

#### Question 18

In [None]:
z = y * y * 3
f = z.mean()
print(z, f)