*Notebook conventions:*

* <font color="red">assignment problem</font>. The red color indicates the task that should be done
* <font color="green">debugging</font>. The green tells you what is expected outcome. Its primarily goal is to help you get the correct answer
* <font color="blue">comments, hints</font>.

Assignment 1 (pytorch basics)
======================



![pytorch](https://upload.wikimedia.org/wikipedia/commons/9/96/Pytorch_logo.png)

#### Useful Links:

* pytorch official documentation
http://pytorch.org/docs/master/index.html

* pytorch discussion
https://discuss.pytorch.org/

* pytorch official tutorials
https://pytorch.org/tutorials/

* pytorch tutorials (a bit more advance than official)
https://github.com/yunjey/pytorch-tutorial

* TODO: more?


### Preliminaries

In [1]:
import numpy as np
import torch
import torchvision

In [2]:
# check versions
from platform import python_version
print("python version:".ljust(25) + python_version())
print("torch version:".ljust(25) + torch.__version__)
print("torchvision version:".ljust(25) + torchvision.__version__)

python version:          3.6.7
torch version:           1.1.0
torchvision version:     0.3.0


In [3]:
# TODO: not sure I need it here
from google.colab import files

ModuleNotFoundError: No module named 'google.colab'

In [3]:
# random seed settings
torch.manual_seed(42)
np.random.seed(42)

###  Tensors

One of the main data type in pytorch is tensor.
We will start with the concept of tensor and how it is used in pytorch

![tensor](https://drive.google.com/uc?id=1LIthDTyj0tuz2VbewX7HGInseyFY-tpP)

#### Tensor Initialization

In [4]:
# 1d tensor of size 64 of type float (default)
# (this tensor is initialized with default values close to zero)
v = torch.empty(64)

print(" * the first 4 elements of 'v' are:")
print(v[:4]) # print the first four elements of the tensor

# initialize with array [0,1,...,63]
v = torch.arange(0,64)

print(" * the first 4 elements of 'v' are:")
print(v[:4]) # print the first four elements of the tensor

print(" * the size of the 'v' is ")
print(v.size())

 * the first 4 elements of 'v' are:
tensor([4.6463e-23, 4.5898e-41, 3.2176e-37, 0.0000e+00])
 * the first 4 elements of 'v' are:
tensor([0, 1, 2, 3])
 * the size of the 'v' is 
torch.Size([64])


In [5]:
# 2d tensor of size 64 of type float
x = torch.zeros(8, 8, dtype=torch.long)

print(" * the last 4 elements of 'x' are:")
print(x[:4,:4]) # print the last four elements of the tensor

# initialize with array all ones
x = torch.ones(8, 8, dtype=torch.float)

print(" * the last 4 elements of 'x' are:")
print(x[:4, :4]) # print the last four elements of the tensor

print(" * the size of the 'x' is ")
print(x.size())

print(" * the size of the 'x' can also be obtained by familar from numpy 'shape' command")
print(x.shape)

 * the last 4 elements of 'x' are:
tensor([[0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0]])
 * the last 4 elements of 'x' are:
tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])
 * the size of the 'x' is 
torch.Size([8, 8])
 * the size of the 'x' can also be obtained by familar from numpy 'shape' command
torch.Size([8, 8])


-----

 <font color="red">**[PROBLEM I]:** </font> 

 <font color="red"> Initialize X </font>     
 <font color="red"> 3d Tensor of size (4,4,4) </font>   
 <font color="red"> of type int32 with all elements equal to 10 </font>   

-----

In [6]:
X = torch.full([4,4,4], 10, dtype=torch.float32)

In [7]:
X.shape

torch.Size([4, 4, 4])

#### Reshaping, broadcasting

Tensor reshaping is done with command 'view':

In [8]:
a = torch.tensor([[1,2], [3,4]])
a_reshaped = a.view(4) # reshape into one-dimensional tensor of size 4

print(a)
print(a_reshaped)

tensor([[1, 2],
        [3, 4]])
tensor([1, 2, 3, 4])


-----

 <font color="red"> **[PROBLEM II]:** </font> 

 <font color="red"> Use command 'view' to reshape v and X into 2d tensor --> v' and X'. </font>  
  <font color="red">Also convet all tensors to type double </font>
 <font color="red"> Perform addition of these reshaped tensors, namely calculate v' + X' + x </font>  
 <font color="red"> Finally display the result. </font>

-----

In [9]:
v

tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
        18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
        36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53,
        54, 55, 56, 57, 58, 59, 60, 61, 62, 63])

In [10]:
X

tensor([[[10., 10., 10., 10.],
         [10., 10., 10., 10.],
         [10., 10., 10., 10.],
         [10., 10., 10., 10.]],

        [[10., 10., 10., 10.],
         [10., 10., 10., 10.],
         [10., 10., 10., 10.],
         [10., 10., 10., 10.]],

        [[10., 10., 10., 10.],
         [10., 10., 10., 10.],
         [10., 10., 10., 10.],
         [10., 10., 10., 10.]],

        [[10., 10., 10., 10.],
         [10., 10., 10., 10.],
         [10., 10., 10., 10.],
         [10., 10., 10., 10.]]])

In [11]:
v_ = v.view((8,8)).type(torch.double)
X_ = X.view((8,8)).type(torch.double)
torch.add(torch.add(v_, X_), x.type(torch.double))

tensor([[11., 12., 13., 14., 15., 16., 17., 18.],
        [19., 20., 21., 22., 23., 24., 25., 26.],
        [27., 28., 29., 30., 31., 32., 33., 34.],
        [35., 36., 37., 38., 39., 40., 41., 42.],
        [43., 44., 45., 46., 47., 48., 49., 50.],
        [51., 52., 53., 54., 55., 56., 57., 58.],
        [59., 60., 61., 62., 63., 64., 65., 66.],
        [67., 68., 69., 70., 71., 72., 73., 74.]], dtype=torch.float64)

### Operations on Tensors

relevant tutorial
https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html#operations

There are multiple syntaxes for operations. Let us look for addition operation.

In [12]:
x = x[:4,:4]
y = v.type(torch.float).view(8,8)[:4,:4]

In [13]:
print(torch.add(x, y))
print(x + y)
result = torch.empty_like(x)
torch.add(x, y, out=result)
print(result)

tensor([[ 1.,  2.,  3.,  4.],
        [ 9., 10., 11., 12.],
        [17., 18., 19., 20.],
        [25., 26., 27., 28.]])
tensor([[ 1.,  2.,  3.,  4.],
        [ 9., 10., 11., 12.],
        [17., 18., 19., 20.],
        [25., 26., 27., 28.]])
tensor([[ 1.,  2.,  3.,  4.],
        [ 9., 10., 11., 12.],
        [17., 18., 19., 20.],
        [25., 26., 27., 28.]])


inplace addition

In [14]:
print(x)
x.add_(y)
print(x)

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])
tensor([[ 1.,  2.,  3.,  4.],
        [ 9., 10., 11., 12.],
        [17., 18., 19., 20.],
        [25., 26., 27., 28.]])


### Numpy bridge

In [15]:
# create numpy array
a = np.array([[1,2], [3,4]])
# transform numpy array into torch.Tensor
b = torch.from_numpy(a)
# make operation on this Tensor (in this case transpose)
b = b.transpose(1,0)
# transform back to numpy
c = b.numpy()                

print(a, type(a))
print(b, type(b))
print(c, type(c))

[[1 2]
 [3 4]] <class 'numpy.ndarray'>
tensor([[1, 3],
        [2, 4]]) <class 'torch.Tensor'>
[[1 3]
 [2 4]] <class 'numpy.ndarray'>


-----

 <font color="red"> **[PROBLEM III]:** </font> 

In [16]:
# using these two random matrices do the following:
x = np.random.randn(3, 10)
y = np.random.randn(4, 10)
y.shape

(4, 10)

<span style="color:red"> Do the following: </span>
* <span style="color:red">transform $\mathbf{x}$ and $\mathbf{y}$ to torch.Tensors</span>
* <span style="color:red">perform matrix mutliplication $\mathbf{r1} = \mathbf{x} \cdot \mathbf{y^T} $</span>  
<span style="color:blue"> look in for pytorch function http://pytorch.org/docs/master/torch.html#torch.mm </span>  
* <span style="color:red">perform matrix element-wise mutliplication $\mathbf{r2} = \mathbf{r1} \cdot \mathbf{r1} $</span>  
<span style="color:blue"> look in for pytorch function http://pytorch.org/docs/master/torch.html#torch.mul </span> 
* <span style="color:red">perform scalar addition and scalar multiplication $\mathbf{r3} = 2 * \mathbf{r2} + 3 $</span>  
* <span style="color:red">transform the result back to numpy </span>

-----

In [25]:
x = torch.from_numpy(x)
y = torch.from_numpy(y)
res1 = torch.mm(x, y.transpose(1,0))
res2 = torch.mul(res1, res1)
res3 = torch.add(3, 2, res2)
res3_np = res3.numpy()
print(res3_np)

[[15.75251238  4.00731333  4.7705989  22.49291184]
 [ 4.29568683 76.39517546 44.09112074  5.5827972 ]
 [ 3.75285596 26.46251526  3.39959147  5.45615519]]


### CUDA stuff

let us run on CUDA! ... if CUDA is available

We will use ``torch.device`` objects to move tensors in and out of GPU

In [26]:
torch.cuda.is_available()

True

In [28]:
if torch.cuda.is_available():
    device = torch.device("cuda")          # a CUDA device object
    y = torch.ones_like(x, device=device)  # directly create a tensor on GPU
    x = x.to(device)                       # or just use strings ``.to("cuda")``
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))       # ``.to`` can also change dtype together!

tensor([[ 1.4967,  0.8617,  1.6477,  2.5230,  0.7658,  0.7659,  2.5792,  1.7674,
          0.5305,  1.5426],
        [ 0.5366,  0.5343,  1.2420, -0.9133, -0.7249,  0.4377, -0.0128,  1.3142,
          0.0920, -0.4123],
        [ 2.4656,  0.7742,  1.0675, -0.4247,  0.4556,  1.1109, -0.1510,  1.3757,
          0.3994,  0.7083]], device='cuda:0', dtype=torch.float64)
tensor([[ 1.4967,  0.8617,  1.6477,  2.5230,  0.7658,  0.7659,  2.5792,  1.7674,
          0.5305,  1.5426],
        [ 0.5366,  0.5343,  1.2420, -0.9133, -0.7249,  0.4377, -0.0128,  1.3142,
          0.0920, -0.4123],
        [ 2.4656,  0.7742,  1.0675, -0.4247,  0.4556,  1.1109, -0.1510,  1.3757,
          0.3994,  0.7083]], dtype=torch.float64)


###  Autograd: automatic differentiation

relevant tutorial
https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html#sphx-glr-beginner-blitz-autograd-tutorial-py


*torch.Tensor* is the central class of the package. If you set its attribute *.requires_grad* as True, it starts to track all operations on it. When you finish your computation you can call *.backward()* and have all the gradients computed automatically. The gradient for this tensor will be accumulated into .grad attribute. 

**use of autograd**

Lets start with simple example.
Consider the following function:
$$f = (x + y) \cdot z$$

For concretness let's take $x=2$, $y=-7$, $z=3$. The 'forward' calculation is shown in <span style="color:green"> green </span> on the image below.

Automaic differentiation provides the elegant tool to calculate derivatives of $f$ with respect to all variables, by 'backward' path.

$$f = (x + y) \cdot z = u \cdot z $$

$$ \frac{\partial f}{\partial u} = z $$

$$ \frac{\partial f}{\partial z} = u = -5 $$

$$ \frac{\partial f}{\partial x} = \frac{\partial f}{\partial u} \cdot \frac{\partial u}{\partial x} = z = 3$$

$$ \frac{\partial f}{\partial y} = \frac{\partial f}{\partial u} \cdot \frac{\partial u}{\partial y} = z = 3$$

![comp_graph_1](https://drive.google.com/uc?id=1jTDQCsT5jDmiIfK2Ay1n1MlCDNS8swYf)

In [58]:
# Create tensors.
# ('requires_grad' is False by default)
x = torch.tensor([2.], requires_grad=True)
y = torch.tensor([-7.], requires_grad=True)
z = torch.tensor([3.], requires_grad=True)

# Build a computational graph.
f = (x + y) * z   

# Compute gradients.
f.backward()

# Print out the gradients.
print(x.grad)    
print(y.grad)    
print(z.grad) 

tensor([3.])
tensor([3.])
tensor([-5.])


 <font color="red">  **[PROBLEM IV]**: </font> 

 Next we will consider the computational graph of the following function 

$$f = \frac{1}{1 + exp^{-(w_0 \cdot x_0 + w_1 \cdot x_1 + b )}} = \frac{1}{1 + exp^{-(\mathbf{w} \cdot \mathbf{x} + b )}}$$


![comp_graph_2](https://drive.google.com/uc?id=1YcYQ5KEp7ZCiXR_Pzr9JYI-4t5wWV4ll)

 We are interested in computing partial derivatives: 

$$ \frac{\partial f}{\partial \mathbf{w}}  $$ 

$$ \frac{\partial f}{\partial b}  $$ 

$$ \frac{\partial f}{\partial \mathbf{x}}  $$ 

define $\{x_0, x_1\}$ and $\{w_0, w_1\}$ as vector variables $\mathbf{x}$ and $\mathbf{w}$
look in for pytorch exponent function http://pytorch.org/docs/master/torch.html#torch.exp 
use matrix operations

You should get the numbers the same as on the figure

In [66]:
w = torch.tensor([3., 5.], requires_grad=True)
x = torch.tensor([-2., 1.], requires_grad=True)
b = torch.tensor([2.], requires_grad=True)

f = 1 / (1 + torch.exp(-(torch.dot(x,w) + b)))
f.backward()


# Print out the gradients.
print(w.grad)
print(x.grad)      
print(b.grad) 

tensor([-0.3932,  0.1966])
tensor([0.5898, 0.9831])
tensor([0.1966])


One can make gradient zero by *.zero_()* command

In [67]:
w.grad.zero_()
print(w.grad)

tensor([0., 0.])
