*Colormap of the notebook:*

* <span style="color:red">assignment problem</span>. The red color indicates the task that should be done
* <span style="color:green">debugging</span>. The green tells you what is expected outcome. Its primarily goal is to help you get the correct answer
* <span style="color:blue">comments, hints</span>.

Assignment 1 (pytorch basics)
======================


<img src="fig/pytorch-logo-dark.png" style="height:64px;" />

#### Useful Links:

* pytorch official documentation
http://pytorch.org/docs/master/index.html

* pytorch discussion
https://discuss.pytorch.org/

* pytorch tutorials
https://github.com/yunjey/pytorch-tutorial

* pytorch examples
https://github.com/jcjohnson/pytorch-examples


### Preliminaries

In [1]:
# for compatability issues (python 2 & python 3)
from __future__ import print_function
from __future__ import division

In [2]:
import numpy as np
import torch
from torch.autograd import Variable

In [3]:
# random seed settings
torch.manual_seed(42)
np.random.seed(42)

###  Tensors

One of the main data type in pytorch is tensor.
We will start with the concept of tensor and how it is used in pytorch

<img src="fig/tensors.jpg" style="height:512px;" />

#### Tensor Initialization

In [4]:
# 1d tensor of size 64 of type float (default)
# (this tensor is initialized with default values close to zero)
v = torch.Tensor(64)

print(" * the first 4 elements of 'v' are:")
print(v[:4]) # print the first four elements of the tensor

# initialize with array [0,1,...,63]
v = torch.arange(0,64)

print(" * the first 4 elements of 'v' are:")
print(v[:4]) # print the first four elements of the tensor

print(" * the size of the 'v' is ")
print(v.size())

 * the first 4 elements of 'v' are:

-5.9009e-22
 4.5618e-41
-5.9009e-22
 4.5618e-41
[torch.FloatTensor of size 4]

 * the first 4 elements of 'v' are:

 0
 1
 2
 3
[torch.FloatTensor of size 4]

 * the size of the 'v' is 
torch.Size([64])


In [5]:
# 2d tensor of size 64 of type float
x = torch.Tensor(8, 8).type(torch.FloatTensor)

print(" * the last 4 elements of 'x' are:")
print(x[:4,:4]) # print the last four elements of the tensor

# initialize with array all ones
x = torch.ones(8, 8).type(torch.FloatTensor)

print(" * the last 4 elements of 'x' are:")
print(x[:4, :4]) # print the last four elements of the tensor

print(" * the size of the 'x' is ")
print(x.size())

 * the last 4 elements of 'x' are:

1.00000e-22 *
 -5.9010  0.0000 -5.9010  0.0000
  0.0000  0.0000  0.0000  0.0000
  0.0000  0.0000  0.0000  0.0000
  0.0000  0.0000  0.0000  0.0000
[torch.FloatTensor of size 4x4]

 * the last 4 elements of 'x' are:

 1  1  1  1
 1  1  1  1
 1  1  1  1
 1  1  1  1
[torch.FloatTensor of size 4x4]

 * the size of the 'x' is 
torch.Size([8, 8])


<span style="color:red"> **[PROBLEM I]**: </span>   

<span style="color:red"> Initialize X </span>  
<span style="color:red"> 3d Tensor of size (4,4,4) </span>  
<span style="color:red"> of type FloatTensor with all elements equal to zero </span>

<span style="color:blue"> consider to use 'zeros' </span>

In [6]:
# YOUR CODE HERE
X = torch.zeros(4,4,4)

<span style="color:red"> **[PROBLEM II]**: </span> 

<span style="color:red"> Explan of what you see while running the following </span>

In [7]:
# watch out 
print("X + x + v : ")
print(X + x + v)
print("x + v + X : ")
print(x + v + X)
print("v + X + x : ")
print(v + X + x)

X + x + v : 

(0 ,.,.) = 
   1   2   3   4
   5   6   7   8
   9  10  11  12
  13  14  15  16

(1 ,.,.) = 
  17  18  19  20
  21  22  23  24
  25  26  27  28
  29  30  31  32

(2 ,.,.) = 
  33  34  35  36
  37  38  39  40
  41  42  43  44
  45  46  47  48

(3 ,.,.) = 
  49  50  51  52
  53  54  55  56
  57  58  59  60
  61  62  63  64
[torch.FloatTensor of size 4x4x4]

x + v + X : 

    1     2     3     4     5     6     7     8
    9    10    11    12    13    14    15    16
   17    18    19    20    21    22    23    24
   25    26    27    28    29    30    31    32
   33    34    35    36    37    38    39    40
   41    42    43    44    45    46    47    48
   49    50    51    52    53    54    55    56
   57    58    59    60    61    62    63    64
[torch.FloatTensor of size 8x8]

v + X + x : 

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46

#### Reshaping, broadcasting

Tensor reshaping is done with command 'view':

In [8]:
a = torch.Tensor([[1,2], [3,4]])
a_reshaped = a.view(4) # reshape into one-dimensional tensor of size 4

print(a)
print(a_reshaped)


 1  2
 3  4
[torch.FloatTensor of size 2x2]


 1
 2
 3
 4
[torch.FloatTensor of size 4]



<span style="color:red"> **[PROBLEM III]**: </span> 

<span style="color:red"> Use command 'view' to reshape v and X into 2d tensor. </span>  
<span style="color:red"> Perform addition of these reshaped tensors v, X and x. </span>  
<span style="color:red"> Finally display the result. </span>

In [9]:
# YOUR CODE HERE
result_add = v.view(8,8) + x + X.view(8,8)
print(result_add)


    1     2     3     4     5     6     7     8
    9    10    11    12    13    14    15    16
   17    18    19    20    21    22    23    24
   25    26    27    28    29    30    31    32
   33    34    35    36    37    38    39    40
   41    42    43    44    45    46    47    48
   49    50    51    52    53    54    55    56
   57    58    59    60    61    62    63    64
[torch.FloatTensor of size 8x8]



#### Numpy bridge

In [10]:
# create numpy array
a = np.array([[1,2], [3,4]])
# transform numpy array into torch.Tensor
b = torch.from_numpy(a)
# make operation on this Tensor (in this case transpose)
b = b.transpose(1,0)
# transform back to numpy
c = b.numpy()                

print(a, type(a))
print(b)
print(c, type(c))

[[1 2]
 [3 4]] <class 'numpy.ndarray'>

 1  3
 2  4
[torch.LongTensor of size 2x2]

[[1 3]
 [2 4]] <class 'numpy.ndarray'>


<span style="color:red"> **[PROBLEM IV]**: </span> 

In [11]:
# create two random matrices with numpy
x = np.random.randn(3, 10)
y = np.random.randn(10, 4)

<span style="color:red"> Do the following: </span>
* <span style="color:red">transform $\mathbf{x}$ and $\mathbf{y}$ to torch.Tensors</span>
* <span style="color:red">perform matrix mutliplication $\mathbf{r1} = \mathbf{x} \cdot \mathbf{y} $</span>  
<span style="color:blue"> look in for pytorch function http://pytorch.org/docs/master/torch.html#torch.mm </span>  
* <span style="color:red">perform matrix element-wise mutliplication $\mathbf{r2} = \mathbf{r1} \cdot \mathbf{r1} $</span>  
<span style="color:blue"> look in for pytorch function http://pytorch.org/docs/master/torch.html#torch.mul </span> 
* <span style="color:red">perform scalar addition and scalar multiplication $\mathbf{r3} = 2 * \mathbf{r2} + 3 $</span>  
<span style="color:blue"> look in for the corresponding pytorch functions on the webpage above </span> 
* <span style="color:red">transform the result back to numpy </span>

In [12]:
# YOUR CODE HERE
r1 = torch.from_numpy(x).mm(torch.from_numpy(y))
r2 = r1 * r1
r3 = 2 * r2 + 3
r3.numpy()

array([[  4.78871768,  33.8241555 ,  29.05517845,  16.39240525],
       [  3.65927203,  13.04205007,  12.29041035,  62.47416877],
       [  5.34368326,  15.85144284,  11.15831282,   3.76068097]])

###  Autograd - get it simpler with Pytorch

Autograd, automatic differentiation (AD), also called algorithmic differentiation or computational differentiation is a set of techniques to numerically evaluate the derivative of a function. [https://en.wikipedia.org/wiki/Automatic_differentiation]

<img src="fig/AutomaticDifferentiationNutshell.png" style="height:256px;" />

Automatic differentiation is used to automate the computation of backward passes in neural networks. The autograd package in PyTorch provides exactly this functionality. 

Lets start with simple example.
Consider the following function:
$$f = (x + y) \cdot z$$

For concretness let's take $x=2$, $y=-7$, $z=3$. The 'forward' calculation is shown in <span style="color:green"> green </span> on the image below.

Automaic differentiation provides the elegant tool to calculate derivatives of $f$ with respect to all variables, by 'backward' path.

$$f = (x + y) \cdot z = u \cdot z $$

$$ \frac{\partial f}{\partial u} = z $$

$$ \frac{\partial f}{\partial z} = u = -5 $$

$$ \frac{\partial f}{\partial x} = \frac{\partial f}{\partial u} \cdot \frac{\partial u}{\partial x} = z = 3$$

$$ \frac{\partial f}{\partial y} = \frac{\partial f}{\partial u} \cdot \frac{\partial u}{\partial y} = z = 3$$

<img src="fig/comp_graph_1.png" style="height:256px;" />

In [13]:
# Create tensors.
x = Variable(torch.Tensor([2]), requires_grad=True)
y = Variable(torch.Tensor([-7]), requires_grad=True)
z = Variable(torch.Tensor([3]), requires_grad=True)

# Build a computational graph.
f = (x + y) * z   

# Compute gradients.
f.backward()

# Print out the gradients.
print(x.grad)    
print(y.grad)    
print(z.grad) 

Variable containing:
 3
[torch.FloatTensor of size 1]

Variable containing:
 3
[torch.FloatTensor of size 1]

Variable containing:
-5
[torch.FloatTensor of size 1]



<span style="color:red"> **[PROBLEM V]**: </span> 

<span style="color:red"> Make the computational graph of the following function  </span>

$$f = \frac{1}{1 + exp^{-(w_0 \cdot x_0 + w_1 \cdot x_1 + b )}} = \frac{1}{1 + exp^{-(\mathbf{w} \cdot \mathbf{x} + b )}}$$

<img src="fig/comp_graph_2.png" style="height:320px;" />

<span style="color:red"> Compute partial derivatives:  </span>

<span style="color:red">$$ \frac{\partial f}{\partial \mathbf{w}}  $$ </span>

<span style="color:red">$$ \frac{\partial f}{\partial b}  $$ </span>

<span style="color:red">$$ \frac{\partial f}{\partial \mathbf{x}}  $$ </span>

<span style="color:blue">define $\{x_0, x_1\}$ and $\{w_0, w_1\}$ as vector variables $\mathbf{x}$ and $\mathbf{w}$ </span>  
<span style="color:blue"> look in for pytorch exponent function http://pytorch.org/docs/master/torch.html#torch.exp </span>  
<span style="color:blue">use matrix operations</span>

<span style="color:green">You should get the numbers the same as on the figure</span>

In [14]:
#YOUR CODE HERE
w = Variable(torch.Tensor([3, 5]), requires_grad=True)
x = Variable(torch.Tensor([-2, 1]), requires_grad=True)
b = Variable(torch.Tensor([2]), requires_grad=True)
f = 1 / (torch.exp(-w.view(-1,2).mm(x.view(2,-1)) - b) + 1)

# Compute gradients.
f.backward()

# Print out the gradients.
print(w.grad)
print(x.grad)      
print(b.grad) 

Variable containing:
-0.3932
 0.1966
[torch.FloatTensor of size 2]

Variable containing:
 0.5898
 0.9831
[torch.FloatTensor of size 2]

Variable containing:
 0.1966
[torch.FloatTensor of size 1]



In [15]:
#YOUR CODE HERE
w = Variable(torch.Tensor([3, 5]), requires_grad=True)
x = Variable(torch.Tensor([-2, 1]), requires_grad=True)
b = Variable(torch.Tensor([2]), requires_grad=True)
f = 1 / (torch.exp(-torch.dot(w,x) - b) + 1)

# Compute gradients.
f.backward()

# Print out the gradients.
print(w.grad)
print(x.grad)      
print(b.grad) 

Variable containing:
-0.3932
 0.1966
[torch.FloatTensor of size 2]

Variable containing:
 0.5898
 0.9831
[torch.FloatTensor of size 2]

Variable containing:
 0.1966
[torch.FloatTensor of size 1]



<span style="color:red"> **[PROBLEM VI]**: </span> 

Let's move on to more complicated example.
We will consider some random input *x* and random output *y*.  
our *model* is 
$$y_{pred} = x \cdot w + b$$
and the *loss* is 
$$loss = \sum{(y_{pred} - y)^2}$$

<span style="color:red">Fill the missing parts below</span> 
* <span style="color:red">Initialize biases</span>  
* <span style="color:red">Implement $y_{pred}$ according to the formula above</span>  
<span style="color:blue">while computing be careful about tensor dimensionality size, for changing it use .expand function http://pytorch.org/docs/master/tensors.html#torch.Tensor.expand </span>
* <span style="color:red">Implement *loss* according to the formula above</span>

In [15]:
# Create random input and output data
x = Variable(torch.randn(10, 3))
y = Variable(torch.randn(10, 4))

# Randomly initialize weights
w = Variable(torch.randn(3, 4), requires_grad=True)

# Randomly initialize bias
#b = #YOUR CODE HERE
b = Variable(torch.randn(4), requires_grad=True)

In [16]:
learning_rate = 1e-3
for t in range(200):
    y_pred = x.mm(w) + b.expand(10, 4) #YOUR CODE HERE
  
    loss = (y_pred - y).pow(2).sum() #YOUR CODE HERE
    
    if t % 20 == 0:
        print(t, loss.data[0])
  
    loss.backward()
    w.data -= learning_rate * w.grad.data
    b.data -= learning_rate * b.grad.data
    
    w.grad.data.zero_()
    b.grad.data.zero_()


0 173.16668701171875
20 78.67835998535156
40 48.11701583862305
60 37.271697998046875
80 33.033660888671875
100 31.21847152709961
120 30.374984741210938
140 29.95508575439453
160 29.73402976989746
180 29.612422943115234


<span style="color:green">The output should be similar to the following:</span>