# Deep Learning in Medicine
### BMSC-GA 4493, BMIN-GA 3007 
### Lab 2: PyTorch Tutorial and Loss Functions


### Goal of this lab: 
    - Understand Pytorch Tensor, Variables, and AutoGrad. 
    - Understand Loss Functions

### What is PyTorch?
It's a Python based scientific computing package targeted as:
* A replacement for numpy to use the power of GPUs
* A deep learning research platform that provides maximum flexibility and speed

### Tensor
It is similar to Numpy Ndarray
<a href="https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html">https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html 


In [1]:
from __future__ import print_function
import torch

#### Tensor Initialization

In [2]:
x = torch.Tensor(6, 2)  # construct a 6x2 matrix, uninitialized


In [3]:
x, x.size()

(
  0.0000  0.0000
 -6.2870 -2.0005
  0.0000  0.0000
  0.0000  0.0000
  0.0000  0.0000
  0.0000  0.0000
 [torch.FloatTensor of size 6x2], torch.Size([6, 2]))

In [4]:
y = torch.rand(6, 2)  # construct a randomly initialized matrix


In [5]:
y, y.size()

(
  0.3452  0.1345
  0.7829  0.0130
  0.8530  0.6648
  0.3693  0.3538
  0.8928  0.7046
  0.0817  0.6020
 [torch.FloatTensor of size 6x2], torch.Size([6, 2]))

In [6]:
z = torch.ones(7) # construct a matrix of ones


In [7]:
z, z.size()

(
  1
  1
  1
  1
  1
  1
  1
 [torch.FloatTensor of size 7], torch.Size([7]))

#### Operation Example: Addtion
Related reading and reference:
    
* PyTorch documentation:
<a href="http://pytorch.org/docs/0.3.0/"> http://pytorch.org/docs/0.3.0/ </a>


In [8]:
# addition: syntax 1
x + y


 0.3452  0.1345
-5.5041 -1.9875
 0.8530  0.6648
 0.3693  0.3538
 0.8928  0.7046
 0.0817  0.6020
[torch.FloatTensor of size 6x2]

In [9]:
# addition: syntax 2
torch.add(x, y)


 0.3452  0.1345
-5.5041 -1.9875
 0.8530  0.6648
 0.3693  0.3538
 0.8928  0.7046
 0.0817  0.6020
[torch.FloatTensor of size 6x2]

In [10]:
# addition: giving an output tensor
result = torch.Tensor(6, 2)
torch.add(x, y, out=result)


 0.3452  0.1345
-5.5041 -1.9875
 0.8530  0.6648
 0.3693  0.3538
 0.8928  0.7046
 0.0817  0.6020
[torch.FloatTensor of size 6x2]

In [11]:
# addition: in-place
y.add_(x) # adds x to y


 0.3452  0.1345
-5.5041 -1.9875
 0.8530  0.6648
 0.3693  0.3538
 0.8928  0.7046
 0.0817  0.6020
[torch.FloatTensor of size 6x2]

In [13]:
y.add_(x)


  0.3452   0.1345
-18.0781  -5.9884
  0.8530   0.6648
  0.3693   0.3538
  0.8928   0.7046
  0.0817   0.6020
[torch.FloatTensor of size 6x2]

In [14]:
y # y value is updated with the implace addition


  0.3452   0.1345
-18.0781  -5.9884
  0.8530   0.6648
  0.3693   0.3538
  0.8928   0.7046
  0.0817   0.6020
[torch.FloatTensor of size 6x2]

In [15]:
x


 0.0000  0.0000
-6.2870 -2.0005
 0.0000  0.0000
 0.0000  0.0000
 0.0000  0.0000
 0.0000  0.0000
[torch.FloatTensor of size 6x2]

#### Numpy Bridge:
The torch Tensor and numpy array will share their underlying memory locations, and changing one will change the other.

##### Convert Torch Tensor to Numpy

In [16]:
a = torch.ones(5)
a


 1
 1
 1
 1
 1
[torch.FloatTensor of size 5]

In [17]:
b = a.numpy()
b

array([ 1.,  1.,  1.,  1.,  1.], dtype=float32)

In [18]:
a.add_(1) # Remember this is an inplace addition
print(a)
print(b) # see how the numpy array changed in value


 2
 2
 2
 2
 2
[torch.FloatTensor of size 5]

[ 2.  2.  2.  2.  2.]


In [19]:
c = torch.ones(5)

##### Converting Numpy Array to Torch Tensor

In [20]:
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)
print(b)

[ 2.  2.  2.  2.  2.]

 2
 2
 2
 2
 2
[torch.DoubleTensor of size 5]



####  Used of CUDA

In [21]:
# let us run this cell only if CUDA is available
if torch.cuda.is_available():
    x = x.cuda()
    y = y.cuda()
    x + y

### Autograd: automatic differentiation
* The autograd package provides automatic differentiation for all operations on Tensors.It is a define-by-run framework, which means that your backprop is defined by how your code is run, and that every single iteration can be different.

### Variable
* autograd.Variable is the central class of the package. It wraps a Tensor, and supports nearly all of operations defined on it. 
* .backward() computes all the gradients automatically; for non-scalar variable, grad_output argument is the tensor of matching shape
* .data attribute accesses the raw tensor
* .grad access the gradient

### Function
* Variable and Function are interconnected and build up an acyclic graph , that econdes the history of computation. 
* Each variable has a .grad_fn attribute to reference a function created the Variable, expectr the Variables created by the user ( grad_fn = None)

Related Reading and Reference:
<a href="http://pytorch.org/docs/autograd"> http://pytorch.org/docs/autograd </a>

In [22]:
import torch
from torch.autograd import Variable

In [23]:
x = Variable(torch.ones(2, 2), requires_grad=True)
print(x)

Variable containing:
 1  1
 1  1
[torch.FloatTensor of size 2x2]



In [24]:
x 

Variable containing:
 1  1
 1  1
[torch.FloatTensor of size 2x2]

In [25]:
y = x + 2
print(y)

Variable containing:
 3  3
 3  3
[torch.FloatTensor of size 2x2]



In [26]:
print(y.grad_fn)

<torch.autograd.function.AddConstantBackward object at 0x10a04f5e8>


In [27]:
z = y * y * 2
out = z.mean()
print(z, out)

Variable containing:
 18  18
 18  18
[torch.FloatTensor of size 2x2]
 Variable containing:
 18
[torch.FloatTensor of size 1]



In [28]:
# What's the gradient of X before backward() is performed?
print(x.grad)

None


In [29]:
# What's the correct gradient of X?

# .retain_grad() enables the .grad attribute of non-leaf variable. 
# Remember of the graph of this process? x is a leaf variable and y,z are non-leaf variables
# .backward() computes and accumulates gradient values w.r.t leaf variables 

y.retain_grad() 
out.backward()

print(x.grad)
print(y.grad)

# you should zero out the gradient after gradient calculation each time
# as backward() computes and accumulates gradient values
x.grad.data.zero_()
y.grad.data.zero_()

# Question: How do we get these values?

Variable containing:
 3  3
 3  3
[torch.FloatTensor of size 2x2]

Variable containing:
 3  3
 3  3
[torch.FloatTensor of size 2x2]




 0  0
 0  0
[torch.FloatTensor of size 2x2]

In [38]:
# Another way to calculate the gradient for more than one variables in the graph
from torch.autograd import grad

# set up the problem 
x = Variable(torch.ones(2, 2), requires_grad=True)
y = x + 2
z = y * y * 2
out = z.mean()

# torch.autograd.grad computes gradients of the output variables (out in this case) w.r.t input variables (x and y) 
# Please refer to pytorch documentation
grad(out, {x, y})

(Variable containing:
  3  3
  3  3
 [torch.FloatTensor of size 2x2], Variable containing:
  3  3
  3  3
 [torch.FloatTensor of size 2x2])

### Loss Functions

 Related Reference: 
<a href="http://pytorch.org/docs/master/nn.html#loss-functions">http://pytorch.org/docs/master/nn.html#loss-functions </a>

#### Mean Squared Error
Question: What is mean squared error? What are the inputs? What's the output?

In [31]:
import torch.nn as nn
input = Variable(torch.randn(4, 5), requires_grad=True)
target = Variable(torch.randn(4, 5))

In [32]:
print(input)
print(target)

Variable containing:
 1.0067  1.3582 -1.6535  1.0207  0.7158
-0.1837 -0.6007  0.1071  1.8064  0.8879
 0.7336 -1.0479  2.6996 -0.3867 -0.1989
 0.8987 -0.9395 -0.6792  0.1445  0.4901
[torch.FloatTensor of size 4x5]

Variable containing:
-0.0775 -1.0968  1.2148  0.2029  0.2498
-0.8878 -0.5683  0.3297 -0.1101 -0.3222
 1.2251 -1.3004 -1.1549 -0.1843  0.4544
-1.0313 -1.4928 -0.1002 -0.8093  1.2386
[torch.FloatTensor of size 4x5]



In [33]:
loss = nn.MSELoss()
output = loss(input, target) # Note, in actual training, input here will be replaced with the predicted values
output.backward()

In [34]:
output, input

(Variable containing:
  2.1733
 [torch.FloatTensor of size 1], Variable containing:
  1.0067  1.3582 -1.6535  1.0207  0.7158
 -0.1837 -0.6007  0.1071  1.8064  0.8879
  0.7336 -1.0479  2.6996 -0.3867 -0.1989
  0.8987 -0.9395 -0.6792  0.1445  0.4901
 [torch.FloatTensor of size 4x5])

#### Cross Entropy Loss
Question: What is cross entropy loss? What are the inputs? What's the output?

In [35]:
input = Variable(torch.randn(4, 5), requires_grad=True)
target = Variable(torch.LongTensor(4).random_(5))
print(input)
print(target)

Variable containing:
 1.1191 -1.3392 -2.0688 -0.0497 -0.2072
-1.6031  1.0475 -0.5397  1.0828  2.4887
-1.1214  0.0846  1.0767 -0.4005  0.6867
-1.6015  0.2643  0.7007  1.8930  1.6718
[torch.FloatTensor of size 4x5]

Variable containing:
 4
 4
 1
 2
[torch.LongTensor of size 4]



In [37]:
# Filling in the code to calculated the cross-entropy loss
loss = nn.CrossEntropyLoss()
output = loss(input, target) # Note, in actual training, input here will be replaced with the predicted values
output

Variable containing:
 1.5490
[torch.FloatTensor of size 1]

### Reference:
* Deep Learning with PyTorch: A 60 Minute Blitz:
    <a href="http://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html">http://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html
    
    
* PyTorch documentation:
<a href="http://pytorch.org/docs/0.3.0/"> http://pytorch.org/docs/0.3.0/ </a>