In [1]:
from __future__ import print_function
import torch

In [2]:
x = torch.rand(5,3)
print(x)


 0.6720  0.7877  0.7345
 0.1567  0.4794  0.3279
 0.1312  0.0936  0.4808
 0.7903  0.5292  0.6350
 0.9796  0.6047  0.4758
[torch.FloatTensor of size 5x3]



In [3]:
print(x.size())

torch.Size([5, 3])


Any operation that mutates a tensor in-place is post-fixed with an underscore. For example: 'x.copy_(y)', and 'x.t_()', will change x.

In [4]:
y = torch.zeros(5,3)

In [5]:
print(x)


 0.6720  0.7877  0.7345
 0.1567  0.4794  0.3279
 0.1312  0.0936  0.4808
 0.7903  0.5292  0.6350
 0.9796  0.6047  0.4758
[torch.FloatTensor of size 5x3]



In [6]:
x.copy_(y)
print(x)


 0  0  0
 0  0  0
 0  0  0
 0  0  0
 0  0  0
[torch.FloatTensor of size 5x3]



Standard numpy indexing works.

In [7]:
print(x[:,1])


 0
 0
 0
 0
 0
[torch.FloatTensor of size 5]



In fact, you can convert back and forth from a Torch array to a numpy ndarray.

In [8]:
b = x.numpy()
print(b)
type(b)

[[ 0.  0.  0.]
 [ 0.  0.  0.]
 [ 0.  0.  0.]
 [ 0.  0.  0.]
 [ 0.  0.  0.]]


numpy.ndarray

In [9]:
import numpy as np
a = np.ones((3,5))
c = torch.from_numpy(a)
print(c)
type(c)


 1  1  1  1  1
 1  1  1  1  1
 1  1  1  1  1
[torch.DoubleTensor of size 3x5]



torch.DoubleTensor

At the moment, CUDA is unavailable. But you can have blocks of code that run only if it is. This call moves tensors onto the GPU.

In [10]:
if torch.cuda.is_available():
    print("C is available")
    x = x.cuda()

### Autograd: automatic differentiation exercise

the autograd package provides automatic differentiation for all operations on Tensors. 

autograd.Variable is the central class of this package; it wraps a Tensor.

In [11]:
from torch.autograd import Variable
x = Variable(torch.ones(2,2),requires_grad=True)
print(x)

Variable containing:
 1  1
 1  1
[torch.FloatTensor of size 2x2]



In [12]:
# every Variable object has a creator attribute which points
# back to the function that created it. I guess that helps 
# track backward for differentiation.
y = x + 2
print(y)
print(y.creator)

Variable containing:
 3  3
 3  3
[torch.FloatTensor of size 2x2]

<torch.autograd._functions.basic_ops.AddConstant object at 0x7f6cf03d5ba8>


### It looks like the creator is always the last function in the expression that created it.

In [13]:
z = y*y*3
print(z)

Variable containing:
 27  27
 27  27
[torch.FloatTensor of size 2x2]



In [14]:
# you can always access the underlying torch tensor through the .data accessor.
type(y.data)

torch.FloatTensor

In [15]:
out = z.mean()
print(out)

Variable containing:
 27
[torch.FloatTensor of size 1]



In [16]:
# I am not yet sure what out.backward does.
out.backward(retain_variables=True)
print(x.grad)

Variable containing:
 4.5000  4.5000
 4.5000  4.5000
[torch.FloatTensor of size 2x2]



If the 'out' variable is O, then $O={1\over 4}\Sigma_i z_i$, and $z_i=3(x_i + 2)^2$. 

So,

${dO \over dx_i} = {dO \over dz_i}\cdot {dz_i \over dx_i} = 
{1\over 4}\cdot 6(x_i + 2) = {18 \over 4} = 4.5$ 

if the expression is evaluated at $x_i = 1$.

So it seems that the grad function gives the derivative of variables deeper in the computation graph, with respect to the calling variable.  But then shouldn't z.grad be defined as well? It's still 'None'.

In [17]:
print(z.grad)

None


Another autograd example:

In [18]:
# initialize a random variable
x = torch.randn(3)
x = Variable(x, requires_grad=True)

# multiply it by 2 until its norm grows to a certain size
y = x * 2
while y.data.norm() < 1000:
    y = y * 2

print(y)


Variable containing:
  853.5403
  327.2983
 1259.5571
[torch.FloatTensor of size 3]



# The meaning of the 'gradients' parameter which is passed to 'backward'

Now you can take its derivative with respect to  the initial 3-vector x
I pass it the derivative at y: dZ/dy = (1,1,1)

It chokes if you don't feed it the 'gradients' term which has to have the same dimensionality as the output.

NTS: After reading up on automatic differentiation -- at the end of the  graph, there is an implicit single Z output value. Backpropagation actually computes dZ/dx for every x in the graph. So if you are considering a multidimensional output (y1,y2, y3) for the graph, you need to initialize the backpropagation with dZ/dy1, dZ/dy2, and dZ/dy3. 

Also NTS: there is also an implicit dZ/dZ == 1 at the end of the graph, which is why, if your output is 1-dimensional, you can leave 'gradients' out of the PyTorch backward function and it will assume your last node is Z, and assume a starting value of dZ/dZ = 1.

In this example, y is a 3-tensor so you have to give it an explicit gradient.

In [19]:
#Now you can take its derivative with respect to  the initial 3-vector x
# by default it takes the derivative at x = (1,1,1)
# It chokes if you don't feed it the 'gradients' term which has to be 3D

### NTS: After reading up on automatic differentiation -- at the end of the 
# graph, there has to be a single Z output value. 
# Backpropagation actually computes dZ/dx
# for every x in the graph. So if you are considering a multidimensional output 
# (y1,y2, y3) for the graph, you are implicitly assuming you've started off 
# with dZ/dy1, dZ/dy2, and dZ/dy3. 

# NTS: there is also an implicit dZ/dZ == 1 at the end of the graph, which is why,
# if your output is 1-dimensional, you can leave 'gradients' out of the 
# backward function and backward will assume your last node is Z, so that using a 
# starting value of dZ/dZ = 1 is appropriate.
# In this example, y is a 3-tensor so you have to give it an explicit gradient.
gradients = torch.FloatTensor([1,1,1])
y.backward(gradients)
print(x.grad)

Variable containing:
 1024
 1024
 1024
[torch.FloatTensor of size 3]



This time, don't use a trivial input gradient: make it nontrivial so you can see that the effect on the backpropagated gradient is just multiplicative.

In [22]:
# initialize a random variable
x = torch.randn(3)
x = Variable(x, requires_grad=True)

# multiply it by 2 until its norm grows to a certain size
y = x * 2
while y.data.norm() < 1000:
    y = y * 2
gradients2 = torch.FloatTensor([10.0,1.0,.1])
y.backward(gradients2)
print(x.grad)


Variable containing:
 81920.0000
  8192.0000
   819.2000
[torch.FloatTensor of size 3]

