## Walking through some of the "Pytorch blitz" examples

In [1]:
import torch
import numpy as np

In [2]:
a = torch.ones(4)

In [3]:
c = a.new_ones(3,2)

In [5]:
b = a.numpy()

Tensors and their "bridged" numpy counterparts use the same array in memory!

In [6]:
a +=1
print(b)

[2. 2. 2. 2.]


If we construct a Tensor directly from numpy array, this is not the case:

In [9]:
c = np.ones(3)
ct = torch.Tensor(c)
print(ct)

tensor([1., 1., 1.])


In [10]:
c += 1
print(ct)

tensor([1., 1., 1.])


But if we use from_numpy, the memory is shared:

In [11]:
d = np.ones(3)
dt = torch.from_numpy(d)
print(dt) 
d += 1
print(dt)

tensor([1., 1., 1.], dtype=torch.float64)
tensor([2., 2., 2.], dtype=torch.float64)


In [47]:
d = np.ones(3)
dt = torch.tensor(d)
print(dt)
d += 1
print(d,dt)

tensor([1., 1., 1.], dtype=torch.float64)
[2. 2. 2.] tensor([1., 1., 1.], dtype=torch.float64)


moving tensors between devices

In [13]:
cpu = torch.device("cpu")
gpu = torch.device("cuda")

In [14]:
x = torch.ones(5, device=cpu)
y = torch.ones(5, device=gpu)

In [15]:
#raises runtime error
z = x + y

RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor

In [16]:
# ok
z = x + y.to(cpu)

What if we create from numpy, then move to gpu?

In [26]:
a = np.ones(4)
b = torch.from_numpy(a)

Once the tensor is moved to GPU, it's no longer in the same device memory, so the numpy bridge is lost:

In [27]:
print(b)
b = b.to(gpu)
a += 1
print(a)
print(b)
b = b.to(cpu)
print(b)

tensor([1., 1., 1., 1.], dtype=torch.float64)
[2. 2. 2. 2.]
tensor([1., 1., 1., 1.], device='cuda:0', dtype=torch.float64)
tensor([1., 1., 1., 1.], dtype=torch.float64)


# Autograd etc

In [28]:
x = torch.ones(4, requires_grad = True)

In [29]:
y = x + 2

All computed tensors have associated Functions for obtaining gradients

In [31]:
f = y.grad_fn

In [36]:
f.name()

'AddBackward0'

User-created tensors don't:

In [38]:
print(x.grad_fn)

None


We can detach tensors to remove them from the computational graph

In [39]:
yd = y.detach()

In [40]:
print(yd)

tensor([3., 3., 3., 3.])


In [41]:
print(yd.grad_fn)

None


In [44]:
print(y.data)

tensor([3., 3., 3., 3.])


In [45]:
print(yd.data)

tensor([3., 3., 3., 3.])


In [48]:
x = torch.ones(2,2, requires_grad=True)
y = x+2
z = y * y * 3
out = z.mean()
print(out)

tensor(27., grad_fn=<MeanBackward1>)


In [49]:
out.backward()

In [51]:
print(x.grad)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])


In [52]:
x = torch.randn(3, requires_grad=True)
y = x * 2
while y.data.norm() < 1000:
    y = y * 2

In [53]:
print(y)

tensor([ 282.1167,  -25.0840, 1196.4836], grad_fn=<MulBackward0>)


Computing vector gradients requires feeding external ("upstream") gradient vectors

In [54]:
v = torch.tensor([.1, 1, 1], dtype=torch.float)


In [56]:
# raises RuntimeError
# y.backward()
# ok
y.backward(v)

## What the heck is a data tensor?

In [62]:
a = torch.ones(4, requires_grad=True)

In [63]:
b = a.data

In [65]:
print(a.requires_grad, b.requires_grad)

True False


By modifying the data tensor, I can alter the value that's stored in a particular tensor, without touching the graph:

In [68]:
print(a)
b += 2
print(a)
print(a.grad_fn)
print(b.grad_fn)

tensor([5., 5., 5., 5.], requires_grad=True)
tensor([7., 7., 7., 7.], requires_grad=True)
None
None


Compare with the following, which adds graph elements:

In [70]:
a = torch.ones(4, requires_grad=True)
b = a.data
print(a,b)
a = a + 2
print(a,b)

tensor([1., 1., 1., 1.], requires_grad=True) tensor([1., 1., 1., 1.])
tensor([3., 3., 3., 3.], grad_fn=<AddBackward0>) tensor([1., 1., 1., 1.])


Note that the data tensor has no idea what's going on, and is still stuck with the old value.
Also, note that *in place ops on a will give a Runtime Error*:

In [71]:
a = torch.ones(4, requires_grad=True)
a.add_(2)

RuntimeError: a leaf Variable that requires grad has been used in an in-place operation.

the problem is that a is a leaf variable:

In [75]:
print(a.is_leaf)

True


In [87]:
a = torch.ones(1, requires_grad=True)
b = a + 2
print(b.is_leaf)

False


In [88]:
print(b, b.data)

tensor([3.], grad_fn=<AddBackward0>) tensor([3.])


In [89]:
b.add_(2)
print(a, b, b.data)

tensor([1.], requires_grad=True) tensor([5.], grad_fn=<AddBackward0>) tensor([5.])


In [90]:
b.backward()
print(a.grad)

tensor([1.])


In [94]:
a.grad.zero_()
a.backward()
print(a.grad)

tensor([1.])


## Picking out parts of the computational graph

In [95]:
x = torch.ones(3, requires_grad=True)
y = 2 * x
z = y * y

In [98]:
print(z.grad_fn)
print(z.grad_fn.next_functions)
print(y.grad_fn)
print(y.grad_fn.next_functions)

<MulBackward0 object at 0x7fba3c4427b8>
((<MulBackward0 object at 0x7fba3c442978>, 0), (<MulBackward0 object at 0x7fba3c442978>, 0))
<MulBackward0 object at 0x7fba3c442898>
((<AccumulateGrad object at 0x7fba3c442940>, 0), (None, 0))
