# Machine Learning with PyTorch

## Understanding PyTorch

* Tensors and NumPy interfaces
* Autograd
* Using GPUs with `torch.cuda`
* Parallelizing on clusters with `torch.distributed`
* Create a neural network with `torch.nn`

## Tensors and NumPy interfaces

At a first pass, PyTorch tensors are very similar to NumPy arrays.  Both are ways of storing multi-dimensional data efficiently, and much of the same "vectorized" style of operation applies to both. Broadcasting and elementwise operations are similar.  Moreover, many of the same functions and methods exist in both PyTorch and NumPy, and conversion between the two formats is made straightforward by PyTorch.

Where PyTorch tensors go beyond NumPy arrays, and are needed for neural networks are in a couple key areas.  As a not-so-minor matter, tensors can work transparently on GPUs as well as CPUs, and this can often vastly speed up operations.  NumPy does not build in that capability, but a number of projects allow this particular capability to be used outside of PyTorch, in varying ways (see [PyCUDA](https://documen.tician.de/pycuda/array.html), [Numba](https://numba.pydata.org/numba-doc/dev/cuda/index.html), [CuPy](https://cupy.chainer.org/), [MXNet](https://mxnet.incubator.apache.org/versions/master/tutorials/basic/ndarray.html), and probably others).

You *could* use PyTorch simply to work with array computation on GPUs, but what is more likely to bring you here is an equally essential capability that is not present in those other mentioned libraries (by design): Autograd.  By storing the gradients from every operation (where autograd is enabled for any component tensors), PyTorch provides reverse automatic differentiation.  That is to say, it gives you a directed acyclic graph whose leaves are the input tensors and roots are the output tensors.  We explain this more below.

## Autograd

In [228]:
import torch

x = torch.randn(3, 3, requires_grad=True)
print("Random 2-D tensor")
print(x)

# Perform an operation on tensor
y = (x + 7) * 5
print("\nRandom 2-D shifted by 7, multiplied by 5")
print(y)

Random 2-D tensor
tensor([[ 0.5615, -1.1130, -1.0479],
        [-0.6518, -0.3695, -1.1789],
        [ 0.4406,  0.3704, -0.1163]], requires_grad=True)

Random 2-D shifted by 7, multiplied by 5
tensor([[37.8076, 29.4348, 29.7605],
        [31.7411, 33.1524, 29.1055],
        [37.2028, 36.8518, 34.4184]], grad_fn=<MulBackward0>)


In [232]:
v = torch.tensor([[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]], dtype=torch.float)

# grad_fn is derivative, so offset does not matter to slope
y.grad_fn(v)

(tensor([[ 5., 10., 15.],
         [20., 25., 30.],
         [35., 40., 45.]]), None)

In [237]:
# Perform an additional vectorized operation, then a reduction
z = y * 3
out = z.mean()

print("Multiplied by 3")
print(z)
print("\nMean of values")
print(out)

Multiplied by 3
tensor([[113.4228,  88.3044,  89.2815],
        [ 95.2233,  99.4572,  87.3164],
        [111.6083, 110.5554, 103.2551]], grad_fn=<MulBackward0>)

Mean of values
tensor(99.8249, grad_fn=<MeanBackward1>)


In [258]:
grad = out.grad_fn
indent = 1
while True:
    print(" "*indent, "-->", grad)
    if not grad.next_functions:
        break
    grad = grad.next_functions[0][0]
    indent += 1

  --> <MeanBackward1 object at 0x11df8d0b8>
   --> <MulBackward0 object at 0x11d4d9a20>
    --> <MulBackward0 object at 0x11df82cf8>
     --> <AddBackward0 object at 0x11df8d240>
      --> <AccumulateGrad object at 0x11df8d358>


In [174]:
import torch

N, D_in, H, D_out = 64, 1000, 100, 10

x = torch.randn(N, D_in, dtype=torch.float, requires_grad=False)
y = torch.randn(N, D_out, dtype=torch.float, requires_grad=False)

w1 = torch.randn(D_in, H, dtype=torch.float, requires_grad=True)
w2 = torch.randn(H, D_out, dtype=torch.float, requires_grad=True)

learning_rate = 1e-6
for t in range(500):
    y_pred = x.mm(w1).clamp(min=0).mm(w2)

    loss = (y_pred - y).pow(2).sum()
    print(t, loss.data.item())

    loss.backward()

    w1.data -= learning_rate * w1.grad.data
    w2.data -= learning_rate * w2.grad.data

    w1.grad.data.zero_()
    w2.grad.data.zero_()

0 30092430.0
1 25721254.0
2 22510530.0
3 18227932.0
4 13278302.0
5 8774677.0
6 5520158.0
7 3467485.5
8 2272035.25
9 1582319.0
10 1173364.375
11 915979.875
12 742447.75
13 617237.875
14 522056.9375
15 446764.28125
16 385746.03125
17 335264.25
18 293016.1875
19 257211.3125
20 226642.8125
21 200417.609375
22 177765.890625
23 158109.1875
24 141002.0625
25 126028.1953125
26 112872.28125
27 101297.484375
28 91083.59375
29 82051.4296875
30 74049.109375
31 66931.484375
32 60588.95703125
33 54926.48046875
34 49858.8203125
35 45316.52734375
36 41237.953125
37 37568.44140625
38 34263.0234375
39 31282.5859375
40 28595.404296875
41 26169.27734375
42 23972.61328125
43 21978.310546875
44 20168.7109375
45 18523.453125
46 17025.548828125
47 15660.7734375
48 14415.55859375
49 13279.970703125
50 12243.6318359375
51 11295.333984375
52 10427.42578125
53 9632.2724609375
54 8902.85546875
55 8233.1533203125
56 7617.99658203125
57 7053.326171875
58 6533.890625
59 6055.77978515625
60 5615.45263671875
61 5209.82

375 0.00019402771431487054
376 0.0001885614765342325
377 0.0001830369874369353
378 0.00017752006533555686
379 0.0001724722096696496
380 0.00016776005213614553
381 0.0001630315964575857
382 0.0001583377452334389
383 0.00015381410776171833
384 0.00014971845666877925
385 0.0001458066690247506
386 0.00014156801626086235
387 0.00013760363799519837
388 0.00013435866276267916
389 0.000130480169900693
390 0.00012713923933915794
391 0.00012387083552312106
392 0.00012071116361767054
393 0.00011767202522605658
394 0.0001146420108852908
395 0.00011160575377289206
396 0.00010900406050495803
397 0.00010604845738271251
398 0.00010361430031480268
399 0.00010104291141033173
400 9.868395864032209e-05
401 9.610589768271893e-05
402 9.365149162476882e-05
403 9.127856174018234e-05
404 8.924103167373687e-05
405 8.698945021023974e-05
406 8.472665649605915e-05
407 8.327760588144884e-05
408 8.126563625410199e-05
409 7.921575888758525e-05
410 7.762937457300723e-05
411 7.59475224185735e-05
412 7.40181640139781e-0

## Using GPUs with `torch.cuda`

## Parallelizing on clusters with `torch.distributed`

## Creating a neural network with `torch.nn`