# Pytorch 101 

#### Date : 2019.04.05
#### Jen-Huan Hu

##### The Tensor : 
- Compute gradient or not, all in one, no need to split to Variable, Input, or constant.  
Only specify via ```requires_grad=True```
- Can be put either on CPU or on GPU

In [1]:
# Adopted from example codes in https://pytorch.org/tutorials/beginner/
import torch
dtype = torch.float
device = torch.device("cpu")
device = torch.device("cuda")
x = torch.randn(10, 2, device=device, dtype=dtype)
y = torch.randn(10, 3, device=device, dtype=dtype)
print(x)
print(y)

tensor([[-0.3004,  0.9528],
        [-0.0706,  0.3140],
        [-0.1106, -0.7706],
        [-0.7284, -0.1796],
        [ 0.5580, -0.4967],
        [ 0.8949, -0.6340],
        [ 0.9921, -0.0614],
        [-0.2118, -0.3671],
        [-2.5234,  1.0110],
        [-2.1622,  1.8061]], device='cuda:0')
tensor([[ 0.9422, -0.9047, -0.2429],
        [-2.2105,  0.3179, -0.6145],
        [ 0.0822, -0.3188,  1.0322],
        [ 0.3550,  0.8601, -1.2159],
        [-1.4600, -0.3394,  0.6525],
        [ 1.2568,  0.8458, -1.6936],
        [ 0.9896,  0.5678, -0.4366],
        [ 2.4846,  2.3004, -0.8342],
        [ 1.8072, -0.6414,  0.9406],
        [ 2.0180, -0.6870, -1.1373]], device='cuda:0')


Each tensor contains ```.data``` and ```.grad```.
- ```.data``` is the current value, unlike Tensorflow, which requires 
*execution mode* to display value directly, invoke the Pytorch variable directly shows its value.
- ```.grad```

In [10]:
print(x.data)
print(x.grad)

tensor([[ 0.7978, -0.7820],
        [ 1.5388, -0.1608],
        [-0.2886,  0.3207],
        [-0.5638,  0.7177],
        [-1.6463,  1.5153],
        [-0.9010,  1.7310],
        [-0.1920,  0.7477],
        [ 1.6201,  0.4547],
        [-1.4789, -0.2135],
        [-1.4830,  1.1082]], device='cuda:0')
None


#### Demo of ***autograd*** :
Let  
$
\begin{align}
y = A * x + b \\
loss = \sum{y - \bar{y}}
\end{align}
$

with  
$
\begin{align}
y 
\end{align}
$
: output of linear model  
$
\begin{align} 
\bar{y} 
\end{align}
$
: ground truth

In [9]:
N = 10
dim = 5
learning_rate = 0.001
x = torch.randn(N, dim, device=device, dtype=dtype)
A = torch.randn(dim, dim, device=device, dtype=dtype, requires_grad = True)
b = torch.randn(dim, device=device, dtype=dtype, requires_grad = True)
print(x.shape)
print(A.shape)
print(b.shape)
print(y.shape)
bar_y = x * 20 + 5

for i in range(3200):
    y = torch.add( torch.mm(x, A),  b)
    loss = (bar_y - y).pow(2).sum()
    loss.backward()
    with torch.no_grad():
        A -= learning_rate * A.grad
        b -= learning_rate * b.grad
        # Manually zero the gradients after updating weights
        A.grad.zero_()
        b.grad.zero_()
        if i % 100 == 0:
            print(loss.item())
        
print(A)
print(b)

torch.Size([10, 5])
torch.Size([5, 5])
torch.Size([5])
torch.Size([10, 5])
22203.67578125
840.3851318359375
323.9636535644531
181.81585693359375
118.34613037109375
81.90540313720703
58.22930908203125
41.95325469970703
30.454334259033203
22.208675384521484
16.242908477783203
11.902196884155273
8.732269287109375
6.411769390106201
4.710416793823242
3.4616994857788086
2.5445878505706787
1.870729684829712
1.3754652738571167
1.0113813877105713
0.7436903715133667
0.5468733906745911
0.4021511673927307
0.2957342267036438
0.2174730747938156
0.15992480516433716
0.1176069974899292
0.0864873081445694
0.06360244005918503
0.04677269235253334
0.03439600020647049
0.025294408202171326
tensor([[ 2.0000e+01, -9.1572e-04, -3.4387e-04, -2.0627e-03,  9.8785e-04],
        [-2.1653e-03,  1.9989e+01, -4.0026e-03, -2.6654e-02,  1.2648e-02],
        [-5.8809e-04, -2.9622e-03,  1.9999e+01, -6.9037e-03,  3.2779e-03],
        [-8.3797e-03, -4.4591e-02, -1.5680e-02,  1.9895e+01,  4.9695e-02],
        [ 4.3944e-03,  2

So basically, the diagonal elements are all go to near 20.
and the bias is close to 5.  
This test shows how Pytorch ***autograd*** works, and how manual gradient descent works.

---

##### Tensor creation from *Numpy* :