## autograd
* When you finish your computation you can call .backward()
* The gradient for this tensor will be accumulated into .grad attribute
* **To stop a tensor from tracking history**, you can call .detach()
* To prevent tracking history and using memory, you can also wrap the code block in 
    * **with torch.no_grad():**
* The model may have trainable parameters with **requires_grad=True**, 
    * but for which we don't need the gradients. (勾配を必要としない場合がある）

## Function
* There's one more class which is very important for autograd implementation - 
* build up an acyclic graph (非同期グラフ)
    * has a **.grad_fn** attribute



* If you want to compute the derivatives, もし、微分を計算したいのであれば、
    * you can call .backward()

In [1]:
import torch

In [3]:
x = torch.ones(2,2, requires_grad = True)
print(x)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)


In [4]:
y = x + 2
print(y)

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)


In [5]:
print(y.grad_fn)

<AddBackward0 object at 0x0000028DDBDBC208>


In [6]:
z = y * y * 3
out = z.mean()
print(z,out)

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward0>)


* .requires_grad_(...) changes an existing Tensor's requires_grad flag in-place.
    * default False

In [7]:
a = torch.randn(2,2)
a = ((a*3)/(a-1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b = (a*a).sum()
print(b.grad_fn)

False
True
<SumBackward0 object at 0x0000028DDBDBCE88>


* Because out contains a single scalar, out.backward() is equivalent to out.backward(torch.tensor(1.)).

In [8]:
out.backward()

In [9]:
print(x.grad)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])


* torch.autograd is an engine for computing vector-Jacobian product.
$$ v = (v1, v2, \cdot\cdot\cdot, v_{m})^T $$
* Now let’s take a look at an example of vector-Jacobian product:

In [10]:
x = torch.randn(3, requires_grad = True)

y = x * 2
while y.data.norm() < 1000:
    y = y * 2
print(y)

tensor([  662.8099,   530.6400, -1470.2668], grad_fn=<MulBackward0>)


* Now in this case y is no longer a scalar. torch.autograd could not compute the full Jacobian directly, but if we just want the vector-Jacobian product, simply pass the vector to backward as argument:

In [11]:
v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(v)
print(x.grad)

tensor([1.0240e+02, 1.0240e+03, 1.0240e-01])


* You can also stop autograd from tracking history on Tensors with .requires_grad=True either by wrapping the code block in with torch.no_grad():

In [12]:
print(x.requires_grad)
print((x**2).requires_grad)

with torch.no_grad():
    print((x**2).requires_grad)

True
True
False


* Or by using .detach() to get a new Tensor with the same content but that does not require gradients:

In [13]:
print(x.requires_grad)
y = x.detach()
print(y.requires_grad)
print(x.eq(y).all())

True
False
tensor(True)


In [16]:
# print(help(x.eq(y)))
# Document about autograd.Function is at https://pytorch.org/docs/stable/autograd.html#function