# 03 Autograd

In this part, learn about the Autograd and how to compute gradients.

# What is Autograd?

#Autogradとは

Autograd is a PyTorch package that automatically calculates gradients, which are crucial for optimizing models.

Grad is gradation, which is a vector of partial differentials with respect to each parameter.


Autogradは、モデルの最適化に欠かせない勾配を自動計算するPyTorchのパッケージです。

gradは勾配のことで，各パラメータについての偏微分をベクトルとしてまとめたものです．


# Autograd package

Autograd provides automatic differentiation for all operations on Tensors.

If we want the gradient, we have to set `requires_grad=True.` 

- This attribute set, all operations on the tensor are tracked in the computational graph.

Operations with tensors create a computational graph and have a "grad_fn" attribute.
- In this program, "y" has a computational graph and "grad_fn" attribute.


[torch.randn](https://pytorch.org/docs/stable/generated/torch.randn.html): Returns a tensor filled with random numbers from a normal distribution with mean 0 and variance 1 


AutogradはTensorに対するすべての操作に対して自動で微分を計算します．

勾配を求めたい場合は，`requires_grad=True.`を設定する必要があります．

- この属性が設定されると，Tensorに対するすべての操作が計算グラフの中で追跡されるようになります．

Tensorを使った操作は計算グラフを作り、"grad_fn" 属性を持ちます．
- このプログラムでは、"y "は計算グラフと "grad_fn "属性を持っています．


[torch.randn](https://pytorch.org/docs/stable/generated/torch.randn.html)： 平均0、分散1の正規分布からの乱数で満たされたTensorを返します．





In [None]:
import torch
x = torch.randn(3, requires_grad=True)
y = x + 2


print(x)
print(y)
print(y.grad_fn)


z = y * y * 3
print(z)
z = z.mean()
print(z)

tensor([-0.3206,  0.7642, -0.7820], requires_grad=True)
tensor([1.6794, 2.7642, 1.2180], grad_fn=<AddBackward0>)
<AddBackward0 object at 0x7fcc013177c0>
tensor([ 8.4616, 22.9223,  4.4503], grad_fn=<MulBackward0>)
tensor(11.9447, grad_fn=<MeanBackward0>)


# Computational graph

Operations with tensors create a computational graph.

First, do a forward pass, calculate the output Y.
- If `requires_grad = True`: automatically create and store a function.  and this function is then used in the back propagation.

- In this program, `y` has an attribute grad.

This function(grad_fn) is used in the back propagation and to get the gradients 

- `y` has an attribute grad_fm so this will point to a gradient function


It's called at `add backward` and with this function we can calculate the gradients.


Tensorを使った演算は，計算グラフを作成します．

まず，forward passで.，出力Yを計算します．
- requires_grad = True` の場合：自動的に関数を作成し，保存します．

- このプログラムの`y`はgrad属性を持っています．

この関数(grad_fn)はback propagationで使用され，gradientsを得るために使用されます．

- y`はgrad_fm属性を持っており，これは勾配関数を指します．

grad_fmが，`add backward`で呼び出され, この関数を使用し，gradientを計算できます．


![as](https://drive.google.com/uc?id=1H-cON2ukow9bAvcSa_E7sQVTRclX9H8y)


# compute the gradients with backpropagation

When we finish our computation we can simply call .backward() and have all the gradients computed automatically. The gradient for this tensor will be accumulated into .grad attribute. It is the partial derivate of the function w.r.t. the tensor.


計算が終わったら，.backward()を呼び出すだけで，すべての勾配を自動的に計算させることができます．このTensorの勾配は.grad属性に累積されます．これはTensorに対する関数の偏導関数です．


In [None]:
z.backward()
print(x.grad) # dz/dx

tensor([3.3589, 5.5284, 2.4359])


In general torch.autograd is computing a vector-Jacobian product. Compute partial derivatives while applying the chain rule.

一般的にtorch.autogradはvector-Jacobian productというものを計算しています．連鎖律を適用しながら部分導関数を計算します．

In [None]:
# Model with non-scalar output:
# If a Tensor is non-scalar (more than 1 elements), we need to specify arguments for backward() 


x = torch.randn(3, requires_grad=True)

y = x * 2
for _ in range(10):
    y = y * 2

print(y)
print(y.shape)

v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float32)
y.backward(v)
print(x.grad)

tensor([-2066.9714, -2159.9631, -3006.5088], grad_fn=<MulBackward0>)
torch.Size([3])
tensor([2.0480e+02, 2.0480e+03, 2.0480e-01])


# Stop a tensor from tracking history

For example during training loop when we want to update our weights then this update operation should not be part of the gradient computation. We have 3 options to stop gradient calculations.

例えば，学習ループ中に重みを更新する場合，この更新操作は勾配計算の一部であってはならない．勾配計算を停止するには，3つのオプションがあります．

# .requires_grad_(): changes an existing flag in-place

# .requires_grad_(): 存在するフラグをその場で変更

requires_grad_() controls whether the tensor is accessed directly and its gradient is tracked.
This modifies the tensor itself, so make sure the tensor is not shared with other variables. If you call a.requires_grad_(False), the gradient computation for a will stop.

requires_grad_()は，Tensorに直接アクセスし，その勾配を追跡するかどうかを制御します．
これによりTensor自体が変更されるため，Tensorが他の変数と共有されていないか確認が必要です． もし，a.requires_grad_(False) を呼び出すと，a の勾配計算が停止します．

In [None]:
a = torch.randn(2, 2)
print(a.requires_grad)
b = ((a * 3) / (a - 1))
print(b.grad_fn)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)

False
None
True
<SumBackward0 object at 0x7fcc01317d90>


# .detach(): get a new Tensor with the same content but no gradient computation:
# .detach(): 同じ内容を持つが勾配計算のない新しい Tensor を取得

.detach() creates a new tensor with the same data as the original tensor but without gradient chasing. if b = a.detach(), b will have the value of a, but the gradient computation will not be tracked.

.detach()は，元のTensorと同じデータを持つが勾配追跡を行わない新しいTensorを作成します．b = a.detach()とすると，bはaの値を持ちますが，勾配の追跡はされません．

In [None]:
a = torch.randn(2, 2, requires_grad=True)
print(a.requires_grad)
b = a.detach()
print(b.requires_grad)

True
False


# wrap in with torch.no_grad():
# torch.no_grad()で包む

Temporarily suspends gradient tracking for all internal calculations.

内部で行われるすべての計算の勾配追跡を一時的に停止します．

In [None]:
a = torch.randn(2, 2, requires_grad=True)
print(a.requires_grad)
with torch.no_grad():
    print((x ** 2).requires_grad)

True
False


# Empty gradients
backward() is accumulating the gradient of this tensor into the .grad attribute.
So we need to use .zero_() to empty the gradients before a new optimization step.
Otherwise, gradient information from different iterations may be mixed and training may not be correct.


backward() は、このTensorの勾配を .grad 属性に累積しています．
そのため，.zero_()を使って，新しい最適化ステップの前に勾配を空にする必要があります．
空にしないと，異なる反復からの勾配情報が混在し，学習が正しく行われなくなる可能性があります．

In [None]:
weights = torch.ones(4, requires_grad=True)

for epoch in range(3):
    # just a dummy example
    model_output = (weights*3).sum()
    model_output.backward()

    print(weights.grad)

    # optimize model, i.e. adjust weights...
    with torch.no_grad():
        weights -= 0.1 * weights.grad

    # this is important! It affects the final weights & output
    weights.grad.zero_()

print(weights)
print(model_output)

tensor([3., 3., 3., 3.])
tensor([3., 3., 3., 3.])
tensor([3., 3., 3., 3.])
tensor([0.1000, 0.1000, 0.1000, 0.1000], requires_grad=True)
tensor(4.8000, grad_fn=<SumBackward0>)
