# PyTorchの自動微分

## 参考
- [pytorch-tutorial > pytorch_basics](https://github.com/yunjey/pytorch-tutorial/blob/master/tutorials/01-basics/pytorch_basics/main.py)
- [Autograd mechanics](http://pytorch.org/docs/0.3.0/notes/autograd.html)

In [1]:
import numpy as np
import torch
import torch.nn as nn

In [7]:
# テンソルを作成
# requires_grad=Falseだと微分の対象にならず勾配はNoneが返る
x = torch.tensor(1.0, requires_grad=True)
w = torch.tensor(2.0, requires_grad=True)
b = torch.tensor(3.0, requires_grad=True)

# 計算グラフを構築
# y = 2 * x + 3
y = w * x + b

# 勾配を計算
y.backward()

# 勾配を表示
print(x.grad)  # dy/dx = w = 2
print(w.grad)  # dy/dw = x = 1
print(b.grad)  # dy/db = 1

tensor(2.)
tensor(1.)
tensor(1.)


## 自動微分の例いろいろ

- [Theanoの使い方 (2) 自動微分](http://aidiary.hatenablog.com/entry/20150518/1431954329) をPyTorchでやってみる

### 例1
$y = x^2$

$\frac{dy}{dx} = 2x$

In [8]:
x = torch.tensor(2.0, requires_grad=True)
y = x ** 2
y.backward()
print(x.grad)

tensor(4.)


### 例2

$y = exp(x)$

$\frac{dy}{dx} = exp(x)$

In [9]:
x = torch.tensor(2.0, requires_grad=True)
y = torch.exp(x)
y.backward()
print(x.grad)

tensor(7.3891)


### 例3
$y = \sin(x)$

$\frac{dy}{dx} = \cos(x)$

In [10]:
x = torch.tensor(np.pi, requires_grad=True)
y = torch.sin(x)
y.backward()
print(x.grad)

tensor(-1.)


### 例4
$y = (x - 4)(x^2 + 6)$

$\frac{dy}{dx} = 3x^2 - 8x + 6$

In [11]:
x = torch.tensor(0.0, requires_grad=True)
y = (x - 4) * (x ** 2 + 6)
y.backward()
print(x.grad)

tensor(6.)


### 例5
$y = (\sqrt x + 1)^3$

$\frac{dy}{dx} = \frac{3 (\sqrt x + 1)^2}{2 \sqrt x}$

In [12]:
x = torch.tensor(2.0, requires_grad=True)
y = (torch.sqrt(x) + 1) ** 3
y.backward()
print(x.grad)

tensor(6.1820)


### 例6
$z = (x + 2 y)^2$

$\frac{\partial z}{\partial x} = 2(x + 2y)$

$\frac{\partial z}{\partial y} = 4(x + 2y)$

In [13]:
x = torch.tensor(1.0, requires_grad=True)
y = torch.tensor(2.0, requires_grad=True)
z = (x + 2 * y) ** 2
z.backward()
print(x.grad)  # dz/dx
print(y.grad)  # dz/dy

tensor(10.)
tensor(20.)


## 一般的にはlossをbackwardする

In [14]:
# バッチサンプル数=5、入力特徴量の次元数=3
x = torch.randn(5, 3)
# バッチサンプル数=5、出力特徴量の次元数=2
y = torch.randn(5, 2)

# Linear層を作成
# 3ユニット => 2ユニット
linear = nn.Linear(3, 2)

# Linear層のパラメータ
print('w:', linear.weight)
print('b:', linear.bias)

# lossとoptimizer
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(linear.parameters(), lr=0.01)

# forward
pred = linear(x)

# loss = L
loss = criterion(pred, y)
print('loss:', loss)

# backpropagation
loss.backward()

# 勾配を表示
print('dL/dw:', linear.weight.grad)
print('dL/db:', linear.bias.grad)

# 勾配を用いてパラメータを更新
print('*** by hand')
print(linear.weight.sub(0.01 * linear.weight.grad))
print(linear.bias.sub(0.01 * linear.bias.grad))

# 勾配降下法
optimizer.step()

# 1ステップ更新後のパラメータを表示
# 上の式と結果が一致することがわかる
print('*** by optimizer.step()')
print(linear.weight)
print(linear.bias)

w: Parameter containing:
tensor([[ 0.4176,  0.2302,  0.3942],
        [-0.3258,  0.0489, -0.3333]], requires_grad=True)
b: Parameter containing:
tensor([0.4269, 0.2872], requires_grad=True)
loss: tensor(1.3395, grad_fn=<MseLossBackward>)
dL/dw: tensor([[ 0.4404,  0.4512,  0.9893],
        [-0.6777, -0.2535, -0.5191]])
dL/db: tensor([0.6095, 0.6305])
*** by hand
tensor([[ 0.4132,  0.2257,  0.3843],
        [-0.3191,  0.0514, -0.3281]], grad_fn=<ThSubBackward>)
tensor([0.4208, 0.2809], grad_fn=<ThSubBackward>)
*** by optimizer.step()
Parameter containing:
tensor([[ 0.4132,  0.2257,  0.3843],
        [-0.3191,  0.0514, -0.3281]], requires_grad=True)
Parameter containing:
tensor([0.4208, 0.2809], requires_grad=True)
