### 作業目標: 使用Pytorch進行微分與倒傳遞
這份作業我們會實作微分與倒傳遞以及使用Pytorch的Autograd。

### 使用Pytorch實作微分與倒傳遞

這裡我們很簡單的實作兩層的神經網路進行回歸問題，其中loss function為L2 loss

$$
L2\_loss = (y_{pred}-y)^2
$$

兩層經網路如下所示
$$
y_{pred} = ReLU(XW_1)W_2
$$

In [2]:
import torch
device = torch.device('cpu')

In [None]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)

# 初始化weight W1, W2
w1 = torch.randn(D_in, H, device=device)
w2 = torch.randn(H, D_out, device=device)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
  # 向前傳遞: 計算y_pred
    h = x.mm(w1)
    h_relu = h.clamp(min=0)
    y_pred = h_relu.mm(w2)

    # 計算loss
    loss = (y_pred - y).pow(2).sum()
    print(t, loss.item())

    # 倒傳遞: 計算W1與W2對loss的微分(梯度)
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.t().mm(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.t())
    grad_h = grad_h_relu.clone()
    grad_h[h < 0] = 0
    grad_w1 = x.t().mm(grad_h)

    # 參數更新
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

0 41012208.0
1 41455604.0
2 40426592.0
3 32099776.0
4 19425206.0
5 9610482.0
6 4653895.5
7 2602401.75
8 1738951.375
9 1311661.25
10 1053339.625
11 871963.9375
12 733717.875
13 623637.0625
14 533976.5
15 459877.0625
16 398296.5625
17 346628.4375
18 302947.125
19 265851.875
20 234216.125
21 207061.71875
22 183648.8125
23 163395.140625
24 145782.21875
25 130419.609375
26 116946.671875
27 105060.890625
28 94580.703125
29 85307.1328125
30 77072.1875
31 69750.78125
32 63229.1015625
33 57411.3671875
34 52204.4140625
35 47530.4765625
36 43331.24609375
37 39551.6328125
38 36146.0390625
39 33069.3359375
40 30287.869140625
41 27768.982421875
42 25482.6328125
43 23407.19921875
44 21519.986328125
45 19802.5625
46 18236.45703125
47 16807.27734375
48 15501.2158203125
49 14307.0703125
50 13215.982421875
51 12215.443359375
52 11297.6728515625
53 10455.720703125
54 9682.275390625
55 8971.400390625
56 8317.248046875
57 7715.3173828125
58 7160.8759765625
59 6649.7197265625
60 6178.23583984375
61 5742.6206

### 使用Pytorch的Autograd

In [5]:
import torch
device = torch.device('cpu')

In [6]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)

# 初始化weight W1, W2
w1 = torch.randn(D_in, H, device=device, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, requires_grad=True)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
    # 向前傳遞: 計算y_pred
    y_pred = x.mm(w1).clamp(min=0).mm(w2)

    # 計算loss
    loss = (y_pred - y).pow(2).sum()
    print(t, loss.item())

    # 倒傳遞: 計算W1與W2對loss的微分(梯度)
    loss.backward()

    # 參數更新: 這裡再更新參數時，我們不希望更新參數的計算也被紀錄微分相關的資訊，因此使用torch.no_grad()
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad

        # Manually zero the gradients after running the backward pass
        w1.grad.zero_()
        w2.grad.zero_()

0 29635326.0
1 24330520.0
2 22213194.0
3 19979736.0
4 16582840.0
5 12359974.0
6 8405246.0
7 5399070.0
8 3426994.5
9 2230063.0
10 1523664.25
11 1101696.0
12 839601.1875
13 667641.875
14 547939.6875
15 459844.8125
16 391945.125
17 337757.15625
18 293381.21875
19 256375.625
20 225122.28125
21 198464.71875
22 175592.328125
23 155832.890625
24 138691.46875
25 123758.46875
26 110689.390625
27 99231.9375
28 89154.5234375
29 80275.84375
30 72419.0625
31 65443.203125
32 59243.02734375
33 53723.875
34 48793.59765625
35 44379.94921875
36 40421.2265625
37 36865.5625
38 33668.2421875
39 30786.744140625
40 28184.62109375
41 25831.037109375
42 23699.0703125
43 21765.68359375
44 20010.71484375
45 18416.390625
46 16965.875
47 15643.044921875
48 14435.71875
49 13333.2197265625
50 12325.08203125
51 11401.763671875
52 10554.888671875
53 9778.146484375
54 9065.2900390625
55 8409.76171875
56 7806.892578125
57 7251.9638671875
58 6740.4091796875
59 6268.857421875
60 5833.310546875
61 5430.83544921875
62 5058.