### 作業目標: 使用Pytorch進行微分與倒傳遞
這份作業我們會實作微分與倒傳遞以及使用Pytorch的Autograd。

### 使用Pytorch實作微分與倒傳遞

這裡我們很簡單的實作兩層的神經網路進行回歸問題，其中loss function為L2 loss

$$
L2\_loss = (y_{pred}-y)^2
$$

兩層經網路如下所示
$$
y_{pred} = ReLU(XW_1)W_2
$$

In [1]:
import torch
device = torch.device('cpu')

In [2]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
x = torch.randn((N, D_in)).to(device)
y = torch.randn((N, D_out)).to(device)

# 初始化weight W1, W2
W1 = torch.randn((D_in, H)).to(device)
W2 = torch.randn((H, D_out)).to(device)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
    # 向前傳遞: 計算y_pred
    h = torch.matmul(x, W1)
    h_relu = torch.relu(h)
    y_pred = torch.matmul(h_relu, W2)

    # 計算loss
    loss = torch.square(y_pred - y).sum()
    print(t, loss.item())

    # 倒傳遞: 計算W1與W2對loss的微分(梯度)
    y_pred_grad = 2. * (y_pred - y)
    W2_grad = h_relu.T.mm(y_pred_grad)
    h_grad = y_pred_grad.mm(W2.T) * (h > 0.)
    W1_grad = x.T.mm(h_grad)

    # 參數更新
    W1.data -= learning_rate * W1_grad
    W2.data -= learning_rate * W2_grad

0 38726248.0
1 40318640.0
2 42564536.0
3 37843348.0
4 25836382.0
5 13665246.0
6 6468407.0
7 3309400.25
8 2028546.125
9 1455197.5
10 1144233.75
11 942018.125
12 793187.625
13 676552.1875
14 581932.5625
15 503753.625
16 438328.5
17 383156.46875
18 336391.96875
19 296431.15625
20 262151.125
21 232644.609375
22 207145.0625
23 184997.5
24 165668.59375
25 148703.890625
26 133765.203125
27 120580.1796875
28 108899.9765625
29 98529.9921875
30 89302.0234375
31 81077.65625
32 73722.375
33 67134.1875
34 61214.06640625
35 55884.1484375
36 51080.4765625
37 46743.68359375
38 42818.890625
39 39267.4453125
40 36046.35546875
41 33122.4375
42 30463.01171875
43 28040.8671875
44 25832.923828125
45 23817.2734375
46 21974.125
47 20288.62890625
48 18745.185546875
49 17330.11328125
50 16033.814453125
51 14844.7529296875
52 13751.98046875
53 12746.6748046875
54 11821.431640625
55 10968.9208984375
56 10183.208984375
57 9458.5634765625
58 8789.2490234375
59 8171.1376953125
60 7599.4794921875
61 7071.02001953125


### 使用Pytorch的Autograd

In [3]:
import torch
device = torch.device('cpu')

In [4]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
x = torch.randn((N, D_in)).to(device)
y = torch.randn((N, D_out)).to(device)

# 初始化weight W1, W2
W1 = torch.randn((D_in, H), requires_grad=True).to(device)
W2 = torch.randn((H, D_out), requires_grad=True).to(device)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
    # 向前傳遞: 計算y_pred
    y_pred = torch.matmul(torch.relu(torch.matmul(x, W1)), W2)

    # 計算loss
    loss = torch.square(y_pred - y).sum()
    print(t, loss.item())

    # 倒傳遞: 計算W1與W2對loss的微分(梯度)
    loss.backward()

    # 參數更新: 這裡在更新參數時，我們不希望更新參數的計算也被紀錄微分相關的資訊，因此使用torch.no_grad()
    with torch.no_grad():
        # 更新參數W1 W2
        W1.data -= learning_rate * W1.grad 
        W2.data -= learning_rate * W2.grad 

        # 將紀錄的gradient清空(因為已經更新參數)
        W1.grad.zero_()
        W2.grad.zero_()

0 37071792.0
1 34518796.0
2 32226868.0
3 26123524.0
4 17636146.0
5 10165134.0
6 5512571.5
7 3095902.25
8 1919997.625
9 1323996.75
10 991731.8125
11 783262.9375
12 638769.9375
13 530930.4375
14 446680.0625
15 379092.8125
16 324008.5625
17 278476.90625
18 240480.1875
19 208577.90625
20 181639.59375
21 158762.375
22 139243.09375
23 122517.03125
24 108116.5859375
25 95690.4765625
26 84910.0625
27 75511.703125
28 67306.953125
29 60127.27734375
30 53824.5859375
31 48275.5234375
32 43379.515625
33 39047.71875
34 35206.4375
35 31795.30078125
36 28762.20703125
37 26055.689453125
38 23635.291015625
39 21465.90625
40 19518.837890625
41 17768.01953125
42 16191.96484375
43 14770.61328125
44 13487.412109375
45 12326.5791015625
46 11276.72265625
47 10325.568359375
48 9462.294921875
49 8677.490234375
50 7964.193359375
51 7314.8876953125
52 6722.88330078125
53 6182.94287109375
54 5689.822265625
55 5239.1337890625
56 4826.9462890625
57 4449.822265625
58 4104.22607421875
59 3787.42822265625
60 3496.73657