### 作業目標: 使用Pytorch進行微分與倒傳遞
這份作業我們會實作微分與倒傳遞以及使用Pytorch的Autograd。

### 使用Pytorch實作微分與倒傳遞

這裡我們很簡單的實作兩層的神經網路進行回歸問題，其中loss function為L2 loss

$$
L2\_loss = (y_{pred}-y)^2
$$

兩層經網路如下所示
$$
y_{pred} = ReLU(XW_1)W_2
$$

In [2]:
import torch
device = torch.device('cpu')

In [59]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
x = torch.randn(N,D_in).to(device)
y = torch.randn(N,D_out).to(device)

# 初始化weight W1, W2
W1 = torch.randn(D_in,H, requires_grad=True).to(device)
W2 = torch.randn(H,D_out, requires_grad=True).to(device)

# 設置learning rate
learning_rate = 1e-6  

# 訓練500個epoch
for t in range(500):
    # 向前傳遞: 計算y_pred
    hidden = torch.mm(x,W1)
    h_relu = torch.relu(hidden)
    y_pred = torch.mm(h_relu, W2)
    
    # 計算loss
    loss = torch.sum(torch.square(y_pred - y))
    print(t, loss.item())

    # 倒傳遞: 計算W1與W2對loss的微分(梯度)
    diff = 2*(y_pred-y)
    W2_grad = torch.mm(h_relu.T, diff)
    W1_grad = x.T.mm(diff.mm(W2.T) * (hidden > 0.))

    # 參數更新
    W1 = W1 - learning_rate*(W1_grad)
    W2 = W2 - learning_rate*(W2_grad)

0 32256536.0
1 29726774.0
2 29975672.0
3 28044424.0
4 22646810.0
5 15120106.0
6 8845249.0
7 4873042.0
8 2801194.5
9 1768210.375
10 1241616.875
11 946968.1875
12 762051.125
13 632822.0625
14 535346.4375
15 458085.125
16 394936.03125
17 342475.0625
18 298360.0625
19 260932.703125
20 228987.3125
21 201589.0
22 178009.078125
23 157650.84375
24 139976.234375
25 124572.0
26 111108.703125
27 99313.625
28 88948.875
29 79818.015625
30 71760.7265625
31 64644.99609375
32 58332.609375
33 52724.59375
34 47732.0078125
35 43269.6640625
36 39275.9765625
37 35695.12890625
38 32478.904296875
39 29588.646484375
40 26986.69140625
41 24639.234375
42 22519.458984375
43 20603.392578125
44 18869.251953125
45 17297.380859375
46 15869.525390625
47 14572.802734375
48 13392.6982421875
49 12318.0625
50 11337.6953125
51 10442.42578125
52 9625.0810546875
53 8877.6923828125
54 8193.625
55 7567.43896484375
56 6993.66650390625
57 6467.51220703125
58 5984.7109375
59 5541.193359375
60 5133.17822265625
61 4757.822265625
6

### 使用Pytorch的Autograd

In [47]:
import torch
device = torch.device('cpu')

In [57]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
x = torch.randn(N,D_in).to(device)
y = torch.randn(N,D_out).to(device)

# 初始化weight W1, W2
W1 = torch.randn(D_in,H, requires_grad=True).to(device)
W2 = torch.randn(H,D_out, requires_grad=True).to(device)

# 設置learning rate
learning_rate = 1e-6  

# 訓練500個epoch
for t in range(500):
    # 向前傳遞: 計算y_pred
    hidden = torch.mm(x,W1)
    h_relu = torch.relu(hidden)
    y_pred = torch.mm(h_relu, W2)
    
    # 計算loss
    loss = torch.sum(torch.square(y_pred - y))
    print(t, loss.item())

    # 倒傳遞: 計算W1與W2對loss的微分(梯度)
    loss.backward()

    # 參數更新: 這裡再更新參數時，我們不希望更新參數的計算也被紀錄微分相關的資訊，因此使用torch.no_grad()
    with torch.no_grad():
        # 更新參數W1 W2
        W1.data = W1.data  - learning_rate*(W1.grad)
        W2.data  = W2.data  - learning_rate*(W2.grad)


    # 將紀錄的gradient清空(因為已經更新參數)
        W1.grad.zero_()
        W2.grad.zero_()

0 33394700.0
1 31035708.0
2 33403056.0
3 34463664.0
4 30248952.0
5 21196526.0
6 12089702.0
7 6116694.5
8 3129235.75
9 1775745.5
10 1156483.0
11 844212.3125
12 663873.375
13 544790.9375
14 457954.8125
15 390324.71875
16 335743.5
17 290758.875
18 253140.921875
19 221410.78125
20 194474.28125
21 171455.671875
22 151673.171875
23 134592.609375
24 119793.265625
25 106919.4296875
26 95679.34375
27 85823.5625
28 77167.78125
29 69553.6328125
30 62814.9296875
31 56839.59375
32 51529.453125
33 46800.7890625
34 42575.3203125
35 38792.30078125
36 35398.25390625
37 32347.5234375
38 29603.7265625
39 27128.6484375
40 24892.78515625
41 22868.69140625
42 21034.533203125
43 19368.958984375
44 17854.29296875
45 16474.6484375
46 15216.384765625
47 14067.5390625
48 13010.712890625
49 12044.4404296875
50 11159.033203125
51 10347.3857421875
52 9602.060546875
53 8917.08984375
54 8287.27734375
55 7707.2109375
56 7172.5498046875
57 6679.4560546875
58 6224.07177734375
59 5803.23291015625
60 5414.14404296875
61 5