### 作業目標: 使用Pytorch進行微分與倒傳遞
這份作業我們會實作微分與倒傳遞以及使用Pytorch的Autograd。

### 使用Pytorch實作微分與倒傳遞

這裡我們很簡單的實作兩層的神經網路進行回歸問題，其中loss function為L2 loss

$$
L2\_loss = (y_{pred}-y)^2
$$

兩層經網路如下所示
$$
y_{pred} = ReLU(XW_1)W_2
$$

In [1]:
import torch
device = torch.device('cpu')

In [3]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
###<your code>###
x = torch.randn(N, D_in, device = device)
y = torch.randn(N, D_out, device = device)

# 初始化weight W1, W2
###<your code>###
w1 = torch.randn(D_in, H, device = device)
w2 = torch.randn(H, D_out, device = device)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
    # 向前傳遞: 計算y_pred
    ###<your code>###
    h = x.mm(w1)
    h_relu = h.clamp(min=0)
    y_pred = h_relu.mm(w2)

    # 計算loss
    ###<your code>###
    loss = (y_pred-y).pow(2).sum()
    print(t, loss.item())

    # 倒傳遞: 計算W1與W2對loss的微分(梯度)
    ###<your code>###
    grad_y_pred = 2.0 * (y_pred-y)
    grad_w2 = h_relu.t().mm(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.t())
    grad_h = grad_h_relu.clone()
    grad_h[h < 0] = 0
    grad_w1 = x.t().mm(grad_h)
    
    # 參數更新
    ###<your code>###
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

0 31007222.0
1 27779354.0
2 29016494.0
3 30109080.0
4 27795460.0
5 21392710.0
6 13734828.0
7 7749834.0
8 4235825.0
9 2446457.0
10 1569384.125
11 1119416.625
12 865673.375
13 704871.75
14 591667.3125
15 505610.90625
16 436907.5625
17 380429.125
18 333181.21875
19 293219.53125
20 259070.921875
21 229728.90625
22 204381.0625
23 182389.046875
24 163229.03125
25 146468.1875
26 131754.21875
27 118791.9296875
28 107339.9375
29 97192.828125
30 88203.09375
31 80207.328125
32 73063.3359375
33 66668.4609375
34 60942.12109375
35 55792.94921875
36 51149.671875
37 46952.5234375
38 43157.96484375
39 39716.6328125
40 36590.1015625
41 33746.046875
42 31155.87890625
43 28791.875
44 26633.068359375
45 24657.0625
46 22847.150390625
47 21186.197265625
48 19662.533203125
49 18261.3125
50 16971.5
51 15783.4921875
52 14687.583984375
53 13676.208984375
54 12741.68359375
55 11877.7490234375
56 11078.6796875
57 10338.81640625
58 9653.021484375
59 9016.8291015625
60 8426.8505859375
61 7879.1083984375
62 7369.8730

442 0.00023221851733978838
443 0.00022669858299195766
444 0.00022161859669722617
445 0.00021760356321465224
446 0.0002128762862412259
447 0.00020821359066758305
448 0.0002036992518696934
449 0.0002001529064727947
450 0.00019604644330684096
451 0.00019166266429238021
452 0.0001876369206001982
453 0.00018390745390206575
454 0.00017943252169061452
455 0.00017584774468559772
456 0.000172184212715365
457 0.0001688090997049585
458 0.00016566112753935158
459 0.00016225676517933607
460 0.00015947507927194238
461 0.00015638553304597735
462 0.00015315393102355301
463 0.00015033481759019196
464 0.0001471911818953231
465 0.00014501783880405128
466 0.00014209671644493937
467 0.00013944462989456952
468 0.00013653657515533268
469 0.00013434435823000968
470 0.0001316860580118373
471 0.0001292914676014334
472 0.00012679748761001974
473 0.00012481580779422075
474 0.00012295579654164612
475 0.00012084833724657074
476 0.0001191540650324896
477 0.00011666478530969471
478 0.00011495137005113065
479 0.000112

In [9]:
print(h)
print(grad_h)

tensor([[-56.5955,  28.4796, -41.3764,  ...,  -6.0458, -10.1326, -49.7235],
        [ -0.0862,  -3.4317, -12.6462,  ...,   9.4660,  -8.1950,  17.2813],
        [ -0.4179, -17.2937,  -6.1472,  ...,   6.9604, -30.2982, -61.6549],
        ...,
        [ 34.1610, -34.4656,  15.8393,  ...,  33.5640,  20.9921,  47.7632],
        [ 13.2743,   6.3752, -33.7450,  ..., -32.3400,  -0.3585,  -3.4289],
        [ 22.3167,  -4.1795,  18.0820,  ...,  -1.0317, -25.6461,  27.1560]])
tensor([[ 0.0000e+00,  3.1192e-04,  0.0000e+00,  ...,  0.0000e+00,
          0.0000e+00,  0.0000e+00],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -2.0002e-03,
          0.0000e+00, -9.2590e-04],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ...,  2.3014e-03,
          0.0000e+00,  0.0000e+00],
        ...,
        [ 3.6051e-04,  0.0000e+00, -5.3349e-05,  ..., -1.1044e-04,
         -2.8362e-04, -3.6738e-04],
        [-4.2480e-04,  5.4007e-04,  0.0000e+00,  ...,  0.0000e+00,
          0.0000e+00,  0.0000e+00],
  

### 使用Pytorch的Autograd

In [None]:
import torch
device = torch.device('cpu')

In [10]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
###<your code>###
x = torch.randn(N, D_in, device = device)
y = torch.randn(N, D_out, device = device)

# 初始化weight W1, W2
###<your code>###
w1 = torch.randn(D_in, H, device = device, requires_grad = True)
w2 = torch.randn(H, D_out, device = device, requires_grad = True)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
    # 向前傳遞: 計算y_pred
    ###<your code>###
    y_pred = x.mm(w1).clamp(min=0).mm(w2)

    # 計算loss
    ###<your code>###
    loss = (y_pred - y).pow(2).sum()
    print(t, loss.item())

    # 倒傳遞: 計算W1與W2對loss的微分(梯度)
    ###<your code>###
    loss.backward()

    # 參數更新: 這裡再更新參數時，我們不希望更新參數的計算也被紀錄微分相關的資訊，因此使用torch.no_grad()
    with torch.no_grad():
    # 更新參數W1 W2
    ###<your code>###
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad

    # 將紀錄的gradient清空(因為已經更新參數)
    w1.grad.zero_()
    w2.grad.zero_()

0 26792260.0
1 24179166.0
2 26580240.0
3 30587076.0
4 32149228.0
5 27947500.0
6 19206870.0
7 10683876.0
8 5345743.0
9 2733893.5
10 1580065.0
11 1056478.875
12 792017.625
13 636280.9375
14 530790.0
15 451807.375
16 389056.5625
17 337562.65625
18 294510.875
19 258150.25
20 227149.703125
21 200564.96875
22 177673.21875
23 157938.546875
24 140768.34375
25 125770.5859375
26 112661.7265625
27 101153.4140625
28 91004.234375
29 82031.2578125
30 74080.34375
31 67020.8671875
32 60737.1953125
33 55136.5390625
34 50125.25
35 45637.4453125
36 41605.97265625
37 37977.0078125
38 34707.09375
39 31756.0078125
40 29086.3125
41 26668.234375
42 24475.48828125
43 22484.923828125
44 20674.19140625
45 19026.74609375
46 17527.4765625
47 16159.8212890625
48 14910.6083984375
49 13767.75
50 12721.939453125
51 11763.5966796875
52 10885.2919921875
53 10079.6826171875
54 9339.2998046875
55 8657.05859375
56 8029.38818359375
57 7451.06298828125
58 6918.2568359375
59 6427.4072265625
60 5974.5263671875
61 5556.90478515