### 作業目標: 使用Pytorch進行微分與倒傳遞
這份作業我們會實作微分與倒傳遞以及使用Pytorch的Autograd。

### 使用Pytorch實作微分與倒傳遞

這裡我們很簡單的實作兩層的神經網路進行回歸問題，其中loss function為L2 loss

$$
L2\_loss = (y_{pred}-y)^2
$$

兩層經網路如下所示
$$
y_{pred} = ReLU(XW_1)W_2
$$

In [1]:
import torch
device = torch.device('cpu')

In [4]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)

# 初始化weight W1, W2
w1 = torch.randn(D_in, H, device=device)
w2 = torch.randn(H, D_out, device=device)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
  # 向前傳遞: 計算y_pred
  h = x.mm(w1)
  h_relu = h.clamp(min=0)
  y_pred = h_relu.mm(w2)

  # 計算loss
  # L2 loss
  loss = (y_pred - y).pow(2).sum()
  print(t, loss.item())

  # 倒傳遞: 計算W1與W2對loss的微分(梯度)
  grad_y_pred = 2.0 * (y_pred - y)
  grad_w2 = h_relu.t().mm(grad_y_pred)
  grad_h_relu = grad_y_pred.mm(w2.t())
  grad_h = grad_h_relu.clone()
  grad_h[h < 0] = 0
  grad_w1 = x.t().mm(grad_h)

  # 參數更新
  w1 -= learning_rate * grad_w1
  w2 -= learning_rate * grad_w2

0 37603388.0
1 37807656.0
2 37165484.0
3 30391136.0
4 19813836.0
5 10566823.0
6 5363993.5
7 2940301.25
8 1876724.25
9 1363832.0
10 1076051.125
11 887320.875
12 748835.9375
13 640229.25
14 551991.75
15 478956.53125
16 417797.0
17 366116.8125
18 322088.9375
19 284328.5625
20 251812.25
21 223687.84375
22 199244.296875
23 177889.5
24 159182.078125
25 142752.8125
26 128286.6953125
27 115497.9140625
28 104174.5234375
29 94102.4921875
30 85129.8046875
31 77132.4453125
32 69977.09375
33 63563.2421875
34 57803.12109375
35 52623.63671875
36 47958.26171875
37 43748.16015625
38 39949.3125
39 36514.4453125
40 33404.984375
41 30584.384765625
42 28023.39453125
43 25699.2109375
44 23585.94140625
45 21662.697265625
46 19909.634765625
47 18311.13671875
48 16851.66015625
49 15518.2841796875
50 14298.109375
51 13182.3359375
52 12160.3515625
53 11223.62109375
54 10364.3828125
55 9576.1396484375
56 8852.369140625
57 8187.2119140625
58 7574.5537109375
59 7010.8408203125
60 6492.173828125
61 6014.322265625
62

463 4.021368295070715e-05
464 3.9556805859319866e-05
465 3.910127270501107e-05
466 3.857928822981194e-05
467 3.814227966358885e-05
468 3.775650475290604e-05
469 3.719079904840328e-05
470 3.673130049719475e-05
471 3.623211887315847e-05
472 3.582951467251405e-05
473 3.5451226722216234e-05
474 3.48285939253401e-05
475 3.446369373705238e-05
476 3.402420406928286e-05
477 3.35830518451985e-05
478 3.310669490019791e-05
479 3.2789583201520145e-05
480 3.23397180181928e-05
481 3.18111342494376e-05
482 3.163429209962487e-05
483 3.106907752226107e-05
484 3.071351966354996e-05
485 3.0626324587501585e-05
486 3.010312320839148e-05
487 2.96427242574282e-05
488 2.9297987566678785e-05
489 2.8864793421234936e-05
490 2.8479505999712273e-05
491 2.810761179716792e-05
492 2.76644295809092e-05
493 2.7440308258519508e-05
494 2.706743180169724e-05
495 2.6822208383237012e-05
496 2.652544753800612e-05
497 2.6217800041195005e-05
498 2.5962188374251127e-05
499 2.5743367586983368e-05


### 使用Pytorch的Autograd

In [9]:
import torch
device = torch.device('cpu')

In [11]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)

# 初始化weight W1, W2
w1 = torch.randn(D_in, H, device=device, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, requires_grad=True)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
  # 向前傳遞: 計算y_pred
  y_pred = x.mm(w1).clamp(min=0).mm(w2)
  
  # 計算loss
  loss = (y_pred - y).pow(2).sum()
  print(t, loss.item())

  # 倒傳遞: 計算W1與W2對loss的微分(梯度)
  loss.backward()

  # 參數更新: 這裡再更新參數時，我們不希望更新參數的計算也被紀錄微分相關的資訊，因此使用torch.no_grad()
  with torch.no_grad():
    # 更新參數W1 W2
    w1 -= learning_rate * w1.grad
    w2 -= learning_rate * w2.grad

    # 將紀錄的gradient清空(因為已經更新參數)
    w1.grad.zero_()
    w2.grad.zero_()

0 33892836.0
1 30349074.0
2 28023534.0
3 23596050.0
4 17333580.0
5 11178979.0
6 6708922.0
7 4007101.0
8 2520722.5
9 1713096.25
10 1256663.25
11 979047.625
12 795201.5
13 663603.75
14 563584.75
15 484357.3125
16 419877.53125
17 366424.4375
18 321557.125
19 283449.46875
20 250858.921875
21 222856.90625
22 198646.015625
23 177655.96875
24 159328.109375
25 143271.65625
26 129155.03125
27 116688.84375
28 105648.203125
29 95842.125
30 87110.1796875
31 79314.3359375
32 72336.265625
33 66079.9921875
34 60456.43359375
35 55394.609375
36 50826.3671875
37 46694.76171875
38 42951.7578125
39 39558.69921875
40 36476.921875
41 33670.7421875
42 31112.7421875
43 28777.1484375
44 26643.138671875
45 24688.634765625
46 22897.71484375
47 21256.115234375
48 19746.61328125
49 18358.435546875
50 17080.185546875
51 15902.26953125
52 14816.138671875
53 13813.1982421875
54 12886.88671875
55 12030.3330078125
56 11237.6552734375
57 10503.4775390625
58 9822.869140625
59 9192.03515625
60 8605.75
61 8061.0087890625
6

480 0.0003998828469775617
481 0.00039166875649243593
482 0.00038292640238069
483 0.0003746700822375715
484 0.0003664284886326641
485 0.0003589073312468827
486 0.00035116710932925344
487 0.0003434799727983773
488 0.0003369661280885339
489 0.00032892770832404494
490 0.0003222243394702673
491 0.00031660482636652887
492 0.0003101319307461381
493 0.00030313621391542256
494 0.0002965655876323581
495 0.00029008358251303434
496 0.0002851157623808831
497 0.0002796238986775279
498 0.00027335365302860737
499 0.0002685135696083307
