### 作業目標: 使用Pytorch進行微分與倒傳遞
這份作業我們會實作微分與倒傳遞以及使用Pytorch的Autograd。

### 使用Pytorch實作微分與倒傳遞

這裡我們很簡單的實作兩層的神經網路進行回歸問題，其中loss function為L2 loss

$$
L2\_loss = (y_{pred}-y)^2
$$

兩層經網路如下所示
$$
y_{pred} = ReLU(XW_1)W_2
$$

In [1]:
import torch
device = torch.device('cpu')

In [9]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
###<your code>###
x = torch.randn((N, D_in)).to(device)
y = torch.randn((N, D_out)).to(device)

# 初始化weight W1, W2
###<your code>###
W1 = torch.randn((D_in, H)).to(device)
W2 = torch.randn((H, D_out)).to(device)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
      # 向前傳遞: 計算y_pred
      ###<your code>###
    h = torch.matmul(x, W1)
    h_relu = torch.relu(h)
    y_pred = torch.matmul(h_relu, W2)
    
    # 計算loss
    ###<your code>###
    loss = torch.square(y_pred - y).sum()
    print(t, loss.item())
    
    # 倒傳遞: 計算W1與W2對loss的微分(梯度)
    ###<your code>###
    y_pred_grad = 2. * (y_pred - y)
    W2_grad = h_relu.T.mm(y_pred_grad)
    h_grad = y_pred_grad.mm(W2.T) * (h > 0.)
    W1_grad = x.T.mm(h_grad)
    
    # 參數更新
    ###<your code>###
    W1.data -= learning_rate * W1_grad
    W2.data -= learning_rate * W2_grad

0 30877814.0
1 25664636.0
2 23724748.0
3 21504712.0
4 17959650.0
5 13308940.0
6 8968765.0
7 5662622.0
8 3553947.5
9 2299704.5
10 1577733.25
11 1151679.5
12 889021.5
13 715838.0
14 594132.0
15 503397.0625
16 432501.78125
17 375219.84375
18 327848.3125
19 288076.375
20 254331.078125
21 225451.671875
22 200538.71875
23 178981.671875
24 160187.921875
25 143737.53125
26 129282.796875
27 116555.78125
28 105313.1484375
29 95347.0625
30 86483.9765625
31 78584.2734375
32 71531.765625
33 65227.546875
34 59524.4296875
35 54403.4765625
36 49794.20703125
37 45641.6484375
38 41887.55859375
39 38489.24609375
40 35407.30859375
41 32607.404296875
42 30060.048828125
43 27739.62890625
44 25624.55078125
45 23695.21484375
46 21930.302734375
47 20314.876953125
48 18835.12109375
49 17477.384765625
50 16230.4404296875
51 15084.1357421875
52 14028.91796875
53 13057.0224609375
54 12160.2568359375
55 11331.9580078125
56 10566.3818359375
57 9858.6884765625
58 9203.775390625
59 8597.158203125
60 8034.775390625
61 

457 0.00033793391776271164
458 0.00033066459582187235
459 0.0003224685206077993
460 0.00031484526698477566
461 0.0003075042914133519
462 0.000299794424790889
463 0.0002932200441136956
464 0.00028658186784014106
465 0.0002798507921397686
466 0.0002735923044383526
467 0.00026745651848614216
468 0.0002611943054944277
469 0.0002548719057813287
470 0.0002498402609489858
471 0.00024412726634182036
472 0.0002382778184255585
473 0.00023297552252188325
474 0.00022865315258968621
475 0.00022349655046127737
476 0.00021876487880945206
477 0.00021417617972474545
478 0.00020966822921764106
479 0.00020500669779721648
480 0.0002006172580877319
481 0.0001964632683666423
482 0.00019329518545418978
483 0.0001891322317533195
484 0.00018513677059672773
485 0.00018111859390046448
486 0.00017713646229822189
487 0.00017371811554767191
488 0.00017055297212209553
489 0.00016649722238071263
490 0.00016378147120121866
491 0.0001605614525033161
492 0.00015732862812001258
493 0.00015446508768945932
494 0.0001515182

### 使用Pytorch的Autograd

In [4]:
import torch
device = torch.device('cpu')

In [11]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
###<your code>###
x = torch.randn((N, D_in)).to(device)
y = torch.randn((N, D_out)).to(device)

# 初始化weight W1, W2
###<your code>###
W1 = torch.randn((D_in, H), requires_grad=True).to(device)
W2 = torch.randn((H, D_out), requires_grad=True).to(device)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
    # 向前傳遞: 計算y_pred
    ###<your code>###
    y_pred = torch.matmul(torch.relu(torch.matmul(x, W1)), W2)

    # 計算loss
    ###<your code>###
    loss = torch.square(y_pred - y).sum()
    print(t, loss.item())

    # 倒傳遞: 計算W1與W2對loss的微分(梯度)
    ###<your code>###
    loss.backward()

    # 參數更新: 這裡再更新參數時，我們不希望更新參數的計算也被紀錄微分相關的資訊，因此使用torch.no_grad()
    with torch.no_grad():
        # 更新參數W1 W2
        ###<your code>###
        W1.data -= learning_rate * W1.grad 
        W2.data -= learning_rate * W2.grad 

        # 將紀錄的gradient清空(因為已經更新參數)
        W1.grad.zero_()
        W2.grad.zero_()

0 32655160.0
1 29188582.0
2 32827338.0
3 37466384.0
4 36880880.0
5 28126662.0
6 16215664.0
7 7621471.0
8 3465914.25
9 1772934.375
10 1089715.75
11 778285.625
12 608539.125
13 498645.46875
14 418796.0
15 356572.375
16 306209.65625
17 264707.65625
18 230073.265625
19 200908.234375
20 176198.171875
21 155127.484375
22 137028.265625
23 121403.5
24 107864.953125
25 96065.875
26 85750.4296875
27 76707.21875
28 68752.96875
29 61728.92578125
30 55516.32421875
31 50016.1015625
32 45129.43359375
33 40775.13671875
34 36883.6953125
35 33403.0078125
36 30286.08984375
37 27495.4140625
38 24985.65625
39 22728.03125
40 20693.947265625
41 18857.12109375
42 17196.701171875
43 15695.0390625
44 14334.181640625
45 13099.7548828125
46 11978.8427734375
47 10961.2890625
48 10036.427734375
49 9194.830078125
50 8429.4052734375
51 7730.5205078125
52 7092.95458984375
53 6511.310546875
54 5980.44091796875
55 5495.40185546875
56 5051.90625
57 4646.017578125
58 4274.8525390625
59 3934.46923828125
60 3622.60693359375

472 1.9071338101639412e-05
473 1.8934526451630518e-05
474 1.86634024430532e-05
475 1.834696377045475e-05
476 1.822057311073877e-05
477 1.812049413274508e-05
478 1.79757898877142e-05
479 1.7801872672862373e-05
480 1.767390131135471e-05
481 1.7509719327790663e-05
482 1.7342768842354417e-05
483 1.72478576132562e-05
484 1.7149586710729636e-05
485 1.698795131233055e-05
486 1.6671925550326705e-05
487 1.6570840671192855e-05
488 1.6498246623086743e-05
489 1.6347526980098337e-05
490 1.6269619663944468e-05
491 1.6147459973581135e-05
492 1.5913719835225493e-05
493 1.57997892529238e-05
494 1.5744830307085067e-05
495 1.5512550817220472e-05
496 1.5485096810152754e-05
497 1.5336323485826142e-05
498 1.52618158608675e-05
499 1.5153023014136124e-05
