### 作業目標: 使用Pytorch進行微分與倒傳遞
這份作業我們會實作微分與倒傳遞以及使用Pytorch的Autograd。

### 使用Pytorch實作微分與倒傳遞

這裡我們很簡單的實作兩層的神經網路進行回歸問題，其中loss function為L2 loss

$$
L2\_loss = (y_{pred}-y)^2
$$

兩層經網路如下所示
$$
y_{pred} = ReLU(XW_1)W_2
$$

In [1]:
import torch
device = torch.device('cpu')

C:\ProgramData\Anaconda3\lib\site-packages\numpy\.libs\libopenblas.NOIJJG62EMASZI6NYURL6JBKM4EVBGM7.gfortran-win_amd64.dll
C:\ProgramData\Anaconda3\lib\site-packages\numpy\.libs\libopenblas.TXA6YQSD3GCQQC22GEQ54J2UDCXDXHWN.gfortran-win_amd64.dll
  stacklevel=1)


In [8]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
###<your code>###
x = torch.randn((N, D_in)).to(device)
y = torch.randn((N, D_out)).to(device)

# 初始化weight W1, W2
###<your code>###
W1 = torch.randn((D_in, H), requires_grad=True).to(device)
W2 = torch.randn((H, D_out), requires_grad=True).to(device)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
    # 向前傳遞: 計算y_pred
    ###<your code>###
    # 𝑦_𝑝𝑟𝑒𝑑 =𝑅𝑒𝐿𝑈(𝑋𝑊1)𝑊2
    h = torch.matmul(x, W1)
    h_relu = torch.relu(h)
    y_pred = torch.relu(h).mm(W2)

    # 計算loss
    ###<your code>### 
    loss = ((y_pred - y)**2).sum()    #𝐿2_𝑙𝑜𝑠𝑠=(𝑦_𝑝𝑟𝑒𝑑 − 𝑦) **2 
    print(t, loss.item())

    # 倒傳遞: 計算W1與W2對loss的微分(梯度)
    ###<your code>###
    y_pred_grad = 2. * (y_pred - y)
    W2_grad = h_relu.T.mm(y_pred_grad)
    
    h_grad = y_pred_grad.mm(W2.T) * (h > 0.)
    W1_grad = x.T.mm(h_grad)
    
    # 參數更新
    ###<your code>###
    W1.data -= learning_rate * W1_grad
    W2.data -= learning_rate * W2_grad

0 33839764.0
1 31205788.0
2 32969728.0
3 32215580.0
4 26178854.0
5 16646683.0
6 8876494.0
7 4398068.5
8 2341312.5
9 1427207.0
10 998654.625
11 767756.0
12 623547.3125
13 521939.4375
14 444397.53125
15 382444.21875
16 331626.75
17 289332.25
18 253696.859375
19 223444.40625
20 197606.109375
21 175448.140625
22 156296.625
23 139659.796875
24 125182.4921875
25 112549.28125
26 101453.609375
27 91684.3046875
28 83031.5859375
29 75343.265625
30 68494.9140625
31 62377.23828125
32 56898.7734375
33 51980.0703125
34 47553.91796875
35 43562.22265625
36 39958.1796875
37 36698.26953125
38 33742.25
39 31062.29296875
40 28626.740234375
41 26406.251953125
42 24381.021484375
43 22529.369140625
44 20834.982421875
45 19282.8125
46 17859.27734375
47 16551.751953125
48 15349.9921875
49 14245.232421875
50 13228.9453125
51 12291.9814453125
52 11427.984375
53 10630.908203125
54 9893.6962890625
55 9212.60546875
56 8582.1923828125
57 7999.1552734375
58 7459.46630859375
59 6959.11572265625
60 6495.1044921875
61 6

412 0.00030594898271374404
413 0.0002993274829350412
414 0.00029172596987336874
415 0.0002843031252268702
416 0.0002776019391603768
417 0.00027025595773011446
418 0.00026396819157525897
419 0.000257790059549734
420 0.0002522027643863112
421 0.0002457919472362846
422 0.00023980948026292026
423 0.00023410332505591214
424 0.00022871461987961084
425 0.0002244000497739762
426 0.00021903218294028193
427 0.0002139952703146264
428 0.00020883878460153937
429 0.00020531739573925734
430 0.00020031564054079354
431 0.0001961431698873639
432 0.0001919066853588447
433 0.0001881696516647935
434 0.00018412043573334813
435 0.00018020826973952353
436 0.0001760190207278356
437 0.0001726032205624506
438 0.0001692256482783705
439 0.00016537094779778272
440 0.00016193214105442166
441 0.0001584652200108394
442 0.00015534287376794964
443 0.00015238785999827087
444 0.0001494140160502866
445 0.00014647174975834787
446 0.00014364758681040257
447 0.00014093401841819286
448 0.00013809726806357503
449 0.000136001777

### 使用Pytorch的Autograd

In [9]:
import torch
device = torch.device('cpu')

In [15]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
###<your code>###
x = torch.randn((N, D_in)).to(device)
y = torch.randn((N, D_out)).to(device)

# 初始化weight W1, W2
###<your code>###
W1 = torch.randn((D_in, H),requires_grad=True).to(device)
W2 = torch.randn((H, D_out)requires_grad=True).to(device)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
    # 向前傳遞: 計算y_pred
    ###<your code>###
    y_pred = torch.mm(torch.relu(torch.mm(x, W1)), W2)   # 𝑦_𝑝𝑟𝑒𝑑 =𝑅𝑒𝐿𝑈(𝑋𝑊1)𝑊2
    
    # 計算loss
    ###<your code>###
    loss = ((y_pred - y)**2).sum()    #𝐿2_𝑙𝑜𝑠𝑠=(𝑦_𝑝𝑟𝑒𝑑 − 𝑦) **2 
    print(t, loss.item())

    # 倒傳遞: 計算W1與W2對loss的微分(梯度)
    ###<your code>###
    loss.backward()
    
    # 參數更新: 這裡再更新參數時，我們不希望更新參數的計算也被紀錄微分相關的資訊，因此使用torch.no_grad()
    with torch.no_grad():
        # 更新參數W1 W2
        ###<your code>###
        W1_grad -= learning_rate * W1.grad
        W2_grad -= learning_rate * W2.grad
        
        # 將紀錄的gradient清空(因為已經更新參數)
        w1.grad.zero_()
        w2.grad.zero_()

0 27579528.0


RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn