### 作業目標: 使用Pytorch進行微分與倒傳遞
這份作業我們會實作微分與倒傳遞以及使用Pytorch的Autograd。

### 使用Pytorch實作微分與倒傳遞

這裡我們很簡單的實作兩層的神經網路進行回歸問題，其中loss function為L2 loss

$$
L2\_loss = (y_{pred}-y)^2
$$

兩層經網路如下所示
$$
y_{pred} = ReLU(XW_1)W_2
$$

In [1]:
import torch
device = torch.device('cpu')

In [2]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
###<your code>###
x=torch.randn(N,D_in,device=device)
y=torch.randn(N,D_out,device=device)
# 初始化weight W1, W2
###<your code>###
w1=torch.randn(D_in,H,device=device)
w2=torch.randn(H,D_out,device=device)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
    
  # 向前傳遞: 計算y_pred
  ###<your code>###
    h=x.mm(w1)  #mm()矩陣相乘
    h_relu=h.clamp(min=0) #限幅 將input的值限制在[min,max]之間
    y_pred=h_relu.mm(w2)

  # 計算loss
  ###<your code>###
    loss=(y_pred-y).pow(2).sum() #pow()
    print(t, loss.item())

  # 倒傳遞: 計算W1與W2對loss的微分(梯度)
  ###<your code>###
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.t().mm(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.t())
    grad_h = grad_h_relu.clone()#clone()返回一個完全相同的tensor，新的tensor開啟新的內存，但能留在計算圖中
    grad_h[h < 0] = 0
    grad_w1 = x.t().mm(grad_h) #.t()求矩陣轉置

  # 參數更新
  ###<your code>###
    w1 -= learning_rate * grad_w1 #c-=a相當於c=c-a
    w2 -= learning_rate * grad_w2
    

0 41847036.0
1 49798516.0
2 59951104.0
3 55661264.0
4 33398964.0
5 12890513.0
6 4435308.0
7 2112165.75
8 1419213.375
9 1108902.5
10 909978.0625
11 760886.5
12 643168.875
13 547784.8125
14 469845.0625
15 405638.75
16 352070.90625
17 307034.09375
18 268954.71875
19 236607.8125
20 208945.03125
21 185155.015625
22 164640.140625
23 146847.40625
24 131359.859375
25 117823.546875
26 105944.953125
27 95483.6796875
28 86244.609375
29 78062.40625
30 70794.828125
31 64321.55078125
32 58544.765625
33 53376.96875
34 48745.9921875
35 44583.14453125
36 40832.75
37 37447.99609375
38 34389.875
39 31620.43359375
40 29109.7578125
41 26829.0546875
42 24756.13671875
43 22867.03125
44 21143.52734375
45 19567.44140625
46 18125.078125
47 16802.849609375
48 15590.126953125
49 14475.78515625
50 13451.224609375
51 12508.53125
52 11640.19921875
53 10839.6083984375
54 10100.798828125
55 9418.0419921875
56 8786.626953125
57 8202.3330078125
58 7661.0869140625
59 7159.3212890625
60 6693.8251953125
61 6261.66943359375

385 0.00217788340523839
386 0.002111491747200489
387 0.002042860724031925
388 0.0019797368440777063
389 0.001918720081448555
390 0.0018585558282211423
391 0.001803046790882945
392 0.0017477860674262047
393 0.0016938543412834406
394 0.001642585382796824
395 0.0015923904720693827
396 0.0015451399376615882
397 0.0014973898651078343
398 0.0014555456582456827
399 0.0014108988689258695
400 0.0013706798199564219
401 0.001330900122411549
402 0.00129370903596282
403 0.0012549380771815777
404 0.001217250945046544
405 0.0011827104026451707
406 0.001149546355009079
407 0.0011184015311300755
408 0.001086980919353664
409 0.0010555199114605784
410 0.0010265680029988289
411 0.000998449744656682
412 0.000969636719673872
413 0.0009451324003748596
414 0.0009182497742585838
415 0.0008937395759858191
416 0.0008706483640708029
417 0.0008477579685859382
418 0.0008245648932643235
419 0.0008055306388996542
420 0.0007842566119506955
421 0.0007635051151737571
422 0.000744153163395822
423 0.0007255236268974841
42

### 使用Pytorch的Autograd

In [1]:
import torch
device = torch.device('cpu')

In [7]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
###<your code>###
x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)

# 初始化weight W1, W2
###<your code>###
w1 = torch.randn(D_in, H, device=device, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, requires_grad=True)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
  # 向前傳遞: 計算y_pred
  ###<your code>###
    y_pred = x.mm(w1).clamp(min=0).mm(w2)
  
  # 計算loss
  ###<your code>###
    loss = (y_pred - y).pow(2).sum()
    print(t, loss.item())
  

  # 倒傳遞: 計算W1與W2對loss的微分(梯度)
  ###<your code>###
    loss.backward()

  # 參數更新: 這裡再更新參數時，我們不希望更新參數的計算也被紀錄微分相關的資訊，因此使用torch.no_grad()
  # 更新參數W1 W2
 ###<your code>###
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad


    # 將紀錄的gradient清空(因為已經更新參數)
        w1.grad.zero_()
        w2.grad.zero_()

0 29153104.0
1 23403852.0
2 21268008.0
3 19614434.0
4 17145148.0
5 13709872.0
6 10018276.0
7 6825849.0
8 4482920.0
9 2937215.25
10 1972416.75
11 1381007.875
12 1014204.625
13 779284.75
14 621967.6875
15 511328.15625
16 429699.28125
17 366958.125
18 317127.09375
19 276459.875
20 242627.09375
21 214066.609375
22 189715.3125
23 168751.71875
24 150623.25
25 134841.890625
26 121007.4140625
27 108881.4140625
28 98206.328125
29 88776.4296875
30 80420.1015625
31 72988.6328125
32 66359.9921875
33 60429.71875
34 55122.94921875
35 50356.4609375
36 46066.98828125
37 42200.3203125
38 38706.06640625
39 35546.1484375
40 32684.59765625
41 30086.3671875
42 27724.5625
43 25574.455078125
44 23614.21484375
45 21824.322265625
46 20188.75
47 18690.658203125
48 17318.08984375
49 16058.8515625
50 14902.623046875
51 13839.517578125
52 12861.3271484375
53 11960.4033203125
54 11129.8603515625
55 10364.126953125
56 9657.005859375
57 9003.5302734375
58 8398.736328125
59 7838.9560546875
60 7320.0078125
61 6838.8896

379 0.0013485255185514688
380 0.0013014660216867924
381 0.0012550998944789171
382 0.0012122271582484245
383 0.0011707964586094022
384 0.0011288919486105442
385 0.001091916230507195
386 0.0010538350325077772
387 0.0010183446574956179
388 0.0009841069113463163
389 0.0009512481046840549
390 0.0009198152692988515
391 0.0008895525243133307
392 0.0008601028239354491
393 0.0008313251892104745
394 0.000805255607701838
395 0.0007794237462803721
396 0.0007543391548097134
397 0.0007307855412364006
398 0.000708081410266459
399 0.0006850435165688396
400 0.0006639365456067026
401 0.0006436138064600527
402 0.0006241531227715313
403 0.000605465960688889
404 0.0005870707682333887
405 0.0005697858287021518
406 0.0005530743510462344
407 0.0005371911684051156
408 0.0005212038522586226
409 0.0005058413371443748
410 0.0004928181297145784
411 0.0004786154313478619
412 0.00046462222235277295
413 0.0004513101011980325
414 0.0004392244154587388
415 0.00042771041626110673
416 0.0004146201245021075
417 0.00040318