### 作業目標: 使用Pytorch進行微分與倒傳遞
這份作業我們會實作微分與倒傳遞以及使用Pytorch的Autograd。

### 使用Pytorch實作微分與倒傳遞

這裡我們很簡單的實作兩層的神經網路進行回歸問題，其中loss function為L2 loss

$$
L2\_loss = (y_{pred}-y)^2
$$

兩層經網路如下所示
$$
y_{pred} = ReLU(XW_1)W_2
$$

In [1]:
import torch
device = torch.device('cpu')

In [None]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
###<your code>###
x = torch.random(1000, 100, requires_grad=True)
y = torch.random(10, requires_grad=True)
# 初始化weight W1, W2
###<your code>###
W1 = torch.random(100, 100, requires_grad=True)
W2 = torch.random(100, 10, requires_grad=True)
# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
  # 向前傳遞: 計算y_pred
  ###<your code>###
    y_pred = 
  # 計算loss
  ###<your code>###
    print(t, loss.item())

  # 倒傳遞: 計算W1與W2對loss的微分(梯度)
  ###<your code>###

  # 參數更新
  ###<your code>###

0 41012208.0
1 41455604.0
2 40426592.0
3 32099776.0
4 19425206.0
5 9610482.0
6 4653895.5
7 2602401.75
8 1738951.375
9 1311661.25
10 1053339.625
11 871963.9375
12 733717.875
13 623637.0625
14 533976.5
15 459877.0625
16 398296.5625
17 346628.4375
18 302947.125
19 265851.875
20 234216.125
21 207061.71875
22 183648.8125
23 163395.140625
24 145782.21875
25 130419.609375
26 116946.671875
27 105060.890625
28 94580.703125
29 85307.1328125
30 77072.1875
31 69750.78125
32 63229.1015625
33 57411.3671875
34 52204.4140625
35 47530.4765625
36 43331.24609375
37 39551.6328125
38 36146.0390625
39 33069.3359375
40 30287.869140625
41 27768.982421875
42 25482.6328125
43 23407.19921875
44 21519.986328125
45 19802.5625
46 18236.45703125
47 16807.27734375
48 15501.2158203125
49 14307.0703125
50 13215.982421875
51 12215.443359375
52 11297.6728515625
53 10455.720703125
54 9682.275390625
55 8971.400390625
56 8317.248046875
57 7715.3173828125
58 7160.8759765625
59 6649.7197265625
60 6178.23583984375
61 5742.6206

### 使用Pytorch的Autograd

In [None]:
import torch
device = torch.device('cpu')

In [None]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
###<your code>###

# 初始化weight W1, W2
###<your code>###

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
  # 向前傳遞: 計算y_pred
  ###<your code>###
  
  # 計算loss
  ###<your code>###
  print(t, loss.item())

  # 倒傳遞: 計算W1與W2對loss的微分(梯度)
  ###<your code>###

  # 參數更新: 這裡再更新參數時，我們不希望更新參數的計算也被紀錄微分相關的資訊，因此使用torch.no_grad()
  with torch.no_grad():
    # 更新參數W1 W2
    ###<your code>###

    # 將紀錄的gradient清空(因為已經更新參數)
    w1.grad.zero_()
    w2.grad.zero_()

0 34531528.0
1 37563884.0
2 45605692.0
3 47743296.0
4 37549408.0
5 19983466.0
6 8218459.0
7 3358081.75
8 1769839.75
9 1200070.875
10 933849.8125
11 768861.875
12 647828.125
13 552127.8125
14 474321.3125
15 409854.5
16 355987.53125
17 310554.46875
18 271993.3125
19 239127.03125
20 210943.40625
21 186672.421875
22 165673.984375
23 147426.53125
24 131516.640625
25 117607.96875
26 105414.3125
27 94697.796875
28 85241.1640625
29 76882.578125
30 69470.8125
31 62886.90625
32 57027.1640625
33 51796.1875
34 47115.85546875
35 42919.08203125
36 39149.59375
37 35757.9609375
38 32700.904296875
39 29940.25390625
40 27445.466796875
41 25186.919921875
42 23138.40234375
43 21277.962890625
44 19586.13671875
45 18047.291015625
46 16644.80859375
47 15364.494140625
48 14194.1484375
49 13123.5673828125
50 12142.908203125
51 11244.23046875
52 10419.24609375
53 9661.384765625
54 8964.6005859375
55 8323.494140625
56 7732.97802734375
57 7188.859375
58 6686.65673828125
59 6223.0908203125
60 5794.66650390625
61 5

431 0.00015894130046945065
432 0.00015560245083179325
433 0.00015244056703522801
434 0.00014918063243385404
435 0.00014623792958445847
436 0.00014322339848149568
437 0.0001401646004524082
438 0.0001371508842566982
439 0.00013460413902066648
440 0.00013200324610807002
441 0.00012920792505610734
442 0.0001268077758140862
443 0.0001243362348759547
444 0.00012227214756421745
445 0.00011963630095124245
446 0.00011724793148459867
447 0.00011501400149427354
448 0.00011302570783300325
449 0.00011090389307355508
450 0.00010875487350858748
451 0.00010680056584533304
452 0.00010480251512490213
453 0.00010332858073525131
454 0.00010135072079719976
455 9.952658729162067e-05
456 9.77267773123458e-05
457 9.615623275749385e-05
458 9.459262946620584e-05
459 9.324532584287226e-05
460 9.124297503149137e-05
461 8.984180021798238e-05
462 8.840537338983268e-05
463 8.65921683725901e-05
464 8.517434616805986e-05
465 8.388777496293187e-05
466 8.267535304185003e-05
467 8.130612695822492e-05
468 8.00859561422839