### 作業目標: 使用Pytorch進行微分與倒傳遞
這份作業我們會實作微分與倒傳遞以及使用Pytorch的Autograd。

### 使用Pytorch實作微分與倒傳遞

這裡我們很簡單的實作兩層的神經網路進行回歸問題，其中loss function為L2 loss

$$
L2\_loss = (y_{pred}-y)^2
$$

兩層經網路如下所示
$$
y_{pred} = ReLU(XW_1)W_2
$$

In [1]:
import torch
device = torch.device('cpu')

In [18]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
x = torch.randn((N, D_in)).to(device)
y = torch.randn((N, D_out)).to(device)

# 初始化weight W1, W2
W1 = torch.randn((D_in, H)).to(device)
W2 = torch.randn((H, D_out)).to(device)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
  # 向前傳遞: 計算y_pred
  ###<your code>###
  h = torch.matmul(x, W1)
  h_relu = torch.relu(h)
  y_pred = torch.matmul(h_relu, W2)

  # 計算loss
  loss = torch.square(y_pred - y).sum()
  print(t, loss.item())

  # 倒傳遞: 計算W1與W2對loss的微分(梯度)
  y_pred_grad = 2. * (y_pred - y)
  W2_gradient = h_relu.T.matmul(y_pred_grad)
  h_gradient = y_pred_grad.mm(W2.T) * (h > 0.)
  W1_gradient = x.T.mm(h_gradient)


  # 參數更新
  ###<your code>###
  W1.data -= learning_rate * W1_gradient
  W2.data -= learning_rate * W2_gradient

0 29931504.0
1 24328968.0
2 23049052.0
3 22647408.0
4 20780420.0
5 17184488.0
6 12428644.0
7 8168848.0
8 5046410.0
9 3117892.0
10 1997661.75
11 1365098.625
12 998012.0
13 774090.25
14 627804.4375
15 525428.625
16 449163.9375
17 389533.28125
18 341230.125
19 301103.0625
20 267139.90625
21 238044.78125
22 212885.671875
23 190995.71875
24 171837.390625
25 155014.25
26 140176.15625
27 127026.171875
28 115331.15625
29 104896.859375
30 95558.8359375
31 87178.5234375
32 79640.1640625
33 72848.78125
34 66719.140625
35 61192.6328125
36 56186.91015625
37 51641.58984375
38 47509.7421875
39 43747.6015625
40 40325.4765625
41 37201.171875
42 34344.359375
43 31731.744140625
44 29337.689453125
45 27141.669921875
46 25128.158203125
47 23287.494140625
48 21594.80078125
49 20038.150390625
50 18603.849609375
51 17280.5390625
52 16058.4345703125
53 14929.708984375
54 13886.9619140625
55 12924.8251953125
56 12034.1923828125
57 11209.3369140625
58 10445.1904296875
59 9736.560546875
60 9079.30078125
61 8469.3

### 使用Pytorch的Autograd

In [None]:
import torch
device = torch.device('cpu')

In [22]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
x = torch.randn((N, D_in)).to(device)
y = torch.randn((N, D_out)).to(device)

# 初始化weight W1, W2
W1 = torch.randn((D_in, H), requires_grad=True).to(device)
W2 = torch.randn((H, D_out), requires_grad=True).to(device)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
  # 向前傳遞: 計算y_pred
  h = torch.matmul(x, W1)
  h_relu = torch.relu(h)
  y_pred = torch.matmul(h_relu, W2)
  
  # 計算loss
  ###<your code>###
  loss = torch.square(y_pred - y).sum()
  print(t, loss.item())

  # 倒傳遞: 計算W1與W2對loss的微分(梯度)
  loss.backward()

  # 參數更新: 這裡再更新參數時，我們不希望更新參數的計算也被紀錄微分相關的資訊，因此使用torch.no_grad()
  with torch.no_grad():
    # 更新參數W1 W2
    W1.data -= learning_rate * W1.grad
    W2.data -= learning_rate * W2.grad

    # 將紀錄的gradient清空(因為已經更新參數)
    W1.grad.zero_()
    W2.grad.zero_()

0 27591426.0
1 24856636.0
2 25583904.0
3 26096426.0
4 24027528.0
5 18839464.0
6 12601480.0
7 7455209.5
8 4207797.0
9 2424422.25
10 1504133.875
11 1023126.0625
12 757310.875
13 596656.4375
14 490023.3125
15 412825.4375
16 353373.5
17 305648.9375
18 266388.9375
19 233442.515625
20 205469.1875
21 181534.53125
22 160947.625
23 143109.65625
24 127603.609375
25 114084.8671875
26 102242.9453125
27 91832.53125
28 82659.9453125
29 74541.40625
30 67345.0859375
31 60943.65234375
32 55241.4453125
33 50148.63671875
34 45594.16015625
35 41509.60546875
36 37839.89453125
37 34537.45703125
38 31559.880859375
39 28875.1640625
40 26447.876953125
41 24249.666015625
42 22258.61328125
43 20450.087890625
44 18805.078125
45 17307.234375
46 15942.4140625
47 14697.548828125
48 13560.8916015625
49 12522.2900390625
50 11571.642578125
51 10700.431640625
52 9901.5498046875
53 9168.044921875
54 8494.6103515625
55 7875.337890625
56 7305.5185546875
57 6780.60693359375
58 6297.18798828125
59 5852.24462890625
60 5441.41