作業目標: 使用Pytorch進行微分與倒傳遞
這份作業我們會實作微分與倒傳遞以及使用Pytorch的Autograd。

### 使用Pytorch實作微分與倒傳遞

這裡我們很簡單的實作兩層的神經網路進行回歸問題，其中loss function為L2 loss

$$
L2\_loss = (y_{pred}-y)^2
$$

兩層經網路如下所示
$$
y_{pred} = ReLU(XW_1)W_2
$$

In [1]:
import torch
device = torch.device('cpu')

In [19]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
###<your code>###
torch.manual_seed(7)
x = torch.randn(D_in,N, requires_grad=True)
y = torch.randn(D_out,N, requires_grad=True)
# 初始化weight W1, W2
###<your code>###
W1 = torch.randn(H,D_in, requires_grad=True)
W2 = torch.randn(D_out,H, requires_grad=True)
# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
  # 向前傳遞: 計算y_pred
  ###<your code>###
  y_pred = torch.mm(W2,torch.nn.functional.relu(torch.mm(W1,x)))
  # 計算loss
  ###<your code>###
  loss = (torch.square(y_pred - y)).sum()
  print(t, loss.item())

  # 倒傳遞: 計算W1與W2對loss的微分(梯度)
    ###<your code>###
  y_pred_grad = 2. * (y_pred - y)
  W2_grad = torch.mm(torch.nn.functional.relu(torch.mm(W1,x)), y_pred_grad.T)
  W1_grad = torch.mm(torch.mm(W2.T,y_pred_grad) * (torch.mm(W1,x) > 0), x.T)
    
  # 參數更新
  ###<your code>###
  W1.data -= learning_rate * W1_grad
  W2.data -= learning_rate * W2_grad.T

0 25318296.0
1 17124468.0
2 13241769.0
3 11018170.0
4 9429306.0
5 8037655.0
6 6717095.0
7 5464585.0
8 4330923.0
9 3358394.0
10 2566293.25
11 1944279.125
12 1469754.875
13 1114483.25
14 851593.75
15 657872.125
16 514803.125
17 408489.71875
18 328830.75
19 268462.0625
20 222106.59375
21 185994.578125
22 157482.59375
23 134643.734375
24 116098.9296875
25 100855.65625
26 88169.4140625
27 77499.25
28 68443.359375
29 60699.5859375
30 54022.05078125
31 48227.53515625
32 43172.98828125
33 38738.484375
34 34832.51953125
35 31379.39453125
36 28318.28515625
37 25596.5234375
38 23170.681640625
39 21004.0234375
40 19067.376953125
41 17331.021484375
42 15770.96484375
43 14366.2080078125
44 13099.6025390625
45 11956.517578125
46 10922.94140625
47 9988.1005859375
48 9141.818359375
49 8373.931640625
50 7676.54541015625
51 7042.759765625
52 6466.1005859375
53 5941.10546875
54 5462.39990234375
55 5025.568359375
56 4626.94775390625
57 4262.1787109375
58 3928.6455078125
59 3623.4697265625
60 3343.781738281

In [4]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
###<your code>###
torch.manual_seed(7)
x = torch.randn(D_in,N, requires_grad=True)
y = torch.randn(D_out,N, requires_grad=True)
# 初始化weight W1, W2
###<your code>###
W1 = torch.randn(H,D_in, requires_grad=True)
W2 = torch.randn(D_out,H, requires_grad=True)
# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
  # 向前傳遞: 計算y_pred
  ###<your code>###
  y_pred = torch.mm(W2,torch.nn.functional.relu(torch.mm(W1,x)))
  # 計算loss
  ###<your code>###
  loss = (torch.square(y_pred - y)).sum()
  print(t, loss.item())

  # 倒傳遞: 計算W1與W2對loss的微分(梯度)
  ###<your code>###
  loss.backward()
  # 參數更新
  ###<your code>###
  with torch.no_grad():
    # 更新參數W1 W2
    ###<your code>###
    W1 -= learning_rate * W1.grad
    W2 -= learning_rate * W2.grad
    # 將紀錄的gradient清空(因為已經更新參數)
    W1.grad.zero_()
    W2.grad.zero_()

0 25318296.0
1 17124468.0
2 13241768.0
3 11018171.0
4 9429306.0
5 8037655.0
6 6717094.5
7 5464584.5
8 4330922.5
9 3358393.75
10 2566293.25
11 1944279.0
12 1469754.75
13 1114483.25
14 851593.875
15 657872.25
16 514803.25
17 408489.8125
18 328830.78125
19 268462.03125
20 222106.59375
21 185994.59375
22 157482.609375
23 134643.765625
24 116098.9609375
25 100855.671875
26 88169.421875
27 77499.265625
28 68443.359375
29 60699.578125
30 54022.0390625
31 48227.51953125
32 43172.98828125
33 38738.484375
34 34832.515625
35 31379.39453125
36 28318.298828125
37 25596.52734375
38 23170.681640625
39 21004.013671875
40 19067.373046875
41 17331.017578125
42 15770.9736328125
43 14366.208984375
44 13099.607421875
45 11956.51953125
46 10922.9404296875
47 9988.09765625
48 9141.8154296875
49 8373.931640625
50 7676.548828125
51 7042.7666015625
52 6466.1005859375
53 5941.109375
54 5462.40185546875
55 5025.57177734375
56 4626.94873046875
57 4262.17724609375
58 3928.641845703125
59 3623.471923828125
60 3343.7

In [16]:
torch.nn.functional.relu(torch.randn(5))

tensor([1.2907, 0.0000, 0.2920, 0.0000, 0.0000])

In [8]:
N, D_in, H, D_out = 64, 1000, 100, 10
torch.manual_seed(7)
x = torch.randn(D_in,1, requires_grad=True)
y = torch.randn(D_out,1 ,requires_grad=True)
# 初始化weight W1, W2
###<your code>###
W1 = torch.randn(H,D_in, requires_grad=True)
w2 = torch.randn(D_out,H, requires_grad=True)
torch.mm(w2,torch.nn.functional.relu(torch.mm(W1,x)))
y_pred = torch.mm(w2,torch.nn.functional.relu(torch.mm(W1,x)))
loss = ((y_pred - y)**2).mean()
print(loss)

tensor(48860.3984, grad_fn=<MeanBackward0>)
