### 作業目標: 使用Pytorch進行微分與倒傳遞
這份作業我們會實作微分與倒傳遞以及使用Pytorch的Autograd。

### 使用Pytorch實作微分與倒傳遞

這裡我們很簡單的實作兩層的神經網路進行回歸問題，其中loss function為L2 loss

$$
L2\_loss = (y_{pred}-y)^2
$$

兩層經網路如下所示
$$
y_{pred} = ReLU(XW_1)W_2
$$

In [2]:
import torch
device = torch.device('cpu')

In [3]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
###<your code>###
x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)

# 初始化weight W1, W2
w1 = torch.randn(D_in, H, device=device)
w2 = torch.randn(H, D_out, device=device)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
  # 向前傳遞: 計算y_pred
  ###<your code>###
    h = x.mm(w1)
    # clamp 將h的值 > 0     
    h_relu = h.clamp(min=0)
    y_pred = h_relu.mm(w2)
  # 計算loss
  ###<your code>###
    loss = (y_pred - y).pow(2).sum()
    print(t, loss.item())

  # 倒傳遞: 計算W1與W2對loss的微分(梯度)
  ###<your code>###
    grad_y_pred = 2*(y_pred - y)
    grad_w2 = h_relu.t().mm(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.t())

    grad_h = grad_h_relu
    grad_h[h < 0] = 0
    grad_w1 = x.t().mm(grad_h)

  # 參數更新
  ###<your code>###
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

0 30895606.0
1 32158418.0
2 35924376.0
3 36301384.0
4 29768268.0
5 18872060.0
6 9794407.0
7 4751481.5
8 2520135.0
9 1573732.0
10 1137062.5
11 898872.5625
12 744783.5
13 631833.375
14 542919.625
15 470366.21875
16 409881.4375
17 358881.03125
18 315471.6875
19 278360.25
20 246374.78125
21 218695.875
22 194664.25
23 173743.578125
24 155443.21875
25 139411.875
26 125262.75
27 112747.671875
28 101647.4296875
29 91779.59375
30 82989.2578125
31 75153.0625
32 68143.3203125
33 61855.203125
34 56208.49609375
35 51134.2578125
36 46569.1328125
37 42451.6015625
38 38736.98828125
39 35377.54296875
40 32339.044921875
41 29590.931640625
42 27097.83984375
43 24834.36328125
44 22778.33203125
45 20907.248046875
46 19203.0390625
47 17650.451171875
48 16235.091796875
49 14943.8291015625
50 13762.962890625
51 12682.431640625
52 11693.1494140625
53 10786.615234375
54 9955.7041015625
55 9194.146484375
56 8494.5654296875
57 7852.0166015625
58 7259.9921875
59 6715.4462890625
60 6214.603515625
61 5753.4028320312

### 使用Pytorch的Autograd

In [None]:
import torch
device = torch.device('cpu')

In [8]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)

# 初始化weight W1, W2
w1 = torch.randn(D_in, H, device=device, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, requires_grad=True)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
  # 向前傳遞: 計算y_pred
    y_pred = x.mm(w1).clamp(min=0).mm(w2)
  
  # 計算loss
    loss = (y_pred - y).pow(2).sum()
    print(t, loss.item())

  # 倒傳遞: 計算W1與W2對loss的微分(梯度)
    loss.backward()

  # 參數更新: 這裡再更新參數時，我們不希望更新參數的計算也被紀錄微分相關的資訊，因此使用torch.no_grad()
    with torch.no_grad(): # 更新參數W1 W2
        w1 = w1 - learning_rate * w1.grad
        w2 = w2 - learning_rate * w2.grad
        w1.grad.zero_()
        w2.grad.zero_()

0 34149700.0


AttributeError: 'NoneType' object has no attribute 'zero_'

In [3]:
x = torch.randn(2, 3)
x

tensor([[-1.3351, -0.6220, -0.0067],
        [ 0.0886,  0.4465, -0.0856]])

In [4]:
x.t()

tensor([[-1.3351,  0.0886],
        [-0.6220,  0.4465],
        [-0.0067, -0.0856]])

In [5]:
x = torch.rand((2,2))
y = torch.rand((2,2))
z = torch.rand((2,2), requires_grad=True)
a = x + y
b = a + z