### 作業目標: 使用Pytorch進行微分與倒傳遞
這份作業我們會實作微分與倒傳遞以及使用Pytorch的Autograd。

### 使用Pytorch實作微分與倒傳遞

這裡我們很簡單的實作兩層的神經網路進行回歸問題，其中loss function為L2 loss

$$
L2\_loss = (y_{pred}-y)^2
$$

兩層經網路如下所示
$$
y_{pred} = ReLU(XW_1)W_2
$$

In [3]:
import torch
device = torch.device('cuda')

In [150]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
x = torch.randn(N, D_in, requires_grad=True)
y = torch.randn(N, D_out)

# 初始化weight W1, W2
W1, W2 = torch.ones(D_in, H, requires_grad=True), torch.ones(H, D_out, requires_grad=True)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
    # 向前傳遞: 計算y_pred
    layer_1 = x.mm(W1).relu()
    y_pred = layer_1.mm(W2)

    # 計算loss
    loss = (y_pred-y).pow(2).sum(dim=1).mean()
    print(t, loss.item())

    # 倒傳遞: 計算W1與W2對loss的微分(梯度)
    dy_hat = (y_pred-y)
    dW2 = layer_1.T.mm(dy_hat)
    dlayer_1 = dy_hat.mm(W2.T)
    
    dlayer_0 = dlayer_1.detach().where(dlayer_1>0, torch.zeros(dlayer_1.shape))
    dW1 = x.T.mm(dlayer_0)

    # 參數更新
    W2 = W2 - dW2*learning_rate
    W1 = W1 - dW1*learning_rate

0 51163964.0
1 4298668.0
2 784144.125
3 10.398697853088379
4 10.398697853088379
5 10.398697853088379
6 10.398697853088379
7 10.398697853088379
8 10.398697853088379
9 10.398697853088379
10 10.398697853088379
11 10.398697853088379
12 10.398697853088379
13 10.398697853088379
14 10.398697853088379
15 10.398697853088379
16 10.398697853088379
17 10.398697853088379
18 10.398697853088379
19 10.398697853088379
20 10.398697853088379
21 10.398697853088379
22 10.398697853088379
23 10.398697853088379
24 10.398697853088379
25 10.398697853088379
26 10.398697853088379
27 10.398697853088379
28 10.398697853088379
29 10.398697853088379
30 10.398697853088379
31 10.398697853088379
32 10.398697853088379
33 10.398697853088379
34 10.398697853088379
35 10.398697853088379
36 10.398697853088379
37 10.398697853088379
38 10.398697853088379
39 10.398697853088379
40 10.398697853088379
41 10.398697853088379
42 10.398697853088379
43 10.398697853088379
44 10.398697853088379
45 10.398697853088379
46 10.398697853088379
4

### 使用Pytorch的Autograd

In [None]:
import torch
device = torch.device('cuda')

In [199]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
x = torch.randn(N, D_in, requires_grad=True)
y = torch.randn(N, D_out)

# 初始化weight W1, W2
W1, W2 = torch.ones(D_in, H, requires_grad=True), torch.ones(H, D_out, requires_grad=True)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
    # 向前傳遞: 計算y_pred
    layer_1 = x.mm(W1).relu()
    y_pred = layer_1.mm(W2)

    # 計算loss
    loss = (y_pred-y).pow(2).sum(dim=1).mean()
    print(t, loss.item())

    # 倒傳遞: 計算W1與W2對loss的微分(梯度)
    W1.retain_grad()
    W2.retain_grad()
    loss.backward()
    dW1 = W1.grad
    dW2 = W2.grad

    # 參數更新: 這裡再更新參數時，我們不希望更新參數的計算也被紀錄微分相關的資訊，因此使用torch.no_grad()
#     with torch.no_grad():
    # 更新參數W1 W2
    W1 = W1 - dW1*learning_rate
    W2 = W2 - dW2*learning_rate
    
    # 將紀錄的gradient清空(因為已經更新參數)
    if W1.grad:
        W1.grad.zero_()
    if W2.grad:
        W2.grad.zero_()

0 29117068.0
1 24199648.0
2 20424870.0
3 17459820.0
4 15085715.0
5 13153673.0
6 11559338.0
7 10227697.0
8 9103642.0
9 8145913.0
10 7323109.5
11 6610959.0
12 5990459.5
13 5446548.0
14 4967158.5
15 4542532.5
16 4164700.25
17 3827102.0
18 3524296.0
19 3251737.75
20 3005606.75
21 2782669.75
22 2580175.5
23 2395773.0
24 2227440.0
25 2073430.5
26 1932232.125
27 1802523.625
28 1683153.875
29 1573111.0
30 1471503.25
31 1377542.625
32 1290532.25
33 1209851.0
34 1134946.75
35 1065324.25
36 1000538.375
37 940194.0625
38 883929.75
39 831422.875
40 782379.875
41 736535.0625
42 693647.3125
43 653496.5
44 615882.9375
45 580622.875
46 547549.875
47 516510.0625
48 487362.53125
49 459978.09375
50 434237.75
51 410031.875
52 387258.5625
53 365825.03125
54 345644.28125
55 326635.96875
56 308726.5
57 291846.21875
58 275931.1875
59 260921.9375
60 246763.046875
61 233402.765625
62 220792.84375
63 208888.546875
64 197647.6875
65 187031.171875
66 177002.453125
67 167526.859375
68 158572.546875
69 150109.375
70 



113 14010.1123046875
114 13280.904296875
115 12589.765625
116 11934.701171875
117 11313.810546875
118 10725.3203125
119 10167.521484375
120 9638.8154296875
121 9137.669921875
122 8662.6513671875
123 8212.392578125
124 7785.59521484375
125 7381.0361328125
126 6997.5546875
127 6634.04736328125
128 6289.47509765625
129 5962.84814453125
130 5653.2255859375
131 5359.72412109375
132 5081.5009765625
133 4817.76416015625
134 4567.7529296875
135 4330.75390625
136 4106.0869140625
137 3893.111572265625
138 3691.21923828125
139 3499.829345703125
140 3318.398193359375
141 3146.404541015625
142 2983.358154296875
143 2828.794189453125
144 2682.265625
145 2543.359375
146 2411.67822265625
147 2286.84521484375
148 2168.504150390625
149 2056.316162109375
150 1949.9620361328125
151 1849.1380615234375
152 1753.5556640625
153 1662.9443359375
154 1577.0423583984375
155 1495.6072998046875
156 1418.4058837890625
157 1345.217529296875
158 1275.834228515625
159 1210.0577392578125
160 1147.699951171875
161 1088.5