### 作業目標: 使用Pytorch進行微分與倒傳遞
這份作業我們會實作微分與倒傳遞以及使用Pytorch的Autograd。

### 使用Pytorch實作微分與倒傳遞

這裡我們很簡單的實作兩層的神經網路進行回歸問題，其中loss function為L2 loss

$$
L2\_loss = (y_{pred}-y)^2
$$

兩層經網路如下所示
$$
y_{pred} = ReLU(XW_1)W_2
$$

In [1]:
import torch
import numpy as np
import torch.utils.data as Data
from torch.autograd import Variable

### 使用Pytorch的Autograd

In [2]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10
device = torch.device('cpu')

# 隨機生成x, y
x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)

# 初始化weight W1, W2
w1 = torch.randn(D_in, H, device=device, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, requires_grad=True)

# 設置learning rate
learning_rate = 1e-6

torch_dataset = Data.TensorDataset(x, y)
loader = Data.DataLoader(
    dataset=torch_dataset,      # torch TensorDataset format
    batch_size=N,      # mini batch size
    shuffle=True,               # random shuffle for training
    num_workers=2,              # subprocesses for loading data
)


# 訓練500個epoch
for t in range(500):
    hidden = x.mm(w1)
    hidden = torch.relu(hidden)
    
    # 向前傳遞: 計算y_pred
    y_pred = hidden.mm(w2)
    
    # 計算loss
    loss = torch.sum((y_pred - y).pow(2))    
    print(t, loss.item())
        
    # 倒傳遞: 計算W1與W2對loss的微分(梯度)
    loss.backward()
    
    w1.data.add_(- learning_rate * w1.grad.data)
    w2.data.add_(- learning_rate * w2.grad.data)
    
    # 參數更新
    w1.grad.data.zero_()
    w2.grad.data.zero_()

0 37248148.0


  allow_unreachable=True)  # allow_unreachable flag


1 40153540.0
2 46282496.0
3 45711860.0
4 33231308.0
5 17300958.0
6 7290821.5
7 3290431.25
8 1890482.625
9 1343146.625
10 1064332.25
11 883132.25
12 748378.375
13 641504.3125
14 554079.5625
15 481469.625
16 420621.9375
17 369290.65625
18 325604.875
19 288197.84375
20 256035.921875
21 228240.96875
22 204106.5
23 183039.34375
24 164573.3125
25 148343.109375
26 134020.109375
27 121350.8203125
28 110111.40625
29 100110.75
30 91192.609375
31 83220.3828125
32 76069.0625
33 69639.3046875
34 63847.1640625
35 58620.1953125
36 53896.98046875
37 49623.86328125
38 45747.34765625
39 42224.34375
40 39015.45703125
41 36087.63671875
42 33413.875
43 30968.76953125
44 28728.58203125
45 26675.00390625
46 24789.27734375
47 23057.19140625
48 21465.193359375
49 19997.619140625
50 18644.525390625
51 17394.37109375
52 16238.87890625
53 15169.7451171875
54 14180.236328125
55 13263.0751953125
56 12412.4462890625
57 11624.4599609375
58 10894.4150390625
59 10216.619140625
60 9585.91796875
61 8998.6123046875
62 845

402 0.0028045170474797487
403 0.0027079784777015448
404 0.00261653121560812
405 0.0025265992153435946
406 0.0024425017181783915
407 0.002359578851610422
408 0.0022812739480286837
409 0.0022066987585276365
410 0.002132140100002289
411 0.0020627998746931553
412 0.001993381418287754
413 0.0019284933805465698
414 0.0018681340152397752
415 0.0018062923336401582
416 0.0017490936443209648
417 0.0016965174581855536
418 0.0016421364853158593
419 0.0015901925507932901
420 0.001538395299576223
421 0.001492758048698306
422 0.0014462401159107685
423 0.0014016032218933105
424 0.0013586905552074313
425 0.0013176685897633433
426 0.0012767662992700934
427 0.0012398881372064352
428 0.001204164233058691
429 0.0011689934181049466
430 0.0011355041060596704
431 0.0011023309780284762
432 0.0010716997785493731
433 0.0010397059377282858
434 0.001009725732728839
435 0.0009823148138821125
436 0.0009534219861961901
437 0.0009280994418077171
438 0.0009034117683768272
439 0.0008768652915023267
440 0.000854179961606

### 使用Pytorch的No grad

In [3]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)

# 初始化weight W1, W2
w1 = torch.randn(D_in, H, device=device, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, requires_grad=True)

# 設置learning rate
# learning_rate = 1e-6
torch_dataset = Data.TensorDataset(x, y)
loader = Data.DataLoader(
    dataset=torch_dataset,      # torch TensorDataset format
    batch_size=N,      # mini batch size
    shuffle=True,               # random shuffle for training
    num_workers=2,              # subprocesses for loading data
)
for t in range(500):
    hidden = x.mm(w1)
    hidden = torch.relu(hidden)
    
    # 向前傳遞: 計算y_pred
    y_pred = hidden.mm(w2)
    
    # 計算loss
    loss = torch.sum((y_pred - y).pow(2))    
    print(t, loss.item())
    
    # 倒傳遞: 計算W1與W2對loss的微分(梯度)
    loss.backward()
    
    with torch.no_grad():
        w1.data.add_(- learning_rate * w1.grad.data)
        w2.data.add_(- learning_rate * w2.grad.data)
    
        # 參數更新
        w1.grad.data.zero_()
        w2.grad.data.zero_()

0 29610502.0
1 28210390.0
2 30297010.0
3 31218164.0
4 27519496.0
5 19709290.0
6 11593254.0
7 6077392.0
8 3180805.5
9 1825289.25
10 1191697.375
11 871269.375
12 687484.125
13 567193.0
14 479647.90625
15 411486.71875
16 356161.8125
17 310376.4375
18 271929.1875
19 239229.984375
20 211290.875
21 187248.28125
22 166479.0625
23 148442.84375
24 132704.75
25 118904.3203125
26 106776.1484375
27 96081.265625
28 86620.578125
29 78234.171875
30 70773.53125
31 64121.078125
32 58178.25390625
33 52859.03515625
34 48083.6953125
35 43794.3984375
36 39935.4375
37 36455.2734375
38 33311.453125
39 30471.189453125
40 27896.998046875
41 25560.060546875
42 23441.708984375
43 21519.720703125
44 19775.095703125
45 18185.619140625
46 16737.5078125
47 15416.486328125
48 14208.60546875
49 13103.662109375
50 12092.0107421875
51 11165.03125
52 10315.1591796875
53 9535.951171875
54 8819.99609375
55 8162.171875
56 7557.419921875
57 7000.775390625
58 6488.18603515625
59 6015.8681640625
60 5580.90576171875
61 5179.604

471 3.91527428291738e-05
472 3.854540773318149e-05
473 3.799326077569276e-05
474 3.7632758903782815e-05
475 3.714419653988443e-05
476 3.687634307425469e-05
477 3.649127029348165e-05
478 3.602370270527899e-05
479 3.558491152944043e-05
480 3.5188662877772003e-05
481 3.4839060390368104e-05
482 3.4359440178377554e-05
483 3.3965654438361526e-05
484 3.3568558137631044e-05
485 3.2980606192722917e-05
486 3.2677064154995605e-05
487 3.235888289054856e-05
488 3.177786129526794e-05
489 3.145492519252002e-05
490 3.1052430131239817e-05
491 3.0848972528474405e-05
492 3.0539198633050546e-05
493 3.0025648811715655e-05
494 2.975244387926068e-05
495 2.9388704206212424e-05
496 2.8858730729552917e-05
497 2.8646254577324726e-05
498 2.829691584338434e-05
499 2.810234946082346e-05
