### 作業目標: 使用Pytorch進行微分與倒傳遞
這份作業我們會實作微分與倒傳遞以及使用Pytorch的Autograd。

### 使用Pytorch實作微分與倒傳遞

這裡我們很簡單的實作兩層的神經網路進行回歸問題，其中loss function為L2 loss

$$
L2\_loss = (y_{pred}-y)^2
$$

兩層經網路如下所示
$$
y_{pred} = ReLU(XW_1)W_2
$$

In [1]:
import torch
device = torch.device('cpu')

In [3]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension

# 神經網路 shape
# (D_in, H) ---> (H, D_out)

N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
###<your code>###
x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)

# 初始化weight W1, W2
# W1: (H, )
# W2: (D_out)
###<your code>###
w1 = torch.randn(D_in, H, device=device)
w2 = torch.randn(H, D_out, device=device)

# 設置learning rate
learning_rate = 1e-6


def relu(x):
    if x <= 0:
        return 0
    else:
        return x

# 訓練500個epoch
for t in range(500):
    # 向前傳遞: 計算y_pred
    ###<your code>###
    h = torch.matmul(x, w1)
    h_relu = h.clamp(min=0)
    y_pred = torch.matmul(h_relu, w2)


    # 計算loss
    ###<your code>###
    loss = (y_pred - y).pow(2).sum()
    print(t, loss.item())

    # 倒傳遞: 計算W1與W2對loss的微分(梯度)
    ###<your code>###
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.t().mm(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.t())
    grad_h = grad_h_relu.clone()
    grad_h[h < 0] = 0
    grad_w1 = x.t().mm(grad_h)

    # 參數更新
    ###<your code>###
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

0 28666620.0
1 24328416.0
2 24746308.0
3 25688580.0
4 24553776.0
5 20098540.0
6 14042011.0
7 8539967.0
8 4893785.5
9 2802629.5
10 1705515.25
11 1125734.25
12 807618.8125
13 619186.4375
14 498119.875
15 413772.625
16 351095.875
17 302163.03125
18 262602.5625
19 229869.71875
20 202341.75
21 178939.90625
22 158856.40625
23 141492.328125
24 126416.265625
25 113256.875
26 101728.25
27 91592.484375
28 82648.4296875
29 74733.3359375
30 67708.5546875
31 61453.90625
32 55871.7265625
33 50881.42578125
34 46406.9296875
35 42384.59765625
36 38763.0234375
37 35494.78125
38 32542.5
39 29873.814453125
40 27452.85546875
41 25255.2734375
42 23256.99609375
43 21436.611328125
44 19777.54296875
45 18262.59375
46 16877.423828125
47 15609.4326171875
48 14447.8173828125
49 13381.783203125
50 12403.3349609375
51 11503.421875
52 10675.2978515625
53 9913.353515625
54 9211.6748046875
55 8564.9345703125
56 7967.7412109375
57 7415.71923828125
58 6905.65625
59 6433.69287109375
60 5996.654296875
61 5591.7685546875
6

439 0.0002550444914959371
440 0.00024972244864329696
441 0.00024464866146445274
442 0.00023995799710974097
443 0.00023561634588986635
444 0.00023100370890460908
445 0.00022678333334624767
446 0.00022125472605694085
447 0.000217867418541573
448 0.00021361892868299037
449 0.00020957947708666325
450 0.00020543209393508732
451 0.00020233845862094313
452 0.00019849170348607004
453 0.00019511881691869348
454 0.00019141158554702997
455 0.00018783670384436846
456 0.00018466927576810122
457 0.0001812389527913183
458 0.00017800016212277114
459 0.0001747286005411297
460 0.00017135083908215165
461 0.00016875055734999478
462 0.00016602591495029628
463 0.00016298549599014223
464 0.0001602594566065818
465 0.0001575335772940889
466 0.000154904046212323
467 0.00015248454292304814
468 0.0001502505037933588
469 0.00014777523756492883
470 0.00014492974150925875
471 0.00014278260641731322
472 0.00014067578013055027
473 0.0001384415663778782
474 0.0001360745809506625
475 0.00013433053391054273
476 0.0001319

### 使用Pytorch的Autograd

In [4]:
import torch
device = torch.device('cpu')

In [7]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
###<your code>###
x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)

# 初始化weight W1, W2
###<your code>###
w1 = torch.randn(D_in, H, device=device, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, requires_grad=True)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
    # 向前傳遞: 計算y_pred
    ###<your code>###
    y_pred = x.mm(w1).clamp(min=0).mm(w2)

    # 計算loss
    ###<your code>###
    loss = (y_pred - y).pow(2).sum()
    print(t, loss.item())

    # 倒傳遞: 計算W1與W2對loss的微分(梯度)
    ###<your code>###
    loss.backward()

    # 參數更新: 這裡再更新參數時，我們不希望更新參數的計算也被紀錄微分相關的資訊，因此使用torch.no_grad()
    with torch.no_grad():
    # 更新參數W1 W2
    ###<your code>###
        w1.data -= learning_rate * w1.grad
        w2.data -= learning_rate * w2.grad

    # 將紀錄的gradient清空(因為已經更新參數)
    w1.grad.zero_()
    w2.grad.zero_()

0 27494528.0
1 21165056.0
2 17654414.0
3 14780239.0
4 11911998.0
5 9110906.0
6 6662204.5
7 4735918.0
8 3346841.75
9 2391595.25
10 1752747.5
11 1325245.375
12 1035023.0
13 832169.125
14 685587.5625
15 575955.375
16 491242.6875
17 423869.40625
18 368994.9375
19 323472.3125
20 285160.125
21 252547.59375
22 224530.09375
23 200345.125
24 179331.125
25 160954.125
26 144833.328125
27 130677.8515625
28 118194.203125
29 107123.609375
30 97275.0390625
31 88487.078125
32 80627.296875
33 73578.25
34 67246.875
35 61549.37890625
36 56413.8203125
37 51772.00390625
38 47564.359375
39 43749.09375
40 40284.55078125
41 37135.546875
42 34272.625
43 31661.796875
44 29280.4140625
45 27105.0546875
46 25114.6328125
47 23290.234375
48 21614.22265625
49 20075.21875
50 18657.9609375
51 17353.59375
52 16148.533203125
53 15037.791015625
54 14013.53125
55 13067.7470703125
56 12193.630859375
57 11384.9912109375
58 10636.3583984375
59 9942.732421875
60 9299.5556640625
61 8702.6484375
62 8148.3037109375
63 7632.642578

391 0.0029159418772906065
392 0.0028092903085052967
393 0.0027115591801702976
394 0.0026132892817258835
395 0.0025227670557796955
396 0.00243637734092772
397 0.0023469962179660797
398 0.0022645967546850443
399 0.002186926081776619
400 0.0021065075416117907
401 0.002035108394920826
402 0.0019664112478494644
403 0.0018964933697134256
404 0.001832201611250639
405 0.0017694653943181038
406 0.0017090511973947287
407 0.0016514104790985584
408 0.0015972054097801447
409 0.0015428635524585843
410 0.0014936940278857946
411 0.0014461888931691647
412 0.0014012466417625546
413 0.00135462183970958
414 0.0013115470064803958
415 0.001267691026441753
416 0.001230177585966885
417 0.0011888820445165038
418 0.0011502074776217341
419 0.0011147805489599705
420 0.0010821128962561488
421 0.0010489225387573242
422 0.001017941627651453
423 0.0009870256762951612
424 0.0009563605417497456
425 0.0009277019998989999
426 0.0008998432895168662
427 0.0008734815055504441
428 0.0008475991198793054
429 0.0008219984592869