# Tensor

### Warm up: NumPy

哇...要用NumPy直接來寫個network!!<br>
NumPy是用於科學計算的通用框架 (函式庫跟框架的差別要再查查)<br>
可通過使用NumPy手刻一個網路的前向與反向傳遞<br>
現在來試試看用NumPy實現三階多項式擬合為正弦函數

In [2]:
import numpy as np
import math

# 創造隨機輸入輸出資料
x = np.linspace(-math.pi, math.pi, 2000) #在指定的時間間隔內返回均勻分布的數字，所以這個例子是在正負pi間返回2000組數字
y = np.sin(x)

# 隨機初始化權重
a = np.random.randn()
b = np.random.randn()
c = np.random.randn()
d = np.random.randn()

learning_rate = 1e-6 #e表示科學計數法符號 1e-6代表10的-6次方

for t in range(2000):
    # Forward pass: compute predicted y
    # y = a + b x + c x^2 + d x^3
    y_pred = a + b * x + c * x ** 2 + d * x ** 3
    
    # Compute and print loss
    loss = np.square(y_pred - y).sum()
    if t % 100 == 99:
        print(f"step:{t}, loss:{loss}")
    
    # Back propagation to compute gradients of a, b, c, d with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_a = grad_y_pred.sum()
    grad_b = (grad_y_pred*x).sum()
    grad_c = (grad_y_pred*x**2).sum()
    grad_d = (grad_y_pred*x**3).sum()
    
    # Update weights
    a -= learning_rate * grad_a
    b -= learning_rate * grad_b
    c -= learning_rate * grad_c
    d -= learning_rate * grad_d
print(f"Result: y = {a} + {b} x + {c} x^2 + {d} x^3")

step:99, loss:1161.7638298666643
step:199, loss:824.6932594122708
step:299, loss:586.1739011501846
step:399, loss:417.38517478880146
step:499, loss:297.94144738046776
step:599, loss:213.41680151205318
step:699, loss:153.60271616183064
step:799, loss:111.27511462948257
step:899, loss:81.32186566419584
step:999, loss:60.12535913983492
step:1099, loss:45.12558520848639
step:1199, loss:34.51094678063097
step:1299, loss:26.99946256051163
step:1399, loss:21.68393503027248
step:1499, loss:17.922383298020286
step:1599, loss:15.260507685948362
step:1699, loss:13.376821628445805
step:1799, loss:12.04382413171274
step:1899, loss:11.100523367937218
step:1999, loss:10.432992924043464
Result: y = -0.04254816928011967 + 0.8565911332504037 x + 0.0073402672054135145 x^2 + -0.09330909412190755 x^3


## PyTorch: Tensor

Tensor概念與NumPy很像，最大的差別是Tensor可以利用GPU加速運算!<br>
PyTorch提供許多Tensor上操作的函數。幕後，Tensor也可以跟蹤計算圖和梯度，也可用做科學計算的通用工具<br>
用Tensor實現剛剛NumPy所實現的網路

In [4]:
import torch
import math
dtype = torch.float
device = torch.device("cuda:0") #試試看用GPU跑!!
#device = torch.device("cpu")

# Create random input and output data
x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)

# 隨機初始化權重
a = torch.randn((), device=device, dtype= dtype)
b = torch.randn((), device=device, dtype= dtype)
c = torch.randn((), device=device, dtype= dtype)
d = torch.randn((), device=device, dtype= dtype)

learning_rate = 1e-6

for t in range(2000):
    # Forward pass: 計算y的預測值 以 三次多項式擬合正弦值
    y_pred = a + b * x + c * x **2 + d * x ** 3
    
    # Compute and print loss
    loss = (y_pred - y).pow(2).sum().item()
    if t % 100 == 99:
        print(f"step:{t}, loss:{loss}")
    
    # Backprop to compute gradients of a, b, c, d with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_a = grad_y_pred.sum()
    grad_b = (grad_y_pred*x).sum()
    grad_c = (grad_y_pred*x ** 2).sum()
    grad_d = (grad_y_pred*x ** 3).sum()
    
    # Update weights using gradient descent
    a -= learning_rate * grad_a
    b -= learning_rate * grad_b
    c -= learning_rate * grad_c
    d -= learning_rate * grad_d
print(f"Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3")    

step:99, loss:5608.47900390625
step:199, loss:3756.371826171875
step:299, loss:2518.57568359375
step:399, loss:1690.819091796875
step:499, loss:1136.9091796875
step:599, loss:765.9970703125
step:699, loss:517.4488525390625
step:799, loss:350.77313232421875
step:899, loss:238.9150390625
step:999, loss:163.785400390625
step:1099, loss:113.2826156616211
step:1199, loss:79.30513000488281
step:1299, loss:56.425228118896484
step:1399, loss:41.004188537597656
step:1499, loss:30.60055160522461
step:1599, loss:23.57510757446289
step:1699, loss:18.826112747192383
step:1799, loss:15.612659454345703
step:1899, loss:13.436006546020508
step:1999, loss:11.960025787353516
Result: y = 0.03892526775598526 + 0.8156076669692993 x + -0.006715255323797464 x^2 + -0.08747954666614532 x^3


## Autograd
### PyTorch: Tensor & Autograd
PyTorch中的Autograd可以自動微分，自動計算NN中的反向傳播<br>
使用Autograd時，網路的正向傳播將定義計算圖；圖中的Node為Tensor，Edge為輸入Tensor產生輸出Tensor的函數。<br>
通過該Graph進行反向傳播，可輕鬆計算梯度。

每個Tensor代表圖中的一個Node。<br>
如果x具有x.requires_grad = True的Tensor，則x.grad是另一個Tensor，其保持x相對於某個標量值的梯度。<br>
這邊使用Autograd來實現前面所做的例子，來自動實現反向傳播。

In [6]:
import torch
import math
dtype = torch.float
device = torch.device("cuda:0")
#device = torch.device("cpu")

# 創建虛擬的input和對應的output
# 預設的requires_grad = False 
# compute gradients with respect to these Tensors during the backward pass.
x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)

# Create random Tensors for weights. For a third order polynomial, we need
# 4 weights: y = a + b x + c x^2 + d x^3
# Setting requires_grad = True indicates that we want to compute gradients with respect to these Tensors during the backward pass!!

a = torch.randn((), device=device, dtype=dtype, requires_grad=True)
b = torch.randn((), device=device, dtype=dtype, requires_grad=True)
c = torch.randn((), device=device, dtype=dtype, requires_grad=True)
d = torch.randn((), device=device, dtype=dtype, requires_grad=True)

learning_rate = 1e-6

for t in range(2000):
    # Forward pass: compute predicated y using operations on Tensors.
    y_pred = a + b * x + c * x ** 2 + d * x ** 3
    
    # loss is a Tensor of shape (1, )
    # loss.item() gets the scalar value held in the loss.
    loss = (y_pred - y).pow(2).sum() #MSE的loss計算
    if t % 100 == 99:
        print(f"step:{t}, loss:{loss.item()}")
    
    # Use autograd to compute the backward pass.
    # This call will compute the gradient of loss with respect to "all Tensors with requires_grad = True"!!
    # After this call a.grad, b.grad, c.grad and d.grad will be Tensors holding the gradient of the loss with respect to a, b, c, d respectively
    loss.backward()
    
    # 使用梯度下降法手動更新權重！
    # Wrap in torch.no_grad()
    # because weights have requires_grad=True, but we don't need to track this in autograd.
    
    with torch.no_grad():
        a -= learning_rate * a.grad
        b -= learning_rate * b.grad
        c -= learning_rate * c.grad
        d -= learning_rate * d.grad
        
        # 在更新權重後記得將累積梯度全數歸零～～
        a.grad=None
        b.grad=None
        c.grad=None
        d.grad=None
print(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')

step:99, loss:868.8411865234375
step:199, loss:614.483154296875
step:299, loss:435.4798583984375
step:399, loss:309.4645690917969
step:499, loss:220.72369384765625
step:599, loss:158.2130126953125
step:699, loss:114.16705322265625
step:799, loss:83.12327575683594
step:899, loss:61.238075256347656
step:999, loss:45.80586242675781
step:1099, loss:34.9215087890625
step:1199, loss:27.243152618408203
step:1299, loss:21.825389862060547
step:1399, loss:18.00198745727539
step:1499, loss:15.30331039428711
step:1599, loss:13.398152351379395
step:1699, loss:12.052997589111328
step:1799, loss:11.103099822998047
step:1899, loss:10.432229042053223
step:1999, loss:9.958356857299805
Result: y = -0.03537272661924362 + 0.861544668674469 x + 0.006102385465055704 x^2 + -0.09401369094848633 x^3


## PyTorch: 定義新的Autograd函數

在背後，每個原始的Autograd運算符實際上都是在Tensor上運行的兩個函數。<br>
正向函數從輸入Tensor計算輸出Tensor。<br>
反向函數接收相對於某個標量值的輸出Tensor的梯度，並計算相對於相同標量值的輸入Tensor的梯度。

PyTorch中，可以通過定義**torch.autograd.Function的子類並實現forward與backward函數**來輕鬆定義自己的Autograd運算符。  
然後，我們可以通過構造實例並像調用函數一樣使用新的Autograd運算符，並傳遞包含輸入數據的Tensor。

In [7]:
# 這個案例中，將模型定義為 y = a + b p[3] (c + dx)
# p[3](x) = 1/2 (5x ^ 3 - 3x) 是三次的勒讓德多項式。
# 這個案例來編寫自定義的Autograd function來計算p[3]的前進與後退
import torch
import math

class LegendrePolynomial3(torch.autograd.Function):
    """
    We can implement our own custom autograd Functions by subclassing torch.autograd.Function
    and implementing the forward and backward passes which operate on Tensors.
    """
    @staticmethod
    def forward(ctx, input):
        """
        In the forward pass we receive a Tensor containing the input and return a Tensor containing the output.
        ctx is a context object that can be used to stash information for backward computation.
        You can cache arbitrary objects for use in the backward pass using the ctx.save_for_backward method.
        """
        ctx.save_for_backward(input)
        return 0.5 * (5 * input ** 3 - 3 * input)
    
    @staticmethod
    def backward(ctx, grad_output):
        """
        In the backward pass we receive a Tensor containing the gradient of the loss with respect to the output,
        and we need to compute the gradient of the loss with respect to the input.
        """
        input, = ctx.saved_tensors
        return grad_output * 1.5 * (5 * input ** 2 - 1)
    
dtype = torch.float
device = torch.device("cpu")

x = torch.linspace(-math.pi, math.pi, 2000, device = device, dtype = dtype)
y = torch.sin(x)

a = torch.full((), 0.0, device = device, dtype = dtype, requires_grad = True)
b = torch.full((), -1.0, device = device, dtype = dtype, requires_grad = True)
c = torch.full((), 0.0, device = device, dtype = dtype, requires_grad = True)
d = torch.full((), 0.3, device = device, dtype = dtype, requires_grad = True)

learning_rate = 5e-6

for t in range(2000):
    # To apply our Function, we use Function.apply method. We alias this as 'P3'.
    P3 = LegendrePolynomial3.apply # 別名叫做P3
    
    # Forward pass: compute predicted y using operations; we compute P3 using our custom autograd operation.
    y_pred = a + b * P3(c + d * x)
    
    # Compute loss
    loss = (y_pred - y).pow(2).sum()
    if t % 100 == 99:
        print(f"step: {t}, loss: {loss.item()}")
    
    # Use autograd to compute the backward pass.
    loss.backward()
    
    # 更新權重
    with torch.no_grad():
        a -= learning_rate * a.grad
        b -= learning_rate * b.grad
        c -= learning_rate * c.grad
        d -= learning_rate * d.grad
        
        a.grad = None
        b.grad = None
        c.grad = None
        d.grad = None

print(f'Result: y = {a.item()} + {b.item()} * P3({c.item()} + {d.item()} x)')


step: 99, loss: 209.95834350585938
step: 199, loss: 144.66018676757812
step: 299, loss: 100.70249938964844
step: 399, loss: 71.03519439697266
step: 499, loss: 50.97850799560547
step: 599, loss: 37.403133392333984
step: 699, loss: 28.206867218017578
step: 799, loss: 21.973188400268555
step: 899, loss: 17.7457275390625
step: 999, loss: 14.877889633178711
step: 1099, loss: 12.931766510009766
step: 1199, loss: 11.610918045043945
step: 1299, loss: 10.714258193969727
step: 1399, loss: 10.10548210144043
step: 1499, loss: 9.692106246948242
step: 1599, loss: 9.411375045776367
step: 1699, loss: 9.220745086669922
step: 1799, loss: 9.091285705566406
step: 1899, loss: 9.003361701965332
step: 1999, loss: 8.943639755249023
Result: y = -5.423830273798558e-09 + -2.208526849746704 * P3(1.3320399228078372e-09 + 0.2554861009120941 x)


## nn.Module
構建大型神經網路，原始的Autograd可能不夠用  
因此在構建nn時，我們會想將計算安排在Layer中，某些Layer具有可學習的參數  
這些參數會在學習期間進行優化。 

在TensorFlow中，像是Keras, TensorFlow-Slim, TFLearn之類的包在原始計算圖上提供了更高層次的抽象，可用於構建NN

PyTorch中，nn包達到一樣的目的。  
nn定義了一組Module，大致等效於神經網路層。  
Module接收輸入Tensor並計算輸出Tensor，也可以保持內部狀態，例如包含可學習參數的Tensor  
nn包還定義了一組有用的loss function，可在訓練nn時做使用。

In [8]:
# 使用nn構建多項式模型網路
import torch
import math

x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

# 這個案例中， 輸出y是一個(x, x^2, x^3)的線性函數 
# 所以可以看做線性神經網路，先準備tensor(x, x^2, x^3)
p = torch.tensor([1, 2, 3])
xx = x.unsqueeze(-1).pow(p)

# x.unsqueeze(-1) has shape (2000, 1), and p has shape(3, ), for this case, broadcasting semantics will apply to obtain a tensor of shape (2000, 3)

# Use the nn package to define our model as a sequence of layers.
# nn.Sequential is a Module which contains other Modules, and applies them in sequence to produce its output.
# The Linear Module computes output from input using a linear function, and holds interal Tensors for its weight and bias.
# The Flatten layer flatens the output of the linear layer to a 1D tensor, to match the shape of 'y'.
model = torch.nn.Sequential(
    torch.nn.Linear(3, 1),
    torch.nn.Flatten(0, 1)
)

# The nn package also contains definitions of popular loss functions!!
# in this case we will use Mean Squared Error (MSE) as our loss function.
loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-6
for t in range(2000):
    # 前向傳播，藉由傳遞x to the model來計算預測值y
    # Module objects override the __call__ operator so you can call them like functions.
    # When doing so you pass a Tensor of input data to the Module and it produces a Tensor of output data.
    y_pred = model(xx)
    
    # 計算loss
    # 我們傳遞Tensors 包含預測和實際的y值，以及loss function會返回一個包含loss的Tensor
    loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(f"step: {t}, loss: {loss.item()}")
    
    # 在跑backward之前要先將梯度值歸零
    model.zero_grad()
    
    # Backward pass: compute gradient of the loss with respect to all the learnable parameters of the model.
    # Internally, the parameters of each Module are stored in Tensors with requires_grad = True, so this call will compute gradients for all learnable parametters in the model.
    loss.backward()
    
    # 用梯度下降法更新權重
    # 每個參數都是一個Tensor，我們可以用先前的做法存取梯度
    with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate * param.grad
            
# You can access the first layer of 'model' like accessing the first item of a list
linear_layer = model[0]

# For linear layer, its parameters are stored as 'weight' and 'bias'.
print(f'Result: y = {linear_layer.bias.item()} + {linear_layer.weight[:, 0].item()} x + {linear_layer.weight[:, 1].item()} x^2 + {linear_layer.weight[:, 2].item()} x^3')

step: 99, loss: 654.8340454101562
step: 199, loss: 437.47344970703125
step: 299, loss: 293.3116455078125
step: 399, loss: 197.67909240722656
step: 499, loss: 134.2257080078125
step: 599, loss: 92.11399841308594
step: 699, loss: 64.15941619873047
step: 799, loss: 45.59776306152344
step: 899, loss: 33.26963806152344
step: 999, loss: 25.079345703125
step: 1099, loss: 19.63631820678711
step: 1199, loss: 16.01784896850586
step: 1299, loss: 13.611520767211914
step: 1399, loss: 12.010741233825684
step: 1499, loss: 10.945393562316895
step: 1599, loss: 10.236116409301758
step: 1699, loss: 9.763710021972656
step: 1799, loss: 9.4489164352417
step: 1899, loss: 9.23903751373291
step: 1999, loss: 9.099055290222168
Result: y = -0.007070383056998253 + 0.8417672514915466 x + 0.0012197582982480526 x^2 + -0.09120053052902222 x^3


## PyTorch: optim
先前都是使用torch.no_grad() 手動更改可學習參數的Tensor來更新模型的權重  
對於隨機梯度下降來說，這很easy.  
但實戰中，正常更新參數的優化器都會使用更複雜的(e.g. AdaGrad, RMSProp, Adam等) 來訓練NN

PyTorch中的optim包抽象了優化器的方法，並提供常用優化算法的實現

這個案例使用optim包提供的RMSprop來優化model.

In [9]:
import torch
import math

x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

p = torch.tensor([1, 2, 3])
xx = x.unsqueeze(-1).pow(p)

# Use the nn package to define our model and loss function.
model = torch.nn.Sequential(
    torch.nn.Linear(3, 1),
    torch.nn.Flatten(0, 1)
)
loss_fn = torch.nn.MSELoss(reduction='sum')

# 重頭戲～ use optim package to define an Optimizer that will update the weights of the model for us.
# Here we will use RMSprop
# the optim package contains many other optimization algorithms.
# The first argument to the RMSprop constructor tells the optimizer "which Tensors it should update".

learning_rate = 1e-3
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)
for t in range(2000):
    # Forward pass: compute predicted y by passing x to the model
    y_pred = model(xx)
    
    # Compute and print loss.
    loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(f"step: {t}, loss: {loss.item()}")
    
    # Before the backward pass, use the optimizer object to zero all of the gradients for the variables it will update (which are the learnable weights of the model).
    # This is because by default, gradients are accumulated in buffers( i.e, not overwritten) whenever .backward() is called.
    # Checkout docs of torch.autograd.backward for more details.
    optimizer.zero_grad()
    
    # Backward pass: compute gradient of the loss with respect to model parameters
    loss.backward()
    
    # Calling the step function on an Optimizer makes an update to its parameters
    optimizer.step()
    
linear_layer = model[0]

print(f'Result: y = {linear_layer.bias.item()} + {linear_layer.weight[:, 0].item()} x + {linear_layer.weight[:, 1].item()} x^2 + {linear_layer.weight[:, 2].item()} x^3')

step: 99, loss: 23213.193359375
step: 199, loss: 10697.0380859375
step: 299, loss: 5156.4345703125
step: 399, loss: 3104.404296875
step: 499, loss: 2487.3955078125
step: 599, loss: 2176.162841796875
step: 699, loss: 1886.525390625
step: 799, loss: 1611.3702392578125
step: 899, loss: 1362.6876220703125
step: 999, loss: 1142.302490234375
step: 1099, loss: 947.1959838867188
step: 1199, loss: 773.845947265625
step: 1299, loss: 620.3523559570312
step: 1399, loss: 485.9271240234375
step: 1499, loss: 370.0064392089844
step: 1599, loss: 272.1261291503906
step: 1699, loss: 191.9249267578125
step: 1799, loss: 127.76820373535156
step: 1899, loss: 79.59146118164062
step: 1999, loss: 45.88486862182617
Result: y = -0.00030105706537142396 + 0.6701569557189941 x + -0.0003019354771822691 x^2 + -0.0662439838051796 x^3


## PyTorch: 自定義 nn Module
有時候有些模型Sequential絕對做不出來，對於這種情況可使用nn.Module並定義一個forward來定義自己的Module<br>
該Module使用其他Module或在Tensors上的其他自動轉換操作來接收輸入Tensors並生成輸出Tensors.

In [None]:
#三階多項式實現自定義Module subclasses
import torch
import math

class Polynomial3(torch.nn.Module):
    def __init__(self):
        """
        In the constructor we instantiate four parameters and assign them as
        member parameter.
        """
        super().__init__()
        self.a = torch.nn.Parameter(torch.randn(()))
        self.b = torch.nn.Parameter(torch.randn(()))
        self.c = torch.nn.Parameter(torch.randn(()))
        self.d = torch.nn.Parameter(torch.randn(()))
        
    def forward(self, x):
        """
        In the forward function
        """