## 1. Numpy 
Xây dựng một MLP đơn giản với 2 hidden layer và thực hiện các thao tác forward, tính toán hàm loss, backward bằng Numpy.

<img src="https://images.deepai.org/glossary-terms/49157de013394ab7a36022759a55b6aa/multipercep.jpg">

In [1]:
import numpy as np

### Khởi tạo dữ liệu random

In [2]:
n, input_dim, hidden_dim, output_dim = 64, 784, 100, 10

#create random input and output data
x = np.random.randn(n, input_dim)
y = np.random.randn(n, output_dim)

In [3]:
x.shape

(64, 784)

In [4]:
y.shape

(64, 10)

### Khởi tạo weight random

In [5]:
#random initialize weights
w1 = np.random.randn(input_dim, hidden_dim)
w2 = np.random.randn(hidden_dim, output_dim)

In [6]:
print(w1.shape)
print(w2.shape)

(784, 100)
(100, 10)


### Khởi tạo learning_rate

In [7]:
learning_rate = 1e-6

### Khởi tạo quá trình forward và tiến hành backward

In [8]:
#loop for 500 epochs
for i in range(500):
    #forward pass: compute predicted y
    h = x.dot(w1)
    h_relu = np.maximum(h, 0)
    y_pred = h_relu.dot(w2)
    
    #Compute and print los
    loss = np.square(y_pred - y).sum()
    if( i%100 == 99):
        print("Epoch {} loss = {}".format(i, loss))
    
    #Backward to compute gradients of w1 and w2 with respect to loss
    grad_y_pred = 2.0*(y_pred - y)
    grad_w2 = h_relu.T.dot(grad_y_pred)
    grad_h_relu = grad_y_pred.dot(w2.T)
    grad_h = grad_h_relu.copy()
    grad_h[h<0] = 0
    grad_w1 = x.T.dot(grad_h)
    
    #update weights
    w1 -= learning_rate*grad_w1
    w2 -= learning_rate*grad_w2

Epoch 99 loss = 3569.082604967826
Epoch 199 loss = 104.48886311840033
Epoch 299 loss = 4.861781392265676
Epoch 399 loss = 0.2794481522763724
Epoch 499 loss = 0.017795633906585888


## 2. Pytorch

In [9]:
import torch

### Khởi tạo data tensors

In [10]:
n, input_dim, hidden_dim, output_dim = 64, 784, 100, 10

#create random input and output data
x = torch.randn(n, input_dim)
y = torch.randn(n, output_dim)

### Khởi tạo weight tensor


In [11]:
w1 = torch.randn(input_dim, hidden_dim)
w2 = torch.randn(hidden_dim, output_dim)

### Training process

In [12]:
learning_rate = 1e-6

In [13]:
for t in range(500):
    h = x.mm(w1)
    h_relu = h.clamp(min=0)
    y_pred = h_relu.mm(w2)
    
    #Compute and print loss
    loss = (y_pred - y).pow(2).sum()
    if t%100 == 99:
        print("Epoch: {} loss = {}".format(t, loss))
        
    #backward to compute gradient of w1 and w2 with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.t().mm(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.t())
    grad_h = grad_h_relu.clone()
    grad_h[h<0] = 0
    grad_w1 = x.t().mm(grad_h)
    
    #update weight using gradient descent
    w1 -= learning_rate*grad_w1
    w2 -= learning_rate*grad_w2
    

Epoch: 99 loss = 1503.0023193359375
Epoch: 199 loss = 11.835199356079102
Epoch: 299 loss = 0.13097688555717468
Epoch: 399 loss = 0.0019004193600267172
Epoch: 499 loss = 0.00013900772319175303


### Using AutoGrad

In [16]:
w1 = torch.randn(input_dim, hidden_dim, requires_grad=True)
w2 = torch.randn(hidden_dim, output_dim, requires_grad=True)

for t in range(500):
    #forward
    y_pred = x.mm(w1).clamp(min=0).mm(w2)
    
    #Compute loss
    loss = (y_pred - y).pow(2).sum()
    
    if t% 100 ==99:
        print("Epoch {}: loss = {}".format(t, loss.item()))
    
    #backward
    loss.backward()
    
    #update weight
    with torch.no_grad():
        w1 -= learning_rate*w1.grad
        w2 -= learning_rate*w2.grad
        
        #manually zero the gradients after updateing weights
        w1.grad.zero_()
        w2.grad.zero_()

Epoch 99: loss = 2206.612060546875
Epoch 199: loss = 29.66794776916504
Epoch 299: loss = 0.5643945932388306
Epoch 399: loss = 0.012877823784947395
Epoch 499: loss = 0.0005371177685447037


### Using Newral Netword Module

In [18]:
N, D_in, H, D_out = 64, 1000, 100, 10

x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

model = torch.nn.Sequential(
        torch.nn.Linear(D_in, H),
        torch.nn.ReLU(),
        torch.nn.Linear(H, D_out)
)

In [19]:
model

Sequential(
  (0): Linear(in_features=1000, out_features=100, bias=True)
  (1): ReLU()
  (2): Linear(in_features=100, out_features=10, bias=True)
)

In [20]:
loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-4

for t in range(500):
    y_pred = model(x)
    loss = loss_fn(y_pred, y)
    
    if t%100 == 99:
        print(t, loss.item())
        
    #zero the gradient s before running the backward pass
    model.zero_grad()
    loss.backward()
    
    with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate*param.grad

99 1.8833938837051392
199 0.02831760235130787
299 0.000815054343547672
399 3.369090336491354e-05
499 1.833533701756096e-06


### Using module optim 

In [21]:
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

for t in range(500):
    y_pred = model(x)
    
    loss = loss_fn(y_pred, y)
    
    if t%100 ==99:
        print(t, loss.item())
        
    optimizer.zero_grad()
    
    loss.backward()
    
    optimizer.step()

99 1.075982936526998e-06
199 7.208559691207483e-05
299 0.00016871863044798374
399 0.00025269435718655586
499 0.00034803530434146523


### Custum module with nn.Module

In [26]:
class TowLayerNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        """
        In the constructor we instantiate two nn.Linear modules and assign them as
        member variables.
        """ 
        super(TowLayerNet, self).__init__()
        self.linear1 = torch.nn.Linear(D_in, H)
        self.linear2 = torch.nn.Linear(H, D_out)
        
    def forward(self, x):
        """
        In the forward function we accept a Tensor of input data and we must return
        a Tensor of output data. We can use Modules defined in the constructor as
        well as arbitrary operators on Tensors.
        """
        h_relu = self.linear1(x).clamp(min=0)
        y_pred = self.linear2(h_relu)
        return y_pred

In [28]:
model = TowLayerNet(D_in, H, D_out)

In [29]:
model

TowLayerNet(
  (linear1): Linear(in_features=1000, out_features=100, bias=True)
  (linear2): Linear(in_features=100, out_features=10, bias=True)
)