<a href="https://colab.research.google.com/github/dbsxogh09/AI504/blob/main/ai504_03_pytorch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Week 3: PyTorch, Logistic Regression and MLP

- We will cover basic concepts of PyTorch Framework (tensor operations, GPU utilizing and autograd)
- We will implement simple logistic regression and multinomial logistic regression (softmax) with PyTorch
- We will use simple linear model and multi-layer perceptron (MLP) in this class

If you have any questions, feel free to ask
- For additional questions, post questions in classum.

## Why PyTorch?

- Intuitive and concise code
- Define by Run method (Tensorflow is Define and Run method)
- High compatibility with Numpy (almost one-to-one mapping)

![picture](https://drive.google.com/uc?id=1nAfTkF8Kp4YEI1pBeShs3L7NCPHx_iHQ)

## 0. Prelim: Load packages & GPU setup

In [1]:
# visualize current GPU usages in your server
!nvidia-smi

Thu Oct  5 13:10:25 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   62C    P8    10W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [2]:
!pip install gpustat
!gpustat

Collecting gpustat
  Downloading gpustat-1.1.1.tar.gz (98 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m98.1/98.1 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting nvidia-ml-py>=11.450.129 (from gpustat)
  Downloading nvidia_ml_py-12.535.108-py3-none-any.whl (36 kB)
Collecting blessed>=1.17.1 (from gpustat)
  Downloading blessed-1.20.0-py2.py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.4/58.4 kB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages: gpustat
  Building wheel for gpustat (pyproject.toml) ... [?25l[?25hdone
  Created wheel for gpustat: filename=gpustat-1.1.1-py3-none-any.whl size=26534 sha256=468267f58618de5da930a6d195640b781d241050ff4d8b20ffd3f2056aaeebed
  Stored in directory: /root/.cache/pip/

In [3]:
# set gpu by number
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'  # setting gpu number

In [4]:
# load packages
!pip install torch
!pip install numpy
import torch
import numpy as np



In [5]:
# print the version of PyTorch
print(torch.__version__)

2.0.1+cu118


## 1. PyTorch Tensors and Numpy

PyTorch use **tensor**: the basic data structure in PyTorch.\
**Tensor: n-dimensional array + GPU calculation is supported**\
**Almost the same with Numpy array**

![picture](https://drive.google.com/uc?id=1z2v05mGyhP_FpEa3Z4JsNpgbtEnkg0bo)

### PyTorch Tensors and Numpy shares almost identical grammer


**We will show some examples of:**
- Same operation with identical grammer
- Same operation with different grammer
- Different operation with same grammer

**We will not handle all examples in this class :(**
- For more examples, see the following reference: https://github.com/wkentaro/pytorch-for-numpy-users

**First! Define Numpy array and PyTorch tensor**

In [6]:
np_array_1 = np.array([1, 2, 3, 4])
np_array_2 = np.array([5, 6, 7, 8])
torch_tensor_1 = torch.tensor([1, 2, 3, 4])
torch_tensor_2 = torch.tensor([5 ,6 ,7, 8])

print (np_array_1)
print (np_array_2)
print (torch_tensor_1)
print (torch_tensor_2)

[1 2 3 4]
[5 6 7 8]
tensor([1, 2, 3, 4])
tensor([5, 6, 7, 8])


**1) Same operations with identical grammer**

Example) Get the shape of the tensor

In [None]:
# numpy
print (np_array_1.shape)

# torch
print (torch_tensor_1.shape)
print (torch_tensor_1.size()) # size() and shape operation is identical in torch

(4,)
torch.Size([4])
torch.Size([4])


**2) Same operations with different grammer**

Example 1) Concatenate two tensors
- numpy use `np.concatenate`
- torch use `torch.cat`
- IMPORTANT: axis (numpy) and dim (torch) is identical

In [None]:
# numpy
np_concate = np.concatenate([np_array_1, np_array_2], axis=0)
print ('----numpy----')
print (np_concate)

# torch
torch_concate= torch.cat([torch_tensor_1, torch_tensor_2], dim=0)
print ('----torch----')
print (torch_concate)

----numpy----
[1 2 3 4 5 6 7 8]
----torch----
tensor([1, 2, 3, 4, 5, 6, 7, 8])


Example 2) reshape the tensor shape
- numpy use `X.reshape`
- torch use `X.view`
- IMPORTANT: axis (numpy) and dim (torch) is identical

In [None]:
# numpy
np_reshaped = np_concate.reshape(4, 2)
print ('----numpy----')
print (np_reshaped)
print (np_reshaped.shape)

# torch
torch_reshaped = torch_concate.view(4, 2)
print ('----torch----')
print (torch_reshaped)
print (torch_reshaped.shape)

----numpy----
[[1 2]
 [3 4]
 [5 6]
 [7 8]]
(4, 2)
----torch----
tensor([[1, 2],
        [3, 4],
        [5, 6],
        [7, 8]])
torch.Size([4, 2])


**3) Different operations with same grammer (Confusing operations)**

Example) manipulation tensors
- Same grammer `repeat`  has different operations

In [16]:
x = np.array([1, 2, 3])
x_repeat = x.repeat(2)

print ('----numpy----')
print (x)
print (x_repeat)

x = torch.tensor([1, 2, 3])
x_repeat = x.repeat(2)

print ('----torch----')
print (x)
print (x_repeat)

# To obtain the same result with np.repeat (will skip explanation: you should be proficient with reshaping operations)
print('----obtain the same result-----')
x_repeat = x.view(3, 1)
print (x_repeat)

x_repeat = x_repeat.repeat(1, 2)
print (x_repeat)

x_repeat = x_repeat.view(-1)
print (x_repeat)

----numpy----
[1 2 3]
[1 1 2 2 3 3]
----torch----
tensor([1, 2, 3])
tensor([1, 2, 3, 1, 2, 3])
----obtain the same result-----
tensor([[1],
        [2],
        [3]])
tensor([[1, 1],
        [2, 2],
        [3, 3]])
tensor([1, 1, 2, 2, 3, 3])


In [17]:
# similar manipulation operation: stack & repeat
x = torch.tensor([1, 2, 3])
x_repeat = x.repeat(4)
x_stack = torch.stack([x, x, x, x])

print (x_repeat)
print (x_stack)
print (x_repeat.view(4, 3)) # reshape x
print(x_repeat.device)

tensor([1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3])
tensor([[1, 2, 3],
        [1, 2, 3],
        [1, 2, 3],
        [1, 2, 3]])
tensor([[1, 2, 3],
        [1, 2, 3],
        [1, 2, 3],
        [1, 2, 3]])
cpu


## 2. Tensor operations under GPU utilization

Deep learning frameworks utilize GPUs to accelarate computations.

In this section, we will learn **how to utilize GPU** in PyTorch

In [18]:
print(torch.cuda.is_available())  # Is GPU accessible?

True


In [19]:
a = torch.ones(3)
b = torch.randn(100, 50, 3)

In [20]:
print(a.device)
print(b.device)

cpu
cpu


In [21]:
c = a + b

In [22]:
print(c.device)

cpu


In [23]:
# upload a and b to GPU
a = a.to('cuda')
b = b.to('cuda')

In [24]:
print(a.device)
print(b.device)

cuda:0
cuda:0


In [25]:
c = a + b

In [26]:
print(c.device)

cuda:0


In [27]:
c = c.to('cpu')

In [28]:
print(c.device)

cpu


## 3. Autograd

Central to all neural networks in PyTorch is the `autograd` package.

The `autograd` package provides automatic differentiation for all operations on Tensors.

`torch.Tensor` is the central class of the package. If you set its attribute `.requires_grad` as True, it starts to track all operations on it. When you finish your computation you can call `.backward()` and have all the gradients computed automatically. The gradient for this tensor will be accumulated into `.grad` attribute.

To stop a tensor from tracking history, you can call `.detach()` to detach it from the computation history, and to prevent future computation from being tracked.

### Example

In [36]:
x = torch.ones(2, 2, requires_grad=True)
print(x)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)


In [37]:
y = x + 2
print(y)

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)


In [38]:
z = y * y * 3
print(z)

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>)


In [39]:
out = z.mean()
print(out)

tensor(27., grad_fn=<MeanBackward0>)


In [40]:
y.retain_grad()
z.retain_grad()
out.backward()
# do backward path


![picture](https://drive.google.com/uc?id=1JyMWTbaU6ktJAHx2XqiU7s4tId-cxiLF)
![picture](https://drive.google.com/uc?id=17j-aNqj1yjZfVPCKZJRt6YVZ-7usf5PH)

In [41]:
print(z.grad)

tensor([[0.2500, 0.2500],
        [0.2500, 0.2500]])


![picture](https://drive.google.com/uc?id=1jPfdq6piSkkwZ21nX7kIBa-xGJE6uPBu)
![picture](https://drive.google.com/uc?id=1NN0kpdvRRP9NwguXJHnU3u8VikMFUKw2)

In [42]:
print(y.grad)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])


![picture](https://drive.google.com/uc?id=1HllHu2CxuNFX8mc6QdQEEtnXJ3Rvo6TE)
![picture](https://drive.google.com/uc?id=1jWJPOXVLG6mdUyDSklocNWPVa9Rg62K3)

In [43]:
print(x.grad)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])


### Efficient inference (testing) with torch.no_grad()

To prevent tracking history (and using memory), you can also wrap the code block in with `torch.no_grad()`

Situation: when **gradient calculation is not required** e.g., inference\
Solution: use `torch.no_grad()`, then torch doesn't generate computational graph for back propagation, therefore it is **much faster**

In [44]:
with torch.no_grad():
    x = torch.ones(2, 2, requires_grad=True)
    y = x + 2
    z = y * y * 3
    out = z.mean()

In [45]:
out

tensor(27.)

In [46]:
out.backward() ## ERROR!!!!: we used torch.no_grad()!!

RuntimeError: ignored

## 4. nn.Module

![picture](https://drive.google.com/uc?id=1Vu3oRATA-EWDycO2zVWkBdzndU-8C5cB)

### Using pre-defined modules (subset of models) in PyTorch

In [47]:
import torch.nn as nn

X = torch.tensor([[1., 2., 3.], [4., 5., 6.]])

print (X)
print (X.shape)

tensor([[1., 2., 3.],
        [4., 5., 6.]])
torch.Size([2, 3])


In [48]:
# input dim 3, output dim 1
linear_fn = nn.Linear(3, 1)

In [49]:
linear_fn  # WX + b

Linear(in_features=3, out_features=1, bias=True)

In [50]:
Y = linear_fn(X)
print(Y)
print(Y.shape)

tensor([[-0.6415],
        [-0.0831]], grad_fn=<AddmmBackward0>)
torch.Size([2, 1])


In [None]:
Y = Y.sum()
print(Y)

tensor(-2.3458, grad_fn=<SumBackward0>)


You can use other types of `nn.Module` in PyTorch

In [51]:
nn.Conv2d
nn.RNNCell
nn.LSTMCell
nn.GRUCell
nn.Transformer;

### How can we design a customized model (neural network)?

In [52]:
class Model(nn.Module):
    def __init__(self, input_dim, output_dim, hidden_dim):
        super(Model, self).__init__()
        self.linear_1 = nn.Linear(input_dim, hidden_dim)
        self.linear_2 = nn.Linear(hidden_dim, output_dim)
        self.relu = nn.ReLU()
    def forward(self, x):
        x = self.linear_1(x)
        x = self.relu(x) # Activation function
        x = self.linear_2(x)
        return x

**What is activation function?**
- They make non-linearity for deep neural networks
- Therefore, deep neural networks can approximate complex functions

In [53]:
nn.Sigmoid
nn.ReLU
nn.LeakyReLU
nn.Tanh;

## 5. MNIST classification with PyTorch (Logistic regression & MLP)

### What is MNIST & How to do multi-class classification?

The MNIST database of **handwritten digits from 0 to 9**, has a training set of 60,000 examples, and a test set of 10,000 examples.

Since we have 10 classes (0~9), current problem can be interpreted as **multinomial logistic regression** (**multi-class classification**).

Therefore, we use **softmax** function to handle multiple class output with **cross-entropy** loss function.

![picture](https://drive.google.com/uc?id=1v-QvM2MEMku6wWMb_8f8NIqIDzby7wJP)

### Load packages

In [54]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

from torch.utils.data import DataLoader

import torchvision
import torchvision.transforms as transforms

### Load datasets for training & testing

In [55]:
# Usually we split the data into test & test, but MNIST provides splited data
# MNIST dataset
train_dataset = torchvision.datasets.MNIST(root='./', train=True, transform=transforms.ToTensor(), download=True)
test_dataset = torchvision.datasets.MNIST(root='./', train=False, transform=transforms.ToTensor())

# Data loader
# mini batch size
train_loader = DataLoader(dataset=train_dataset, batch_size=128, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=128, shuffle=False)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 122209256.42it/s]

Extracting ./MNIST/raw/train-images-idx3-ubyte.gz to ./MNIST/raw






Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 114171247.71it/s]


Extracting ./MNIST/raw/train-labels-idx1-ubyte.gz to ./MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 52865299.89it/s]

Extracting ./MNIST/raw/t10k-images-idx3-ubyte.gz to ./MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz





Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 21143761.12it/s]


Extracting ./MNIST/raw/t10k-labels-idx1-ubyte.gz to ./MNIST/raw



### Define model (we will use one layer classifier first)

In [56]:
# Define model class
# This model has one hidden layer
class Multinomial_logistic_regression(nn.Module):
    def __init__(self, input_size, output_size):
        super(Multinomial_logistic_regression, self).__init__()
        self.fc = nn.Linear(input_size, output_size)

    def forward(self, x):
        out = self.fc(x)
        return out

In [57]:
# Generate model
model = Multinomial_logistic_regression(784, 10)  # init(784, 10)
# input dim: 784  / output dim: 10

In [59]:
model

Multinomial_logistic_regression(
  (fc): Linear(in_features=784, out_features=10, bias=True)
)

In [58]:
# Upload model to GPU
model = model.to('cuda')

### Define optimizer

Optimization is about finding the best solution (model parameter) that fits the given dataset!

PyTorch optimizer is about **which optimization methods to use for training**

We will not handle the details in this class. (take **"Optimization for AI (AI505)"** course)

In [75]:
# Optimizer define
# optimizer = torch.optim.SGD(model.parameters(), lr=0.05)
#optimizer = torch.optim.SGD(model.parameters(), lr=0.05, momentum=0.9)
optimizer = torch.optim.Adam(model.parameters(), lr=0.05)

![picture](https://drive.google.com/uc?id=1BvkB6O1hsGZ4YkD92k-E3I59omprN7qz)

### Train the model

In [61]:
# Loss function define (we use cross-entropy)
loss_fn = nn.CrossEntropyLoss()

#Train the model
total_step = len(train_loader)

for epoch in range(10):
    for i, (images, labels) in enumerate(train_loader):  # mini batch for loop
        # upload to gpu
        images = images.reshape(-1, 28*28).to('cuda')
        labels = labels.to('cuda')

        # Forward
        outputs = model(images)  # forwardI(images): get prediction
        loss = loss_fn(outputs, labels)  # calculate the loss (cross entropy loss) with ground truth & prediction value

        # Backward and optimize
        optimizer.zero_grad() # To prevent gradients' accumulation
        loss.backward()  # automatic gradient calculation (autograd)
        optimizer.step()  # update model parameter with requires_grad=True

        if (i+1) % 100 == 0:
            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
                   .format(epoch+1, 10, i+1, total_step, loss.item()))

Epoch [1/10], Step [100/469], Loss: 0.3389
Epoch [1/10], Step [200/469], Loss: 0.2541
Epoch [1/10], Step [300/469], Loss: 0.2933
Epoch [1/10], Step [400/469], Loss: 0.3973
Epoch [2/10], Step [100/469], Loss: 0.3464
Epoch [2/10], Step [200/469], Loss: 0.3730
Epoch [2/10], Step [300/469], Loss: 0.3668
Epoch [2/10], Step [400/469], Loss: 0.3443
Epoch [3/10], Step [100/469], Loss: 0.4333
Epoch [3/10], Step [200/469], Loss: 0.1800
Epoch [3/10], Step [300/469], Loss: 0.2343
Epoch [3/10], Step [400/469], Loss: 0.2547
Epoch [4/10], Step [100/469], Loss: 0.2499
Epoch [4/10], Step [200/469], Loss: 0.2613
Epoch [4/10], Step [300/469], Loss: 0.2172
Epoch [4/10], Step [400/469], Loss: 0.3166
Epoch [5/10], Step [100/469], Loss: 0.1807
Epoch [5/10], Step [200/469], Loss: 0.2085
Epoch [5/10], Step [300/469], Loss: 0.2614
Epoch [5/10], Step [400/469], Loss: 0.4689
Epoch [6/10], Step [100/469], Loss: 0.2504
Epoch [6/10], Step [200/469], Loss: 0.1757
Epoch [6/10], Step [300/469], Loss: 0.3342
Epoch [6/10

### Test the model

In [62]:
# Test the model
# In test phase, we don't need to compute gradients (for memory efficiency)
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.reshape(-1, 28*28).to('cuda')
        labels = labels.to('cuda')
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)  #because it is softmax value, classification -> get the label prediction of top 1
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Accuracy of the network on the 10000 test images: {} %'.format(100 * correct / total))

Accuracy of the network on the 10000 test images: 92.36 %


### New model: MLP (multi-layer-perceptron)

Previous model used multinomial logistic regression (one linear layer)\
What if we use **MLP (multi-layer-perceptron)?** A neural network with hidden layers?

In [85]:
# New model with multi layer
class NeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(NeuralNet, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, hidden_size)
        self.fc3 = nn.Linear(hidden_size, output_size)
        #self.fc4 = nn.Linear(hidden_size, output_size)

        self.sigmoid = nn.Sigmoid()  # sigmoid activation function (you can customize)

    def forward(self, x):
        out = self.fc1(x)
        out = self.sigmoid(out)
        out = self.fc2(out)
        out = self.sigmoid(out)
        out = self.fc3(out)
        #out = self.sigmoid(out)
        #out = self.fc4(out)
        return out

In [88]:
# Generate model
model = NeuralNet(784, 15, 10)  # init(784, 20, 10)
# input dim: 784  / hidden dim: 20  / output dim: 10

# Upload model to GPU
model = model.to('cuda')

# Loss function define (we use cross-entropy)
loss_fn = nn.CrossEntropyLoss()

# Define optimizer
# optimizer = torch.optim.SGD(model.parameters(), lr=0.05)
optimizer = torch.optim.SGD(model.parameters(), lr=0.07, momentum=0.8)
# optimizer = torch.optim.Adam(model.parameters(), lr=0.05)

# Train the model
total_step = len(train_loader)

for epoch in range(10):
    for i, (images, labels) in enumerate(train_loader):  # mini batch for loop
        # upload to gpu
        images = images.reshape(-1, 28*28).to('cuda')
        labels = labels.to('cuda')

        # Forward
        outputs = model(images)  # forwardI(images): get prediction
        loss = loss_fn(outputs, labels)  # calculate the loss (cross entropy loss) with ground truth & prediction value

        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()  # automatic gradient calculation (autograd)
        optimizer.step()  # update model parameter with requires_grad=True

        if (i+1) % 100 == 0:
            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
                   .format(epoch+1, 10, i+1, total_step, loss.item()))

Epoch [1/10], Step [100/469], Loss: 2.2830
Epoch [1/10], Step [200/469], Loss: 2.0924
Epoch [1/10], Step [300/469], Loss: 1.6016
Epoch [1/10], Step [400/469], Loss: 1.3453
Epoch [2/10], Step [100/469], Loss: 1.0782
Epoch [2/10], Step [200/469], Loss: 0.9323
Epoch [2/10], Step [300/469], Loss: 0.7414
Epoch [2/10], Step [400/469], Loss: 0.6095
Epoch [3/10], Step [100/469], Loss: 0.5381
Epoch [3/10], Step [200/469], Loss: 0.5030
Epoch [3/10], Step [300/469], Loss: 0.3988
Epoch [3/10], Step [400/469], Loss: 0.4370
Epoch [4/10], Step [100/469], Loss: 0.4054
Epoch [4/10], Step [200/469], Loss: 0.4277
Epoch [4/10], Step [300/469], Loss: 0.2873
Epoch [4/10], Step [400/469], Loss: 0.4587
Epoch [5/10], Step [100/469], Loss: 0.3111
Epoch [5/10], Step [200/469], Loss: 0.3448
Epoch [5/10], Step [300/469], Loss: 0.2499
Epoch [5/10], Step [400/469], Loss: 0.3648
Epoch [6/10], Step [100/469], Loss: 0.3423
Epoch [6/10], Step [200/469], Loss: 0.1940
Epoch [6/10], Step [300/469], Loss: 0.2347
Epoch [6/10

In [87]:
# Test the model
# In test phase, we don't need to compute gradients (for memory efficiency)
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.reshape(-1, 28*28).to('cuda')
        labels = labels.to('cuda')
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)  # classification -> get the label prediction of top 1
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Accuracy of the network on the 10000 test images: {} %'.format(100 * correct / total))

Accuracy of the network on the 10000 test images: 93.01 %


### Change the following options to obtain better accuracy!! (try it by your-self)

#### (1) Model configurations:
- size of hidden layer units
- number of layers
- type of activation function (e.g., relu, tanh, softplus etc.)

#### (2) Optimization configurations
- learning rate
- epoch
- type of optimizer
- momentem hyperparameter