# Multinomial Logistic Regression

More infos can be found on [Edward Choi youtube](https://www.youtube.com/@mp2893/featured) channel\
KAIST AI504

## Load packages

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

from torch.utils.data import DataLoader

import torchvision
import torchvision.transforms as transforms

In [2]:
# set gpu by number 
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'  # setting gpu number

In [3]:
# print the version of PyTorch
print(torch.__version__)

2.0.1+cu117


![Image Alt Text](img/MLPScheme.png)

## Multinomial Logistic Regression

The MNIST database of **handwritten digits from 0 to 9**, has a training set of 60,000 examples, and a test set of 10,000 examples.\
Since we have 10 classes (0~9), current problem can be interpreted as **multinomial logistic regression (multi-class classification)**.\
Therefore, we use **softmax** function to handle multiple class output with **cross-entropy** loss function.

![Image Alt Text](img/multinominalLogisticRegression.png)

## Loading MNIST

In [4]:
# MNIST dataset 
train_dataset = torchvision.datasets.MNIST(root='../', train=True, transform=transforms.ToTensor(), download=True)
test_dataset = torchvision.datasets.MNIST(root='../', train=False, transform=transforms.ToTensor())

# Data loader
# mini batch size
train_loader = DataLoader(dataset=train_dataset, batch_size=128, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=128, shuffle=False)

## Model

In [5]:
# Define model class
# This model has one hidden layer
class Multinomial_logistic_regression(nn.Module):
    def __init__(self, input_size, output_size):
        super(Multinomial_logistic_regression, self).__init__()
        self.fc = nn.Linear(input_size, output_size) 
        
    def forward(self, x):
        out = self.fc(x)
        return out

In [6]:
# Generate model
model = Multinomial_logistic_regression(784, 10)  # init(784, 10)
# input dim: 784  / output dim: 10

In [7]:
model

Multinomial_logistic_regression(
  (fc): Linear(in_features=784, out_features=10, bias=True)
)

In [8]:
# Upload model to GPU
model = model.to('cuda')

In [9]:
# Optimizer define
# optimizer = torch.optim.SGD(model.parameters(), lr=0.05) 
optimizer = torch.optim.SGD(model.parameters(), lr=0.05, momentum=0.9)
# toptimizer = orch.optim.Adam(model.parameters(), lr=0.05)

## Training

In [10]:
# Loss function define (we use cross-entropy)
loss_fn = nn.CrossEntropyLoss()

#Train the model
total_step = len(train_loader)

for epoch in range(10):
    for i, (images, labels) in enumerate(train_loader):  # mini batch for loop
        # upload to gpu
        images = images.reshape(-1, 28*28).to('cuda')
        labels = labels.to('cuda')
        
        # Forward
        outputs = model(images)  # forwardI(images): get prediction
        loss = loss_fn(outputs, labels)  # calculate the loss (cross entropy loss) with ground truth & prediction value
        
        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()  # automatic gradient calculation (autograd)
        optimizer.step()  # update model parameter with requires_grad=True 
        
        if (i+1) % 100 == 0:
            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
                   .format(epoch+1, 10, i+1, total_step, loss.item()))

Epoch [1/10], Step [100/469], Loss: 0.4186
Epoch [1/10], Step [200/469], Loss: 0.3590
Epoch [1/10], Step [300/469], Loss: 0.4131
Epoch [1/10], Step [400/469], Loss: 0.4002
Epoch [2/10], Step [100/469], Loss: 0.3498
Epoch [2/10], Step [200/469], Loss: 0.4046
Epoch [2/10], Step [300/469], Loss: 0.2372
Epoch [2/10], Step [400/469], Loss: 0.4787
Epoch [3/10], Step [100/469], Loss: 0.2980
Epoch [3/10], Step [200/469], Loss: 0.1731
Epoch [3/10], Step [300/469], Loss: 0.4050
Epoch [3/10], Step [400/469], Loss: 0.3320
Epoch [4/10], Step [100/469], Loss: 0.2626
Epoch [4/10], Step [200/469], Loss: 0.1943
Epoch [4/10], Step [300/469], Loss: 0.2374
Epoch [4/10], Step [400/469], Loss: 0.4377
Epoch [5/10], Step [100/469], Loss: 0.3977
Epoch [5/10], Step [200/469], Loss: 0.2471
Epoch [5/10], Step [300/469], Loss: 0.2397
Epoch [5/10], Step [400/469], Loss: 0.3813
Epoch [6/10], Step [100/469], Loss: 0.2690
Epoch [6/10], Step [200/469], Loss: 0.2501
Epoch [6/10], Step [300/469], Loss: 0.2219
Epoch [6/10

## Testing

In [11]:
# Test the model
# In test phase, we don't need to compute gradients (for memory efficiency)
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.reshape(-1, 28*28).to('cuda')
        labels = labels.to('cuda')
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)  # classification -> get the label prediction of top 1 
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Accuracy of the network on the 10000 test images: {} %'.format(100 * correct / total))

Accuracy of the network on the 10000 test images: 92.55 %
