## Assignment Q1
### Yaxni Li; Ruiyang Chen; Haoze Wang

MNIST is a classic dataset that has become relatively easy to work with, thanks to today's advanced hardware capabilities. We trained a fully connected neural network to predict the digits, and since this type of network already achieves high accuracy, we believe that more complex networks would offer minimal improvement for this task. Therefore, there is no need to use other network types.

### 1.Import the related Libs
In this work, we use pytorch, so we import pytorch.And we also import some sub sets of it for easier code writing. 

In [1]:
from torchvision.datasets import MNIST
import torch.nn.functional as F
import torch.nn as nn
import torch
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

### 2. Hyperparameter configuration.

We set learning rate, batch size, and number of neurons per hidden layer.

In [2]:
# Config
LEARNING_RATE = 0.01
BATCH_SIZE = 64
HIDDEN1 = 300
HIDDEN2 = 200

### 3. Define our fully connected NN

According to the classical dataset, MNIST, the input size was set to 28 * 28. And we pointed out the output size should be 10, because we have 10 possible result(from 0 to 9). And the hidden layers were based on the hyperparameters we just set.

In [3]:
class MyNN(nn.Module):
    def __init__(self) -> None:
        super(MyNN, self).__init__()
        self.input_to_hidden1 = nn.Linear(28 * 28, HIDDEN1)
        self.hidden1_to_hidden2 = nn.Linear(HIDDEN1, HIDDEN2)
        self.hidden2_to_output = nn.Linear(HIDDEN2, 10)

    def forward(self, x):
        x = x.view(-1, 28 * 28)
        x = F.relu(self.input_to_hidden1(x))
        x = F.relu(self.hidden1_to_hidden2(x))
        x = self.hidden2_to_output(x)
        return x


### 4. Set the tranfroms class

In [4]:
transform = transforms.Compose([
    # transforms.Resize(32),
    transforms.ToTensor(),
    transforms.Normalize((0.5), (0.5))
])

### 5. Load the data

Load the train set.

In [5]:
train_set = MNIST(root = "./", train = True, download = True, transform = transform)
train_loader = DataLoader(train_set, batch_size = BATCH_SIZE, shuffle = True)

Load the test set.

In [6]:
test_dataset = MNIST('./', train = False, transform = transform)
test_loader = DataLoader(test_dataset, batch_size = BATCH_SIZE, shuffle = True)

### 6. Train our model

In [7]:
model = MyNN()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr = LEARNING_RATE, momentum = 0.9)

num_epochs = 5
for epoch in range(num_epochs):
    running_loss = 0.0
    for i, data in enumerate(train_loader, 0):
        inputs, laberls = data
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, laberls)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
        if i % 100 == 99:
            print("[%d, %5d] loss: %.3f" % (epoch + 1, i + 1, running_loss / 100))
            running_loss = 0.0

[1,   100] loss: 1.267
[1,   200] loss: 0.460
[1,   300] loss: 0.390
[1,   400] loss: 0.332
[1,   500] loss: 0.298
[1,   600] loss: 0.264
[1,   700] loss: 0.246
[1,   800] loss: 0.226
[1,   900] loss: 0.206
[2,   100] loss: 0.189
[2,   200] loss: 0.164
[2,   300] loss: 0.176
[2,   400] loss: 0.162
[2,   500] loss: 0.164
[2,   600] loss: 0.161
[2,   700] loss: 0.146
[2,   800] loss: 0.170
[2,   900] loss: 0.146
[3,   100] loss: 0.117
[3,   200] loss: 0.114
[3,   300] loss: 0.122
[3,   400] loss: 0.119
[3,   500] loss: 0.112
[3,   600] loss: 0.111
[3,   700] loss: 0.106
[3,   800] loss: 0.117
[3,   900] loss: 0.099
[4,   100] loss: 0.092
[4,   200] loss: 0.094
[4,   300] loss: 0.086
[4,   400] loss: 0.084
[4,   500] loss: 0.095
[4,   600] loss: 0.091
[4,   700] loss: 0.080
[4,   800] loss: 0.092
[4,   900] loss: 0.091
[5,   100] loss: 0.060
[5,   200] loss: 0.073
[5,   300] loss: 0.067
[5,   400] loss: 0.074
[5,   500] loss: 0.068
[5,   600] loss: 0.089
[5,   700] loss: 0.086
[5,   800] 

### 7.Test our model

Our test code print the accurate rate on console.

In [8]:
# Test
correct = 0
total = 0
with torch.no_grad():
    for data in test_loader:
        images, labels = data
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('The accuracy of the network on the test set: %d %%' % (100 * correct / total))

The accuracy of the network on the test set: 97 %
