# CS 583-C-Homework 3
## Neural networks

---

### ***Fill your details below***
### Name: Corey Heckel
### CWID: 10462028
### Email ID: checkel
### References: ***Cite your references here***


---
### Submission guidelines:

#### 1. Submit this notebook along with its PDF version. You can do this by clicking File->Print->"Save as PDF"

#### 2. Name the file as "<mailID_HWnumber.extension>".  

For example, mailID is abcdefg @stevens.edu then name the files as abcdefg_HW1.ipynb and abcdefg_HW1.pdf.

#### 3. Please do not Zip your files.

---

### Installing Pytorch

In [2]:
!pip install torch
!pip install torchvision

SyntaxError: invalid syntax (87847310.py, line 1)

### Q1: (50 points)
#### Design a vision transformer model and obtain an test accuracy above 95%.

In [49]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score

In [50]:
class VisionTransformer(nn.Module):
    def __init__(self, image_size=28, patch_size=7, num_classes=10, embed_dim=256, num_heads=8, num_layers=6, mlp_dim=512):
        super(VisionTransformer, self).__init__()
        
        # Parameters
        self.image_size = image_size
        self.patch_size = patch_size
        self.num_classes = num_classes
        self.embed_dim = embed_dim
        self.num_heads = num_heads
        self.num_layers = num_layers
        self.mlp_dim = mlp_dim
        
        self.patch_embed = nn.Conv2d(in_channels=1, out_channels=self.embed_dim, kernel_size=self.patch_size, stride=self.patch_size)
        
        self.encoder_layers = nn.TransformerEncoderLayer(
            d_model=self.embed_dim, nhead=self.num_heads, dim_feedforward=self.mlp_dim
        )
        self.transformer_encoder = nn.TransformerEncoder(self.encoder_layers, num_layers=self.num_layers)
        
        self.fc = nn.Linear(self.embed_dim, self.num_classes)

    def forward(self, x):
        x = self.patch_embed(x) 
        x = x.flatten(2).transpose(1, 2) 
        
        x = self.transformer_encoder(x) 
        
        x = x.mean(dim=1)  

        x = self.fc(x)
        return x



In [51]:
transform = transforms.Compose([
    transforms.Resize(28),
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])
train_data = datasets.MNIST(root='./data', train=True, transform=transform, download=True)
test_data = datasets.MNIST(root='./data', train=False, transform=transform, download=True)
train_loader = DataLoader(train_data, batch_size=100, shuffle=True)
test_loader = DataLoader(test_data, batch_size=100, shuffle=False)

model = VisionTransformer()
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-4)

In [52]:
def train(model, train_loader, criterion, optimizer, device):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        
        optimizer.zero_grad()
        
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        _, predicted = torch.max(outputs, 1)
        correct += (predicted == labels).sum().item()
        total += labels.size(0)
    
    accuracy = 100 * correct / total
    return running_loss / len(train_loader), accuracy


In [53]:
def test(model, test_loader, criterion, device):
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in test_loader:
            images, labels = images.to(device), labels.to(device)
            
            outputs = model(images)
            _, predicted = torch.max(outputs, 1)
            correct += (predicted == labels).sum().item()
            total += labels.size(0)
    
    accuracy = 100 * correct / total
    return accuracy


In [54]:
num_epochs = 10
for epoch in range(num_epochs):
    train_loss, train_accuracy = train(model, train_loader, criterion, optimizer, device)
    test_accuracy = test(model, test_loader, criterion, device)
    
    print(f"Epoch {epoch+1}/{num_epochs}, Loss: {train_loss:.4f}, Train Accuracy: {train_accuracy:.2f}%, Test Accuracy: {test_accuracy:.2f}%")


Epoch 1/10, Loss: 1.1057, Train Accuracy: 70.38%, Test Accuracy: 90.59%
Epoch 2/10, Loss: 0.3578, Train Accuracy: 92.30%, Test Accuracy: 93.38%
Epoch 3/10, Loss: 0.2252, Train Accuracy: 94.62%, Test Accuracy: 95.79%
Epoch 4/10, Loss: 0.1706, Train Accuracy: 95.73%, Test Accuracy: 95.99%
Epoch 5/10, Loss: 0.1396, Train Accuracy: 96.35%, Test Accuracy: 96.86%
Epoch 6/10, Loss: 0.1169, Train Accuracy: 96.87%, Test Accuracy: 96.77%
Epoch 7/10, Loss: 0.1036, Train Accuracy: 97.13%, Test Accuracy: 97.11%
Epoch 8/10, Loss: 0.0895, Train Accuracy: 97.54%, Test Accuracy: 97.06%
Epoch 9/10, Loss: 0.0804, Train Accuracy: 97.69%, Test Accuracy: 97.19%
Epoch 10/10, Loss: 0.0718, Train Accuracy: 97.98%, Test Accuracy: 97.29%


### Q2: (50 points)
#### Compared the accuracy with the accuracy of HW1.
#### Discuss vision transformer's performance vs your network in HW1.

The test accuracy of the vision transformer is fairly similar to the network in HW1. In HW1 the accuracy was 97.76% and the vision transformer had an accuracy of 97.29%. It seems like with little change in test accuracy between the two models, it was observed that the vision transformer took significantly longer to run than the neural network due to its complex nature. That being said, it can be argued that a simpler model such as a neural network may be sufficient for some tasks such as optical character recognition while vision transformers may be better suited for much larger datasets where the advantages of the model could be observed better.