# HW6 - Character Classification
Anirudh Lath | CS6017 | July 24, 2022
In this assignment we'll tackle a slightly more complicated image classification problem than MNIST digit classification. We're going to classify characters that contain (gasp!) letters!

The dataset we'll play with is from University of California, Irvine (UCI) and contains a bunch of images of letters of various fonts. Some printed + scanned, some the values screen-capped from a computer. The images are 20x20 pixels, grayscale.

## Step 1: Data Acquisition + Cleanup
Import the data for Arial font

In [6]:
import torchvision
import torch

import numpy as np
import pandas as pd
import torch.nn as nn
import torch.nn.functional as F
import torchvision.transforms as transforms
import torch.optim as optim

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5),(0.5))]) #convert from images to tensors
mnist_test  = torchvision.datasets.MNIST( "./mnist", train=False, download=True, transform=transform )

# setting device on GPU if available, else CPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)
print()


#Additional Info when using cuda
if device.type == 'cuda':
    print(torch.cuda.get_device_name(0))
    print('Memory Usage:')
    print('Allocated:', round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
    print('Cached:   ', round(torch.cuda.memory_cached(0)/1024**3,1), 'GB')

df = pd.read_csv('fonts/ARIAL.csv')
df.head()


Using device: cuda

NVIDIA GeForce RTX 3080
Memory Usage:
Allocated: 0.0 GB
Cached:    0.0 GB




FileNotFoundError: [Errno 2] No such file or directory: 'fonts/ARIAL.csv'

Drop all columns except m_label and the pixel values which are scattered across 400 columns labeled rxcy (where x and y are the row and column numbers that range from 0 to 19).

In [2]:
df.drop(columns=['font', 'fontVariant', 'strength', 'italic', 'orientation', 'm_top', 'm_left', 'originalH', 'originalW', 'h', 'w'], inplace=True)
df

Unnamed: 0,m_label,r0c0,r0c1,r0c2,r0c3,r0c4,r0c5,r0c6,r0c7,r0c8,...,r19c10,r19c11,r19c12,r19c13,r19c14,r19c15,r19c16,r19c17,r19c18,r19c19
0,61442,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
1,61441,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
2,61440,255,123,123,123,123,123,123,123,123,...,123,123,123,123,123,123,123,123,123,255
3,9674,1,1,1,1,1,1,1,1,64,...,255,192,118,10,1,1,1,1,1,1
4,8805,46,176,238,203,80,80,53,1,1,...,80,80,80,197,93,158,255,255,255,67
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
975,37,1,1,1,1,1,1,1,39,125,...,222,229,125,42,1,1,1,1,1,1
976,36,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
977,35,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
978,34,1,1,1,1,1,1,192,255,255,...,255,64,1,1,1,1,1,1,1,1


Now, write a function that takes in one of these types of dataframe and returns 2 numpy arrays: Xs which is a #samples x 20 x 20 array containing the pixel values, and Ys which is a #samples x 1 array containing the ascii vales for each character. You should normalize the Xs array so the values go from 0-1 (most likely this requires dividing by 255).

In [3]:
def extract_data(df):
    #Xs which is a #samples x 20 x 20 array containing the pixel values
    X = df.drop(columns='m_label').to_numpy(dtype=np.float64)
    X = np.array([x.reshape(20, 20) for x in X], dtype=np.float64) / 255
    X = np.reshape(X, (-1, 1, 20, 20))

    #Ys
    Y_data = df["m_label"].to_numpy()
    keys, Y = np.unique(Y_data, return_inverse=True)
    # Y = np.array(Y_data)

    return X, Y , keys

X, Y, keys = extract_data(df)
Y

array([244, 243, 242, 241, 240, 239, 238, 237, 236, 235, 234, 233, 232,
       231, 230, 229, 228, 227, 226, 225, 224, 223, 222, 221, 220, 219,
       218, 217, 216, 215, 214, 213, 212, 211, 210, 209, 208, 207, 206,
       205, 204, 203, 202, 201, 200, 199, 198, 197, 196, 195, 194, 193,
       192, 191, 190, 189, 188, 187, 186, 185, 184, 183, 182, 181, 180,
       179, 178, 177, 176, 175, 174, 173, 172, 171, 170, 169, 168, 167,
       166, 165, 164, 163, 162, 161, 160, 159, 158, 157, 156, 155, 154,
       153, 152, 151, 150, 149, 148, 147, 146, 145, 144, 143, 142, 141,
       140, 139, 138, 137, 136, 135, 134, 133, 132, 131, 130, 129, 128,
       127, 126, 125, 124, 123, 122, 121, 120, 119, 118, 117, 116, 115,
       114, 113, 112, 111, 110, 109, 108, 107, 106, 105, 104, 103, 102,
       101, 100,  99,  98,  97,  96,  95,  94,  93,  92,  91,  90,  89,
        88,  87,  86,  85,  84,  83,  82,  81,  80,  79,  78,  77,  76,
        75,  74,  73,  72,  71,  70,  69,  68,  67,  66,  65,  6

## Step 2: Build a PyTorch Network
We're going to use the PyTorch library, like we've seen in class, to build/train our network. Check out the notebooks we've made in class or the official documentation/tutorials.

To start with, we're going to use a model very similar to the MNIST CNN we used in class. It will consist of:

* a Convolution2D layer with ReLU activations
* a max pooling layer
* another convolution layer
* another max pooling layer
* a dense layer with relu activation
* a dense layer

Compile and train your network like we did in class. You'll probably have to use the np.reshape() function on your data to make PyTorch happy. I reshaped my X values like np.reshape(Xs, (-1, 1, 20, 20)) to get them in the right format.

For training, you'll want to check out torch.utils.data.DataLoader which can take a TensorDataset so you can iterate over batches like we did in class for the MNIST data.

In [4]:
class network1(nn.Module):
    def __init__(self):
        super(network1, self).__init__()

        self.convolution1 = nn.Conv2d(1, 8, 3)
        self.pooling1 = nn.MaxPool2d(2, 2)
        self.dense1 = nn.Linear(576, 4000)

        self.convolution2 = nn.Conv2d(8, 64, 3)
        self.pooling2 = nn.MaxPool2d(2, 2)
        self.dense2 = nn.Linear(4000, 3097)


    def forward(self, x):
        x = self.pooling1(F.relu(self.convolution1(x)))
        x = self.pooling2(F.relu(self.convolution2(x)))

        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.dense1(x))
        x = self.dense2(x)
        return x

    def num_flat_features(self, x):
        # Exclude Batch Dimension
        size = x.size()[1:]
        num_features = 1
        for s in size:
            num_features *= s
        return num_features


network1 = network1()

def train( model, epochs, data, labels ):
    # model.to(device)

    criterion = nn.CrossEntropyLoss()

    # use the optimiser to find weights
    optimizer = optim.Adam( model.parameters(), lr= 1e-4 )

    model.float()

    for epoch in range( epochs ):

        running_loss = 0.0
        optimizer.zero_grad()

        outputs = model(data.float()) # Predict outputs
        loss = criterion(outputs, labels) # Check the predictions accuracy

        loss.backward() # Calculate new weights
        optimizer.step() # Change weights and try again

        running_loss += loss.item()

    print('Model has been trained!')

def evaluate( model, data, labels ):
    #load some test data
    correct = 0
    total = 0

    with torch.no_grad(): # Don't calculate gradients as it's not necessary here.

        outputs = model(data.float())

        _, predicted = torch.max(outputs.data, 1)

        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print( 'Accuracy of the network on Arial Font: %d %%' % (100 * correct / total))

### Train the model

In [5]:
data = torch.from_numpy(X)
labels = torch.from_numpy(Y)
# data, labels = data.to(device), labels.to(device)

print(min(labels))
print(max(labels))

print("Training the model, please wait...")
train(network1, 15, data, labels)
print("Training complete.")

tensor(0)
tensor(244)
Training the model, please wait...
Model has been trained!
Training complete.
