# Sign Language Dataset

The Sign Language Dataset consists of 9680 grayscale images of hand signs for the digits 0-9 and the alphabets a-z. Thus, this is a multiclass classification problem with 36 classes. Your task is to build a machine learning model that can accurately classify images from this dataset.

## Loading the dataset

You **do not** need to upload any data. Both the visible training dataset and the hidden test dataset are already available on the Jupyter hub.

In [1]:
import os
import csv
import cv2
import random
import numpy as np
import matplotlib.pyplot as plt

In [2]:
# Setting the path of the training dataset (that was already provided to you)

running_local = True if os.getenv('JUPYTERHUB_USER') is None else False
DATASET_PATH = "."

# Set the location of the dataset
if running_local:
    # If running on your local machine, the sign_lang_train folder's path should be specified here
    local_path = "sign_lang_train"
    if os.path.exists(local_path):
        DATASET_PATH = local_path
else:
    # If running on the Jupyter hub, this data folder is already available
    # You DO NOT need to upload the data!
        DATASET_PATH = "/data/mlproject21/sign_lang_train"

In [3]:
# Utility function

def read_csv(csv_file):
    with open(csv_file, newline='') as f:
        reader = csv.reader(f)
        data = list(reader)
    return data

## Data Loading using PyTorch

For creating and training your model, you can work with any machine learning library of your choice. 

If you choose to work with [PyTorch](https://pytorch.org/), you will need to create your own [Dataset](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) class for loading the data. This is provided below. See [here](https://pytorch.org/tutorials/beginner/data_loading_tutorial.html) for a nice example of how to create a custom data loading pipeline in PyTorch. 

In [4]:
import torch
from torch.utils.data import Dataset, DataLoader, random_split
from torchvision import transforms, utils, io
from torchvision.utils import make_grid

from string import ascii_lowercase

class SignLangDataset(Dataset):
    """Sign language dataset"""

    def __init__(self, csv_file, root_dir, class_index_map=None, transform=None):
        """
        Args:
            csv_file (string): Path to the csv file with annotations.
            root_dir (string): Directory with all the images.
            transform (callable, optional): Optional transform to be applied on a sample.
        """
        self.data = read_csv(os.path.join(root_dir,csv_file))
        self.root_dir = root_dir
        self.class_index_map = class_index_map
        self.transform = transform
        # List of class names in order
        self.class_names = list(map(str, list(range(10)))) + list(ascii_lowercase)

    def __len__(self):
        """
        Calculates the length of the dataset-
        """
        return len(self.data)

    def __getitem__(self, idx):
        """
        Returns one sample (dict consisting of an image and its label)
        """
        if torch.is_tensor(idx):
            idx = idx.tolist()

        # Read the image and labels
        image_path = os.path.join(self.root_dir, self.data[idx][1])
        image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
        # Shape of the image should be H,W,C where C=1
        image = np.expand_dims(image, 0)
        # The label is the index of the class name in the list ['0','1',...,'9','a','b',...'z']
        # because we should have integer labels in the range 0-35 (for 36 classes)
        label = self.class_names.index(self.data[idx][0])
                
        sample = {'image': image, 'label': label}

        if self.transform:
            sample = self.transform(sample)

        return sample

## Prepare Dataset and Dataloaders for training and testing the network

In [5]:
# Create a Dataset object
sign_lang_dataset = SignLangDataset(csv_file="labels.csv", root_dir=DATASET_PATH)#, transform=transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,),(0.3081,))]))

# Size of the entire dataset
data_len = len(sign_lang_dataset)

# What percentage of the dataset to use for training
# The remaining images will go into the validation set
train_ratio = 0.8

# Calculate the size of the training and validation sets
train_size = int(train_ratio * data_len)
val_size = data_len - train_size

# Create Dataset objects for training and validation
train_dataset, val_dataset = random_split(sign_lang_dataset, [train_size, val_size])

# Create Dataloader objects for training and validation
train_dataloader = DataLoader(train_dataset, 
                              batch_size=64,
                              shuffle=True, 
                              num_workers=0)

val_dataloader = DataLoader(val_dataset, 
                            batch_size=64,
                            shuffle=True, 
                            num_workers=0)

## Definition of our ANN

In the following cell we define our artificial neural network.

In [17]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchsummary import summary
from tqdm import tqdm


class Net(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(Net, self).__init__()
        self.hidden_size = hidden_size
        self.input_size = input_size
        self.output_size = output_size
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, hidden_size)
        self.fc3 = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        # Follow these steps:
        #
        # Flatten the input x keeping the batch dimension the same
        # Use the relu activation on the output of self.fc1(x)
        # Use the relu activation on the output of self.fc2(x)
        # Pass x through fc3 but do not apply any activation function (think why not?)
        
        
        # YOUR CODE HERE (please remove 'raise NotImplementedError()')
        #print(self.input_size)
        #print(x.shape)
        x = x.view(-1, self.input_size)
        #print(x.shape)
        x = F.relu(self.fc1(x))
        #print(x.shape)
        x = F.relu(self.fc2(x))
        #print(x.shape)
        x = self.fc3(x)
        #print(x.shape)
        
        return x  # Return x (logits)

## Definition of hyperparameters (TODO: GRID SEARCH FOR BETTER PARAMETERS)

In [18]:
LEARNING_RATE = 0.008 #0.005 -> 0.78 trainaccuracy, 0.72 testaccuracy 
INPUT_SIZE = 16384 #Size of one image (128 x 128)
OUTPUT_SIZE = 36 #number of different labels
HIDDEN_SIZE = 400 #bigger then 400 seems useless with the current params
MOMENT = 0.94
NUM_EPOCHS = 10

## Function to create and initialize a fresh neural network

In [19]:
def create_and_init_ann():
    ann = Net(INPUT_SIZE, HIDDEN_SIZE, OUTPUT_SIZE)
    crit = nn.CrossEntropyLoss()
    opti = optim.SGD(ann.parameters(), lr=LEARNING_RATE, momentum=MOMENT)
    return ann, crit, opti

## Function to train the network

In [20]:
def train_neural_network_pytorch_minibatch(net, train_loader, optimizer, criterion, num_epochs):
    net.train()
    for epoch in range(num_epochs):
        for batch_idx, data in enumerate(tqdm(train_loader)):    
            #print(train_loader.dataset)
            optimizer.zero_grad()
            outputs = net((data['image']/255))
            #print(data['image'].shape)
            #print(len(data['label']))
            loss = criterion(outputs, data['label'])
            #print(loss)
            loss.backward()
            optimizer.step()

## Function to save the trained network to disk

In [21]:
def save_net(net):
    torch.save(net.state_dict(), "saved_model.pt")

In [23]:
ann, crit, opti = create_and_init_ann()
train_neural_network_pytorch_minibatch(ann, train_dataloader, opti, crit, NUM_EPOCHS)


  0%|          | 0/121 [00:00<?, ?it/s][A
  2%|▏         | 2/121 [00:00<00:08, 13.98it/s][A
  3%|▎         | 4/121 [00:00<00:07, 14.66it/s][A
  5%|▍         | 6/121 [00:00<00:07, 15.01it/s][A
  7%|▋         | 8/121 [00:00<00:07, 15.18it/s][A
  8%|▊         | 10/121 [00:00<00:07, 15.37it/s][A
 10%|▉         | 12/121 [00:00<00:07, 15.54it/s][A
 12%|█▏        | 14/121 [00:00<00:06, 15.46it/s][A
 13%|█▎        | 16/121 [00:01<00:06, 15.29it/s][A
 15%|█▍        | 18/121 [00:01<00:06, 15.22it/s][A
 17%|█▋        | 20/121 [00:01<00:06, 14.84it/s][A
 18%|█▊        | 22/121 [00:01<00:06, 15.08it/s][A
 20%|█▉        | 24/121 [00:01<00:06, 15.07it/s][A
 21%|██▏       | 26/121 [00:01<00:06, 14.95it/s][A
 23%|██▎       | 28/121 [00:01<00:06, 14.74it/s][A
 25%|██▍       | 30/121 [00:02<00:06, 14.73it/s][A
 26%|██▋       | 32/121 [00:02<00:06, 14.70it/s][A
 28%|██▊       | 34/121 [00:02<00:05, 14.88it/s][A
 30%|██▉       | 36/121 [00:02<00:05, 15.25it/s][A
 31%|███▏      | 38/121 

KeyboardInterrupt: 

## Prediction Stub

You will need to provide a function that can be used to make predictions using your final trained model. 

**IMPORTANT**

1. The name of your prediction function must be `leader_board_predict_fn`
2. Your prediction function should be able take as input a 4-D numpy array of shape [batch_size,1,128,128] and produce predictions in the form of a 1-D numpy array of shape [batch_size,]. 
3. Predictions for each image should be an integer in the range 0-35, that is `0` for the digit $0$, `1` for the digit $1$, .... , `9` for the digit $9$, `10` for the letter $a$, `11` for the letter $b$, ..., `35` for the letter $z$.
4. Your prediction function should internally load your trained model and take care of any data transformations that you need.

Below we provide an implementation of the `leader_board_predict_fn` function, in which we show how a trained model can be loaded (from the weights saved on the disk) for making predictions. This example is for PyTorch, but you are free to use any framework of your choice for your model. The only requirement is that this function should accept a numpy array (with the proper shape) as the input and should produce a numpy array (with the proper shape) as the output. What you do internally is up to you.

Note that the model that we load here is not properly trained and so its performance is very bad. This example is only for showing you how a model can be loaded in PyTorch and how predictions can be made.

In [15]:
 def leader_board_predict_fn(input_batch):
    """
    Function for making predictions using your trained model.
    
    Args:
        input_batch (numpy array): Input images (4D array of shape 
                                   [batch_size, 1, 128, 128])
        
    Returns:
        output (numpy array): Predictions of the your trained model 
                             (1D array of int (0-35) of shape [batch_size, ])
    """
    prediction = None
    
    batch_size, channels, height, width = input_batch.shape
       
    
    
    
    # YOUR CODE HERE (please remove 'raise NotImplementedError()')
    
    input_batch = (input_batch/255).astype(np.float32)
    
    net = Net(INPUT_SIZE, HIDDEN_SIZE, OUTPUT_SIZE).float()
    net.load_state_dict(torch.load("saved_model.pt"))
    net.eval()
    data = torch.from_numpy(input_batch.astype(np.float32))
    net_out = net(data)
    pred = net_out.argmax(dim=1, keepdim=True)
    output = pred.numpy().reshape((batch_size,))
    prediction = output
    assert prediction is not None, "Prediction cannot be None"
    assert isinstance(prediction, np.ndarray), "Prediction must be a numpy array"

    return prediction

## Evaluation

Your final model will be evaluated on a hidden test set containing images similar to the dataset that you are provided with.

For evaluating the performance of your model, we will use the normalized [accuracy_score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html#sklearn.metrics.accuracy_score) metric from sklearn. This is simply the percentage of correct predictions that your model makes for all the images of the hidden test set. Hence, if all the predictions are correct, the score is 1.0 and if all predictions are incorrect, the score is 0.0. We will use the sklearn metric so that the accuracy function is agnostic to the machine learning framework you use.

In [16]:
from sklearn.metrics import accuracy_score
  
def accuracy(dataset_path, max_batches=30):
    """
    Calculates the average prediction accuracy.
    
    IMPORTANT
    =========
    In this function, we use PyTorch only for loading the data. When your `leader_board_predict_fn`
    function is called, we pass the arguments to it as numpy arrays. The output of `leader_board_predict_fn`
    is also expected to be a numpy array. So, as long as your `leader_board_predict_fn` function takes
    numpy arrays as input and produces numpy arrays as output (with the proper shapes), it does not
    matter what framework you used for training your network or for producing your predictions.
    
    Args:
        dataset_path (str): Path of the dataset directory
        
    Returns:
        accuracy (float): Average accuracy score over all images (float in the range 0.0-1.0)
    """

    # Create a Dataset object
    sign_lang_dataset = SignLangDataset(csv_file="labels.csv", root_dir=dataset_path)

    # Create a Dataloader
    sign_lang_dataloader = DataLoader(sign_lang_dataset, 
                                      batch_size=64,
                                      shuffle=True, 
                                      drop_last=True,
                                      num_workers=0)
    
    # Calculate accuracy for each batch
    accuracies = list()
    for batch_idx, sample in enumerate(sign_lang_dataloader):
        x = sample["image"].numpy()
        y = sample["label"].numpy()
        prediction = leader_board_predict_fn(x)
        accuracies.append(accuracy_score(y, prediction, normalize=True))
        
        # We will consider only the first 30 batches
        if batch_idx == (max_batches - 1):
            break

    assert len(accuracies) == max_batches
    
    # Return the average accuracy
    mean_accuracy = np.mean(accuracies)
    return mean_accuracy

We will now use your `leader_board_predict_fn` function for calculating the accuracy of your model. As a check, we provide the code for testing your loaded model on the visible training data. There will be a hidden test which will evaluate your model's performance on the hidden test dataset (this is not visible to you when you validate this notebook).

In [17]:
### LEADER BOARD TEST
seed = 200

torch.manual_seed(seed)
np.random.seed(seed)

# Calculate the accuracy on the training dataset
# to check that your `leader_board_predict_fn` function 
# works without any error
dataset_score = accuracy(dataset_path=DATASET_PATH)

assert isinstance(dataset_score, float), f"type of dataset_score is {type(dataset_score)}, but it must be float"
assert 0.0<=dataset_score<=1.0, f"Value of dataset_score is {dataset_score}, but it must be between 0.0 and 1.0"

# This is your accuracy score on the visible training dataset
# This is NOT used for the leaderboard.
print(f"Accuracy score on training data: {dataset_score}")

# There is a hidden test that will evaluate your trained model on the hidden test set
# This hidden dataset and the accuracy for this will not be visible to you when you
# validate this notebook. The accuracy score on the hidden dataset will be used
# for calculating your leaderboard score.

### LEADER BOARD TEST

Accuracy score on training data: 0.959375
