# ASL - Erkennung

## Intro

Stellen sie sich vor, sie wollen sich mit einem Taubstummen unterhalten, aber beherschen die Zeichensprache nicht. Unsere Applikation bietet ihnen eine real-time Lösung, die mit einer Bilderkennung Handzeichen in Buchstaben übersetzt. 

## Architektur unseres Neuronales Netzwerks

Die Basis unseres neuronalem Netzwerks bildet das Modell VGG16, das mit 14 Millionen Bildern vortrainiert ist und Features extrahieren kann. Diese Features nutzt unser eigens erstellter Klassifizierer um ein Eingabebild einem Buchstaben zuzuordnen. 
ü ö ä
## Entwicklungsprozess

Angefangen haben damit die Bilder des Trainingsdatensatzes zu importieren und mit den richtigen Labels zu versehen und ein erstes Netzwerk zu trainieren. Dabei haben wir uns anfangs nur auf "A", "B" und "Nichts" beschränkt um das Problem klein zu halten und mögliche Probleme frühzeitig zu erkennen. Daraufhin haben wir das Netzwerk exportiert damit wir es in unserer Applikation als Prediction Machine nutzen können. Mit pygame Modul haben wir dann das Interface gebaut und die Webcam als live Input implementiert. Als alles funktionierte konnten wir zum ersten Mal unser Programm in der echten Welt testen und wir bemerkten schnell, dass die korrekte Klassifizierung niedrig war. Aufgrund dessen experimentierten wir mit Hauttonerkennung um Hintergrundstörungen zu entfernen. Doch es stellte sich schnell heraus, dass aufgrund der Schatten, die bei manchen Handzeichen entstehen, die Erkennung nur mittelmäßig funktionierte und das Ergebnis verschlechterte. Dafür erweiterten wir den Datensatz mit eigens aufgenommen Daten und verbesserten somit das Ergebnis erheblich. Dann nahmen wir immmer mehr Buchstaben in unser Netzwerk auf. Erst sechs, dann zwölf, dann 21 und letzendlich 29, aber je mehr Klassen hinzu kamen desto ungenauer wurde unser Ergebnis. 
Unser Plan ist, durch Ausnutzung \\ mehrerer Bilder in einem kleinen Zeitintervall die Vorhersage zu verbessern.

## Reflektion

Leider funktioniert unser NN letzendlich nicht so gut wie wir uns es anfangs ausgemalt haben. Solange wir uns nur auf wenige Zeichen beschränken funktioniert unsere Applikation genau so wie wir uns es auch vorgestellt haben. Bei fast 30 Klassen wird es aber unumgänglich sein deutlich mehr Trainingsdaten hinzuzufügen. Ein komplexeres Netzwerk (reinforcement learning) oder Fine Tuning des ganzen Modells um das Erebnis leicht zu verbessern. 

## Ausblick

Sobald das Programm auf dem Computer sehr zuverlässig funktioniert, planen wir eine mobile App zu bauen.



import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb

import torch
from torch import nn
from torch import optim
import torch.nn.functional as F
from torchvision import datasets, transforms, models

In [3]:
import json

with open('jsons/middle21.json', 'r') as f:
    flower_to_name = json.load(f)
    
print(len(flower_to_name)) 
print(flower_to_name)

21
{'1': 'A', '2': 'B', '3': 'C', '4': 'D', '5': 'E', '6': 'F', '7': 'G', '8': 'H', '9': 'I', '10': 'J', '11': 'K', '12': 'L', '13': 'M', '14': 'N', '15': 'O', '16': 'P', '17': 'Q', '18': 'R', '27': 'del', '28': 'nothing', '29': 'space'}


In [4]:
data_dir = '/home/leonardo/Documents/jupyter/ASL/data-batch'
train_dir = data_dir + '/train'
valid_dir = data_dir + '/valid'
test_dir = data_dir + '/test'

In [5]:
training_transforms = transforms.Compose([transforms.RandomRotation(30),
                                          transforms.RandomResizedCrop(224),
                                          transforms.RandomHorizontalFlip(),
                                          transforms.ToTensor(),
                                          transforms.Normalize([0.485, 0.456, 0.406], 
                                                               [0.229, 0.224, 0.225])])

validation_transforms = transforms.Compose([transforms.Resize(256),
                                            transforms.CenterCrop(224),
                                            transforms.ToTensor(),
                                            transforms.Normalize([0.485, 0.456, 0.406], 
                                                                 [0.229, 0.224, 0.225])])

testing_transforms = transforms.Compose([transforms.Resize(256),
                                         transforms.CenterCrop(224),
                                         transforms.ToTensor(),
                                         transforms.Normalize([0.485, 0.456, 0.406], 
                                                              [0.229, 0.224, 0.225])])

# TODO: Load the datasets with ImageFolder
training_dataset = datasets.ImageFolder(train_dir, transform=training_transforms)
validation_dataset = datasets.ImageFolder(valid_dir, transform=validation_transforms)
testing_dataset = datasets.ImageFolder(test_dir, transform=testing_transforms)

# TODO: Using the image datasets and the trainforms, define the dataloaders
train_loader = torch.utils.data.DataLoader(training_dataset, batch_size=64, shuffle=True)
validate_loader = torch.utils.data.DataLoader(validation_dataset, batch_size=32)
test_loader = torch.utils.data.DataLoader(testing_dataset, batch_size=32)

In [6]:
model = models.vgg16(pretrained=True)
model

VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (17): Conv2d

In [7]:
# Freeze pretrained model parameters
for parameter in model.parameters():
    parameter.requires_grad = False


from collections import OrderedDict

classifier = nn.Sequential(OrderedDict([('fc1', nn.Linear(25088, 8000)),
                                        ('relu', nn.ReLU()),
                                        ('drop', nn.Dropout(p=0.5)),
                                        ('fc2', nn.Linear(8000, 21)),
                                        ('output', nn.LogSoftmax(dim=1))]))

model.classifier = classifier

In [8]:
def validation(model, validateloader, criterion):
    
    val_loss = 0
    accuracy = 0
    
    for images, labels in iter(validateloader):

        images, labels = images.to('cuda'), labels.to('cuda')

        output = model.forward(images)
        val_loss += criterion(output, labels).item()

        probabilities = torch.exp(output)
        
        equality = (labels.data == probabilities.max(dim=1)[1])
        accuracy += equality.type(torch.FloatTensor).mean()
    
    return val_loss, accuracy

In [9]:
# Loss function and gradient descent

criterion = nn.NLLLoss()

optimizer = optim.Adam(model.classifier.parameters(), lr=0.001)

In [10]:
# Train the classifier

def train_classifier():


    epochs = 3
    steps = 0
    print_every = 40

    model.to('cuda')

    for e in range(epochs):

        model.train()

        running_loss = 0

        for images, labels in iter(train_loader):

            steps += 1

            images, labels = images.to('cuda'), labels.to('cuda')

            optimizer.zero_grad()

            output = model.forward(images)
            loss = criterion(output, labels)
            loss.backward()
            optimizer.step()

            running_loss += loss.item()

            if steps % print_every == 0:

                model.eval()

                # Turn off gradients for validation, saves memory and computations
                with torch.no_grad():
                    validation_loss, accuracy = validation(model, validate_loader, criterion)

                print("Epoch: {}/{}.. ".format(e+1, epochs),
                      "Training Loss: {:.3f}.. ".format(running_loss/print_every),
                      "Validation Loss: {:.3f}.. ".format(validation_loss/len(validate_loader)),
                      "Validation Accuracy: {:.3f}".format(accuracy/len(validate_loader)))

                running_loss = 0
                model.train()
                    
train_classifier()

Epoch: 1/3..  Training Loss: 6.181..  Validation Loss: 2.305..  Validation Accuracy: 0.300
Epoch: 1/3..  Training Loss: 2.356..  Validation Loss: 1.684..  Validation Accuracy: 0.456
Epoch: 1/3..  Training Loss: 2.104..  Validation Loss: 1.257..  Validation Accuracy: 0.581
Epoch: 1/3..  Training Loss: 1.971..  Validation Loss: 1.344..  Validation Accuracy: 0.565
Epoch: 1/3..  Training Loss: 1.857..  Validation Loss: 0.990..  Validation Accuracy: 0.685
Epoch: 1/3..  Training Loss: 1.781..  Validation Loss: 1.084..  Validation Accuracy: 0.636
Epoch: 1/3..  Training Loss: 1.733..  Validation Loss: 1.056..  Validation Accuracy: 0.651
Epoch: 1/3..  Training Loss: 1.648..  Validation Loss: 0.802..  Validation Accuracy: 0.718
Epoch: 1/3..  Training Loss: 1.632..  Validation Loss: 0.795..  Validation Accuracy: 0.739
Epoch: 1/3..  Training Loss: 1.599..  Validation Loss: 0.726..  Validation Accuracy: 0.735
Epoch: 1/3..  Training Loss: 1.571..  Validation Loss: 0.802..  Validation Accuracy: 0.727

Epoch: 3/3..  Training Loss: 1.107..  Validation Loss: 0.553..  Validation Accuracy: 0.832
Epoch: 3/3..  Training Loss: 1.119..  Validation Loss: 0.610..  Validation Accuracy: 0.824
Epoch: 3/3..  Training Loss: 1.140..  Validation Loss: 0.605..  Validation Accuracy: 0.824
Epoch: 3/3..  Training Loss: 1.139..  Validation Loss: 0.591..  Validation Accuracy: 0.820
Epoch: 3/3..  Training Loss: 1.096..  Validation Loss: 0.576..  Validation Accuracy: 0.822
Epoch: 3/3..  Training Loss: 1.133..  Validation Loss: 0.551..  Validation Accuracy: 0.827
Epoch: 3/3..  Training Loss: 1.093..  Validation Loss: 0.534..  Validation Accuracy: 0.826
Epoch: 3/3..  Training Loss: 1.122..  Validation Loss: 0.503..  Validation Accuracy: 0.834
Epoch: 3/3..  Training Loss: 1.077..  Validation Loss: 0.537..  Validation Accuracy: 0.838
Epoch: 3/3..  Training Loss: 1.128..  Validation Loss: 0.505..  Validation Accuracy: 0.831
Epoch: 3/3..  Training Loss: 1.104..  Validation Loss: 0.447..  Validation Accuracy: 0.851

In [11]:
def test_accuracy(model, test_loader):

    # Do validation on the test set
    model.eval()
    model.to('cuda')

    with torch.no_grad():
    
        accuracy = 0
    
        for images, labels in iter(test_loader):
    
            images, labels = images.to('cuda'), labels.to('cuda')

            output = model.forward(images)

            probabilities = torch.exp(output)
        
            equality = (labels.data == probabilities.max(dim=1)[1])
        
            accuracy += equality.type(torch.FloatTensor).mean()
        
        print("Test Accuracy: {}".format(accuracy/len(test_loader)))    
        
        
test_accuracy(model, test_loader)


Test Accuracy: 1.0


In [12]:
# Save the checkpoint

def save_checkpoint(model):

    model.class_to_idx = training_dataset.class_to_idx

    checkpoint = {'arch': "vgg16",
                  'class_to_idx': model.class_to_idx,
                  'model_state_dict': model.state_dict()
                 }

    torch.save(checkpoint, 'NNs/ABCDEFGHIJKLMNOPQRdns3Gextra1.pth')
    
save_checkpoint(model)

In [None]:
from collections import OrderedDict

# Function that loads a checkpoint and rebuilds the model

def load_checkpoint(filepath):
    
    checkpoint = torch.load(filepath)
    
    if checkpoint['arch'] == 'vgg16':
        
        model = models.vgg16(pretrained=True)
        
        for param in model.parameters():
            param.requires_grad = False
    else:
        print("Architecture not recognized.")
    
    model.class_to_idx = checkpoint['class_to_idx']
    
    classifier = nn.Sequential(OrderedDict([('fc1', nn.Linear(25088, 5000)),
                                            ('relu', nn.ReLU()),
                                            ('drop', nn.Dropout(p=0.5)),
                                            ('fc2', nn.Linear(5000, 3)),
                                            ('output', nn.LogSoftmax(dim=1))]))

    model.classifier = classifier
    
    model.load_state_dict(checkpoint['model_state_dict'])
    
    return model

model = load_checkpoint('NNs/ABn3G.pth')
model.cuda()
print(model)


In [None]:
from PIL import Image

def process_image(image_path):
    ''' Scales, crops, and normalizes a PIL image for a PyTorch model,
        returns an Numpy array
    '''
    
    # Process a PIL image for use in a PyTorch model
    
    pil_image = Image.open(image_path)
    
    # Resize
    if pil_image.size[0] > pil_image.size[1]:
        pil_image.thumbnail((5000, 256))
    else:
        pil_image.thumbnail((256, 5000))
        
    # Crop 
    left_margin = (pil_image.width-224)/2
    bottom_margin = (pil_image.height-224)/2
    right_margin = left_margin + 224
    top_margin = bottom_margin + 224
    
    pil_image = pil_image.crop((left_margin, bottom_margin, right_margin, top_margin))
    
    # Normalize
    np_image = np.array(pil_image)/255
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    np_image = (np_image - mean) / std
    
    # PyTorch expects the color channel to be the first dimension but it's the third dimension in the PIL image and Numpy array
    # Color channel needs to be first; retain the order of the other two dimensions.
    np_image = np_image.transpose((2, 0, 1))
    
    return np_image

In [None]:
def imshow(image, ax=None, title=None):
    if ax is None:
        fig, ax = plt.subplots()
    
    # PyTorch tensors assume the color channel is the first dimension
    # but matplotlib assumes is the third dimension
    image = image.transpose((1, 2, 0))
    
    # Undo preprocessing
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    image = std * image + mean
    
    if title is not None:
        ax.set_title(title)
    
    # Image needs to be clipped between 0 and 1 or it looks like noise when displayed
    image = np.clip(image, 0, 1)
    
    ax.imshow(image)
    
    return ax

image = process_image('A_test.jpg')
imshow(image)

In [None]:
def predict(image_path, model, topk=2):
    ''' Predict the class (or classes) of an image using a trained deep learning model.
    '''
    
    image = process_image(image_path)
    
    # Convert image to PyTorch tensor first
    image = torch.from_numpy(image).type(torch.cuda.FloatTensor)
    #print(image.shape)
    #print(type(image))
    
    # Returns a new tensor with a dimension of size one inserted at the specified position.
    image = image.unsqueeze(0)
    
    output = model.forward(image)
    
    probabilities = torch.exp(output)
    
    # Probabilities and the indices of those probabilities corresponding to the classes
    top_probabilities, top_indices = probabilities.topk(topk)
    
    # Convert to lists
    top_probabilities = top_probabilities.detach().type(torch.FloatTensor).numpy().tolist()[0] 
    top_indices = top_indices.detach().type(torch.FloatTensor).numpy().tolist()[0] 
    
    # Convert topk_indices to the actual class labels using class_to_idx
    # Invert the dictionary so you get a mapping from index to class.
    
    idx_to_class = {value: key for key, value in model.class_to_idx.items()}
    #print(idx_to_class)
    
    top_classes = [idx_to_class[index] for index in top_indices]
    
    return top_probabilities, top_classes
    
probs, classes = predict('A_test.jpg', model)   
print(probs)
print(classes)

In [None]:

# Display an image along with the top 5 classes

# Plot flower input image
plt.figure(figsize = (6,10))
plot_1 = plt.subplot(2,1,1)

image = process_image('A_test.jpg')

flower_title = flower_to_name['1']

imshow(image, plot_1, title=flower_title);

# Convert from the class integer encoding to actual flower names
flower_names = [flower_to_name[i] for i in classes]

# Plot the probabilities for the top 5 classes as a bar graph
plt.subplot(2,1,2)

sb.barplot(x=probs, y=flower_names, color=sb.color_palette()[0]);

plt.show()