# Hand recognition - train model

Let's begin the journey of training the model that will recognise hand gestures. You can also proceed with this file on your PC.

The training will allow to detect 7 different classes:
``go``, ``stop``, ``left``, ``right``, ``circle``, ``free`` and ``blocked`` to differentiate betweend hand gestures and blocking objects. In case none of these appear, the ``free`` class should be activated.

For this, we'll use a popular deep learning library *PyTorch*

In [1]:
import torch
import torch.optim as optim
import torch.nn.functional as F
import torchvision
import torchvision.datasets as datasets
import torchvision.models as models
import torchvision.transforms as transforms

### Dataset

Now we will prepare the data for training.  

In [2]:
dataset = datasets.ImageFolder(
    'dataset_hand_recog',
    transforms.Compose([
        transforms.ColorJitter(0.1, 0.1, 0.1, 0.1),
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])
)

Next, we split the dataset into training and testing sets (30% of the photos).

In [3]:
testset = round(len(dataset)*0.3)
print(testset)
train_dataset, test_dataset = torch.utils.data.random_split(dataset, [len(dataset) - testset, testset])

259


Some extra utilities to shuffle data.

In [4]:
train_loader = torch.utils.data.DataLoader(
    train_dataset,
    batch_size=16,
    shuffle=True,
    num_workers=4
)

test_loader = torch.utils.data.DataLoader(
    test_dataset,
    batch_size=16,
    shuffle=True,
    num_workers=4
)

### Define the neural network

Now, we define the neural network we'll be training.  The *torchvision* package provides a collection of pre-trained models that we can use.

In a process called *transfer learning*, we can repurpose a pre-trained model (trained on millions of images) for a new task that has possibly much less data available.

Important features that were learned in the original training of the pre-trained model are re-usable for the new task.  We'll use the ``alexnet`` model.

The ``alexnet`` model was originally trained for a dataset that had 1000 class labels. We'll replace
the final layer with a new, untrained layer that has only seven outputs that we gathered photos for.  

In [5]:
model = models.alexnet(pretrained=True)
model.classifier[6] = torch.nn.Linear(model.classifier[6].in_features, 7)

Finally, we transfer our model for execution on the GPU

In [7]:
device = torch.device('cuda')
model = model.to(device)

Using the code below we will train the neural network for 30 epochs, saving the best performing model after each epoch.

> An epoch is a full run through our data.

In [None]:
NUM_EPOCHS = 30
BEST_MODEL_PATH = 'best_model.pth'
best_accuracy = 0.0

optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

for epoch in range(NUM_EPOCHS):
    
    for images, labels in iter(train_loader):
        images = images.to(device)
        labels = labels.to(device)
        optimizer.zero_grad()
        outputs = model(images)
        loss = F.cross_entropy(outputs, labels)
        loss.backward()
        optimizer.step()
    
    test_error_count = 0.0
    for images, labels in iter(test_loader):
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        test_error_count += float(torch.sum(torch.abs(labels - outputs.argmax(1))))
    
    test_accuracy = 1.0 - float(test_error_count) / float(len(test_dataset))
    print('%d: %f' % (epoch, test_accuracy))
    if test_accuracy > best_accuracy:
        torch.save(model.state_dict(), BEST_MODEL_PATH)
        best_accuracy = test_accuracy

0: 0.835000


Once that is finished, you should see a file ``best_model.pth`` in the Jupyter Lab file browser. Please proceed to gesture_recognition Notebook.