# Clasificador de gestos de manos 

- Curso: INFO257 Inteligencia Artificial
- Profesor: Pablo Huijse 
- Consultas por slack o correo: phuijse at inf dot uach dot cl 

## Objetivo

El objetivo de esta actividad es entrenar una red convolucional para clasificar gestos de manos 

Se consideran los siguientes tres gestos

<table>
    <tr>
        <td>
            <img src="img/1.jpg">
        </td>
        <td>
            <img src="img/2.jpg">
        </td>
        <td>
            <img src="img/3.jpg">
        </td>
    </tr>
</table>

1. Un dedo levantado
1. Dos dedos levantados
1. Tres dedos levantados

Más una clase adicional que corresponde a "fondo vacío", totalizando cuatro clases a discriminar


## Datos

Para resolver esta tarea se le ha proporcionado una base de datos que puede descargar en el siguiente enlace: 

> https://drive.google.com/file/d/1m9fKMYpUX24sB9PijXq2g54EBxe2W-eO/view?usp=sharing

La base de datos ya está separada en conjuntos de entrenamiento, validación y prueba




## Instrucciones generales

- Se trabajará en grupos de dos personas
- El grupo debe crear un repositorio privado en www.github.com 
- Invite a su profesor como colaborador (usuario: phuijse)
- No suba los datos al repositorio, suba sólo sus códigos fuente y reportes de resultado
- Se evaluará en base al último *commit* del Lunes 10 de Agosto de 2020
- Desarrolle en PyTorch
- [Sean honestos](https://www.acm.org/about-acm/code-of-ethics-in-spanish)



## Instrucciones específicas

1. Proponga, entrene y compare distintos modelos de red convolucional para resolver el problema
    - Ajuste los parámetros del modelo usando el conjunto de entrenamiento 
    - Calibre sus hiper-parámetros y prevenga el sobreajuste evaluando en el conjunto de validación 
    - Compare los modelos finales midiendo su rendimiento en el conjunto de prueba
    - Justifique sus decisiones de función de costo, optimizador, arquitectura, regularización, etc.
1. Considere al menos dos modelos en su comparación
    - Arquitectura convolucional diseñada por usted. Puede partir de una arquitectura existente (e.g. Lenet5) y proponer mejoras de forma iterativa
    - Arquitectura ResNet18 pre-entrenada en ImageNet como extractor de características. Diseñe sólo el clasificador final y mantenga el modelo extractor congelado para entrenar
1. Presente sus resultados usando matrices de confusión, *accuraccy* y *f1-score*. Considere 5 inicializaciones aleatorias y obtenga barras de error para sus métricas.
1. Reporte su proceso, Analice sus resultados, discuta y concluya

In [5]:
from torchvision.datasets import ImageFolder
from torchvision import transforms 
from torch.utils.data import DataLoader
import torch

In [6]:
my_transform = transforms.Compose([transforms.ToTensor()])

train_dataset = ImageFolder('gestos/train', transform=my_transform)
train_loader = DataLoader(train_dataset, shuffle=True, batch_size=32)

valid_dataset = ImageFolder('gestos/valid', transform=my_transform)
valid_loader = DataLoader(valid_dataset, shuffle=True, batch_size=32)



In [7]:
class Lenet5(torch.nn.Module):
    
    def __init__(self):
        super(type(self), self).__init__()
        self.conv1 = torch.nn.Conv2d(kernel_size=5, in_channels=3, out_channels=12)
        self.conv2 = torch.nn.Conv2d(kernel_size=5, in_channels=12, out_channels=24)
        self.mpool = torch.nn.MaxPool2d(kernel_size=5)
        self.activation = torch.nn.ReLU()
        self.linear1 = torch.nn.Linear(in_features=24*7*7, out_features=120)
        self.linear2 = torch.nn.Linear(in_features=120, out_features=84)
        self.linear3 = torch.nn.Linear(in_features=84, out_features=4)
        
    def forward(self, x):
        h = self.mpool(self.activation(self.conv1(x)))
        #print(h.shape)
        h = self.mpool(self.activation(self.conv2(h)))
        #print(h.shape)
        h = h.view(-1, 24*7*7)
        h = self.activation(self.linear1(h))
        h = self.activation(self.linear2(h))
        return self.linear3(h)
        
    
model = Lenet5()
display(model)

Lenet5(
  (conv1): Conv2d(3, 12, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(12, 24, kernel_size=(5, 5), stride=(1, 1))
  (mpool): MaxPool2d(kernel_size=5, stride=5, padding=0, dilation=1, ceil_mode=False)
  (activation): ReLU()
  (linear1): Linear(in_features=1176, out_features=120, bias=True)
  (linear2): Linear(in_features=120, out_features=84, bias=True)
  (linear3): Linear(in_features=84, out_features=4, bias=True)
)

In [8]:
def train(model, device, train_loader, optimizer, epoch):
    # Set the model to training mode
    model.train()
    train_loss = 0
    print("Epoch:", epoch)
    # Process the images in batches
    for batch_idx, (data, target) in enumerate(train_loader):
        # Use the CPU or GPU as appropriate
        # Recall that GPU is optimized for the operations we are dealing with
        data, target = data.to(device), target.to(device)
        
        # Reset the optimizer
        optimizer.zero_grad()
        
        # Push the data forward through the model layers
        output = model(data)
        
        # Get the loss
        loss = loss_criteria(output, target)

        # Keep a running total
        train_loss += loss.item()
        
        # Backpropagate
        loss.backward()
        optimizer.step()
        
        # Print metrics so we see some progress
        print('\tTraining batch {} Loss: {:.6f}'.format(batch_idx + 1, loss.item()))
            
    # return average loss for the epoch
    avg_loss = train_loss / (batch_idx+1)
    print('Training set: Average loss: {:.6f}'.format(avg_loss))
    return avg_loss

In [9]:
def test(model, device, test_loader):
    # Switch the model to evaluation mode (so we don't backpropagate or drop)
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        batch_count = 0
        for data, target in test_loader:
            batch_count += 1
            data, target = data.to(device), target.to(device)
            
            # Get the predicted classes for this batch
            output = model(data)
            
            # Calculate the loss for this batch
            test_loss += loss_criteria(output, target).item()
            
            # Calculate the accuracy for this batch
            _, predicted = torch.max(output.data, 1)
            correct += torch.sum(target==predicted).item()

    # Calculate the average loss and total accuracy for this epoch
    avg_loss = test_loss / batch_count
    print('Validation set: Average loss: {:.6f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        avg_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))
    
    # return average loss for the epoch
    return avg_loss

In [33]:
from ignite.engine import Engine, Events
from ignite.metrics import Loss, Accuracy

model = Lenet5()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = torch.nn.CrossEntropyLoss(reduction='sum')
max_epochs = 100  
device = torch.device('cpu')

model = model.to(device)



def train_one_step(engine, batch):
    optimizer.zero_grad()
    x, y = batch
    x, y = x.to(device), y.to(device)
    yhat = model.forward(x)
    loss = criterion(yhat, y)
    loss.backward()
    optimizer.step()
    return loss.item() # Este output puede llamar luego como trainer.state.output

# Esto es lo que hace el engine de evaluación
def evaluate_one_step(engine, batch):
    with torch.no_grad():
        x, y = batch
        x, y = x.to(device), y.to(device)
        yhat = model.forward(x)
        #loss = criterion(yhat, y)
        return yhat, y

    
trainer = Engine(train_one_step)
evaluator = Engine(evaluate_one_step)
metrics = {'Loss': Loss(criterion), 'Acc': Accuracy()}
for name, metric in metrics.items():
    metric.attach(evaluator, name)# Use an "Adam" optimizer to adjust weights
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

# Specify the loss criteria
loss_criteria = torch.nn.CrossEntropyLoss()

# Track metrics in these arrays
epoch_nums = []
training_loss = []
validation_loss = []

# Train over 10 epochs (We restrict to 10 for time issues)
epochs = 10
print('Training on', device)
for epoch in range(1, epochs + 1):
        train_loss = train(model, device, train_loader, optimizer, epoch)
        test_loss = test(model, device, test_loader)
        epoch_nums.append(epoch)
        training_loss.append(train_loss)
        validation_loss.append(test_loss)

Training on cpu
Epoch: 1
	Training batch 1 Loss: 1.385186
	Training batch 2 Loss: 1.587036
	Training batch 3 Loss: 1.407479
	Training batch 4 Loss: 1.390790
	Training batch 5 Loss: 1.388707
	Training batch 6 Loss: 1.635255
	Training batch 7 Loss: 1.382583
	Training batch 8 Loss: 1.391438
	Training batch 9 Loss: 1.386155
	Training batch 10 Loss: 1.387825
	Training batch 11 Loss: 1.385450
	Training batch 12 Loss: 1.384952
	Training batch 13 Loss: 1.398267
	Training batch 14 Loss: 1.390164
	Training batch 15 Loss: 1.385923
	Training batch 16 Loss: 1.384613
	Training batch 17 Loss: 1.391155
	Training batch 18 Loss: 1.389499
	Training batch 19 Loss: 1.383072
	Training batch 20 Loss: 1.392362
	Training batch 21 Loss: 1.390225
	Training batch 22 Loss: 1.388630
	Training batch 23 Loss: 1.387901
	Training batch 24 Loss: 1.386859
	Training batch 25 Loss: 1.386928
	Training batch 26 Loss: 1.386109
	Training batch 27 Loss: 1.386338
	Training batch 28 Loss: 1.391063
	Training batch 29 Loss: 1.38625

	Training batch 238 Loss: 1.385343
	Training batch 239 Loss: 1.391157
	Training batch 240 Loss: 1.388105
	Training batch 241 Loss: 1.382605
	Training batch 242 Loss: 1.392217
	Training batch 243 Loss: 1.377967
	Training batch 244 Loss: 1.385716
	Training batch 245 Loss: 1.392067
	Training batch 246 Loss: 1.380631
	Training batch 247 Loss: 1.385626
	Training batch 248 Loss: 1.389735
	Training batch 249 Loss: 1.400987
	Training batch 250 Loss: 1.386905
	Training batch 251 Loss: 1.391507
	Training batch 252 Loss: 1.389047
	Training batch 253 Loss: 1.387166
	Training batch 254 Loss: 1.395116
	Training batch 255 Loss: 1.389122
	Training batch 256 Loss: 1.392132
	Training batch 257 Loss: 1.383572
	Training batch 258 Loss: 1.387740
	Training batch 259 Loss: 1.387536
	Training batch 260 Loss: 1.379780
	Training batch 261 Loss: 1.388891
	Training batch 262 Loss: 1.386151
	Training batch 263 Loss: 1.384807
	Training batch 264 Loss: 1.387876
	Training batch 265 Loss: 1.387126
	Training batch 266 

NameError: name 'test_loader' is not defined