<a href="https://colab.research.google.com/github/csar95/Food-Images-Classification-DL/blob/main/basic_implementation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Obtención de datos
Vamos a usar un dataset de imágenes de comida disponible en [Kaggle](https://www.kaggle.com/trolukovich/food11-image-dataset). 

In [1]:
import os

os.environ["KAGGLE_USERNAME"] = "cesargutic"
os.environ["KAGGLE_KEY"] = ""

!kaggle datasets download trolukovich/food11-image-dataset --unzip

Downloading food11-image-dataset.zip to /content
100% 1.08G/1.08G [00:29<00:00, 43.4MB/s]
100% 1.08G/1.08G [00:29<00:00, 40.0MB/s]


Se habrán descargado 3 carpetas diferentes para los datos de entrenamiento, validación y evaluación, dentro de las cuales se encuentra una subcarpeta para cada una de las 11 clases de comida:

- Bread (panes)
- Dairy product (lácteos)
- Dessert (postres)
- Egg (huevos)
- Fried food (fritos)
- Meat (carnes)
- Noodles-Pasta (pasta)
- Rice (arroz)
- Seafood (pescado y marisco)
- Soup (sopas)
- Vegetable-Fruit (vegetales y frutas)

In [2]:
TRAINDIR = "/content/training"
VALDIR = "/content/validation"
TESTDIR = "/content/evaluation"

## Reducción de clases
Con el fin de hacer este problema más accesible, vamos a centrarnos solo en seis de las clases de comida disponibles: Bread, Dairy product, Dessert, Egg, Fried food y Meat.

In [3]:
from glob import glob
import os

valid_classes = {"Bread", "Dairy product", "Dessert", "Egg", "Fried food", "Meat"}
datasets = {TRAINDIR, VALDIR, TESTDIR}

for dataset in datasets:
    for classdir in glob(f"{dataset}/*"):  # Find subfolders with classes
        if classdir.split("/")[-1] not in valid_classes:  # Ignore those in valid_classes
            print(f"Deleting {classdir}...")
            for fname in glob(f"{classdir}/*.jpg"):  # Remove each image file
                os.remove(fname)
            os.rmdir(classdir)  # Remove folder

Deleting /content/evaluation/Vegetable-Fruit...
Deleting /content/evaluation/Seafood...
Deleting /content/evaluation/Noodles-Pasta...
Deleting /content/evaluation/Rice...
Deleting /content/evaluation/Soup...
Deleting /content/validation/Vegetable-Fruit...
Deleting /content/validation/Seafood...
Deleting /content/validation/Noodles-Pasta...
Deleting /content/validation/Rice...
Deleting /content/validation/Soup...
Deleting /content/training/Vegetable-Fruit...
Deleting /content/training/Seafood...
Deleting /content/training/Noodles-Pasta...
Deleting /content/training/Rice...
Deleting /content/training/Soup...


# Procesando imágenes desde ficheros
Este dataset de imágenes es grande, con imágenes de buena resolución, y cada una de ellas tiene diferentes tamaños y relación de aspecto.

A continuación realizamos el trabajo de carga y procesamiento de las imágenes.

In [4]:
from torchvision import datasets, transforms
from torchvision.transforms import ToTensor
from torch.utils.data import DataLoader

In [5]:
image_size = 32
batch_size = 64

transformations = {
    "train": transforms.Compose([
        transforms.Resize((image_size, image_size)),
        transforms.ToTensor()
    ]),
    "val": transforms.Compose([
        transforms.Resize((image_size, image_size)),
        transforms.ToTensor()
    ]),
    "test": transforms.Compose([
        transforms.Resize((image_size, image_size)),
        transforms.ToTensor()
    ])
}

imageFolders = {
    _set: datasets.ImageFolder(eval(f"{_set.upper()}DIR"), transform=transformations[_set])
    for _set in ["train", "val", "test"]
}

dataLoaders = {
    _set: DataLoader(imageFolders[_set], batch_size=batch_size, shuffle=True)
    for _set in ["train", "val", "test"]
}

In [6]:
imageFolders

{'train': Dataset ImageFolder
     Number of datapoints: 6082
     Root location: /content/training
     StandardTransform
 Transform: Compose(
                Resize(size=(32, 32), interpolation=bilinear, max_size=None, antialias=None)
                ToTensor()
            ), 'val': Dataset ImageFolder
     Number of datapoints: 2108
     Root location: /content/validation
     StandardTransform
 Transform: Compose(
                Resize(size=(32, 32), interpolation=bilinear, max_size=None, antialias=None)
                ToTensor()
            ), 'test': Dataset ImageFolder
     Number of datapoints: 2070
     Root location: /content/evaluation
     StandardTransform
 Transform: Compose(
                Resize(size=(32, 32), interpolation=bilinear, max_size=None, antialias=None)
                ToTensor()
            )}

# Construcción de la red

In [7]:
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cuda'

In [8]:
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

In [9]:
class DeepLearningNet(nn.Module):
    
    def __init__(self):
        super(DeepLearningNet, self).__init__()
        # Input size: (BatchSize, 32, 32, 3)
        self.features = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=32, kernel_size=(3,3)),   # (BatchSize, 30, 30, 32)
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=(2,2))                                 # (BatchSize, 15, 15, 32)
        )
        self.flatten = nn.Flatten(start_dim=1)  # dim=0 es el batch. dim=1 para aplanar cada imagen del batch
        self.classifier = nn.Sequential(                                    # (BatchSize, 7200)
            nn.Linear(in_features=7200, out_features=128),                  # (BatchSize, 128)
            nn.ReLU(),
            nn.Dropout(p=0.4),
            nn.Linear(in_features=128, out_features=6),                     # (BatchSize, 6)
            nn.Softmax(dim=1)
        )
        
    def forward(self, input):
        output = self.features(input)        
        output = self.flatten(output)        
        output = self.classifier(output)
        
        return output

In [10]:
model = DeepLearningNet()
model

DeepLearningNet(
  (features): Sequential(
    (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
  )
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (classifier): Sequential(
    (0): Linear(in_features=7200, out_features=128, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.4, inplace=False)
    (3): Linear(in_features=128, out_features=6, bias=True)
    (4): Softmax(dim=1)
  )
)

In [11]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Entrenamiento

In [12]:
img, label = next(iter(dataLoaders['train']))
img.shape, label

(torch.Size([64, 3, 32, 32]),
 tensor([2, 5, 5, 2, 4, 3, 0, 1, 3, 0, 5, 2, 1, 3, 5, 3, 2, 4, 0, 0, 3, 3, 2, 2,
         4, 0, 2, 0, 5, 5, 5, 2, 4, 0, 2, 2, 5, 0, 0, 5, 2, 3, 5, 5, 0, 5, 2, 5,
         2, 2, 3, 1, 0, 1, 4, 4, 3, 3, 2, 3, 4, 4, 2, 3]))

In [13]:
logits = model(img)
logits

tensor([[0.1931, 0.1273, 0.1729, 0.1893, 0.1351, 0.1822],
        [0.1847, 0.1519, 0.1728, 0.1617, 0.1418, 0.1872],
        [0.1817, 0.1309, 0.1683, 0.1753, 0.1675, 0.1763],
        [0.1931, 0.1446, 0.1635, 0.1519, 0.1422, 0.2048],
        [0.1847, 0.1486, 0.1723, 0.1645, 0.1532, 0.1768],
        [0.1891, 0.1425, 0.1754, 0.1712, 0.1422, 0.1796],
        [0.1925, 0.1364, 0.1511, 0.1785, 0.1566, 0.1850],
        [0.1805, 0.1406, 0.1775, 0.1744, 0.1432, 0.1837],
        [0.1793, 0.1438, 0.1778, 0.1646, 0.1592, 0.1754],
        [0.1825, 0.1380, 0.1683, 0.1709, 0.1508, 0.1895],
        [0.1962, 0.1365, 0.1682, 0.1834, 0.1433, 0.1724],
        [0.1845, 0.1543, 0.1677, 0.1675, 0.1437, 0.1823],
        [0.1808, 0.1349, 0.1738, 0.1796, 0.1556, 0.1753],
        [0.1720, 0.1374, 0.1751, 0.1877, 0.1543, 0.1736],
        [0.1775, 0.1494, 0.1699, 0.1662, 0.1523, 0.1847],
        [0.1680, 0.1353, 0.1868, 0.1751, 0.1642, 0.1707],
        [0.1800, 0.1573, 0.1636, 0.1701, 0.1502, 0.1788],
        [0.179

In [14]:
import copy

num_epochs = 20
patience = 7  # Number of epochs with no improvement after which training will be stopped
early_stopping = False
best_loss = float('inf')
epochs_with_no_improvement = 0

for epoch in range(num_epochs):

    epoch_results = {}

    for phase in ["train", "val"]:
        # This sets the execution mode and informs layers (e.g., Dropout, BatchNorm) designed to behave differently during training and evaluation
        if phase == "train":
            model.train()
        else:
            model.eval()

        running_loss = 0.0
        correct_in_dataset = 0
        
        # For each batch of images update model parameters / weights
        for i, (inputs, labels) in enumerate(dataLoaders[phase]):        
            optimizer.zero_grad()               # Sets the gradients of all optimized tensors to zero. Same as model.zero_grad() if all model parameters are in the optimizer
            outputs = model(inputs)
            loss = criterion(outputs, labels)

            if phase == "train":
                loss.backward()                 # Computes the gradient of loss w.r.t all the parameters in loss that have requires_grad=True and store them in x.grad (x.grad += dloss/dx)
                optimizer.step()                # Performs a single optimization step (parameter update based on the gradients)
            
            running_loss += loss.item()
            
            _, predicted = torch.max(outputs.data, 1)
            correct_in_dataset += (predicted == labels).sum().item()
        
        if phase == "train":
            epoch_results['train_loss'] = running_loss/len(dataLoaders[phase])
            epoch_results['train_accuracy'] = correct_in_dataset/len(imageFolders[phase])

        if phase == "val":
            epoch_results['val_loss'] = running_loss/len(dataLoaders[phase])
            epoch_results['val_accuracy'] = correct_in_dataset/len(imageFolders[phase])
            
            if epoch_results['val_loss'] < best_loss:
                best_loss = epoch_results['val_loss']
                best_model = copy.deepcopy(model.state_dict())
                epochs_with_no_improvement = 0
            else:
                epochs_with_no_improvement += 1

            if epochs_with_no_improvement == patience:
                model.load_state_dict(best_model)
                early_stopping = True

    print(f"Epoch {epoch+1}/{num_epochs} | loss: {epoch_results['train_loss']:.4} - accuracy: {epoch_results['train_accuracy']:.4} - val_loss: {epoch_results['val_loss']:.4} - val_accuracy: {epoch_results['val_accuracy']:.4}")

    if early_stopping:
        print('Early stopping!')
        break

Epoch 1/20 | loss: 1.722 - accuracy: 0.2964 - val_loss: 1.68 - val_accuracy: 0.3344
Epoch 2/20 | loss: 1.669 - accuracy: 0.3589 - val_loss: 1.634 - val_accuracy: 0.3952
Epoch 3/20 | loss: 1.638 - accuracy: 0.3889 - val_loss: 1.627 - val_accuracy: 0.3961
Epoch 4/20 | loss: 1.626 - accuracy: 0.4025 - val_loss: 1.638 - val_accuracy: 0.3866
Epoch 5/20 | loss: 1.617 - accuracy: 0.4155 - val_loss: 1.616 - val_accuracy: 0.4122
Epoch 6/20 | loss: 1.608 - accuracy: 0.4308 - val_loss: 1.617 - val_accuracy: 0.4099
Epoch 7/20 | loss: 1.594 - accuracy: 0.442 - val_loss: 1.603 - val_accuracy: 0.4303
Epoch 8/20 | loss: 1.577 - accuracy: 0.4558 - val_loss: 1.601 - val_accuracy: 0.4213
Epoch 9/20 | loss: 1.572 - accuracy: 0.4661 - val_loss: 1.599 - val_accuracy: 0.425
Epoch 10/20 | loss: 1.56 - accuracy: 0.4873 - val_loss: 1.6 - val_accuracy: 0.4265
Epoch 11/20 | loss: 1.55 - accuracy: 0.4961 - val_loss: 1.587 - val_accuracy: 0.4478
Epoch 12/20 | loss: 1.54 - accuracy: 0.5013 - val_loss: 1.588 - val_ac

# Evaluación

In [15]:
correct = 0

with torch.no_grad():
    for images, labels in dataLoaders['test']:        
        outputs = model(images)        
        _, predicted = torch.max(outputs.data, 1)
        correct += (predicted == labels).sum().item()
        
print(f"Accuracy: {correct / len(imageFolders['test']):.2%}")

Accuracy: 45.60%
