In [1]:
!pip install thop

Collecting thop
  Downloading thop-0.1.1.post2209072238-py3-none-any.whl.metadata (2.7 kB)
Downloading thop-0.1.1.post2209072238-py3-none-any.whl (15 kB)
Installing collected packages: thop
Successfully installed thop-0.1.1.post2209072238


EfficientNet Transfer Learning for Hand Gesture Recognition

=======================================================



This module implements a transfer learning approach using MobileNet V2 for hand gesture 

recognition. It fine-tunes a pre-trained EfficientNet model on ASL hand gesture data 

for improved classification performance.



Project Overview

--------------

Part of the Computer Vision Master's Project at UC3M (Universidad Carlos III de Madrid)  

Date: 30/11/2024  

Version: 1.0



Main Features

-----------

* Transfer learning implementation using MobileNet V2

* A Kaggle dataset and a custom dataset handling for hand gesture images

* Fine-tuning capabilities with configurable parameters

* Training progress monitoring and visualization

* Model performance evaluation

* Checkpoint saving and loading

* Learning rate scheduling



Technical Architecture

-------------------

1. Model Architecture:

   - Base: MobileNet V2 pre-trained on ImageNet

   - Modified classifier head for gesture classes

   - Frozen feature extraction layers

   - Fine-tuned top layers



2. Training Pipeline:

   - Custom dataset loading and preprocessing

   - Data augmentation

   - Transfer learning optimization

   - Learning rate scheduling

   - Model checkpointing



3. Evaluation Components:

   - Training/validation loss tracking

   - Accuracy metrics

   - Confusion matrix generation

   - Performance visualization



Dependencies

-----------

* PyTorch >= 1.9.0: Deep learning framework

* torchvision >= 0.10.0: Vision models and utilities

* EfficientNet-PyTorch: Pre-trained models

* NumPy >= 1.19.0: Numerical computations

* Matplotlib >= 3.3.0: Visualization

* PIL: Image processing

* tqdm: Progress tracking



Input Requirements

----------------

* Dataset Structure:

    - Root directory containing class subdirectories

    - Images organized by gesture classes

    - Supported formats: JPG, PNG  

    - Datasets:

        - Dataset 1: Custom dataset created from webcam data 

        - Dataset 2: [ASL Alphabet data](https://www.kaggle.com/datasets/grassknoted/asl-alphabet/data) from Kaggle

        - Dataset 3: Custom dataset based on the combination of datasets 1 and 2. 

* Training Configuration:

    - Batch size

    - Learning rate

    - Number of epochs

    - Device selection (CPU/GPU)



Output

------

* Trained Model:

    - Saved model checkpoints

    - Best model weights

    - Training state

* Performance Metrics:

    - Training/validation loss curves

    - Accuracy plots

    - Confusion matrix

    - Per-class performance metrics



Training Parameters

-----------------

* BATCH_SIZE: Mini-batch size for training

* LEARNING_RATE: Initial learning rate

* NUM_EPOCHS: Total training epochs

* WEIGHT_DECAY: L2 regularization factor

* NUM_CLASSES: Number of gesture classes

* CHECKPOINT_DIR: Directory for saving models



Model Architecture Details

------------------------

* Base Model: MobileNet V2

* Input Size: 224x224x3

* Feature Extraction: Pre-trained weights

* Classifier Head: Custom fully connected layers

* Output: Softmax probabilities for gestures



Training Process

--------------

1. Data Preparation:

   - Image resizing and normalization

   - Data augmentation (random transforms)

   - Batch creation



2. Training Loop:

   - Forward pass

   - Loss computation

   - Backpropagation

   - Optimizer step

   - Learning rate adjustment



3. Validation:

   - Model evaluation

   - Metric computation

   - Best model saving



Performance Considerations

------------------------

* GPU Requirements:

    - Recommended: NVIDIA GPU with 6GB+ VRAM

    - CUDA support required for GPU training

* Training Time:

    - Varies with dataset size and epochs

    - GPU training significantly faster

* Memory Usage:

    - Depends on batch size

    - Typical range: 4-8GB RAM



Notes

-----

* Pre-trained weights significantly reduce training time

* Data augmentation crucial for generalization

* Regular checkpointing prevents training loss

* Monitor validation metrics for overfitting



References

---------

1. EfficientNet Paper: https://arxiv.org/abs/1905.11946

2. PyTorch Documentation: https://pytorch.org/docs/stable/index.html

3. Transfer Learning Guide: https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html

4. [Add relevant papers or resources]

In [2]:
!pip install torchviz

Collecting torchviz
  Downloading torchviz-0.0.3-py3-none-any.whl.metadata (2.1 kB)
Downloading torchviz-0.0.3-py3-none-any.whl (5.7 kB)
Installing collected packages: torchviz
Successfully installed torchviz-0.0.3


In [3]:
!pip install torchsummary

Collecting torchsummary
  Downloading torchsummary-1.5.1-py3-none-any.whl.metadata (296 bytes)
Downloading torchsummary-1.5.1-py3-none-any.whl (2.8 kB)
Installing collected packages: torchsummary
Successfully installed torchsummary-1.5.1


In [4]:
import thop

import torch

import torch.nn.functional as F

import torchmetrics #conda install -c conda-forge torchmetrics

import torch.nn as nn

import torch.nn.functional as F

import torchvision.models as models

from torchvision import transforms

from torch.utils.data import Dataset, DataLoader

from torch.optim import lr_scheduler

import os

import cv2

import pandas as pd

import numpy as np

from datetime import datetime

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

import matplotlib.pyplot as plt

import seaborn as sns

from pathlib import Path

from tqdm import tqdm

import time

import json

import torchviz

import graphviz

from torchsummary import summary

In [5]:
DATA_TYPE = 'data3'



if DATA_TYPE == 'data2':

    DATA_PATH = os.path.join('/kaggle/input','asl-alphabet','asl_alphabet_train','asl_alphabet_train')

elif DATA_TYPE == 'data3':

    DATA_PATH = os.path.join('/kaggle/input','unified-data','unified_data','unified_data','unified_data')

else:

    raise ValueError(f"Data {DATA_TYPE} not found.")



# elif DATA_TYPE == 'data1': # No considerado

#     DATA_PATH = os.path.join('..', '..', 'data', 'webcam_data','unified_data')

In [6]:
# Function to create the DataFrame from the dataset

# Uncomment to use. The output is a dataframe stored in asl_dataset_info.csv



def create_dataframe(data_path):
    """
    Crea un DataFrame con las rutas de las imágenes y sus etiquetas.
    
    Args:
        data_path (str): Ruta al directorio que contiene las carpetas de clases (0-29)
    
    Returns:
        pd.DataFrame: DataFrame con columnas ['Filepaths', 'Labels', 'Label_idx']
    """
    # Convertir a Path object para mejor manejo de rutas
    data_path = Path(data_path)
    
    if not data_path.exists():
        raise ValueError(f"El directorio {data_path} no existe")
    
    # Listas para almacenar datos
    filepaths = []
    labels = []
    label_indices = []
    img_sizes = []
    
    # Obtener todas las carpetas y ordenarlas numéricamente
    folders = sorted([f for f in data_path.iterdir() if f.is_dir()], 
                    key=lambda x: int(x.name))
    
    print("Creando DataFrame...")
    # Usar tqdm para mostrar progreso
    for folder in tqdm(folders, desc="Procesando carpetas"):
        label_idx = int(folder.name)
        
        # Obtener todas las imágenes en la carpeta
        valid_extensions = {'.jpg', '.jpeg', '.png'}
        images = [f for f in folder.iterdir() 
                 if f.suffix.lower() in valid_extensions]
        
        for img_path in images:
            # Verificar que la imagen se puede leer
            try:
                img = cv2.imread(str(img_path))
                if img is None:
                    print(f"Advertencia: No se pudo leer {img_path}")
                    continue
                
                height, width = img.shape[:2]
                
                filepaths.append(str(img_path))
                labels.append(folder.name)
                label_indices.append(label_idx)
                img_sizes.append((width, height))
                
            except Exception as e:
                print(f"Error procesando {img_path}: {str(e)}")
    
    # Crear DataFrame
    df = pd.DataFrame({
        'Filepaths': filepaths,
        'Labels': labels,
        'Label_idx': label_indices,
        'Image_size': img_sizes
    })
    
    # Mostrar información del dataset
    print("\nResumen del Dataset:")
    print(f"Total de imágenes: {len(df)}")
    print(f"Número de clases: {len(df['Labels'].unique())}")
    print("\nDistribución de clases:")
    print(df['Labels'].value_counts().sort_index())
    
    # Verificar balance de clases
    min_samples = df['Labels'].value_counts().min()
    max_samples = df['Labels'].value_counts().max()
    print(f"\nMínimo de muestras por clase: {min_samples}")
    print(f"Máximo de muestras por clase: {max_samples}")
    
    # Verificar tamaños de imagen
    sizes = pd.DataFrame(df['Image_size'].tolist(), columns=['width', 'height'])
    print("\nTamaños de imagen:")
    print(f"Mínimo: {sizes.min().values}")
    print(f"Máximo: {sizes.max().values}")
    print(f"Moda: {sizes.mode().iloc[0].values}")
    
    return df

try:
    # Images in 'data/...'  
    df = create_dataframe(DATA_PATH)
    
    # Save dataframe of images paths and labels
    if DATA_TYPE == 'data2':
        df.to_csv('asl_dataset_info.csv', index=False)
    elif DATA_TYPE == 'data3':
        df.to_csv('unified_data_dataset_info.csv', index=False)

    print("\nPrimeras filas del DataFrame:")
    print(df.head())
    
except Exception as e:
    print(f"Error: {str(e)}")

Creando DataFrame...


Procesando carpetas:  14%|█▍        | 4/29 [01:52<11:42, 28.10s/it]

Advertencia: No se pudo leer /kaggle/input/unified-data/unified_data/unified_data/unified_data/4/E1423_vision_data_asl_alphabet_train.jpg


Procesando carpetas: 100%|██████████| 29/29 [13:56<00:00, 28.83s/it]



Resumen del Dataset:
Total de imágenes: 98005
Número de clases: 29

Distribución de clases:
Labels
0     3400
1     3400
10    3400
11    3400
12    3400
13    3400
14    3400
15    3400
16    3400
17    3400
18    3400
19    3400
2     3400
20    3400
21    3400
22    3400
23    3400
24    3400
25    3400
26    3202
27    3202
28    3202
3     3400
4     3399
5     3400
6     3400
7     3400
8     3400
9     3400
Name: count, dtype: int64

Mínimo de muestras por clase: 3202
Máximo de muestras por clase: 3400

Tamaños de imagen:
Mínimo: [200 200]
Máximo: [640 480]
Moda: [200 200]

Primeras filas del DataFrame:
                                           Filepaths Labels  Label_idx  \
0  /kaggle/input/unified-data/unified_data/unifie...      0          0   
1  /kaggle/input/unified-data/unified_data/unifie...      0          0   
2  /kaggle/input/unified-data/unified_data/unifie...      0          0   
3  /kaggle/input/unified-data/unified_data/unifie...      0          0   
4  /kaggle/

In [7]:
# Load DataFrame of images previously created

if DATA_TYPE == 'data2':

    df = pd.read_csv('/kaggle/working/asl_dataset_info.csv')

elif DATA_TYPE == 'data3':

    df = pd.read_csv('/kaggle/working/unified_data_dataset_info.csv')



print(df.head())

                                           Filepaths  Labels  Label_idx  \
0  /kaggle/input/unified-data/unified_data/unifie...       0          0   
1  /kaggle/input/unified-data/unified_data/unifie...       0          0   
2  /kaggle/input/unified-data/unified_data/unifie...       0          0   
3  /kaggle/input/unified-data/unified_data/unifie...       0          0   
4  /kaggle/input/unified-data/unified_data/unifie...       0          0   

   Image_size  
0  (200, 200)  
1  (200, 200)  
2  (200, 200)  
3  (200, 200)  
4  (200, 200)  


In [8]:
# Configure the device for training

def setup_device():

    if torch.cuda.is_available():

        device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

        torch.backends.cudnn.benchmark = True  # Optimiza el rendimiento

        print(f"Using GPU: {torch.cuda.get_device_name(0)}")

        print(f"GPU memory available: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")

    else:

        device = torch.device('cpu')

        print("GPU not available, using CPU")

    return device



DEVICE = setup_device()

Using GPU: Tesla P100-PCIE-16GB
GPU memory available: 17.06 GB


In [9]:
class ASLDataset(Dataset):

    """

    Custom Dataset for loading ASL (American Sign Language) images.

    

    This dataset class handles loading and preprocessing of ASL hand gesture images.

    It supports on-the-fly data augmentation and preprocessing for model training.

    

    Attributes:

        df (pd.DataFrame): DataFrame containing image paths and labels

        transform (callable): Torchvision transforms for image preprocessing

        is_training (bool): Flag to enable/disable data augmentation

    """



    def __init__(self, dataframe, transform=None):

        """

        Initialize the ASL Dataset.

        

        Args:

            df (pd.DataFrame): DataFrame with columns ['Filepaths', 'Labels']

            transform (callable, optional): Transform to be applied to images

            is_training (bool): If True, enables data augmentation

        """

        self.dataframe = dataframe

        self.transform = transform

        self.labels = pd.Categorical(dataframe['Labels']).codes

    

    def __len__(self):

        """Returns the total number of images in the dataset."""

        return len(self.dataframe)

    

    def __getitem__(self, idx):

        """

        Fetch and preprocess a single image item from the dataset.

        

        Args:

            idx (int): Index of the image to fetch

            

        Returns:

            tuple: (image, label) where image is the preprocessed tensor

                  and label is the corresponding class index

        """

        img_path = self.dataframe.iloc[idx]['Filepaths']

        label = self.dataframe.iloc[idx]['Label_idx']  # Asegúrate de que esto sea un número entero

        

        try:

            # Read and preprocess image

            image = cv2.imread(img_path)

            if image is None:

                raise ValueError(f"Image not found: {img_path}")

            image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

            

            if self.transform:

                image = self.transform(image)



            # Defining data type for labels

            label = torch.tensor(int(label), dtype=torch.long)

            

            return image, label

    

        except Exception as e:

            print(f"Error loading imagen {img_path}: {str(e)}")

            # Retorning a black image

            if self.transform:

                dummy_image = torch.zeros((3, 224, 224))

            else:

                dummy_image = np.zeros((224, 224, 3))

            return dummy_image, label




In [10]:
class ASLModel(nn.Module):

    """

    Custom Neural Network model for ASL gesture recognition using transfer learning.

    

    This model uses MobileNetV2 as the backbone with custom classification layers.

    The architecture is designed to balance accuracy and computational efficiency.

    

    Architecture Overview:

    ---------------------

    1. MobileNetV2 backbone (pretrained on ImageNet)

    2. Custom dense layers with batch normalization

    3. Dropout for regularization

    4. Softmax output layer

    

    Attributes:

        base_model (nn.Module): Pretrained MobileNetV2 model

        classifier (nn.Sequential): Custom classification layers

        num_classes (int): Number of output classes

    """



    def __init__(self, num_classes=29, base_model_name='mobilenet_v2', dense_units=256, dropout_rate=0.5):

        """

        Initialize the ASL Model.

        

        Args:

            num_classes (int): Number of output classes (ASL gestures)

            base_model_name (str): Name of the pretrained model to use

            dense_units (int):

            droput_rate (int):

        """

        super().__init__()

        

        # Load pretrained model 

        if base_model_name.lower() == 'mobilenet_v2':

            # Load MobileNet V2
            self.base_model = models.mobilenet_v2(pretrained=True)
            
            # Remove the last classification layer
            self.base_model = nn.Sequential(*list(self.base_model.children())[:-1])
            
            # Determine the number of features (adjust as needed)
            num_features = 1280

        else:

            raise ValueError(f"Model {base_model_name} not supported")

        

        self.global_pool = nn.AdaptiveAvgPool2d(1)

        

        # Own classifier

        self.dense_block1 = nn.Sequential(

            nn.Linear(num_features, dense_units*2, bias=False),

            nn.BatchNorm1d(dense_units*2),

            nn.ReLU(),

            nn.Dropout(dropout_rate)

        )

        

        self.dense_block2 = nn.Sequential(

            nn.Linear(dense_units*2, dense_units, bias=False),

            nn.BatchNorm1d(dense_units),

            nn.ReLU(),

            nn.Dropout(0.3)

        )

        

        # Modify classifier

        self.classifier = nn.Linear(dense_units, num_classes)

        

        # Initialize weights

        self._initialize_weights()

        

        # Freeze pretrained newtork

        self.freeze_base_model()

    

    def _initialize_weights(self):

        for m in self.modules():

            if isinstance(m, nn.Linear):

                nn.init.kaiming_normal_(m.weight)

                if m.bias is not None:

                    nn.init.constant_(m.bias, 0)

    

    def freeze_base_model(self):

        # Freeze early layers

        for param in self.base_model.parameters():

            param.requires_grad = False

    

    def unfreeze_layers(self, num_layers=30):

        trainable_layers = list(self.base_model.parameters())[-num_layers:]

        for param in trainable_layers:

            param.requires_grad = True



    def forward(self, x):

        """

        Forward pass through the network.

        

        Args:

            x (torch.Tensor): Input tensor of shape (batch_size, channels, height, width)

            

        Returns:

            torch.Tensor: Output predictions of shape (batch_size, num_classes)

        """



        # Base model features

        x = self.base_model(x)

        

        # Global pooling

        x = self.global_pool(x)

        x = torch.flatten(x, 1)

        

        # Dense blocks

        x = self.dense_block1(x)

        x = self.dense_block2(x)

        

        # Output with softmax

        x = self.classifier(x)

        # We don't apply a classification here, we do it later

        #out = F.softmax(x, dim=1)

        

        return x

In [11]:
def create_data_loaders(df, transform, batch_size=32, train_split=0.8, val_split=0.1):

    """

    Create train, validation, and test data loaders.

    

    Args:

        df (pd.DataFrame): DataFrame containing image paths and labels

        transform (callable): Torchvision transforms for image preprocessing

        batch_size (int): Batch size for data loaders

        train_split (float): Proportion of data used for training (default: 0.8)

        val_split (float): Proportion of data used for validation (default: 0.1)

        

    Returns:

        tuple: (train_loader, val_loader, test_loader)

    """

    dataset = ASLDataset(df, transform=transform)



    # Calculate sizes

    total_size = len(dataset)

    train_size = int(train_split * total_size)

    val_size = int(val_split * total_size)

    test_size = total_size - train_size - val_size



    # Create splits

    train_dataset, val_dataset, test_dataset = torch.utils.data.random_split(

        dataset,

        [train_size, val_size, test_size],

        generator=torch.Generator().manual_seed(42)

    )



    # Adjust workers according to your CPU cores (generally num_cores - 1)

    # num_workers = min(4, os.cpu_count() - 1) if os.cpu_count() > 1 else 0



    # Configure a common DataLoader for training, validation, and testing dataloaders

    dataloader_kwargs = {

        'batch_size': batch_size,

        'num_workers': 0,#num_workers

        'pin_memory': torch.cuda.is_available(),

        'persistent_workers': False#if num_workers > 0 else False

    }



    train_loader = DataLoader(train_dataset, shuffle=True, **dataloader_kwargs)

    val_loader = DataLoader(val_dataset, shuffle=False, **dataloader_kwargs)

    test_loader = DataLoader(test_dataset, shuffle=False, **dataloader_kwargs)



    return train_loader, val_loader, test_loader

In [12]:
def generate_evaluation_metrics(model, test_loader, history_phase1, history_phase2, evaluation_path):

    """

    Generate and save comprehensive evaluation metrics for the model.

    

    This function creates various visualizations and metrics including:

    - Training/validation loss curves

    - Accuracy plots

    - Confusion matrix

    - Classification report

    - Per-class performance metrics

    

    Evaluation Components:

    ---------------------

    1. Model Performance Metrics:

        - Test Loss (Cross-Entropy)

        - Test Accuracy

        - Per-class Precision, Recall, and F1-score

    

    2. Visualizations:

        - Confusion Matrix: Shows prediction patterns across all classes

        - Training History Plots:

            * Loss curves (training and validation)

            * Accuracy curves (training and validation)

    

    3. Saved Outputs:

        - classification_metrics.csv: Detailed per-class metrics

        - training_history.json: Complete training history

        - confusion_matrix.png: Visual representation of model predictions

        - training_curves.png: Learning curves from both training phases

    

    Args:

        model (nn.Module): Trained model to evaluate

        test_loader (DataLoader): DataLoader for test data

        history_phase1 (dict): Training history from phase 1

        history_phase2 (dict): Training history from phase 2

        evaluation_path (str): Directory to save evaluation results

        

    Returns:

        dict: A dictionary containing all evaluation metrics and history:

            {

                'training_history': {

                    'phase1': {train_losses, train_accuracies, val_losses, val_accuracies},

                    'phase2': {train_losses, train_accuracies, val_losses, val_accuracies}

                },

                'final_metrics': {

                    'test_loss': float,

                    'test_accuracy': float,

                    'classification_report': dict

                }

            }

    """



    # Evaluate on test set

    model.eval()

    test_loss = 0

    correct = 0

    total = 0

    all_preds = []

    all_labels = []

    

    with torch.no_grad():

        for inputs, labels in tqdm(test_loader, desc='Evaluando'):

            inputs = inputs.to(DEVICE)

            labels = labels.to(DEVICE, dtype=torch.long)

            

            outputs = model(inputs)

            outputs = outputs.float()

            

            # Loss function using cross entropy 

            loss = F.cross_entropy(outputs, labels)

            test_loss += loss.item()

            

            _, predicted = torch.max(outputs.data, 1)

            total += labels.size(0)

            correct += (predicted == labels).sum().item()

            

            all_preds.extend(predicted.cpu().numpy())

            all_labels.extend(labels.cpu().numpy())

    

    # Calculate final metrics 

    test_loss = test_loss / len(test_loader)

    test_accuracy = 100 * correct / total

    

    # Load a label mapping 

    label_mapping = load_label_mapping('/kaggle/input/unified-data/classs_lookup_v2.json')

    

    # Calculate confusion matrix

    cm = confusion_matrix(all_labels, all_preds)

    classification_rep = classification_report(all_labels, all_preds, 

                                            target_names=list(label_mapping.values()),

                                            output_dict=True)

    

    # Save visualizations

    _save_confusion_matrix(cm, label_mapping, evaluation_path)

    _save_training_history(history_phase1, history_phase2, evaluation_path)

    

    # Save metrics in csv

    metrics_df = pd.DataFrame(classification_rep).transpose()

    metrics_df.to_csv(os.path.join(evaluation_path, 'classification_metrics.csv'))

    

    # Prepare complete history

    history_data = {

        'training_history': {

            'phase1': {

                'train_losses': [float(x) for x in history_phase1['train_losses']],

                'train_accuracies': [float(x) for x in history_phase1['train_accuracies']],

                'val_losses': [float(x) for x in history_phase1['val_losses']],

                'val_accuracies': [float(x) for x in history_phase1['val_accuracies']]

            },

            'phase2': {

                'train_losses': [float(x) for x in history_phase2['train_losses']],

                'train_accuracies': [float(x) for x in history_phase2['train_accuracies']],

                'val_losses': [float(x) for x in history_phase2['val_losses']],

                'val_accuracies': [float(x) for x in history_phase2['val_accuracies']]

            }

        },

        'final_metrics': {

            'test_loss': float(test_loss),

            'test_accuracy': float(test_accuracy),

            'classification_report': classification_rep

        }

    }

    

    # Save history in a JSON

    with open(os.path.join(evaluation_path, 'training_history.json'), 'w') as f:

        json.dump(history_data, f, indent=4)

    

    # Show summary

    print("\nResumen Final del Entrenamiento:")

    print(f"Precisión en test: {test_accuracy:.2f}%")

    print(f"Pérdida en test: {test_loss:.4f}")

    print("\nMétricas por clase:")

    print(metrics_df)

    

    return history_data



def _save_confusion_matrix(cm, label_mapping, evaluation_path):

    """Save confusion matrix of model evaluation"""



    plt.figure(figsize=(15, 15))

    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',

                xticklabels=list(label_mapping.values()),

                yticklabels=list(label_mapping.values()))

    plt.title('Matriz de Confusión')

    plt.xlabel('Predicción')

    plt.ylabel('Valor Real')

    plt.xticks(rotation=45)

    plt.yticks(rotation=45)

    plt.tight_layout()

    plt.savefig(os.path.join(evaluation_path, 'confusion_matrix.png'))

    plt.close()



def _save_training_history(history_phase1, history_phase2, evaluation_path):

    """Save plots of training history"""

    plt.figure(figsize=(15, 5))

    

    # Loss plot

    plt.subplot(1, 2, 1)

    plt.plot(history_phase1['train_losses'], label='Phase 1 Train')

    plt.plot(history_phase1['val_losses'], label='Phase 1 Val')

    plt.plot([len(history_phase1['train_losses']) + i for i in range(len(history_phase2['train_losses']))],

             history_phase2['train_losses'], label='Phase 2 Train')

    plt.plot([len(history_phase1['val_losses']) + i for i in range(len(history_phase2['val_losses']))],

             history_phase2['val_losses'], label='Phase 2 Val')

    plt.title('Training and Validation Loss')

    plt.xlabel('Epoch')

    plt.ylabel('Loss')

    plt.legend()

    

    # Accuracy plot

    plt.subplot(1, 2, 2)

    plt.plot(history_phase1['train_accuracies'], label='Phase 1 Train')

    plt.plot(history_phase1['val_accuracies'], label='Phase 1 Val')

    plt.plot([len(history_phase1['train_accuracies']) + i for i in range(len(history_phase2['train_accuracies']))],

             history_phase2['train_accuracies'], label='Phase 2 Train')

    plt.plot([len(history_phase1['val_accuracies']) + i for i in range(len(history_phase2['val_accuracies']))],

             history_phase2['val_accuracies'], label='Phase 2 Val')

    plt.title('Training and Validation Accuracy')

    plt.xlabel('Epoch')

    plt.ylabel('Accuracy (%)')

    plt.legend()

    

    plt.tight_layout()

    plt.savefig(os.path.join(evaluation_path, 'training_history.png'))

    plt.close()

In [13]:
def train_one_phase(model, train_loader, val_loader, optimizer, num_epochs, phase_name, early_stopping_patience, output_dir):

    """

    Train the model for a single phase.

    

    Args:

        model (nn.Module): The model to train.

        train_loader (DataLoader): DataLoader for training data.

        val_loader (DataLoader): DataLoader for validation data.    

        optimizer (torch.optim.Optimizer): Optimizer for updating model weights.

        num_epochs (int): Number of training epochs.

        phase_name (str): Name of the training phase.

        early_stopping_patience (int): Number of epochs to wait before early stopping.    

        output_dir (str): Directory to save model checkpoints and logs.

        

    Returns:

        dict: A dictionary containing training history.

    """



    best_val_loss = float('inf')

    patience_counter = 0

    train_losses = []

    train_accuracies = []

    val_losses = []

    val_accuracies = []

    

    # Create output directory

    checkpoint_dir = os.path.join(output_dir, 'checkpoints')

    os.makedirs(checkpoint_dir, exist_ok=True)

    

    scaler = torch.amp.GradScaler('cuda') # For precised training

    

    for epoch in range(num_epochs):

        

        # Training mode

        model.train()

        total_train_loss = 0

        train_steps = 0

        train_correct = 0 

        train_total = 0 

        

        # Progress bar during training

        train_pbar = tqdm(train_loader, desc=f'{phase_name} Epoch {epoch+1}/{num_epochs} [Train]')

        

        for inputs, labels in train_pbar:

            inputs = inputs.to(DEVICE)

            labels = labels.to(DEVICE,dtype=torch.long)

            

            optimizer.zero_grad()

            

            # Training with a mixed precision

            with torch.cuda.amp.autocast(): #torch.amp.autocast('cuda')

                outputs = model(inputs)

                outputs = outputs.float()

                loss = F.cross_entropy(outputs, labels)

            

            scaler.scale(loss).backward()

            scaler.step(optimizer)

            scaler.update()

            

            # Calculate accuracy

            _, predicted = torch.max(outputs.data, 1)

            train_total += labels.size(0)

            train_correct += (predicted == labels).sum().item()



            # Calcute loss

            total_train_loss += loss.item()

            train_steps += 1

            

             # Calcute current accuracy

            current_train_acc = 100 * train_correct / train_total



            # Update progress bar

            train_pbar.set_postfix({

                'loss': f'{loss.item():.4f}',

                'acc': f'{current_train_acc:.2f}%'

            })

        

        # Calcute training metrics

        avg_train_loss = total_train_loss / train_steps

        train_accuracy = 100 * train_correct / train_total



        # Save metrics

        train_losses.append(avg_train_loss)

        train_accuracies.append(train_accuracy)



        # Evaluation mode

        model.eval()

        total_val_loss = 0

        val_steps = 0

        correct = 0

        total = 0

        

        # Progress bar during validation

        val_pbar = tqdm(val_loader, desc=f'{phase_name} Epoch {epoch+1}/{num_epochs} [Val]')

        

        with torch.no_grad():

            for inputs, labels in val_pbar:

                inputs = inputs.to(DEVICE)

                labels = labels.to(DEVICE,dtype=torch.long)

                

                outputs = model(inputs)

                outputs = outputs.float()

                loss = F.cross_entropy(outputs, labels)

                

                total_val_loss += loss.item()

                val_steps += 1

                

                _, predicted = torch.max(outputs.data, 1)

                total += labels.size(0)

                correct += (predicted == labels).sum().item()

                

                # Update progress bar

                val_pbar.set_postfix({'loss': f'{loss.item():.4f}'})

        

        avg_val_loss = total_val_loss / val_steps

        val_accuracy = 100 * correct / total



        val_losses.append(avg_val_loss)

        val_accuracies.append(val_accuracy)



        checkpoint = {

            'epoch': epoch,

            'model_state_dict': model.state_dict(),

            'optimizer_state_dict': optimizer.state_dict(),

            'train_loss': avg_train_loss,

            'train_accuracy': train_accuracy,  

            'val_loss': avg_val_loss,

            'val_accuracy': val_accuracy

        }

        

        # Early stopping

        if avg_val_loss < best_val_loss:

            best_val_loss = avg_val_loss

            patience_counter = 0

            

            # Save best model

            best_model_path  = os.path.join(checkpoint_dir, f'{phase_name}_best_model.pth')

            torch.save(checkpoint, best_model_path)

            print(f"\nGuardando el mejor modelo en {best_model_path}")

        else:

            patience_counter += 1



        # Show metrics

        print(f'\n{phase_name} Epoch {epoch+1}/{num_epochs}:')

        print(f'Training Loss: {avg_train_loss:.4f}, Training Accuracy: {train_accuracy:.2f}%')

        print(f'Validation Loss: {avg_val_loss:.4f}, Validation Accuracy: {val_accuracy:.2f}%')



        if patience_counter >= early_stopping_patience:

            print(f'\nEarly stopping triggered after {patience_counter} epochs without improvement')

            break

    

    # Load the best model before returing

    best_checkpoint = torch.load(best_model_path)

    model.load_state_dict(best_checkpoint['model_state_dict'])



    return {

        'train_losses': train_losses,

        'train_accuracies': train_accuracies,

        'val_losses': val_losses,

        'val_accuracies': val_accuracies

    }

In [14]:
def train_model_complete(df, transform, num_classes=29, batch_size=32, num_epochs1=5, num_epochs2=5, 

                        learning_rate1=1e-3, learning_rate2=1e-5, early_stopping_patience=3, output_dir='results/'):

    """

    Train the model in two phases: Transfer Learning and Fine-tuning.

    

    Args:

        df (pd.DataFrame): DataFrame containing the training data.

        transform (torchvision.transforms.Compose): Transformations to apply to the input data.

        num_classes (int): Number of classes in the dataset.

        batch_size (int): Batch size for training.

        num_epochs1 (int): Number of epochs for the first phase of training.

        num_epochs2 (int): Number of epochs for the second phase of training.

        learning_rate1 (float): Learning rate for the first phase of training.

        learning_rate2 (float): Learning rate for the second phase of training.

        early_stopping_patience (int): Number of epochs to wait before early stopping.

        output_dir (str): Directory to save model checkpoints and logs.

    Returns:

        tuple: (trained_model, training_history)

    """

    

    print("Starting the complete training...")

    

    # Crear data loaders

    print("Creating  data loaders...")

    train_loader, val_loader, test_loader = create_data_loaders(

        df=df,

        transform=transform,

        batch_size=batch_size

    )

    

    # Create the model and move it to device

    print("Initializing the model...")

    model = ASLModel(num_classes=num_classes)

    model = model.to(DEVICE)

        

    # Fase 1: Transfer Learning

    print("\n Phase 1:  Transfer Learning - Only new layers")

    optimizer_phase1 = torch.optim.AdamW(

        filter(lambda p: p.requires_grad, model.parameters()),

        lr=learning_rate1,

        weight_decay=1e-4

    )

    

    history_phase1 = train_one_phase(

        model=model,

        train_loader=train_loader,

        val_loader=val_loader,

        optimizer=optimizer_phase1,

        num_epochs=num_epochs1,

        phase_name='Transfer_Learning',

        early_stopping_patience=early_stopping_patience,

        output_dir=output_dir

    )

    

    # Fase 2: Fine-tuning

    print("\n Phase 2: Fine-tuning - Complete model")

    model.unfreeze_layers(num_layers=30)

    

    optimizer_phase2 = torch.optim.AdamW(

        filter(lambda p: p.requires_grad, model.parameters()),

        lr=learning_rate2,

        weight_decay=1e-4

    )

    

    history_phase2 = train_one_phase(

        model=model,

        train_loader=train_loader,

        val_loader=val_loader,

        optimizer=optimizer_phase2,

        num_epochs=num_epochs2,

        phase_name='Fine_Tuning',

        early_stopping_patience=early_stopping_patience,

        output_dir=output_dir

    )

    

    # Create a new directory to save evaluation

    evaluation_path = os.path.join(output_dir, 'evaluation')

    os.makedirs(evaluation_path, exist_ok=True)

    

    # Evaluate model

    print("\nRealizando evaluación final...")

    history_data = generate_evaluation_metrics(

        model=model,

        test_loader=test_loader,

        history_phase1=history_phase1,

        history_phase2=history_phase2,

        evaluation_path=evaluation_path

    )

    

    print(f"\nResultados guardados en: {evaluation_path}")

    

    return model, history_data

In [15]:
def load_label_mapping(json_path):

    """

    Load label mapping from a JSON file.

    

    Args:

        json_path (str): Path to the JSON file.

        

    Returns:

        dict: A dictionary mapping class indices to class labels.

    """

    try:

        with open(json_path, 'r') as f:

            class_mapping = json.load(f)

        

        # Convertir las claves de string a int y los valores a mayúsculas

        label_mapping = {int(k): v.upper() for k, v in class_mapping.items()}

        return label_mapping

    

    except Exception as e:

        print(f"Error by loading label mapping: {e}")

        print("Using default label mapping...")

        

        # Mapeo por defecto

        default_labels = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 

                         'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 

                         'U', 'V', 'W', 'X', 'Y', 'Z', 'del', 'nothing', 'space']

        label_mapping = {idx: label for idx, label in enumerate(default_labels)}

        return label_mapping

In [16]:
def save_model_architecture(model, output_dir, input_size=(3, 224, 224)):

    """

    Save a visual representation of the model architecture.

    

    Creates a simplified visualization of the model's architecture using graphviz,

    showing the main components and their connections.

    

    Args:

        model (nn.Module): Model to visualize

        output_dir (str): Directory to save the visualization

        input_size (tuple): Input tensor dimensions (channels, height, width)

        

    Returns:

        str: Path to the saved visualization and summary files

    """

    import io

    from contextlib import redirect_stdout

    from torchinfo import summary

    from graphviz import Digraph

    

    # Create output directory

    architecture_dir = os.path.join(output_dir, 'model_architecture')

    os.makedirs(architecture_dir, exist_ok=True)

    

    try:

        # Create graph

        dot = Digraph(comment='Model Architecture')

        dot.attr(rankdir='TB')

        dot.attr('node', shape='box', style='rounded')

        

        # Define style to nodes

        dot.attr('node', fontname='Arial')

        

        # Input

        channels, height, width = input_size

        input_label = f'Input\n({height}×{width}×{channels})'

        dot.node('input', input_label, shape='oval')

        

        # MobileNetV2 (Pre-trained)

        dot.node('backbone', 'MobileNetV2\n(Pre-trained)', style='filled', fillcolor='lightgray')

        

        # Global Pooling

        dot.node('pool', 'Global Average Pooling')

        

        # Dense blocks

        dot.node('dense1', 'Dense Block 1\n512 units\nBatchNorm + ReLU\nDropout (0.5)')

        dot.node('dense2', 'Dense Block 2\n256 units\nBatchNorm + ReLU\nDropout (0.3)')

        

        # Output

        dot.node('output', 'Output Layer\n29 classes\nSoftmax', shape='oval')

        

        # Add conexions 

        dot.edge('input', 'backbone')

        dot.edge('backbone', 'pool')

        dot.edge('pool', 'dense1')

        dot.edge('dense1', 'dense2')

        dot.edge('dense2', 'output')

        

        # Save the graph

        dot.render(os.path.join(architecture_dir, 'model_architecture_simplified'), 

                  format='png', cleanup=True)

        dot.render(os.path.join(architecture_dir, 'model_architecture_simplified'), 

                  format='pdf', cleanup=True)

        

        # Save basic summary

        summary_file = os.path.join(architecture_dir, 'model_summary_simplified.txt')

        with open(summary_file, 'w') as f:

            f.write("ASL Hand Gesture Classification Model\n")

            f.write("====================================\n\n")

            f.write("Architecture Overview:\n")

            f.write("1. Input Layer: 224×224×3\n")

            f.write("2. Backbone: MobileNetV2 (pre-trained)\n")

            f.write("3. Global Average Pooling\n")

            f.write("4. Dense Block 1:\n")

            f.write("   - 512 units\n")

            f.write("   - Batch Normalization\n")

            f.write("   - ReLU Activation\n")

            f.write("   - Dropout (0.5)\n")

            f.write("5. Dense Block 2:\n")

            f.write("   - 256 units\n")

            f.write("   - Batch Normalization\n")

            f.write("   - ReLU Activation\n")

            f.write("   - Dropout (0.3)\n")

            f.write("6. Output Layer:\n")

            f.write("   - 29 units (classes)\n")

            f.write("   - Softmax activation\n\n")

            f.write("Training Strategy:\n")

            f.write("- Phase 1: Transfer Learning (frozen backbone)\n")

            f.write("- Phase 2: Fine-tuning (last 30 layers unfrozen)\n")

        

        print(f"Visualization saved in  {architecture_dir}")

        print(f"Summary saved in {summary_file}")

        

    except Exception as e:

        print(f"Error creating the visualization: {str(e)}")

        print("Make sure to have graphviz installed:")

        print("1. Install graphviz on your system:")

        print("   - Windows: https://graphviz.org/download/")

        print("   - Linux: sudo apt-get install graphviz")

        print("   - macOS: brew install graphviz")

        print("2. Install the Python package: pip install graphviz")

In [17]:
TIME_STAMP = datetime.now().strftime("%Y%m%d_%H%M%S")
BASE_MODEL = 'mobilenet_v2'
combined_name = BASE_MODEL + '_' + DATA_TYPE
# i.e. 'results/mobilenet_v2_data2/evaluation_20241129_114615'

if DATA_TYPE == 'data2':

    OUTPUT_PATH =  os.path.join('results', combined_name, f'evaluation_{TIME_STAMP}')

elif DATA_TYPE == 'data3':

    OUTPUT_PATH =  os.path.join('results', combined_name, f'evaluation_{TIME_STAMP}')



if not os.path.exists(OUTPUT_PATH):

    os.makedirs(OUTPUT_PATH)

In [18]:
# Define hyperparameters

PARAMS = {

        'input_size': (224, 224, 3),

        'num_classes': 29,

        'batch_size': 32,

        'dense_units': 256,

        'dropout_rate': 0.5,

        'learning_rate1': 1e-3,

        'learning_rate2': 1e-5,

        'epochs_phase1': 20,

        'epochs_phase2': 20,

        'early_stopping_patience': 3

    }

In [19]:
# Model architecture

model = ASLModel(num_classes=29, base_model_name=BASE_MODEL)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model = model.to(device)

summary(model, input_size=(PARAMS['input_size'][2], PARAMS['input_size'][0], PARAMS['input_size'][1]), device='cuda' if torch.cuda.is_available() else 'cpu')



# Save simplified model architecture

save_model_architecture(

    model=model,

    output_dir=OUTPUT_PATH,

    input_size=PARAMS['input_size']

)

Downloading: "https://download.pytorch.org/models/mobilenet_v2-b0353104.pth" to /root/.cache/torch/hub/checkpoints/mobilenet_v2-b0353104.pth
100%|██████████| 13.6M/13.6M [00:00<00:00, 133MB/s]


----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 32, 112, 112]             864
       BatchNorm2d-2         [-1, 32, 112, 112]              64
             ReLU6-3         [-1, 32, 112, 112]               0
            Conv2d-4         [-1, 32, 112, 112]             288
       BatchNorm2d-5         [-1, 32, 112, 112]              64
             ReLU6-6         [-1, 32, 112, 112]               0
            Conv2d-7         [-1, 16, 112, 112]             512
       BatchNorm2d-8         [-1, 16, 112, 112]              32
  InvertedResidual-9         [-1, 16, 112, 112]               0
           Conv2d-10         [-1, 96, 112, 112]           1,536
      BatchNorm2d-11         [-1, 96, 112, 112]             192
            ReLU6-12         [-1, 96, 112, 112]               0
           Conv2d-13           [-1, 96, 56, 56]             864
      BatchNorm2d-14           [-1, 96,

In [20]:
# Image preprocessing

# Using Imagenet mean and std

# mean = [0.485, 0.456, 0.406]

# std = [0.229, 0.224, 0.225]



transform = transforms.Compose([

    transforms.ToPILImage(),

    transforms.Resize((PARAMS['input_size'][0], PARAMS['input_size'][1])),

    transforms.RandomHorizontalFlip(p=0.3),

    transforms.RandomRotation(15),

    transforms.ToTensor(),

    transforms.Normalize(mean=[0.485, 0.456, 0.406],

                        std=[0.229, 0.224, 0.225])

])

In [21]:
# Create and train model

model, history_data = train_model_complete(

    df=df,

    transform=transform,

    num_classes=PARAMS['num_classes'],

    batch_size=PARAMS['batch_size'],

    num_epochs1=PARAMS['epochs_phase1'],

    num_epochs2=PARAMS['epochs_phase2'],

    learning_rate1=PARAMS['learning_rate1'],

    learning_rate2=PARAMS['learning_rate2'],

    early_stopping_patience=PARAMS['early_stopping_patience'],

    output_dir=OUTPUT_PATH

)

Starting the complete training...
Creating  data loaders...
Initializing the model...





 Phase 1:  Transfer Learning - Only new layers


  with torch.cuda.amp.autocast(): #torch.amp.autocast('cuda')
Transfer_Learning Epoch 1/20 [Train]: 100%|██████████| 2451/2451 [06:22<00:00,  6.40it/s, loss=1.7444, acc=69.61%]
Transfer_Learning Epoch 1/20 [Val]: 100%|██████████| 307/307 [00:50<00:00,  6.07it/s, loss=0.4941]



Guardando el mejor modelo en results/mobilenet_v2_data3/evaluation_20241207_112444/checkpoints/Transfer_Learning_best_model.pth

Transfer_Learning Epoch 1/20:
Training Loss: 1.0085, Training Accuracy: 69.61%
Validation Loss: 0.4442, Validation Accuracy: 86.89%


Transfer_Learning Epoch 2/20 [Train]: 100%|██████████| 2451/2451 [06:01<00:00,  6.79it/s, loss=0.4984, acc=79.86%]
Transfer_Learning Epoch 2/20 [Val]: 100%|██████████| 307/307 [00:44<00:00,  6.84it/s, loss=0.5401]



Guardando el mejor modelo en results/mobilenet_v2_data3/evaluation_20241207_112444/checkpoints/Transfer_Learning_best_model.pth

Transfer_Learning Epoch 2/20:
Training Loss: 0.6319, Training Accuracy: 79.86%
Validation Loss: 0.3304, Validation Accuracy: 90.64%


Transfer_Learning Epoch 3/20 [Train]: 100%|██████████| 2451/2451 [06:06<00:00,  6.68it/s, loss=1.1622, acc=82.47%]
Transfer_Learning Epoch 3/20 [Val]: 100%|██████████| 307/307 [00:45<00:00,  6.79it/s, loss=0.1853]



Guardando el mejor modelo en results/mobilenet_v2_data3/evaluation_20241207_112444/checkpoints/Transfer_Learning_best_model.pth

Transfer_Learning Epoch 3/20:
Training Loss: 0.5422, Training Accuracy: 82.47%
Validation Loss: 0.2735, Validation Accuracy: 92.05%


Transfer_Learning Epoch 4/20 [Train]: 100%|██████████| 2451/2451 [06:01<00:00,  6.78it/s, loss=1.1879, acc=84.49%]
Transfer_Learning Epoch 4/20 [Val]: 100%|██████████| 307/307 [00:44<00:00,  6.93it/s, loss=0.4870]



Guardando el mejor modelo en results/mobilenet_v2_data3/evaluation_20241207_112444/checkpoints/Transfer_Learning_best_model.pth

Transfer_Learning Epoch 4/20:
Training Loss: 0.4839, Training Accuracy: 84.49%
Validation Loss: 0.2409, Validation Accuracy: 92.45%


Transfer_Learning Epoch 5/20 [Train]: 100%|██████████| 2451/2451 [06:03<00:00,  6.75it/s, loss=0.3838, acc=85.56%]
Transfer_Learning Epoch 5/20 [Val]: 100%|██████████| 307/307 [00:44<00:00,  6.93it/s, loss=0.1708]



Guardando el mejor modelo en results/mobilenet_v2_data3/evaluation_20241207_112444/checkpoints/Transfer_Learning_best_model.pth

Transfer_Learning Epoch 5/20:
Training Loss: 0.4464, Training Accuracy: 85.56%
Validation Loss: 0.2191, Validation Accuracy: 93.69%


Transfer_Learning Epoch 6/20 [Train]: 100%|██████████| 2451/2451 [05:59<00:00,  6.81it/s, loss=1.7123, acc=86.61%]
Transfer_Learning Epoch 6/20 [Val]: 100%|██████████| 307/307 [00:45<00:00,  6.74it/s, loss=0.1738]



Guardando el mejor modelo en results/mobilenet_v2_data3/evaluation_20241207_112444/checkpoints/Transfer_Learning_best_model.pth

Transfer_Learning Epoch 6/20:
Training Loss: 0.4133, Training Accuracy: 86.61%
Validation Loss: 0.1888, Validation Accuracy: 94.22%


Transfer_Learning Epoch 7/20 [Train]: 100%|██████████| 2451/2451 [06:08<00:00,  6.65it/s, loss=3.4057, acc=87.24%]
Transfer_Learning Epoch 7/20 [Val]: 100%|██████████| 307/307 [00:44<00:00,  6.87it/s, loss=0.1653]



Guardando el mejor modelo en results/mobilenet_v2_data3/evaluation_20241207_112444/checkpoints/Transfer_Learning_best_model.pth

Transfer_Learning Epoch 7/20:
Training Loss: 0.3915, Training Accuracy: 87.24%
Validation Loss: 0.1730, Validation Accuracy: 94.77%


Transfer_Learning Epoch 8/20 [Train]: 100%|██████████| 2451/2451 [05:53<00:00,  6.92it/s, loss=0.0398, acc=87.73%]
Transfer_Learning Epoch 8/20 [Val]: 100%|██████████| 307/307 [00:44<00:00,  6.94it/s, loss=0.2362]



Guardando el mejor modelo en results/mobilenet_v2_data3/evaluation_20241207_112444/checkpoints/Transfer_Learning_best_model.pth

Transfer_Learning Epoch 8/20:
Training Loss: 0.3763, Training Accuracy: 87.73%
Validation Loss: 0.1679, Validation Accuracy: 94.95%


Transfer_Learning Epoch 9/20 [Train]: 100%|██████████| 2451/2451 [05:57<00:00,  6.85it/s, loss=0.1903, acc=88.44%]
Transfer_Learning Epoch 9/20 [Val]: 100%|██████████| 307/307 [00:44<00:00,  6.95it/s, loss=0.4493]



Guardando el mejor modelo en results/mobilenet_v2_data3/evaluation_20241207_112444/checkpoints/Transfer_Learning_best_model.pth

Transfer_Learning Epoch 9/20:
Training Loss: 0.3543, Training Accuracy: 88.44%
Validation Loss: 0.1557, Validation Accuracy: 95.34%


Transfer_Learning Epoch 10/20 [Train]: 100%|██████████| 2451/2451 [06:03<00:00,  6.74it/s, loss=1.5946, acc=88.94%]
Transfer_Learning Epoch 10/20 [Val]: 100%|██████████| 307/307 [00:44<00:00,  6.83it/s, loss=0.3202]



Guardando el mejor modelo en results/mobilenet_v2_data3/evaluation_20241207_112444/checkpoints/Transfer_Learning_best_model.pth

Transfer_Learning Epoch 10/20:
Training Loss: 0.3411, Training Accuracy: 88.94%
Validation Loss: 0.1512, Validation Accuracy: 95.48%


Transfer_Learning Epoch 11/20 [Train]: 100%|██████████| 2451/2451 [06:05<00:00,  6.70it/s, loss=0.0419, acc=89.00%]
Transfer_Learning Epoch 11/20 [Val]: 100%|██████████| 307/307 [00:44<00:00,  6.93it/s, loss=0.0653]



Guardando el mejor modelo en results/mobilenet_v2_data3/evaluation_20241207_112444/checkpoints/Transfer_Learning_best_model.pth

Transfer_Learning Epoch 11/20:
Training Loss: 0.3350, Training Accuracy: 89.00%
Validation Loss: 0.1399, Validation Accuracy: 95.77%


Transfer_Learning Epoch 12/20 [Train]: 100%|██████████| 2451/2451 [06:12<00:00,  6.59it/s, loss=3.6615, acc=89.45%]
Transfer_Learning Epoch 12/20 [Val]: 100%|██████████| 307/307 [00:45<00:00,  6.71it/s, loss=0.1220]



Guardando el mejor modelo en results/mobilenet_v2_data3/evaluation_20241207_112444/checkpoints/Transfer_Learning_best_model.pth

Transfer_Learning Epoch 12/20:
Training Loss: 0.3246, Training Accuracy: 89.45%
Validation Loss: 0.1339, Validation Accuracy: 95.96%


Transfer_Learning Epoch 13/20 [Train]: 100%|██████████| 2451/2451 [06:09<00:00,  6.63it/s, loss=1.0182, acc=89.65%]
Transfer_Learning Epoch 13/20 [Val]: 100%|██████████| 307/307 [00:44<00:00,  6.89it/s, loss=0.1699]



Guardando el mejor modelo en results/mobilenet_v2_data3/evaluation_20241207_112444/checkpoints/Transfer_Learning_best_model.pth

Transfer_Learning Epoch 13/20:
Training Loss: 0.3164, Training Accuracy: 89.65%
Validation Loss: 0.1307, Validation Accuracy: 95.99%


Transfer_Learning Epoch 14/20 [Train]: 100%|██████████| 2451/2451 [06:04<00:00,  6.72it/s, loss=5.0010, acc=90.07%]
Transfer_Learning Epoch 14/20 [Val]: 100%|██████████| 307/307 [00:45<00:00,  6.69it/s, loss=0.0468]



Guardando el mejor modelo en results/mobilenet_v2_data3/evaluation_20241207_112444/checkpoints/Transfer_Learning_best_model.pth

Transfer_Learning Epoch 14/20:
Training Loss: 0.3060, Training Accuracy: 90.07%
Validation Loss: 0.1210, Validation Accuracy: 96.38%


Transfer_Learning Epoch 15/20 [Train]: 100%|██████████| 2451/2451 [06:06<00:00,  6.69it/s, loss=4.2136, acc=90.07%]
Transfer_Learning Epoch 15/20 [Val]: 100%|██████████| 307/307 [00:45<00:00,  6.71it/s, loss=0.2455]



Guardando el mejor modelo en results/mobilenet_v2_data3/evaluation_20241207_112444/checkpoints/Transfer_Learning_best_model.pth

Transfer_Learning Epoch 15/20:
Training Loss: 0.3019, Training Accuracy: 90.07%
Validation Loss: 0.1068, Validation Accuracy: 96.70%


Transfer_Learning Epoch 16/20 [Train]: 100%|██████████| 2451/2451 [06:06<00:00,  6.70it/s, loss=0.5447, acc=90.39%]
Transfer_Learning Epoch 16/20 [Val]: 100%|██████████| 307/307 [00:45<00:00,  6.80it/s, loss=0.0604]



Transfer_Learning Epoch 16/20:
Training Loss: 0.2933, Training Accuracy: 90.39%
Validation Loss: 0.1211, Validation Accuracy: 96.27%


Transfer_Learning Epoch 17/20 [Train]: 100%|██████████| 2451/2451 [06:03<00:00,  6.73it/s, loss=6.6210, acc=90.63%]
Transfer_Learning Epoch 17/20 [Val]: 100%|██████████| 307/307 [00:45<00:00,  6.82it/s, loss=0.1352]



Transfer_Learning Epoch 17/20:
Training Loss: 0.2877, Training Accuracy: 90.63%
Validation Loss: 0.1119, Validation Accuracy: 96.73%


Transfer_Learning Epoch 18/20 [Train]: 100%|██████████| 2451/2451 [05:59<00:00,  6.82it/s, loss=2.6219, acc=90.92%]
Transfer_Learning Epoch 18/20 [Val]: 100%|██████████| 307/307 [00:43<00:00,  7.10it/s, loss=0.1813]
  best_checkpoint = torch.load(best_model_path)



Transfer_Learning Epoch 18/20:
Training Loss: 0.2801, Training Accuracy: 90.92%
Validation Loss: 0.1107, Validation Accuracy: 96.52%

Early stopping triggered after 3 epochs without improvement

 Phase 2: Fine-tuning - Complete model


Fine_Tuning Epoch 1/20 [Train]: 100%|██████████| 2451/2451 [06:20<00:00,  6.44it/s, loss=1.4617, acc=92.75%]
Fine_Tuning Epoch 1/20 [Val]: 100%|██████████| 307/307 [00:46<00:00,  6.59it/s, loss=0.0125]



Guardando el mejor modelo en results/mobilenet_v2_data3/evaluation_20241207_112444/checkpoints/Fine_Tuning_best_model.pth

Fine_Tuning Epoch 1/20:
Training Loss: 0.2221, Training Accuracy: 92.75%
Validation Loss: 0.0784, Validation Accuracy: 97.84%


Fine_Tuning Epoch 2/20 [Train]: 100%|██████████| 2451/2451 [06:16<00:00,  6.50it/s, loss=1.6191, acc=94.40%]
Fine_Tuning Epoch 2/20 [Val]: 100%|██████████| 307/307 [00:45<00:00,  6.69it/s, loss=0.2022]



Guardando el mejor modelo en results/mobilenet_v2_data3/evaluation_20241207_112444/checkpoints/Fine_Tuning_best_model.pth

Fine_Tuning Epoch 2/20:
Training Loss: 0.1709, Training Accuracy: 94.40%
Validation Loss: 0.0666, Validation Accuracy: 97.99%


Fine_Tuning Epoch 3/20 [Train]: 100%|██████████| 2451/2451 [06:27<00:00,  6.33it/s, loss=0.1229, acc=95.07%]
Fine_Tuning Epoch 3/20 [Val]: 100%|██████████| 307/307 [00:43<00:00,  7.07it/s, loss=0.0184]



Guardando el mejor modelo en results/mobilenet_v2_data3/evaluation_20241207_112444/checkpoints/Fine_Tuning_best_model.pth

Fine_Tuning Epoch 3/20:
Training Loss: 0.1486, Training Accuracy: 95.07%
Validation Loss: 0.0567, Validation Accuracy: 98.55%


Fine_Tuning Epoch 4/20 [Train]: 100%|██████████| 2451/2451 [06:39<00:00,  6.14it/s, loss=2.1592, acc=95.88%]
Fine_Tuning Epoch 4/20 [Val]: 100%|██████████| 307/307 [00:47<00:00,  6.42it/s, loss=0.0512]



Guardando el mejor modelo en results/mobilenet_v2_data3/evaluation_20241207_112444/checkpoints/Fine_Tuning_best_model.pth

Fine_Tuning Epoch 4/20:
Training Loss: 0.1268, Training Accuracy: 95.88%
Validation Loss: 0.0480, Validation Accuracy: 98.68%


Fine_Tuning Epoch 5/20 [Train]: 100%|██████████| 2451/2451 [06:32<00:00,  6.25it/s, loss=2.3673, acc=96.18%]
Fine_Tuning Epoch 5/20 [Val]: 100%|██████████| 307/307 [00:44<00:00,  6.83it/s, loss=0.0575]



Guardando el mejor modelo en results/mobilenet_v2_data3/evaluation_20241207_112444/checkpoints/Fine_Tuning_best_model.pth

Fine_Tuning Epoch 5/20:
Training Loss: 0.1149, Training Accuracy: 96.18%
Validation Loss: 0.0428, Validation Accuracy: 98.86%


Fine_Tuning Epoch 6/20 [Train]: 100%|██████████| 2451/2451 [06:25<00:00,  6.36it/s, loss=0.6014, acc=96.52%]
Fine_Tuning Epoch 6/20 [Val]: 100%|██████████| 307/307 [00:45<00:00,  6.78it/s, loss=0.0179]



Guardando el mejor modelo en results/mobilenet_v2_data3/evaluation_20241207_112444/checkpoints/Fine_Tuning_best_model.pth

Fine_Tuning Epoch 6/20:
Training Loss: 0.1052, Training Accuracy: 96.52%
Validation Loss: 0.0379, Validation Accuracy: 98.93%


Fine_Tuning Epoch 7/20 [Train]: 100%|██████████| 2451/2451 [06:35<00:00,  6.20it/s, loss=2.1434, acc=96.91%]
Fine_Tuning Epoch 7/20 [Val]: 100%|██████████| 307/307 [00:47<00:00,  6.46it/s, loss=0.0032]



Guardando el mejor modelo en results/mobilenet_v2_data3/evaluation_20241207_112444/checkpoints/Fine_Tuning_best_model.pth

Fine_Tuning Epoch 7/20:
Training Loss: 0.0953, Training Accuracy: 96.91%
Validation Loss: 0.0373, Validation Accuracy: 98.89%


Fine_Tuning Epoch 8/20 [Train]: 100%|██████████| 2451/2451 [06:23<00:00,  6.40it/s, loss=0.1722, acc=97.09%]
Fine_Tuning Epoch 8/20 [Val]: 100%|██████████| 307/307 [00:43<00:00,  7.05it/s, loss=0.0049]



Guardando el mejor modelo en results/mobilenet_v2_data3/evaluation_20241207_112444/checkpoints/Fine_Tuning_best_model.pth

Fine_Tuning Epoch 8/20:
Training Loss: 0.0885, Training Accuracy: 97.09%
Validation Loss: 0.0326, Validation Accuracy: 99.08%


Fine_Tuning Epoch 9/20 [Train]: 100%|██████████| 2451/2451 [06:26<00:00,  6.34it/s, loss=0.4760, acc=97.34%]
Fine_Tuning Epoch 9/20 [Val]: 100%|██████████| 307/307 [00:47<00:00,  6.50it/s, loss=0.0563]



Guardando el mejor modelo en results/mobilenet_v2_data3/evaluation_20241207_112444/checkpoints/Fine_Tuning_best_model.pth

Fine_Tuning Epoch 9/20:
Training Loss: 0.0814, Training Accuracy: 97.34%
Validation Loss: 0.0293, Validation Accuracy: 99.21%


Fine_Tuning Epoch 10/20 [Train]: 100%|██████████| 2451/2451 [06:30<00:00,  6.27it/s, loss=0.1714, acc=97.51%]
Fine_Tuning Epoch 10/20 [Val]: 100%|██████████| 307/307 [00:45<00:00,  6.78it/s, loss=0.0020]



Fine_Tuning Epoch 10/20:
Training Loss: 0.0771, Training Accuracy: 97.51%
Validation Loss: 0.0306, Validation Accuracy: 99.18%


Fine_Tuning Epoch 11/20 [Train]: 100%|██████████| 2451/2451 [06:32<00:00,  6.25it/s, loss=0.0537, acc=97.70%]
Fine_Tuning Epoch 11/20 [Val]: 100%|██████████| 307/307 [00:46<00:00,  6.63it/s, loss=0.0129]



Guardando el mejor modelo en results/mobilenet_v2_data3/evaluation_20241207_112444/checkpoints/Fine_Tuning_best_model.pth

Fine_Tuning Epoch 11/20:
Training Loss: 0.0719, Training Accuracy: 97.70%
Validation Loss: 0.0256, Validation Accuracy: 99.34%


Fine_Tuning Epoch 12/20 [Train]: 100%|██████████| 2451/2451 [06:19<00:00,  6.46it/s, loss=0.0863, acc=97.80%]
Fine_Tuning Epoch 12/20 [Val]: 100%|██████████| 307/307 [00:45<00:00,  6.76it/s, loss=0.0006]



Fine_Tuning Epoch 12/20:
Training Loss: 0.0672, Training Accuracy: 97.80%
Validation Loss: 0.0257, Validation Accuracy: 99.30%


Fine_Tuning Epoch 13/20 [Train]: 100%|██████████| 2451/2451 [06:23<00:00,  6.40it/s, loss=2.0286, acc=98.02%]
Fine_Tuning Epoch 13/20 [Val]: 100%|██████████| 307/307 [00:45<00:00,  6.72it/s, loss=0.0047]



Guardando el mejor modelo en results/mobilenet_v2_data3/evaluation_20241207_112444/checkpoints/Fine_Tuning_best_model.pth

Fine_Tuning Epoch 13/20:
Training Loss: 0.0630, Training Accuracy: 98.02%
Validation Loss: 0.0237, Validation Accuracy: 99.32%


Fine_Tuning Epoch 14/20 [Train]: 100%|██████████| 2451/2451 [06:18<00:00,  6.47it/s, loss=0.0490, acc=98.07%]
Fine_Tuning Epoch 14/20 [Val]: 100%|██████████| 307/307 [00:46<00:00,  6.61it/s, loss=0.0117]



Guardando el mejor modelo en results/mobilenet_v2_data3/evaluation_20241207_112444/checkpoints/Fine_Tuning_best_model.pth

Fine_Tuning Epoch 14/20:
Training Loss: 0.0577, Training Accuracy: 98.07%
Validation Loss: 0.0214, Validation Accuracy: 99.36%


Fine_Tuning Epoch 15/20 [Train]: 100%|██████████| 2451/2451 [06:18<00:00,  6.47it/s, loss=0.0502, acc=98.17%]
Fine_Tuning Epoch 15/20 [Val]: 100%|██████████| 307/307 [00:46<00:00,  6.67it/s, loss=0.0147]



Guardando el mejor modelo en results/mobilenet_v2_data3/evaluation_20241207_112444/checkpoints/Fine_Tuning_best_model.pth

Fine_Tuning Epoch 15/20:
Training Loss: 0.0563, Training Accuracy: 98.17%
Validation Loss: 0.0179, Validation Accuracy: 99.58%


Fine_Tuning Epoch 16/20 [Train]: 100%|██████████| 2451/2451 [06:17<00:00,  6.48it/s, loss=1.4330, acc=98.25%]
Fine_Tuning Epoch 16/20 [Val]: 100%|██████████| 307/307 [00:45<00:00,  6.78it/s, loss=0.0013]



Fine_Tuning Epoch 16/20:
Training Loss: 0.0548, Training Accuracy: 98.25%
Validation Loss: 0.0192, Validation Accuracy: 99.47%


Fine_Tuning Epoch 17/20 [Train]: 100%|██████████| 2451/2451 [06:34<00:00,  6.22it/s, loss=0.0362, acc=98.34%]
Fine_Tuning Epoch 17/20 [Val]: 100%|██████████| 307/307 [00:47<00:00,  6.47it/s, loss=0.0080]



Guardando el mejor modelo en results/mobilenet_v2_data3/evaluation_20241207_112444/checkpoints/Fine_Tuning_best_model.pth

Fine_Tuning Epoch 17/20:
Training Loss: 0.0513, Training Accuracy: 98.34%
Validation Loss: 0.0173, Validation Accuracy: 99.52%


Fine_Tuning Epoch 18/20 [Train]: 100%|██████████| 2451/2451 [06:19<00:00,  6.45it/s, loss=0.0203, acc=98.54%]
Fine_Tuning Epoch 18/20 [Val]: 100%|██████████| 307/307 [00:45<00:00,  6.82it/s, loss=0.0003]



Guardando el mejor modelo en results/mobilenet_v2_data3/evaluation_20241207_112444/checkpoints/Fine_Tuning_best_model.pth

Fine_Tuning Epoch 18/20:
Training Loss: 0.0459, Training Accuracy: 98.54%
Validation Loss: 0.0168, Validation Accuracy: 99.52%


Fine_Tuning Epoch 19/20 [Train]: 100%|██████████| 2451/2451 [06:23<00:00,  6.40it/s, loss=3.7453, acc=98.49%]
Fine_Tuning Epoch 19/20 [Val]: 100%|██████████| 307/307 [00:45<00:00,  6.81it/s, loss=0.0001]



Guardando el mejor modelo en results/mobilenet_v2_data3/evaluation_20241207_112444/checkpoints/Fine_Tuning_best_model.pth

Fine_Tuning Epoch 19/20:
Training Loss: 0.0474, Training Accuracy: 98.49%
Validation Loss: 0.0162, Validation Accuracy: 99.55%


Fine_Tuning Epoch 20/20 [Train]: 100%|██████████| 2451/2451 [06:21<00:00,  6.43it/s, loss=4.9163, acc=98.61%]
Fine_Tuning Epoch 20/20 [Val]: 100%|██████████| 307/307 [00:43<00:00,  7.04it/s, loss=0.0000]



Guardando el mejor modelo en results/mobilenet_v2_data3/evaluation_20241207_112444/checkpoints/Fine_Tuning_best_model.pth

Fine_Tuning Epoch 20/20:
Training Loss: 0.0457, Training Accuracy: 98.61%
Validation Loss: 0.0138, Validation Accuracy: 99.61%

Realizando evaluación final...


Evaluando: 100%|██████████| 307/307 [00:54<00:00,  5.66it/s]



Resumen Final del Entrenamiento:
Precisión en test: 99.61%
Pérdida en test: 0.0132

Métricas por clase:
              precision    recall  f1-score      support
A              0.994624  1.000000  0.997305   370.000000
B              0.996942  0.993902  0.995420   328.000000
C              1.000000  1.000000  1.000000   338.000000
D              0.997143  0.988669  0.992888   353.000000
E              1.000000  0.993976  0.996979   332.000000
F              0.997067  1.000000  0.998532   340.000000
G              1.000000  1.000000  1.000000   325.000000
H              1.000000  1.000000  1.000000   331.000000
I              0.993377  1.000000  0.996678   300.000000
J              1.000000  0.997230  0.998613   361.000000
K              0.997151  0.997151  0.997151   351.000000
L              0.997110  1.000000  0.998553   345.000000
M              0.991176  0.991176  0.991176   340.000000
N              0.994536  0.991826  0.993179   367.000000
O              1.000000  1.000000  1.000

# Model selection

| Modelo | Características de Salida (num_features) | Tamaño Mínimo de Entrada | Tamaño Recomendado |

|--------|----------------------------------------|-------------------------|-------------------|

| AlexNet | 4096 | 63x63 | 224x224 |

| ConvNeXt-Tiny | 768 | 32x32 | 224x224 |

| DenseNet121 | 1024 | 29x29 | 224x224 |

| EfficientNet-B0 | 1280 | 32x32 | 224x224 |

| EfficientNetV2-S | 1280 | 32x32 | 384x384 | 

| GoogLeNet | 1024 | 29x29 | 224x224 |

| Inception V3 | 2048 | 75x75 | 299x299 |

| MobileNetV2 | 1280 | 32x32 | 224x224 |

| MobileNetV3-Large | 1280 | 32x32 | 224x224 |

| ResNet50 | 2048 | 32x32 | 224x224 |

| ResNeXt50 | 2048 | 32x32 | 224x224 |

| VGG16 | 4096 | 32x32 | 224x224 |

| ViT (Vision Transformer) | 768 | 32x32 | 224x224 |

| Swin-Tiny | 768 | 32x32 | 224x224 |

In [22]:
# Function to print the number of features for a given model

# def print_model_features(model_name):

#     try:

#         # Crear instancia del modelo con los pesos correctos

#         if model_name == 'mobilenet_v2':

#             model = getattr(models, model_name)(weights=models.mobilenet_v2_Weights.IMAGENET1K_V1)

#         elif model_name == 'resnet50':

#             model = getattr(models, model_name)(weights=models.ResNet50_Weights.IMAGENET1K_V1)

#         else:

#             # Intenta cargar el modelo con pesos IMAGENET1K_V1

#             model = getattr(models, model_name)(weights='IMAGENET1K_V1')

            

#         # Obtener número de características

#         if hasattr(model, 'fc'):

#             num_features = model.fc.in_features

#         elif hasattr(model, 'classifier'):

#             if isinstance(model.classifier, nn.Sequential):

#                 num_features = model.classifier[-1].in_features  # Última capa en Sequential

#             else:

#                 num_features = model.classifier.in_features

#         elif hasattr(model, 'head'):

#             num_features = model.head.in_features

#         else:

#             num_features = None  # Si no se encuentra un atributo compatible

#             print(f"Warning: Could not determine the number of features for {model_name}.")

            

#         if num_features is not None:

#             print(f"{model_name}: {num_features} features")



#     except Exception as e:

#         print(f"Error with {model_name}: {str(e)}")



# # Ejemplo de uso

# print_model_features('mobilenet_v2')

# print_model_features('resnet50')