# Deep Learning for Audio Classification: Real vs. Fake Voice Detection

## Project Overview

This project aims to develop a deep learning model capable of distinguishing between real and fake (deepfake) voice recordings. With the increasing sophistication of voice synthesis technologies, the ability to detect artificially generated audio has become crucial for maintaining trust in digital communications and media.

## Data Sources

The project utilizes two main datasets:

1. **Hugging Face Dataset**: A pre-processed dataset containing audio samples labeled as real or fake.
   - Format: Raw audio data with associated labels
   - Sampling Rate: 22050 Hz
   - Labels: 0 for fake, 1 for real

2. **Kaggle Dataset**: A collection of .wav files organized into 'REAL' and 'FAKE' folders.
   - Format: WAV audio files
   - Sampling Rates: Varied (44100 Hz and 48000 Hz observed)

## Data Preprocessing

### Audio Processing

- **Resampling**: All audio is resampled to 22050 Hz for consistency
- **MFCC Extraction**: Mel-frequency cepstral coefficients are computed for each audio sample
- **Padding/Truncation**: MFCCs are padded or truncated to a fixed length (1000 time steps)

### Dataset Classes

Two custom dataset classes were created to handle the different data sources:

1. `AudioDatasetHuggingFace`: Processes the Hugging Face dataset
2. `AudioDatasetKaggle`: Processes the Kaggle dataset

Both classes ensure consistent MFCC computation and output format.

In [None]:
# Imports and Configuration
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader, random_split
import numpy as np
import pandas as pd 
from tqdm import tqdm
from datasets import load_dataset
import librosa
import os



config = {
    'epochs': 20,
    'train_batch_size': 8,
    'eval_batch_size': 8,
    'learning_rate': 3e-5,
    'weight_decay': 1e-8,
    'gradient_accumulation_steps': 4,
    'seed': 42,
    'n_mfcc': 40,
    'sr': 22050,
    'max_length': 500,
    'warmup_ratio': 0.1
}

# Set the seed for reproducibility
torch.manual_seed(config['seed'])
np.random.seed(config['seed'])


### Data Preprocessing for Kaggle dataset

In [75]:
class AudioDatasetKaggle(Dataset):
    def __init__(self, data_source, config):
        self.mfccs = []
        self.labels = []
        self.config = config
        self.target_sr = config['sr']  # Target sampling rate
        
        if isinstance(data_source, str):  # It's a directory path
            self._process_directory(data_source)
        else:
            raise ValueError("data_source must be a directory path for Kaggle dataset")
        
        self.mfccs = np.array(self.mfccs)
        self.labels = np.array(self.labels)
        
        print(f"Loaded {len(self.labels)} audio files.")
        print(f"Real: {np.sum(self.labels == 1)}, Fake: {np.sum(self.labels == 0)}")
        print(f"MFCC shape: {self.mfccs[0].shape}")

    def _process_directory(self, root_dir):
        for label in ['REAL', 'FAKE']:
            folder_path = os.path.join(root_dir, 'KAGGLE', 'AUDIO', label)
            for filename in tqdm(os.listdir(folder_path), desc=f"Processing {label} audio"):
                if filename.endswith('.wav'):
                    file_path = os.path.join(folder_path, filename)
                    
                    # Load audio file with original sampling rate
                    audio, orig_sr = librosa.load(file_path, sr=None)
                    
                    # Resample if necessary
                    if orig_sr != self.target_sr:
                        audio = librosa.resample(audio, orig_sr=orig_sr, target_sr=self.target_sr)
                    
                    # Compute MFCC
                    mfcc = librosa.feature.mfcc(y=audio, sr=self.target_sr, n_mfcc=self.config['n_mfcc'])
                    
                    # Pad or truncate
                    if mfcc.shape[1] < self.config['max_length']:
                        pad_width = ((0, 0), (0, self.config['max_length'] - mfcc.shape[1]))
                        mfcc = np.pad(mfcc, pad_width, mode='constant')
                    else:
                        mfcc = mfcc[:, :self.config['max_length']]
                    
                    self.mfccs.append(mfcc)
                    self.labels.append(1 if label == 'REAL' else 0)  # 1 for REAL, 0 for FAKE

    def __len__(self):
        return len(self.mfccs)

    def __getitem__(self, idx):
        mfcc = torch.tensor(self.mfccs[idx], dtype=torch.float32)
        label = torch.tensor(self.labels[idx], dtype=torch.long)
        return mfcc, label

### Data Preprocessing for Hugging Face (HF) dataset

In [76]:

class AudioDatasetHuggingFace(Dataset):
    def __init__(self, dataset, n_mfcc=40, max_length=500):
        self.dataset = dataset
        self.n_mfcc = n_mfcc
        self.max_length = max_length
        self.mfccs = []
        self.labels = []
        
        for item in tqdm(self.dataset, desc="Processing audio"):
            audio = item['audio']['array']
            sr = item['audio']['sampling_rate']
            label = item['label']
            
            # Compute MFCC
            mfcc = librosa.feature.mfcc(y=audio, sr=sr, n_mfcc=self.n_mfcc)
            
            # Pad or truncate
            if mfcc.shape[1] < self.max_length:
                pad_width = ((0, 0), (0, self.max_length - mfcc.shape[1]))
                mfcc = np.pad(mfcc, pad_width, mode='constant')
            else:
                mfcc = mfcc[:, :self.max_length]
            
            self.mfccs.append(mfcc)
            self.labels.append(label)
        
        self.mfccs = np.array(self.mfccs)
        self.labels = np.array(self.labels)
        
        print(f"Loaded {len(self.labels)} audio files.")
        print(f"Fake (0): {np.sum(self.labels == 0)}, Real (1): {np.sum(self.labels == 1)}")
        print(f"MFCC shape: {self.mfccs[0].shape}")

    def __len__(self):
        return len(self.mfccs)

    def __getitem__(self, idx):
        mfcc = torch.tensor(self.mfccs[idx], dtype=torch.float32)
        label = torch.tensor(self.labels[idx], dtype=torch.long)
        return mfcc, label

# CNNNetwork Model Architecture

## Overview

The `CNNNetwork` class defines a Convolutional Neural Network (CNN) designed for audio classification tasks. This model is specifically tailored to process Mel-frequency cepstral coefficients (MFCCs) of audio inputs and output a single value, suitable for binary classification tasks such as distinguishing between real and fake voice recordings.

## Model Structure

### Convolutional Layers

The model employs three convolutional layers, each followed by ReLU activation and max pooling:

1. **Conv1**:
   - Input channels: 1 (grayscale MFCC)
   - Output channels: 32
   - Kernel size: 3x3, Stride: 1, Padding: 1
   - Max pooling: 2x2 with stride 2

2. **Conv2**:
   - Input channels: 32
   - Output channels: 64
   - Kernel size: 3x3, Stride: 1, Padding: 1
   - Max pooling: 2x2 with stride 2

3. **Conv3**:
   - Input channels: 64
   - Output channels: 128
   - Kernel size: 3x3, Stride: 1, Padding: 1
   - Max pooling: 2x2 with stride 2

### Adaptive Pooling

An adaptive average pooling layer is used to ensure a fixed output size regardless of input dimensions:
- Output size: 4x4

### Fully Connected Layers

The model concludes with three fully connected layers:

1. **FC1**: 
   - Input: 128 * 4 * 4 = 2048
   - Output: 256
   - Followed by ReLU and Dropout (0.5)

2. **FC2**: 
   - Input: 256
   - Output: 64
   - Followed by ReLU and Dropout (0.5)

3. **FC3** (Output layer): 
   - Input: 64
   - Output: 1 (for binary classification)

## Forward Pass

The `forward` method defines the data flow through the network:

1. Input is unsqueezed to add a channel dimension
2. Data passes through the three convolutional layers
3. Adaptive pooling is applied to ensure consistent dimensionality
4. The output is flattened
5. The flattened tensor goes through the fully connected layers
6. A single output value is produced

## Design Considerations

- Increasing channel depths (32, 64, 128) in convolutional layers capture increasingly complex features
- Adaptive pooling allows for flexibility in input sizes
- Dropout layers (0.5 probability) in fully connected layers prevent overfitting
- The final output is a single value, suitable for binary classification tasks using a threshold or sigmoid activation

This architecture is optimized for processing MFCC representations of audio data, making it well-suited for tasks like distinguishing between real and synthetic voice recordings.

In [77]:
class CNNNetwork(nn.Module):
    def __init__(self, num_mfcc):
        super(CNNNetwork, self).__init__()
        
        self.conv1 = nn.Sequential(
            nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )
        
        self.conv2 = nn.Sequential(
            nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )
        
        self.conv3 = nn.Sequential(
            nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )
        
        self.adaptive_pool = nn.AdaptiveAvgPool2d((4, 4))
        
        self.fc = nn.Sequential(
            nn.Linear(128 * 4 * 4, 256),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(256, 64),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(64, 1)
        )

    def forward(self, x):
        x = x.unsqueeze(1)
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.conv3(x)
        x = self.adaptive_pool(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)
        return x

# Data Loader Creation Function

## Purpose

The `create_data_loaders` function is responsible for splitting a dataset into training, validation, and test sets, and creating PyTorch DataLoader objects for each. This function is crucial for preparing the data for model training and evaluation.

## Function Parameters

- `dataset`: The complete dataset to be split
- `config`: A dictionary containing configuration parameters

## Process

1. **Dataset Splitting**:
   - The dataset is split into three parts:
     - Training set: 70% of the data
     - Validation set: 15% of the data
     - Test set: 15% of the data (remaining data)
   - The `random_split` function is used to ensure randomness in the split
   - A fixed seed is used for reproducibility

2. **Data Loader Creation**:
   - Three separate DataLoader objects are created for train, validation, and test sets
   - Each DataLoader is configured with specific parameters:
     - Batch sizes are set according to the config dictionary
     - Training data is shuffled, while validation and test data are not
     - `num_workers=2` is set for parallel data loading
     - `pin_memory=True` is used for faster data transfer to GPU

## Key Features

- **Reproducibility**: The use of a fixed seed ensures that the dataset is split consistently across different runs
- **Flexible Configuration**: Batch sizes and other parameters are controlled via the config dictionary
- **Performance Optimization**: The use of multiple workers and pinned memory optimizes data loading performance

## Output

The function returns three DataLoader objects:
1. `train_loader`: For training the model
2. `val_loader`: For validating the model during training
3. `test_loader`: For final evaluation of the model

This setup allows for efficient training, validation, and testing cycles in the deep learning pipeline.

In [78]:
def create_data_loaders(dataset, config):
    # Define the sizes for train, validation, and test sets
    total_size = len(dataset)
    train_size = int(0.7 * total_size)
    val_size = int(0.15 * total_size)
    test_size = total_size - train_size - val_size

    # Split the dataset
    train_dataset, val_dataset, test_dataset = random_split(
        dataset, 
        [train_size, val_size, test_size],
        generator=torch.Generator().manual_seed(config['seed'])
    )
    
    print(f"Train set size: {len(train_dataset)}")
    print(f"Validation set size: {len(val_dataset)}")
    print(f"Test set size: {len(test_dataset)}")
    
    train_loader = DataLoader(train_dataset, batch_size=config['train_batch_size'], shuffle=True, num_workers=2, pin_memory=True)
    val_loader = DataLoader(val_dataset, batch_size=config['eval_batch_size'], shuffle=False, num_workers=2, pin_memory=True)
    test_loader = DataLoader(test_dataset, batch_size=config['eval_batch_size'], shuffle=False, num_workers=2, pin_memory=True)
    
    return train_loader, val_loader, test_loader

# Train and Evaluate Functions

## Train Function

### Purpose
The `train` function is responsible for training the model on the provided dataset for one epoch.

### Parameters
- `model`: The neural network model
- `dataloader`: DataLoader containing the training data
- `optimizer`: Optimization algorithm (e.g., Adam)
- `criterion`: Loss function
- `device`: Device to run the computations on (CPU or GPU)
- `config`: Dictionary containing configuration parameters

### Process
1. Sets the model to training mode
2. Iterates through the data in batches:
   - Moves data to the specified device
   - Performs forward pass
   - Calculates loss
   - Normalizes loss for gradient accumulation
   - Performs backward pass
   - Updates weights after accumulating gradients
3. Calculates predictions and accuracy
4. Returns average loss and accuracy for the epoch

### Key Features
- Uses `tqdm` for progress visualization
- Implements gradient accumulation for effective training with larger batch sizes
- Converts labels to float and uses sigmoid activation for binary classification

## Evaluate Function

### Purpose
The `evaluate` function assesses the model's performance on a dataset without updating the model's parameters.

### Parameters
- Similar to the `train` function, but without optimizer

### Process
1. Sets the model to evaluation mode
2. Disables gradient calculation for efficiency
3. Iterates through the data:
   - Performs forward pass
   - Calculates loss and accuracy
4. Returns average loss and accuracy

### Key Features
- Uses `torch.no_grad()` to prevent gradient calculation
- Employs the same prediction mechanism as the training function for consistency

## Common Aspects

- Both functions handle binary classification tasks
- They use sigmoid activation and a 0.5 threshold for predictions
- Both return average loss and accuracy over the entire dataset

These functions form the core of the training and evaluation pipeline, enabling efficient model training and performance assessment.

In [79]:
def train(model, dataloader, optimizer, criterion, device, config):
    model.train()
    total_loss, total_acc = 0, 0
    for i, (mfccs, labels) in enumerate(tqdm(dataloader, desc='Train')):
        mfccs, labels = mfccs.to(device), labels.to(device)
        outputs = model(mfccs)
        loss = criterion(outputs, labels.unsqueeze(1).float())  # Convert labels to float
        
        # Normalize loss to account for batch accumulation
        loss = loss / config['gradient_accumulation_steps']
        
        loss.backward()
        
        if (i + 1) % config['gradient_accumulation_steps'] == 0:
            optimizer.step()
            optimizer.zero_grad()
        
        total_loss += loss.item() * config['gradient_accumulation_steps']
        predictions = (torch.sigmoid(outputs) > 0.5).float()
        total_acc += (predictions == labels.unsqueeze(1)).sum().item() / labels.size(0)
    
    return total_loss / len(dataloader), total_acc / len(dataloader)

def evaluate(model, dataloader, criterion, device):
    model.eval()
    total_loss, total_acc = 0, 0
    with torch.no_grad():
        for mfccs, labels in tqdm(dataloader, desc='Eval'):
            mfccs, labels = mfccs.to(device), labels.to(device)
            outputs = model(mfccs)
            loss = criterion(outputs, labels.unsqueeze(1).float())  # Convert labels to float
            total_loss += loss.item()
            predictions = (torch.sigmoid(outputs) > 0.5).float()
            total_acc += (predictions == labels.unsqueeze(1)).sum().item() / labels.size(0)
    return total_loss / len(dataloader), total_acc / len(dataloader)

# Device Selection for Computation

## Purpose

This code snippet determines the appropriate computational device (GPU or CPU) for running the deep learning model.

## Process

1. The code checks if a CUDA-enabled GPU is available using `torch.cuda.is_available()`.
2. If a GPU is available, it sets the device to "cuda" (GPU).
3. If no GPU is available, it defaults to "cpu".

## Output

The code prints a message indicating which device will be used for computations:
- "Using device: cuda" if a GPU is available
- "Using device: cpu" if no GPU is available

## Significance

- **Performance**: GPUs can significantly accelerate deep learning computations compared to CPUs.
- **Flexibility**: This approach allows the code to run on systems with or without a GPU, enhancing portability.
- **Optimization**: By explicitly setting the device, we ensure that tensors and models are allocated on the correct hardware for optimal performance.

This device selection is crucial for efficient training and inference in deep learning projects, especially when dealing with large models or datasets. You can 

# You can also suggest using [Data Parallel GPU Setup with T4 GPUs]

## Overview

For improved performance and to utilize multiple T4 GPUs, we recommend implementing data parallelism using PyTorch's `nn.DataParallel` or `nn.DistributedDataParallel`.

## Implementation Steps

1. **Device Detection**:
   ```python
   import torch

   if torch.cuda.is_available():
       num_gpus = torch.cuda.device_count()
       if num_gpus > 1:
           print(f"Using {num_gpus} GPUs")
           use_data_parallel = True
       else:
           print("Using single GPU")
           use_data_parallel = False
   else:
       print("CUDA is not available. Using CPU")
       use_data_parallel = False
   ```

2. **Model Wrapping**:
   ```python
   if use_data_parallel:
       model = nn.DataParallel(model)
   model = model.to(device)
   ```

3. **Batch Size Adjustment**:
   ```python
   if use_data_parallel:
       config['train_batch_size'] *= num_gpus
       config['eval_batch_size'] *= num_gpus
   ```

## Benefits of Data Parallelism

- **Increased Processing Power**: Utilizes multiple GPUs to process larger batch sizes.
- **Faster Training**: Reduces overall training time by distributing computations.
- **Scalability**: Easily scales with the number of available GPUs.

## Considerations

- **Memory Usage**: Ensure that the increased batch size doesn't exceed GPU memory.
- **Synchronization Overhead**: There's a slight overhead in synchronizing between GPUs.
- **Load Balancing**: Automatic in `DataParallel`, but may need manual tuning for optimal performance.

This setup requires more code changes but offers better performance, especially for multi-node setups.

By implementing these changes, you can effectively utilize multiple T4 GPUs, significantly boosting your model's training performance.

In [80]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

Using device: cuda


# Loading and Processing Data from Kaggle and Hugging Face

In [81]:
# Importing ConcatDataset
from torch.utils.data import ConcatDataset

In [82]:
# Load Dataset from Hugging Face
ds = load_dataset("Hemg/Deepfakeaudio")
# print(ds)

DatasetDict({
    train: Dataset({
        features: ['audio', 'label'],
        num_rows: 19817
    })
})


In [83]:
# Process the dataset from hugging face
dataset = AudioDatasetHuggingFace(ds['train'])


Processing audio: 100%|██████████| 19817/19817 [07:37<00:00, 43.36it/s]


Loaded 19817 audio files.
Fake (0): 10000, Real (1): 9817
MFCC shape: (40, 500)


In [84]:
# Process the dataset from kaggle
new_dataset = AudioDatasetKaggle('/kaggle/input/deep-voice-deepfake-voice-recognition', config)


Processing REAL audio: 100%|██████████| 8/8 [00:10<00:00,  1.27s/it]
Processing FAKE audio: 100%|██████████| 56/56 [01:07<00:00,  1.20s/it]

Loaded 64 audio files.
Real: 8, Fake: 56
MFCC shape: (40, 500)





# Data visualization before concaternating 
## Uncomment the cells and run to view the structure and type of the datasets

In [85]:
# Check the frequency fo kaggle data

# import wave

# def get_wav_sr(file_path):
#     with wave.open(file_path, 'rb') as wav_file:
#         return wav_file.getframerate()

# # Check a few files
# folder_path = os.path.join('/kaggle/input/deep-voice-deepfake-voice-recognition', 'KAGGLE', 'AUDIO', 'REAL')
# for filename in os.listdir(folder_path)[:5]:  # Check first 5 files
#     if filename.endswith('.wav'):
#         file_path = os.path.join(folder_path, filename)
#         sr = get_wav_sr(file_path)
#         print(f"Sampling rate for {filename}: {sr} Hz")

In [86]:
# # Visualize the data from HF
# print("Dataset Structure:")
# print(ds)

# print("\nAvailable Splits:")
# print(list(ds.keys()))

# # Access the 'train' split
# train_dataset = ds['train']

# print("\nTrain Dataset Features:")
# print(train_dataset.features)

# print("\nTrain Dataset Info:")
# print(train_dataset)

# print("\nSample data (first 5 entries):")
# for i in range(min(5, len(train_dataset))):
#     print(f"\nSample {i+1}:")
#     item = train_dataset[i]
#     for key, value in item.items():
#         if key == 'audio':
#             print(f"  Audio:")
#             print(f"    Array shape: {value['array'].shape}")
#             print(f"    Sampling rate: {value['sampling_rate']} Hz")
#         else:
#             print(f"  {key.capitalize()}: {value}")

# print("\nDataset Methods:")
# print([method for method in dir(train_dataset) if not method.startswith('_')])

In [87]:
# # Check lebels for HF data
# train_dataset = ds['train']

# # Count the occurrences of each label
# label_counts = {0: 0, 1: 0}

# for item in train_dataset:
#     label = item['label']
#     label_counts[label] += 1

# # Print the results
# print("Label Distribution:")
# print(f"0 (likely 'fake'): {label_counts[0]}")
# print(f"1 (likely 'real'): {label_counts[1]}")

# # Calculate percentages
# total = sum(label_counts.values())
# print("\nPercentages:")
# print(f"0 (likely 'fake'): {label_counts[0]/total*100:.2f}%")
# print(f"1 (likely 'real'): {label_counts[1]/total*100:.2f}%")

# # Print a few examples of each label
# print("\nSamples with label 0:")
# for item in train_dataset.shuffle(seed=42).filter(lambda x: x['label'] == 0).select(range(5)):
#     print(f"  Audio shape: {item['audio']['array'].shape}, Label: {item['label']}")

# print("\nSamples with label 1:")
# for item in train_dataset.shuffle(seed=42).filter(lambda x: x['label'] == 1).select(range(5)):
#     print(f"  Audio shape: {item['audio']['array'].shape}, Label: {item['label']}")

In [88]:
# # Check the structure of the first item
# first_item = train_dataset[0]
# audio_data = first_item['audio']['array']

# print(f"Audio data shape: {audio_data.shape}")
# print(f"Audio data type: {audio_data.dtype}")


## concaternating the datasets

In [89]:
combined_dataset = ConcatDataset([dataset, new_dataset])

# Model Setup and Configuration

This code block sets up the core components for training the neural network model. Let's break down each part:

## Data Loaders

```python
train_loader, val_loader, test_loader = create_data_loaders(combined_dataset, config)
```
- Creates data loaders for training, validation, and test sets.
- Uses the `create_data_loaders` function (defined earlier) to split and load the data.
- `combined_dataset` likely contains both Hugging Face and Kaggle datasets.

## Model Initialization

```python
model = CNNNetwork(num_mfcc=config['n_mfcc']).to(device)
```
- Initializes the CNN model with the specified number of MFCC coefficients.
- Moves the model to the appropriate device (GPU if available, otherwise CPU).

## Loss Function

```python
criterion = nn.BCEWithLogitsLoss()
```
- Uses Binary Cross Entropy with Logits Loss.
- Suitable for binary classification tasks.
- Combines a Sigmoid layer and BCELoss in one single class for numerical stability.

## Optimizer

```python
optimizer = optim.Adam(model.parameters(), lr=config['learning_rate'], betas=(0.9, 0.999), eps=1e-8)
```
- Uses Adam optimizer for updating model parameters.
- Learning rate and other hyperparameters are set based on the `config` dictionary.

## Learning Rate Scheduler

```python
total_steps = len(train_loader) * config['epochs']
warmup_steps = int(total_steps * config['warmup_ratio'])
scheduler = optim.lr_scheduler.LinearLR(optimizer, start_factor=0.1, total_iters=warmup_steps)
```
- Implements a linear learning rate warm-up.
- Calculates total training steps and warm-up steps.
- Uses `LinearLR` scheduler to gradually increase the learning rate during the warm-up phase.

## Key Points

- The setup is designed for binary classification (real vs fake audio).
- It uses a CNN architecture specifically designed for audio data (MFCCs).
- The learning rate scheduler implements a warm-up strategy, which can help stabilize early training.
- All components are configured to work with the specified device (GPU/CPU).

This setup provides a solid foundation for training a deep learning model on the audio classification task, with considerations for optimization and learning rate adjustment.

In [90]:

# Create data loaders
train_loader, val_loader, test_loader = create_data_loaders(combined_dataset, config)

# Model initialization
model = CNNNetwork(num_mfcc=config['n_mfcc']).to(device)

# Loss and optimizer
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(model.parameters(), lr=config['learning_rate'], betas=(0.9, 0.999), eps=1e-8)

# Learning rate scheduler
total_steps = len(train_loader) * config['epochs']
warmup_steps = int(total_steps * config['warmup_ratio'])
scheduler = optim.lr_scheduler.LinearLR(optimizer, start_factor=0.1, total_iters=warmup_steps)



Train set size: 13916
Validation set size: 2982
Test set size: 2983


# Training Loop

This code implements a training loop.

## Training Loop
```python
train_loss, train_acc = train(model, train_loader, optimizer, criterion, device, config)
```
- Call the `train` function to train the model on the training data for one epoch.
- The function takes the model, training data loader, optimizer, loss criterion, device, and configuration as arguments.
- It returns the training loss and accuracy for the current epoch.

### Validation
```python
val_loss, val_acc = evaluate(model, val_loader, criterion, device)
```
- Call the `evaluate` function to compute the validation loss and accuracy for the current model.
- The function takes the model, validation data loader, loss criterion, and device as arguments.

### Learning Rate Scheduler
```python
scheduler.step()
```
- Call the `step()` method of the learning rate scheduler to update the learning rate if applicable.

### Logging
```python
print(f"Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.4f}")
print(f"Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.4f}")
```
- Print the training and validation losses and accuracies for the current epoch.

### Best Model Checkpoint
- If the validation accuracy for the current epoch is better than the best validation accuracy seen so far:
  - Update `best_valid_acc` to the current validation accuracy.
  - Create a checkpoint dictionary containing the current epoch, model state, optimizer state, scheduler state, and the updated `best_valid_acc`.
  - Save the checkpoint to a file specified by `checkpoint_path` using `torch.save()`.
  - Print a message indicating that the best model so far has been saved.

## Final Evaluation
- After the training loop ends, perform a final evaluation on the test set using the `evaluate` function.
- Print the test loss and accuracy.

This training loop allows you to train your model for a specified number of epochs, monitor the training and validation performance, save checkpoints of the best model based on validation accuracy, and finally evaluate the model on the test set.

In [91]:
# Training loop
best_valid_acc = 0
for epoch in range(config['epochs']):
    print(f"\nEpoch {epoch+1}/{config['epochs']}")
    train_loss, train_acc = train(model, train_loader, optimizer, criterion, device, config)
    val_loss, val_acc = evaluate(model, val_loader, criterion, device)
    scheduler.step()
    
    print(f"Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.4f}")
    print(f"Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.4f}")
    
    if val_acc > best_valid_acc:
        best_valid_acc = val_acc
        checkpoint = {
            'epoch': epoch,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'scheduler_state_dict': scheduler.state_dict(),
            'best_valid_acc': best_valid_acc,
        }
        torch.save(checkpoint, checkpoint_path)
        print("Saved best model")

# Final evaluation on test set
test_loss, test_acc = evaluate(model, test_loader, criterion, device)
print(f"\nTest Loss: {test_loss:.4f}, Test Acc: {test_acc:.4f}")


Epoch 1/20


Train: 100%|██████████| 1740/1740 [00:09<00:00, 182.22it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 300.79it/s]


Train Loss: 0.7279, Train Acc: 0.5023
Val Loss: 0.6940, Val Acc: 0.4977
Saved best model

Epoch 2/20


Train: 100%|██████████| 1740/1740 [00:09<00:00, 180.27it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 310.51it/s]


Train Loss: 0.7035, Train Acc: 0.5038
Val Loss: 0.6912, Val Acc: 0.5306
Saved best model

Epoch 3/20


Train: 100%|██████████| 1740/1740 [00:09<00:00, 183.35it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 313.09it/s]


Train Loss: 0.6973, Train Acc: 0.5092
Val Loss: 0.6906, Val Acc: 0.5313
Saved best model

Epoch 4/20


Train: 100%|██████████| 1740/1740 [00:09<00:00, 182.64it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 311.78it/s]


Train Loss: 0.6944, Train Acc: 0.5139
Val Loss: 0.6903, Val Acc: 0.5366
Saved best model

Epoch 5/20


Train: 100%|██████████| 1740/1740 [00:09<00:00, 181.34it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 311.10it/s]


Train Loss: 0.6923, Train Acc: 0.5197
Val Loss: 0.6902, Val Acc: 0.5821
Saved best model

Epoch 6/20


Train: 100%|██████████| 1740/1740 [00:09<00:00, 181.94it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 309.70it/s]


Train Loss: 0.6918, Train Acc: 0.5206
Val Loss: 0.6885, Val Acc: 0.6085
Saved best model

Epoch 7/20


Train: 100%|██████████| 1740/1740 [00:09<00:00, 183.29it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 308.23it/s]


Train Loss: 0.6898, Train Acc: 0.5372
Val Loss: 0.6867, Val Acc: 0.5939

Epoch 8/20


Train: 100%|██████████| 1740/1740 [00:09<00:00, 181.02it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 307.91it/s]


Train Loss: 0.6885, Train Acc: 0.5341
Val Loss: 0.6841, Val Acc: 0.6383
Saved best model

Epoch 9/20


Train: 100%|██████████| 1740/1740 [00:09<00:00, 184.15it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 313.00it/s]


Train Loss: 0.6856, Train Acc: 0.5494
Val Loss: 0.6793, Val Acc: 0.6447
Saved best model

Epoch 10/20


Train: 100%|██████████| 1740/1740 [00:09<00:00, 184.83it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 309.00it/s]


Train Loss: 0.6830, Train Acc: 0.5611
Val Loss: 0.6751, Val Acc: 0.6567
Saved best model

Epoch 11/20


Train: 100%|██████████| 1740/1740 [00:09<00:00, 180.61it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 284.63it/s]


Train Loss: 0.6790, Train Acc: 0.5744
Val Loss: 0.6685, Val Acc: 0.6953
Saved best model

Epoch 12/20


Train: 100%|██████████| 1740/1740 [00:09<00:00, 182.59it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 312.79it/s]


Train Loss: 0.6731, Train Acc: 0.5894
Val Loss: 0.6595, Val Acc: 0.7120
Saved best model

Epoch 13/20


Train: 100%|██████████| 1740/1740 [00:09<00:00, 184.04it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 310.90it/s]


Train Loss: 0.6690, Train Acc: 0.5988
Val Loss: 0.6523, Val Acc: 0.7261
Saved best model

Epoch 14/20


Train: 100%|██████████| 1740/1740 [00:09<00:00, 181.23it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 300.03it/s]


Train Loss: 0.6596, Train Acc: 0.6263
Val Loss: 0.6442, Val Acc: 0.7007

Epoch 15/20


Train: 100%|██████████| 1740/1740 [00:09<00:00, 183.54it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 309.64it/s]


Train Loss: 0.6509, Train Acc: 0.6455
Val Loss: 0.6290, Val Acc: 0.7631
Saved best model

Epoch 16/20


Train: 100%|██████████| 1740/1740 [00:09<00:00, 181.64it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 312.23it/s]


Train Loss: 0.6388, Train Acc: 0.6640
Val Loss: 0.6134, Val Acc: 0.7784
Saved best model

Epoch 17/20


Train: 100%|██████████| 1740/1740 [00:09<00:00, 179.96it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 307.55it/s]


Train Loss: 0.6253, Train Acc: 0.6859
Val Loss: 0.5980, Val Acc: 0.7949
Saved best model

Epoch 18/20


Train: 100%|██████████| 1740/1740 [00:09<00:00, 183.40it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 306.11it/s]


Train Loss: 0.6095, Train Acc: 0.7014
Val Loss: 0.5760, Val Acc: 0.8147
Saved best model

Epoch 19/20


Train: 100%|██████████| 1740/1740 [00:09<00:00, 183.66it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 310.35it/s]


Train Loss: 0.5924, Train Acc: 0.7195
Val Loss: 0.5547, Val Acc: 0.8277
Saved best model

Epoch 20/20


Train: 100%|██████████| 1740/1740 [00:09<00:00, 180.49it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 304.89it/s]


Train Loss: 0.5725, Train Acc: 0.7421
Val Loss: 0.5318, Val Acc: 0.8210


Eval: 100%|██████████| 373/373 [00:01<00:00, 312.15it/s]


Test Loss: 0.5316, Test Acc: 0.8132





## Load checkpoint add uncomment to train from previous best model saved 

In [100]:
# Load best model

def load_checkpoint(model, optimizer, scheduler, checkpoint_path):
    if os.path.exists(checkpoint_path):
        try:
            checkpoint = torch.load(checkpoint_path, map_location=torch.device('cpu'))
            print("Checkpoint content:")
            for key in checkpoint.keys():
                print(f"  {key}: {type(checkpoint[key])}")
            
            # Load the model state dict
            model.load_state_dict(checkpoint)
            print("Model state dictionary loaded successfully.")

            epoch = 0
            best_valid_acc = 0.0
            
            print(f"Loaded checkpoint. Starting from epoch {epoch} with best validation accuracy: {best_valid_acc:.4f}")
            return epoch, best_valid_acc
        except Exception as e:
            print(f"Error loading checkpoint: {e}")
            return 0, 0
    else:
        print("No checkpoint found. Starting from scratch.")
        return 0, 0

# Usage
checkpoint_path = '/kaggle/working/best_model.pth'
start_epoch, best_valid_acc = load_checkpoint(model, optimizer, scheduler, checkpoint_path)

# Rest of your training loop...
# Training loop
for epoch in range(start_epoch, 50):
    print(f"\nEpoch {epoch+1}/50")
    train_loss, train_acc = train(model, train_loader, optimizer, criterion, device, config)
    val_loss, val_acc = evaluate(model, val_loader, criterion, device)
    scheduler.step()
    
    print(f"Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.4f}")
    print(f"Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.4f}")
    
    if val_acc > best_valid_acc:
        best_valid_acc = val_acc
        checkpoint = {
            'epoch': epoch,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'scheduler_state_dict': scheduler.state_dict(),
            'best_valid_acc': best_valid_acc,
        }
        torch.save(checkpoint, checkpoint_path)
        print("Saved best model")

# Final evaluation on test set
test_loss, test_acc = evaluate(model, test_loader, criterion, device)
print(f"\nTest Loss: {test_loss:.4f}, Test Acc: {test_acc:.4f}")

  checkpoint = torch.load(checkpoint_path, map_location=torch.device('cpu'))


Checkpoint content:
  epoch: <class 'int'>
  model_state_dict: <class 'collections.OrderedDict'>
  optimizer_state_dict: <class 'dict'>
  scheduler_state_dict: <class 'dict'>
  best_valid_acc: <class 'float'>
Error loading checkpoint: Error(s) in loading state_dict for CNNNetwork:
	Missing key(s) in state_dict: "conv1.0.weight", "conv1.0.bias", "conv2.0.weight", "conv2.0.bias", "conv3.0.weight", "conv3.0.bias", "fc.0.weight", "fc.0.bias", "fc.3.weight", "fc.3.bias", "fc.6.weight", "fc.6.bias". 
	Unexpected key(s) in state_dict: "epoch", "model_state_dict", "optimizer_state_dict", "scheduler_state_dict", "best_valid_acc". 

Epoch 1/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 182.87it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 261.31it/s]


Train Loss: 0.1773, Train Acc: 0.9351
Val Loss: 0.0768, Val Acc: 0.9682
Saved best model

Epoch 2/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 182.83it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 306.45it/s]


Train Loss: 0.1017, Train Acc: 0.9619
Val Loss: 0.0516, Val Acc: 0.9799
Saved best model

Epoch 3/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 183.62it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 308.13it/s]


Train Loss: 0.0824, Train Acc: 0.9713
Val Loss: 0.0585, Val Acc: 0.9759

Epoch 4/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 181.79it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 279.25it/s]


Train Loss: 0.0632, Train Acc: 0.9792
Val Loss: 0.0596, Val Acc: 0.9769

Epoch 5/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 180.21it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 300.97it/s]


Train Loss: 0.0665, Train Acc: 0.9771
Val Loss: 0.0409, Val Acc: 0.9832
Saved best model

Epoch 6/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 183.18it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 302.11it/s]


Train Loss: 0.0470, Train Acc: 0.9843
Val Loss: 0.0334, Val Acc: 0.9876
Saved best model

Epoch 7/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 177.00it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 292.08it/s]


Train Loss: 0.0424, Train Acc: 0.9859
Val Loss: 0.0539, Val Acc: 0.9826

Epoch 8/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 180.98it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 300.34it/s]


Train Loss: 0.0476, Train Acc: 0.9829
Val Loss: 0.0405, Val Acc: 0.9856

Epoch 9/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 180.49it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 299.97it/s]


Train Loss: 0.0448, Train Acc: 0.9852
Val Loss: 0.0542, Val Acc: 0.9806

Epoch 10/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 177.60it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 296.16it/s]


Train Loss: 0.0393, Train Acc: 0.9858
Val Loss: 0.0798, Val Acc: 0.9745

Epoch 11/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 181.24it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 252.94it/s]


Train Loss: 0.0362, Train Acc: 0.9884
Val Loss: 0.1118, Val Acc: 0.9618

Epoch 12/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 177.34it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 298.68it/s]


Train Loss: 0.0338, Train Acc: 0.9891
Val Loss: 0.1294, Val Acc: 0.9621

Epoch 13/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 177.66it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 294.10it/s]


Train Loss: 0.0310, Train Acc: 0.9886
Val Loss: 0.0552, Val Acc: 0.9842

Epoch 14/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 181.02it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 290.84it/s]


Train Loss: 0.0288, Train Acc: 0.9890
Val Loss: 0.0298, Val Acc: 0.9893
Saved best model

Epoch 15/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 181.67it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 296.10it/s]


Train Loss: 0.0329, Train Acc: 0.9892
Val Loss: 0.0332, Val Acc: 0.9903
Saved best model

Epoch 16/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 174.78it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 299.54it/s]


Train Loss: 0.0276, Train Acc: 0.9903
Val Loss: 0.0292, Val Acc: 0.9903

Epoch 17/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 180.59it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 299.84it/s]


Train Loss: 0.0215, Train Acc: 0.9934
Val Loss: 0.0318, Val Acc: 0.9886

Epoch 18/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 183.07it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 296.69it/s]


Train Loss: 0.0189, Train Acc: 0.9932
Val Loss: 0.0290, Val Acc: 0.9889

Epoch 19/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 179.60it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 298.75it/s]


Train Loss: 0.0158, Train Acc: 0.9945
Val Loss: 0.0448, Val Acc: 0.9866

Epoch 20/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 180.91it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 297.77it/s]


Train Loss: 0.0257, Train Acc: 0.9914
Val Loss: 0.0288, Val Acc: 0.9906
Saved best model

Epoch 21/50


Train: 100%|██████████| 1740/1740 [00:10<00:00, 173.45it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 290.21it/s]


Train Loss: 0.0134, Train Acc: 0.9948
Val Loss: 0.0309, Val Acc: 0.9916
Saved best model

Epoch 22/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 179.52it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 297.28it/s]


Train Loss: 0.0154, Train Acc: 0.9949
Val Loss: 0.1117, Val Acc: 0.9806

Epoch 23/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 178.62it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 299.20it/s]


Train Loss: 0.0195, Train Acc: 0.9935
Val Loss: 0.0242, Val Acc: 0.9916

Epoch 24/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 180.97it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 301.28it/s]


Train Loss: 0.0235, Train Acc: 0.9919
Val Loss: 0.0367, Val Acc: 0.9899

Epoch 25/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 176.01it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 299.84it/s]


Train Loss: 0.0168, Train Acc: 0.9942
Val Loss: 0.0287, Val Acc: 0.9913

Epoch 26/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 180.19it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 300.05it/s]


Train Loss: 0.0172, Train Acc: 0.9943
Val Loss: 0.0349, Val Acc: 0.9896

Epoch 27/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 179.76it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 299.98it/s]


Train Loss: 0.0126, Train Acc: 0.9968
Val Loss: 0.0420, Val Acc: 0.9873

Epoch 28/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 177.68it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 297.81it/s]


Train Loss: 0.0139, Train Acc: 0.9953
Val Loss: 0.0258, Val Acc: 0.9920
Saved best model

Epoch 29/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 179.99it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 295.30it/s]


Train Loss: 0.0201, Train Acc: 0.9935
Val Loss: 0.0602, Val Acc: 0.9815

Epoch 30/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 181.04it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 262.36it/s]


Train Loss: 0.0168, Train Acc: 0.9935
Val Loss: 0.0260, Val Acc: 0.9910

Epoch 31/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 180.30it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 289.21it/s]


Train Loss: 0.0161, Train Acc: 0.9949
Val Loss: 0.0279, Val Acc: 0.9903

Epoch 32/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 181.31it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 302.50it/s]


Train Loss: 0.0171, Train Acc: 0.9945
Val Loss: 0.0322, Val Acc: 0.9916

Epoch 33/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 179.51it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 267.16it/s]


Train Loss: 0.0122, Train Acc: 0.9964
Val Loss: 0.0299, Val Acc: 0.9896

Epoch 34/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 180.58it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 298.57it/s]


Train Loss: 0.0115, Train Acc: 0.9966
Val Loss: 0.0836, Val Acc: 0.9691

Epoch 35/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 177.80it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 293.58it/s]


Train Loss: 0.0069, Train Acc: 0.9978
Val Loss: 0.0312, Val Acc: 0.9933
Saved best model

Epoch 36/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 178.57it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 302.63it/s]


Train Loss: 0.0198, Train Acc: 0.9932
Val Loss: 0.0383, Val Acc: 0.9910

Epoch 37/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 180.77it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 291.02it/s]


Train Loss: 0.0165, Train Acc: 0.9952
Val Loss: 0.0424, Val Acc: 0.9899

Epoch 38/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 182.88it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 273.69it/s]


Train Loss: 0.0099, Train Acc: 0.9973
Val Loss: 0.0486, Val Acc: 0.9910

Epoch 39/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 174.45it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 295.78it/s]


Train Loss: 0.0048, Train Acc: 0.9986
Val Loss: 0.0504, Val Acc: 0.9879

Epoch 40/50


Train: 100%|██████████| 1740/1740 [00:10<00:00, 173.38it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 277.67it/s]


Train Loss: 0.0267, Train Acc: 0.9913
Val Loss: 0.0219, Val Acc: 0.9926

Epoch 41/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 182.65it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 290.35it/s]


Train Loss: 0.0064, Train Acc: 0.9981
Val Loss: 0.0292, Val Acc: 0.9913

Epoch 42/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 174.98it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 297.28it/s]


Train Loss: 0.0066, Train Acc: 0.9981
Val Loss: 0.0342, Val Acc: 0.9930

Epoch 43/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 182.75it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 298.56it/s]


Train Loss: 0.0208, Train Acc: 0.9933
Val Loss: 0.0368, Val Acc: 0.9903

Epoch 44/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 180.27it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 301.55it/s]


Train Loss: 0.0115, Train Acc: 0.9974
Val Loss: 0.0592, Val Acc: 0.9832

Epoch 45/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 180.10it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 295.31it/s]


Train Loss: 0.0182, Train Acc: 0.9942
Val Loss: 0.0457, Val Acc: 0.9896

Epoch 46/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 180.59it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 291.10it/s]


Train Loss: 0.0108, Train Acc: 0.9964
Val Loss: 0.0434, Val Acc: 0.9846

Epoch 47/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 178.43it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 283.90it/s]


Train Loss: 0.0115, Train Acc: 0.9967
Val Loss: 0.0481, Val Acc: 0.9883

Epoch 48/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 176.38it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 293.06it/s]


Train Loss: 0.0109, Train Acc: 0.9964
Val Loss: 0.0336, Val Acc: 0.9913

Epoch 49/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 180.23it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 298.39it/s]


Train Loss: 0.0137, Train Acc: 0.9963
Val Loss: 0.0342, Val Acc: 0.9892

Epoch 50/50


Train: 100%|██████████| 1740/1740 [00:09<00:00, 179.84it/s]
Eval: 100%|██████████| 373/373 [00:01<00:00, 295.60it/s]


Train Loss: 0.0077, Train Acc: 0.9983
Val Loss: 0.0311, Val Acc: 0.9923


Eval: 100%|██████████| 373/373 [00:01<00:00, 288.39it/s]


Test Loss: 0.0177, Test Acc: 0.9950





In [95]:
# rm -rf ./*