[<img src=\"https://colab.research.google.com/img/colab_favicon.ico\" alt=\"Open this notebook in Google Colab\" width=\"80\">](https://colab.research.google.com/github/GVourvachakis/DeepDLT/blob/main/main.ipynb)

## Customed Components

### create_dataset.py 
Creates directory "./dataset" of **cropped and brightness-varying (by $\pm$ 20%) 64x64 images** (create_dataset() function) in BMP (bitmap) format for storage efficiency and their respective **csv files** (create_data_with_labels_csv() function) splitted as training, validation, and testing datasets sampled from the excel file "./images/all_images.xlsx".

### dataset_loader.py
Construct flexible/modular **custom dataset class** LaserDataset(Dataset) with Ordinal encoded "PP1" categorical feature and respective train/val/test dataloaders (prepare_and_load_data() function) .

### stratified_split.py
Contains a subroutine for k-fold label-wise cross-validation splitting (k_fold_cross_validation() function, with fold=5 as default) and the main execution/development of the folds under a multil-label cross-validation() splitting scheme (main_cross_validation() function) .

### environment.yml
contains all the dependencies and requirements.

### main.ipynb
Notebook where the whole training and inferencing pipeline is implemented 

### KFold_split.py
Creates DataLoader instances for 5-fold cross-validation . [optional]

Connect into the custom virtual environment

In [2]:
# import os
# !source pytorch_venv/bin/activate
# os.environ['VIRTUAL_ENV']

'/home/georgios-vourvachakis/Desktop/DeepDLT/pytorch_venv'

In [3]:
!nvidia-smi

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.



Import native python and torch dependencies

In [2]:
import numpy as np ; import matplotlib; import pandas as pd
import matplotlib.pyplot as plt
import subprocess
import tqdm
import torch ; import torchvision
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

Import quantitative reconstruction evaluation metrics via scikit-image

In [None]:
import skimage as ski
print(ski.__version__)

In [None]:
!python --version
print(f" matplotlib:\t{matplotlib.__version__}\n numpy:\t\t{np.__version__}\
      \n pandas:\t{pd.__version__}\n tqdm:\t\t{tqdm.__version__}\
      \n torch:\t\t{torch.__version__}\n torchvision:\t{torchvision.__version__}\
      \n skimage:\t{ski.__version__}")

Import custom dependencies

In [5]:
from dataset_loader import LaserDataset, prepare_and_load_data
from create_dataset import create_dir

Construct directory of augmented images along with train/val/test csv datasets
(given the directory "./datasets" doesn't exist already)

In [6]:
# if not os.path.exists("./datasets"):
#     subprocess.run(["python", "create_dataset.py"])

Generate uniform label distribution-aware 5-fold cross-validation data (better *generalization*, acounting for *outliers*, and preventing *overfitting*) [given there are train/val/test files to sample from]

In [7]:
# if os.path.exists("./datasets/data_with_labels_csv"):
#     subprocess.run(["python", "stratified_split.py"])

**Complete preprocessing pipeline**:
create_dataset , data_with_labels_csv and globally create train/val/test Dataloaders

In [28]:
# Define paths
input_dirs = [
                '2020-4-30 tuning ripple period',
                '2020-6-9 Crossed polarized',
                'Paper Data/Double pulses',
                'Paper Data/Repetition 6p & 2p 29-4-2020',
                'Paper Data/Single pulses 2p',
                'Paper Data/Single pulses 4 and half 6',
                'Paper Data/Repetition 6p & 2p 29-4-2020/Details'
             ]
    
base_path = "./images"
excel_path = "./images/all_images.xlsx" # sample data for train/val/test csv files
csv_output_path = "./datasets/data_with_labels_csv"

dim = 64 # set dimensions of augmented images

images_path = f'./datasets/2023_im_dataset_{dim}x{dim}'
output_dir_images = create_dir(images_path)

train_loader, val_loader, test_loader = prepare_and_load_data(
                                                                input_dirs,
                                                                base_path,
                                                                output_dir_images,
                                                                excel_path,
                                                                csv_output_path,
                                                                cropped_dim=dim
                                                             )   

Generating dataset with cropped images...

Creating csv files with (train_ratio, val_ratio, test_ratio) = (0.8, 0.1, 0.1)

Creating DataLoaders...

Training DataLoader...
Validation DataLoader...
Testing DataLoader...


Model initialization from models directory

In [32]:
# class CNNAutoencoder is exposed in __init__.py

#Directory tructure:
# DeepDLT/
# ├── models/
# │   ├── __init__.py
# │   └── autoencoder.py
# └── autoencoder.ipynb.py

from models.autoencoder import CNNAutoencoder 

In [21]:
from training_pipeline import train_model, load_checkpoint

In [None]:
# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using {device}")

# Data Preparation (embedded into the LaserDataset class)
# transform = transforms.Compose([
#     transforms.ToTensor(),
#     transforms.Normalize(mean=(0.5,), std=(0.5,))
# ])

In [None]:
# Model Initialization
model = CNNAutoencoder(activation_function='relu', dropout_strength=0.3).to(device)

learning_rate = 1e-3
optimizer = 'Adam'
epochs = 100

# Train the model
train_losses, val_losses, psnr_values, ssim_values = train_model(model, train_loader, val_loader, device,
                                                                 optimizer=optimizer, num_epochs=epochs//2, learning_rate=learning_rate,
                                                                 checkpoint_name='model_checkpoint', # saving a checkpoint model every 10 epochs
                                                                 best_metric_checkpoint_name='best_model') # saving best models for loss, psnr, and ssim

## Loading Checkpoint for Inference and/or Resuming Training

In [None]:
# Load checkpoint
file_path =  f'./models_history_{optimizer}/model_checkpoint.pt'

model, optimizer, start_epoch, loss = load_checkpoint(model, optimizer, file_path, lr=learning_rate)

# Set model to eval mode for evaluation or train mode to continue training
model.eval()  # For evaluation
# Or:
# model.train()  # For resuming training

print(f"Model restored to epoch {start_epoch} with loss {loss:.4f}")

In [None]:
# Continue Training the model
train_losses, val_losses, psnr_values, ssim_values = train_model(model, train_loader, val_loader, device,
                                                                 optimizer='Adam', start_epoch=start_epoch, num_epochs=epochs, learning_rate=1e-2,
                                                                 checkpoint_name='model_checkpoint', 
                                                                 best_metric_checkpoint_name='best_model')

Obtain the loss curves, and PSNR and SSIM values accross epochs

In [None]:
from inference import plotting, visualize_reconstruction

plotting(train_losses=train_losses, val_losses=val_losses, psnr_values=psnr_values, ssim_values=ssim_values)

Visualize Reconstruction on Training and Testing Data

In [None]:
# Reconstruction on Training Data
print("Reconstruction on Training Data:")
visualize_reconstruction(train_loader, model, device, num_images=5)

# Reconstruction on Test Data
print("Reconstruction on Test Data:")
visualize_reconstruction(test_loader, model, device, num_images=5)

In [None]:
%%script false --no-raise-error
if os.path.exists("./datasets/data_with_labels_csv"):
    subprocess.run(["python", "stratified_split.py"])

In [27]:
%%script false --no-raise-error
def train_and_evaluate_kfold(model_class, fold_dir, num_folds, device, features , criterion=nn.MSELoss(), num_epochs=10, learning_rate=1e-3):
    fold_train_losses = []
    fold_val_losses = []
    fold_test_losses = []

    print(f"Procedure must be done across ALL features: {features}, now it is operated only on {features[0]}...\n")

    for fold in range(1, num_folds + 1):
        print(f"Starting Fold {fold}/{num_folds}...")
        
        # Asoociate appropriate path for the fold's datasets
        train_path = os.path.join(fold_dir, 'angle', f'fold_{fold}', 'angle_train.csv')
        val_path = os.path.join(fold_dir, 'angle', f'fold_{fold}', 'angle_val.csv')
        test_path = os.path.join(fold_dir, 'angle', f'fold_{fold}', 'angle_test.csv')

        # Load the datasets for the current fold
        train_dataset = LaserDataset(train_path, transform=transform)
        val_dataset = LaserDataset(val_path, transform=transform)
        test_dataset = LaserDataset(test_path, transform=transform)

        # Prepare Dataloaders
        train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True, num_workers=4)
        val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False, num_workers=4)
        test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False, num_workers=4)

        # Initialize the model for this fold
        model = model_class(activation_function='relu', dropout_strength=0.2).to(device)

        # Train the model
        train_losses, val_losses = train_model(model, train_loader, val_loader, device, criterion=criterion, num_epochs=num_epochs, learning_rate=learning_rate) 

        # Evaluate the model on the test set
        model.eval()
        test_loss = 0.0
        all_outputs = []
        all_targets = []

        with torch.no_grad():
            for inputs, _ in tqdm(test_loader, desc=f"Testing Fold {fold}/{num_folds}"):
                inputs = inputs.to(device)
                outputs = model(inputs)
                test_loss += criterion(inputs, outputs) * inputs.size(0)
                all_outputs.extend(outputs.cpu().numpy())
                all_targets.extend(inputs.cpu().numpy())

        test_loss /= len(test_loader.dataset)
        fold_test_losses.append(test_loss)

        # Log fold results
        print(f"Fold {fold}/{num_folds} - Train Loss: {train_losses[-1]:.4f}, Val Loss: {val_losses[-1]:.4f}, Test Loss: {test_loss:.4f}")

        # Store fold-wise losses
        fold_train_losses.append(train_losses)
        fold_val_losses.append(val_losses)

        # Visualize reconstruction
        print(f"Visualizing Reconstruction for Fold {fold}...")
        visualize_reconstruction(test_loader, model, device, num_images=5)

    # Average Test Loss across all folds
    avg_test_loss = sum(fold_test_losses) / num_folds
    print(f"Average Test Loss across all folds for 'angle' feature: {avg_test_loss:.4f}")

    return fold_train_losses, fold_val_losses, fold_test_losses

In [None]:
# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Stratified splits path
stratified_dir = "./datasets/uniform_cross_validation_data"
num_folds = 3

labels = ['angle', 'EP1', 'NP', 'PP1']

# Train and evaluate the model across all folds
train_and_evaluate_kfold(CNNAutoencoder, stratified_dir, num_folds, device, features=labels , num_epochs=1, learning_rate=1e-3)