# Ensemling

The following cell is installing the necessary Python libraries needed for the notebook.

1. `!pip install lightning`: This command installs the PyTorch Lightning library, which is a lightweight PyTorch wrapper for high-performance AI research. It simplifies the process of scaling and distributing models and provides high-level features for fast prototyping.

2. `!pip install segmentation-models-pytorch`: This command installs the `segmentation_models.pytorch` library, which is a collection of PyTorch implementations of various image segmentation models (like U-Net, FPN, etc.) with pre-trained encoders. This library provides a simple and customizable interface for different segmentation models in PyTorch.

In [1]:
!pip install lightning
!pip install segmentation-models-pytorch

Collecting lightning
  Downloading lightning-2.0.6-py3-none-any.whl (1.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m31.3 MB/s[0m eta [36m0:00:00[0m
Collecting croniter<1.5.0,>=1.3.0 (from lightning)
  Downloading croniter-1.4.1-py2.py3-none-any.whl (19 kB)
Collecting dateutils<2.0 (from lightning)
  Downloading dateutils-0.6.12-py2.py3-none-any.whl (5.7 kB)
Collecting deepdiff<8.0,>=5.7.0 (from lightning)
  Downloading deepdiff-6.3.1-py3-none-any.whl (70 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m70.7/70.7 kB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m
Collecting inquirer<5.0,>=2.10.0 (from lightning)
  Downloading inquirer-3.1.3-py3-none-any.whl (18 kB)
Collecting lightning-cloud>=0.5.37 (from lightning)
  Downloading lightning_cloud-0.5.37-py3-none-any.whl (596 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m596.7/596.7 kB[0m [31m39.5 MB/s[0m eta [36m0:00:00[0m
Collecting pyt

The next block of code is doing the following:

1. Importing necessary modules: `torch`, `lightning` (presumably an alias for `pytorch_lightning`), `segmentation_models_pytorch` (for various image segmentation models), `CosineAnnealingLR` and `ReduceLROnPlateau` (for learning rate scheduling), `AdamW` (for optimization), `nn` (for neural network operations), `DataLoader` (for loading and batching data), `dice` (for performance evaluation), `sys` (for system-specific parameters and functions), `pandas` (for data manipulation and analysis), and `yaml` (for YAML file handling).

2. Updating the System Path: The `sys.path.append` lines of code are used to add the directories where additional Python modules or packages are stored. Here, it includes paths for pre-trained models, efficientnet models, segmentation models, timm pre-trained models, and checkpoint files. Adding these paths to `sys.path` allows Python to find and import these modules/packages when necessary.

In [2]:
import torch
import lightning as l
import segmentation_models_pytorch as smp
from torch.optim.lr_scheduler import CosineAnnealingLR, ReduceLROnPlateau
from torch.optim import AdamW
import torch.nn as nn
from torch.utils.data import DataLoader
from torchmetrics.functional import dice
import sys
import pandas as pd
import yaml


sys.path.append("../input/pretrained-models-pytorch")
sys.path.append("../input/efficientnet-pytorch")
sys.path.append("/kaggle/input/smp-github/segmentation_models.pytorch-master")
sys.path.append("/kaggle/input/timm-pretrained-resnest/resnest/")
sys.path.append("/kaggle/input/checkpoints-unet-resnest101e")
sys.path.append("/kaggle/input/checkpoint-deeplabv3plus")

caused by: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io_plugins.so: undefined symbol: _ZN3tsl6StatusC1EN10tensorflow5error4CodeESt17basic_string_viewIcSt11char_traitsIcEENS_14SourceLocationE']
caused by: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io.so: undefined symbol: _ZTVN10tensorflow13GcsFileSystemE']


The next cell defines a PyTorch Dataset class, named `ContrailsDataset`, which is utilized to preprocess and provide data for the model training in a structured format.

1. The `__init__` method initializes the dataset object with a dataframe, `df`, which contains the data information. It also sets an image size and a flag `train` to differentiate between train and validation datasets. The normalization parameters for images are also defined here. If the image size is not 256, a resizing transform is created.

2. The `__getitem__` method defines how to fetch one piece of data (an image-label pair). It fetches a row from the dataframe based on an index, reads the image file from the path specified in the row, separates the image and label data, converts them into PyTorch tensors, reshapes and reorders the dimensions of the image tensor, resizes the image if necessary, normalizes the image, and then returns the image and label.

3. The `__len__` method returns the total number of items in the dataset (which is the length of the dataframe).

In essence, this class provides a custom way of accessing and processing the Contrails dataset, tailored to work well with PyTorch's DataLoader and the other components of the training framework.

In [4]:
# Dataset

import torch
import numpy as np
import torchvision.transforms as T

class ContrailsDataset(torch.utils.data.Dataset):
    def __init__(self, df, image_size=256, train=True):

        self.df = df
        self.trn = train
        self.normalize_image = T.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))
        self.image_size = image_size
        if image_size != 256:
            self.resize_image = T.transforms.Resize(image_size)

    def __getitem__(self, index):
        row = self.df.iloc[index]
        con_path = row.path
        con = np.load(str(con_path))

        img = con[..., :-1]
        label = con[..., -1]

        label = torch.tensor(label)

        img = torch.tensor(np.reshape(img, (256, 256, 3))).to(torch.float32).permute(2, 0, 1)

        if self.image_size != 256:
            img = self.resize_image(img)

        img = self.normalize_image(img)

        return img.float(), label.float()

    def __len__(self):
        return len(self.df)

## Ensemble Part

### Model One

This first model code defines a custom PyTorch Lightning Module called `LightningModule_Unet_resnest_101`, which encapsulates a U-Net model with ResNeSt101 encoder and its training logic. The U-Net model is a commonly used architecture for image segmentation tasks.

1. In the `__init__` method, the U-Net model is defined with "timm-resnest101e" as the encoder, without using pretrained weights. The model takes 3-channel images as input and outputs a single-channel mask. The activation function is not specified. The Dice loss function is selected for computing the binary segmentation loss with a smoothing factor of 1.0. Two lists, `val_step_outputs` and `val_step_labels`, are also initialized for storing predictions and labels during validation, respectively.

2. The `forward` method passes an input batch through the model and returns the model's output.

3. In the `validation_step` method, the model's forward pass is called on the input images, the output is resized to 256x256 if necessary, the loss is computed by comparing predictions with the ground truth labels, and the loss is logged. The predictions and labels are stored for further use at the end of the validation epoch.

4. `on_validation_epoch_end` method calculates the Dice score, a common metric for image segmentation tasks, using the predictions and labels accumulated during the validation steps. The calculated Dice score is logged. If the current process is the main one in distributed computing settings (i.e., `self.trainer.global_rank == 0`), the current epoch number is printed.

In [5]:
import lightning as l

class LightningModule_Unet_resnest_101(l.LightningModule):

    def __init__(self):
        super().__init__()
        self.model = smp.Unet(encoder_name="timm-resnest101e",
                              encoder_weights=None,
                              in_channels=3,
                              classes=1,
                              activation=None,
                              )
        self.loss_module = smp.losses.DiceLoss(mode="binary", smooth=1.0)
        self.val_step_outputs = []
        self.val_step_labels = []


    def forward(self, batch):
        return self.model(batch)
    
    def validation_step(self, batch, batch_idx):
        imgs, labels = batch
        preds = self.model(imgs)
        preds = torch.nn.functional.interpolate(preds, size=256, mode='bilinear')
        loss = self.loss_module(preds, labels)
        self.log("val_loss", loss, on_step=False, on_epoch=True, prog_bar=True)
        self.val_step_outputs.append(preds)
        self.val_step_labels.append(labels)

    def on_validation_epoch_end(self):
        all_preds = torch.cat(self.val_step_outputs)
        all_labels = torch.cat(self.val_step_labels)
        all_preds = torch.sigmoid(all_preds)
        self.val_step_outputs.clear()
        self.val_step_labels.clear()
        val_dice = dice(all_preds, all_labels.long())
        self.log("val_dice", val_dice, on_step=False, on_epoch=True, prog_bar=True)
        if self.trainer.global_rank == 0:
            print(f"\nEpoch: {self.current_epoch}", flush=True)

#### Model two

The second model code defines a PyTorch Lightning Module called `LightningModule_DeepLabV3Plus`, which encapsulates the DeepLabV3+ model for semantic image segmentation.

1. In the `__init__` method, the DeepLabV3+ model is defined with "tu-resnest26d" as the encoder, without pretrained weights. The model takes images with three channels as input and outputs a one-channel mask. No activation function is specified. The Dice loss function, which is a common choice for binary segmentation tasks, is chosen as the loss function with a smoothing factor of 1.0. Two lists, `val_step_outputs` and `val_step_labels`, are also initialized for storing predictions and labels during validation, respectively.

2. The `forward` method returns the output from passing the input batch through the model.

3. In the `validation_step` method, a forward pass of the model is performed on the input images, the output is interpolated to a size of 256x256 if needed, and the loss is calculated by comparing predictions with the ground truth labels. This loss is logged. The model's predictions and labels are stored for further use at the end of the validation epoch.

4. The `on_validation_epoch_end` method concatenates all the predictions and labels collected during validation, computes the sigmoid function on the predictions to convert them into probabilities, and then computes the Dice score, a popular metric for segmentation tasks. The calculated Dice score is logged. If the current process is the main one in distributed computing settings (i.e., `self.trainer.global_rank == 0`), the current epoch number is printed.

In [6]:
class LightningModule_DeepLabV3Plus(l.LightningModule):

    def __init__(self):
        super().__init__()
        self.model = smp.DeepLabV3Plus(encoder_name="tu-resnest26d", 
                              encoder_weights=None,
                              in_channels=3,
                              classes=1,
                              activation=None,
                              )

        self.loss_module = smp.losses.DiceLoss(mode="binary", smooth=1.0)
        self.val_step_outputs = []
        self.val_step_labels = []


    def forward(self, batch):
        return self.model(batch)
    
    def validation_step(self, batch, batch_idx):
        imgs, labels = batch
        preds = self.model(imgs)
        preds = torch.nn.functional.interpolate(preds, size=256, mode='bilinear')
        loss = self.loss_module(preds, labels)
        self.log("val_loss", loss, on_step=False, on_epoch=True, prog_bar=True)
        self.val_step_outputs.append(preds)
        self.val_step_labels.append(labels)

    def on_validation_epoch_end(self):
        all_preds = torch.cat(self.val_step_outputs)
        all_labels = torch.cat(self.val_step_labels)
        all_preds = torch.sigmoid(all_preds)
        self.val_step_outputs.clear()
        self.val_step_labels.clear()
        val_dice = dice(all_preds, all_labels.long())
        self.log("val_dice", val_dice, on_step=False, on_epoch=True, prog_bar=True)
        if self.trainer.global_rank == 0:
            print(f"\nEpoch: {self.current_epoch}", flush=True)

# Ensembling

##### From https://github.com/Lightning-AI/lightning/discussions/7249 averaging/ stacking with n pretrained models    

Can be used to implement simple weighted average predictions/weighted average ensembling

The next code defines a PyTorch Lightning Module named `MyEnsemble`, which encapsulates an ensemble model composed of two previously trained models.

1. In the `__init__` method, the two models and their corresponding weights are passed in. These models are assigned to `self.modelA` and `self.modelB`, and the weights are assigned to `self.weight_model_one` and `self.weight_model_two`. Both models are frozen to prevent changes during further training. The Dice loss function is selected as the loss function for this ensemble model with a smoothing factor of 1.0. The hyperparameters (except for the two models) are saved. Two lists, `val_step_outputs` and `val_step_labels`, are also initialized for storing predictions and labels during validation, respectively.

2. The `configure_optimizers` method defines the Adam optimizer with a learning rate of 1e-3 to be used for training the model.

3. The `forward` method takes an input `x`, performs a forward pass through both models (`modelA` and `modelB`), multiplies the respective outputs with the model weights, and sums the results to generate the final output.

4. In the `validation_step` method, the ensemble model's forward method is used on the input images, the output is interpolated to a size of 256x256 if needed, and the loss is calculated by comparing the predictions with the ground truth labels. This loss is logged. The ensemble model's predictions and the corresponding labels are stored for use at the end of the validation epoch.

5. The `on_validation_epoch_end` method concatenates all the predictions and labels collected during the validation, applies the sigmoid function on the predictions to convert them into probabilities, and calculates the Dice score, a common metric for segmentation tasks. The calculated Dice score is logged. If the current process is the main one in distributed computing settings (i.e., `self.trainer.global_rank == 0`), the current epoch number is printed.

In [7]:
class MyEnsemble(l.LightningModule):
    def __init__(self, model_one, model_two, weight_model_one, weight_model_two):
        super(MyEnsemble, self).__init__()
        self.modelA = model_one
        self.modelB = model_two
        self.weight_model_two = weight_model_two
        self.weight_model_one = weight_model_one
        self.modelA.freeze()
        self.modelB.freeze()
        self.loss_module = smp.losses.DiceLoss(mode="binary", smooth=1.0)
        self.save_hyperparameters(ignore=['model_one','model_two'])
        self.val_step_outputs = []
        self.val_step_labels = []

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)
        return optimizer

    def forward(self, x):
        x1 = self.modelA(x)
        x2 = self.modelB(x)
        result = x1*self.weight_model_one + x2*self.weight_model_two
        return result
    
    def validation_step(self, batch, batch_idx):
        imgs, labels = batch
        preds = self.forward(imgs)
        preds = torch.nn.functional.interpolate(preds, size=256, mode='bilinear')
        loss = self.loss_module(preds, labels)
        self.log("val_loss", loss, on_step=False, on_epoch=True, prog_bar=True)
        self.val_step_outputs.append(preds)
        self.val_step_labels.append(labels)

    def on_validation_epoch_end(self):
        all_preds = torch.cat(self.val_step_outputs)
        all_labels = torch.cat(self.val_step_labels)
        all_preds = torch.sigmoid(all_preds)
        self.val_step_outputs.clear()
        self.val_step_labels.clear()
        val_dice = dice(all_preds, all_labels.long())
        self.log("val_dice", val_dice, on_step=False, on_epoch=True, prog_bar=True)
        if self.trainer.global_rank == 0:
            print(f"\nEpoch: {self.current_epoch}", flush=True)

This last block first loads the two pre-trained models, the U-net model with a ResNeSt-101 encoder (`model_resnest_101`), and the DeepLabV3Plus model (`Deeplab`), from saved checkpoints. 

It then combines these two models into an ensemble model (`model`) using the `MyEnsemble` class. The weights for the ensemble model are set as equal (0.5 each), meaning that the ensemble's output is the average of the output of the two models.

Next, it prepares the validation dataset:

1. It reads the validation dataframe from a CSV file and appends the directory path to the record IDs to form the complete path for each .npy file.

2. This dataframe is then used to instantiate an object of the `ContrailsDataset` class, which creates a PyTorch dataset for validation. The image size is set to 384 and the `train` flag is set to `False`, indicating this is for validation/testing, not for training.

3. A DataLoader is then created for the validation dataset, with a batch size of 40. The DataLoader is set not to shuffle the data (since this is for validation, not training), and to use 2 worker processes for data loading.

Finally, a PyTorch Lightning `Trainer` is created, and the `validate` method is called on it with the ensemble model and the validation dataloader, performing validation on the ensemble model with the validation dataset. The validation results will include the logged metrics (e.g., validation loss and Dice score) printed to the console or saved in the Lightning logs.

In [8]:
model_resnest_101 = LightningModule_Unet_resnest_101().load_from_checkpoint("/kaggle/input/checkpoints-unet-resnest101e/model.ckpt")
Deeplab = LightningModule_DeepLabV3Plus().load_from_checkpoint("/kaggle/input/checkpoint-deeplabv3plus/model (1).ckpt")

model = MyEnsemble(model_resnest_101, Deeplab, 0.5, 0.5)

valid_df = pd.read_csv("/kaggle/input/contrails-images-ash-color/valid_df.csv")
valid_df["path"] = "/kaggle/input/contrails-images-ash-color/contrails/" + valid_df["record_id"].astype(str) + ".npy"

dataset_validation = ContrailsDataset(valid_df, 384, train=False)
data_loader_validation = DataLoader(
    dataset_validation,
    batch_size=40,
    shuffle=False,
    num_workers=2,
)

trainer = l.Trainer()
# trainer.validate(model_resnest_101, dataloaders=data_loader_validation)
trainer.validate(model, dataloaders=data_loader_validation)

INFO: GPU available: True (cuda), used: True
INFO: TPU available: False, using: 0 TPU cores
INFO: IPU available: False, using: 0 IPUs
INFO: HPU available: False, using: 0 HPUs
INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Validation: 0it [00:00, ?it/s]




Epoch: 0


[{'val_loss': 0.5468075275421143, 'val_dice': 0.4649423360824585}]

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

## Ideas

Try adding/weigthing models before softmax