# **Mini-Project B2 Phase 2: Semantic Segmentation on Lung CT Scans**

In this lab, you will be segmenting CT scan images of lungs. We will be using a combination of the [LUng Nodule Analysis 2016 (LUNA) Competition dataset](https://luna16.grand-challenge.org/) and [Kaggle Data Science Bowl 2017 dataset](https://www.kaggle.com/c/data-science-bowl-2017).

The downloaded dataset can be found here: [Finding and Measuring Lungs in CT Data](https://www.kaggle.com/datasets/kmader/finding-lungs-in-ct-data).

**Problem Motivation:** Medical diagnosis and research often rely on the analysis of medical images to identify lesions and anomalies. In the context of lung cancer diagnosis, analyzing CT images is particularly challenging due to the complexity of grayscale medical scans. Accurate segmentation of the lungs is a critical first step in detecting potential abnormalities and improving disease detection.

**Lung CT Dataset Samples:**

    Number of Samples:
    ・266 2D Scans + 266 Corresponding Masks
    ・4 3D Scans + 4 Corresponding Masks
    ・lung_stats.csv



**Lung CT Annotated Data:**



Corresponding annotated data is stored in `lung_stats.csv`. We will explore it later in the lab.

    

# **1. Finding and Measuring Lungs in CT Dataset**

### Imports

In [None]:
import torch
from torchvision import transforms
from torch.utils.data import DataLoader, Dataset
from PIL import Image
import os
import time
import numpy as np

### Dataset Loading

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
image_dir = '/content/drive/Shareddrives/MiniProject-B1/Lung_CT/2d_images'
mask_dir = '/content/drive/Shareddrives/MiniProject-B1/Lung_CT/2d_masks'


## **1.1 Dataset Creation**

Similar to the Lab 3 Phase 3 Stanford Dataset you will need to create your inherit from ```Dataset``` class to load the dataset and overload the ```___init__()```, ```__len__()```, and ```__getitem__()``` functions.

In [None]:
class LungSegmentationDataset(Dataset):
    def __init__(self, image_dir, mask_dir, transform=None, mask_transform=None):
        """
        Args:
            image_dir (str): Directory with images.
            mask_dir (str): Directory with masks.
            transform (callable, optional): Transformations for images.
            mask_transform (callable, optional): Transformations for masks.
        """

        '''

        TODO

        '''

        return image, mask


Similar to Lab 3, we will use augmentations from the [`torchvision.transforms`](https://pytorch.org/vision/0.9/transforms.html) library.

You may choose to apply your own augmentations if you wish, but be careful with augmentations that are too strong as it may affect your model performance since the dataset is small.

In [None]:
image_transform = transforms.Compose([
    transforms.Resize((512, 512)),  # Ensure images are large enough for DeepLabV3
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

mask_transform = transforms.Compose([
    transforms.Resize((512, 512), interpolation=Image.NEAREST),
    transforms.ToTensor()
])

In [None]:
from torch.utils.data import random_split

# Since the dataset is small and images are large, we will use a small batch size for this lab
BATCH_SIZE = 4

# Initialize dataset
dataset = LungSegmentationDataset(
    image_dir=image_dir,
    mask_dir=mask_dir,
    transform=image_transform,
    mask_transform=mask_transform
)

# Split dataset into training and validation, decide on split % yourself!
train_size = ''' TODO '''
val_size = ''' TODO '''
train_dataset, val_dataset = random_split(''' TODO ''')

# Create DataLoaders
train_loader = ''' TODO '''
val_loader = ''' TODO '''

# Check the sizes of the splits
print(f"Training samples: {len(train_dataset)}, Validation samples: {len(val_dataset)}")


## **1.3 Visualize Dataset**

Let's first take a look at the 2D CT Scans and the masks in the datasets before segmentation. Below, display 4 CT images and their corresponding masks.

*Note:* This dataset does not contain classes! Dislay any 4 random images.

In [None]:
'''

TODO

'''

# **2. Train Segmentation Model**

You will train a deep learning model for lung segmentation using a U-Net architecture with a ResNet34 backbone. This model, widely used in medical image analysis, loads pretrained weights on ImageNet to improve performance and speed up training. Please import the library in the following way below to train your model:

Learn more about the [`segmentation_models_pytorch`](https://github.com/qubvel-org/segmentation_models.pytorch) models at the official documentation.

```python
import segmentation_models_pytorch as smp

model = smp.Unet(
    encoder_name="resnet34",      # Use a ResNet34 backbone
    encoder_weights="imagenet",  # Pretrained on ImageNet
    in_channels=3,               # RGB images
    classes=1                    # Single-class segmentation (binary mask)
)
```



## **2.1 Install and Run Model**

### **2.1.1 Install Required Libraries**

Before starting, ensure you have the necessary libraries installed:

In [None]:
 #!pip install segmentation-models-pytorch
 import segmentation_models_pytorch as smp

### **2.1.2 Train Model**

Import and load your model, then train the model for between 5 to 10 epochs.

Please print the epoch and training loss per epoch:

In [None]:
'''

TODO

'''

## **2.2 Visualize Model Results**

Create a function to pull a random image and visualize results of the model below. Please display:



*   Original CT Scan
*   Ground Truth Corresponding Mask
*   Segmentation Mask Results
*   Segmentation Overlay on Original CT Scan

Please plot all 4 images together with titles for easy visualization.



In [None]:
import random
import matplotlib.pyplot as plt
from PIL import Image
import torch
import numpy as np

def visualize_random_prediction(image_dir, mask_dir, model, transform, device):
    """
    Visualize a random image and mask from the dataset, along with model predictions.

    Args:
        image_dir (str): Path to the directory containing the images.
        mask_dir (str): Path to the directory containing the masks.
        model: Trained PyTorch model for segmentation.
        transform: Transformations to apply to the images before inference.
        device: Device (CPU or GPU) to run the model inference on.
    """

    return image, mask, pred_mask, img_file


In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

visualize_random_prediction(image_dir, mask_dir, model, image_transform, device)


## **2.3 Statistical Analysis from Segmentation Masks**

Segmentation plays a critical role in the medical field as it allows us to derive valuable information from images automatically. By analyzing segmentation masks, we can extract meaningful statistics that help assess regions of interest, such as lung areas, in CT scans.

In this section, you will calculate statistics from segmentation masks and compare predictions with ground truth data to evaluate the accuracy of the model.

**Hounsfield Unit:**
A Hounsfield Unit (HU) is a measure of the density of tissue in a CT scan. It represents how much a tissue attenuates X-rays, with air assigned a value of -1000 HU, water as 0 HU, and bone typically above +1000 HU. For the statistics below, use the pixel intensity values directly from the CT scan, as they are usually already in Hounsfield Units.

**Statistics From Segmentation:**

* `Lung Area in Pixels (lung_area_px)`:
The total number of pixels in the segmentation mask where the lung region is identified.

* `Lung Volume Fraction (lung_volume_fraction)`:
The fraction of lung pixels compared to the total image pixels. This provides an estimate of the proportion of the image occupied by the lung region.

* `Mean Hounsfield Unit (lung_mean_hu)`:
The average intensity value (Hounsfield Unit) inside the lung region. This gives a sense of the lung density, which can indicate abnormalities.

* `95th Percentile Hounsfield Unit (lung_pd95_hu)`:
The value below which 95% of the intensity values in the lung region fall. This statistic can help identify bright regions, such as nodules or lesions.

* `5th Percentile Hounsfield Unit (lung_pd05_hu)`:
The value below which 5% of the intensity values in the lung region fall. This can help identify dark areas, such as air pockets or hollow spaces.

**Write a Function to Calculate Statistics:**

Implement a Python function that calculates the above statistics for a given mask and corresponding CT image. Use the following steps:

1. Count the pixels in the mask where the lung is segmented.
2. Compute the mean, 95th percentile, and 5th percentile Hounsfield Units for the lung region.
3. Return these values as a tuple.

**Apply the Function:**

- Calculate the statistics for the predicted mask using the `calculate_statistics` function.

- Then, calculate the statistics for the ground truth mask using the same function.

- Compare the statistics calculated from the predicted mask with the ground truth statistics (from the CSV) and the statistics calculated from the ground truth mask.

In [None]:
import os
import random
import matplotlib.pyplot as plt
from PIL import Image
import torch
import numpy as np
import pandas as pd


def calculate_statistics(mask, ct_slice):
    """
    Calculate statistical metrics from a segmentation mask and corresponding CT slice.

    Args:
        mask (np.ndarray): Binary segmentation mask (0 for background, 1 for lung).
        ct_slice (np.ndarray): CT slice with intensity values (Hounsfield Units).

    Returns:
        tuple: (lung_area_px, lung_volume_fraction, lung_mean_hu, lung_pd95_hu, lung_pd05_hu)
    """

    '''

    TODO

    '''

    return lung_area_px, lung_volume_fraction, lung_mean_hu, lung_pd95_hu, lung_pd05_hu


def visualize_stats(image_dir, mask_dir, csv_path, model, transform, mask_transform, device):
    """
    Visualize a random image and mask, predict the segmentation mask, calculate statistics,
    and compare the results with the ground truth statistics from a CSV file.

    Args:
        image_dir (str): Path to the directory containing the images.
        mask_dir (str): Path to the directory containing the masks.
        csv_path (str): Path to the CSV file containing ground truth statistics.
        model: Trained PyTorch model for segmentation.
        transform: Transformations to apply to the images before inference.
        mask_transform: Transformations to apply to the mask before calculations.
        device: Device (CPU or GPU) to run the model inference on.
    """

    '''

    TODO

    '''


    return comparison_df



In [None]:
comparison_df = visualize_and_calculate_statistics(''' TODO ''')

# Visualize results!
comparison_df


## 2.4 Report

Compare the predicted statistics (from the predicted mask) with the ground truth statistics (from the CSV).

How close are the predictions to the ground truth?

Are there significant differences for specific metrics (e.g., lung area, mean HU)?

In your report, answer the above questions and reflect on the data you've collected in your calculated statistics.

ANSWER HERE

___
___
# End of MiniProject B2 Phase 2 😊🥳