# Analysis of Segmentation Model Training

This document summarizes the key findings and performance analysis across three distinct iterations of the multi-class lung abnormality segmentation model. The goal of this iterative process is to systematically improve the model's ability to accurately segment healthy lung tissue, COVID-19 infections, and non-COVID infections from chest X-ray images.

## Project Overview and Dataset

- **Task:** Multi-class semantic segmentation of chest X-rays, classifying each pixel into one of four categories:
    - **Class 0:** Background
    - **Class 1:** Healthy Lung Tissue
    - **Class 2:** COVID-19 Infection
    - **Class 3:** Non-COVID Infection
- **Motivation:** Practical application of deep learning concepts to a real-world medical imaging problem, aiming to develop a diagnostic tool for clinicians.
- **Dataset:** COVID-QU-Ex Dataset
    - Diverse collection of images
    - Pixel-level annotations for lung boundaries and infection regions
    - Distinct classes for COVID-19 and non-COVID infections

## Iteration 1: U-Net with ResNet50 Backbone

### Model and Task Overview

- Baseline model: U-Net architecture with ResNet50 backbone
- Objective: Establish pipeline functionality and initial segmentation performance

### Training and Validation Trends

  <table>
  <tr>
    <td align="center">
      <b>Categorical Accuracy</b><br>
      <img src="../images/segmentation_model/iter_1_epoch_categorical_accuracy.jpg" alt="Categorical Accuracy" width="350"/><br>
      <em>Training and validation categorical accuracy over epochs for Iteration 1.<br>Y-axis: Categorical Accuracy, X-axis: Epoch</em>
      <br>
      Steady increase in both training and validation accuracy; curves converge, indicating no overfitting.
    </td>
    <td align="center">
      <b>Dice Coefficient</b><br>
      <img src="../images/segmentation_model/iter_1_epoch_dice_coeff_multi_class.jpg" alt="Dice Coefficient" width="350"/><br>
      <em>Training and validation Dice coefficient over epochs for Iteration 1.<br>Y-axis: Dice Coefficient, X-axis: Epoch</em>
      <br>
      Consistent upward trend for both training and validation; confirms improving segmentation quality.
    </td>
  </tr>
</table>


### Initial Performance Metrics

- **Test Loss:** 0.5418
- **Test Categorical Accuracy:** 0.9422
- **Test Dice Coefficient (Mean across classes):** 0.6066

**Summary:**  
Model segments lungs from background well, but average Dice across all classes (including large healthy lung and background regions) does not reflect performance on smaller infection segments. Categorical accuracy at 94% to be expected for a dataset where the majority of pixels are background, and not informative of class performance. Having confirmed the model and training environment could be successfully established using A100 GPUs in Google Colab, class performance metrics were added for the next iteration of the model. 

## Iteration 2: U-Net with ResNet50 Backbone and analysis of Per-class Performance

### Model and Task Overview

- Focus: Per-class performance analysis
- Challenge: Identify how the model is performing on individual classes prior to tuning the model and/or exploring data augmentation. 

### Training and Validation Graphs

- **Categorical Accuracy & Dice Coefficient:**  
  As expected as no changes to previous model, training is performing similarly and improving during successive epochs.

<table>
  <tr>
    <td align="center">
      <b>Categorical Accuracy</b><br>
      <img src="../images/segmentation_model/epoch_categorical_accuracy.jpg" width="350"/><br>
      <em>Training and validation Categorical Accuracy - Iteration 2.<br>Y-axis: Categorical Accuracy, X-axis: Epoch</em>
    </td>
    <td align="center">
      <b>Dice Coefficient</b><br>
      <img src="../images/segmentation_model/epoch_dice_coeff_multi_class.jpg" width="350"/><br>
      <em>Training and validation Dice Coefficient - Iteration 2.<br>Y-axis: Dice Coefficient, X-axis: Epoch</em>
    </td>
  </tr>
</table>


- **IoU per Class:**

<table>
  <tr>
    <td align="center">
      <b>Background</b><br>
      <img src="../images/segmentation_model/epoch_iou_background.jpg" width="250"/><br>
      <em>IoU for Background class</em>
    </td>
    <td align="center">
      <b>Healthy Lung</b><br>
      <img src="../images/segmentation_model/epoch_iou_healthy_lung.jpg" width="250"/><br>
      <em>IoU for Healthy Lung class</em>
    </td>
  </tr>
  <tr>
    <td align="center">
      <b>COVID-19</b><br>
      <img src="../images/segmentation_model/epoch_iou_covid.jpg" width="250"/><br>
      <em>IoU for COVID-19 class</em>
    </td>
    <td align="center">
      <b>Non-COVID</b><br>
      <img src="../images/segmentation_model/epoch_iou_non_covid.jpg" width="250"/><br>
      <em>IoU for Non-COVID class</em>
    </td>
  </tr>
</table>
<i>x-axis = epoch, y-axis = Class IoU. Orange = Training Set, Blue = Validation Set</i> 

### Per-Class Performance Metrics

| Metric                | Value   |
|-----------------------|---------|
| Test Loss             | 0.6934  |
| Dice Coefficient (Mean)| 0.5930 |
| IoU (Background)      | 0.9556  |
| IoU (Healthy Lung)    | 0.6601  |
| IoU (COVID-19)        | 1.0000  |
| IoU (Non-COVID)       | 0.5623  |

**Analysis of Class Performance:**
**Background**: Very high IoU (0.9556) is expected because most pixels in chest X-ray masks are background, making it easy for the model to segment.

**Healthy Lung**: Moderate IoU (0.6601) reflects the challenge of distinguishing healthy lung tissue from other classes, especially when masks overlap or boundaries are subtle.

**COVID-19**: IoU reaches 1.0 very early in training. This is likely because:
COVID-19 masks are rare and small, so the model can quickly learn to predict "no COVID" for most images, which matches the ground truth for the majority of cases.
When the model predicts no COVID pixels, it gets a perfect score on images without COVID, which dominate the dataset.
If the validation set is imbalanced, the metric can be misleadingly high even if the model fails to segment actual COVID regions.

**Non-COVID**: Lower and volatile IoU (0.5623) shows the model struggles with this class, likely due to less distinct features and overlap with healthy lung regions.

**Summary:**
The rapid rise to IoU=1.0 for COVID-19 is an artifact of class imbalance. The model learns to predict the absence of COVID-19, which is correct for most images, but does not mean it is good at segmenting actual COVID-19 regions. This highlights the need for careful metric interpretation and possibly more balanced data or targeted evaluation. To explore what this means for the model, covid-positive images had masks predicted by the model, with no covid pixels being observed for any in the predicted masks.

<b>Predictions Using Model Iter2 on COVID-19 Images Fails To Predict Masks:</b><br>
<img src="../images/segmentation_model/iter2_covid_pred.jpg" width="500"/><br>
<em> The perfect IOU score for COVID-19 is a red-flag the model is not handling COVID cases correctly. When predicting these masks, it is clearly not predicting an COVID-19 pixels (red).

**Plan for next Iteration**:
- Use class weighting to penalise the model incorrectly assigning COVID-19 mask pixels.
- Applying these class weights to the loss function should enable model to learn COVID-19 mask features.



## Iteration 3: U-Net with ResNet50 Backbone and Class Weights

### Model and Task Overview

- Focus: Attempt to overcome the challenges of sparse COVID mask pixels previously causing the model to reach false 1.0 IOU rapidly for COVID-19 during learning.
- Challenge: Class imbalance and mask size discrepancy (COVID-19 masks are small and fragmented)
- Solution: Weighted loss function to emphasize challenging classes

#### Class Weights Used

| Class         | Weight |
|---------------|--------|
| Background    | 0.1    |
| Healthy Lung  | 3.0    |
| COVID-19      | 10.0   |
| Non-COVID     | 3.0    |

### Training and Validation Graphs

  <b>Dice Coefficient</b><br>
      <img src="../images/segmentation_model/iter3/epoch_dice_coeff_multi_class.jpg" width="350"/><br>
      <em>Training and validation Dice Coefficient - Iteration 2.<br>Y-axis: Dice Coefficient, X-axis: Epoch. Orange = Training Set, Blue = Validation Set</em>

The Dice Co-efficient performs worse than in the previous iteration, only achieving a max of ~0.44. IoU per class was examined to identify what was impacting model performance for its key metric. 

- **IoU per Class:**


<table>
  <tr>
    <td align="center">
      <b>Background</b><br>
      <img src="../images/segmentation_model/iter3/epoch_iou_background.jpg" width="250"/><br>
      <em>IoU for Background class</em>
    </td>
    <td align="center">
      <b>Healthy Lung</b><br>
      <img src="../images/segmentation_model/iter3/epoch_iou_healthy_lung.jpg" width="250"/><br>
      <em>IoU for Healthy Lung class</em>
    </td>
  </tr>
  <tr>
    <td align="center">
      <b>COVID-19</b><br>
      <img src="../images/segmentation_model/iter3/epoch_iou_covid.jpg" width="250"/><br>
      <em>IoU for COVID-19 class</em>
    </td>
    <td align="center">
      <b>Non-COVID</b><br>
      <img src="../images/segmentation_model/iter3/epoch_iou_non_covid.jpg" width="250"/><br>
      <em>IoU for Non-COVID class</em>
    </td>
  </tr>
</table>
<i>x-axis = epoch, y-axis = Class IoU. Orange = Training Set, Blue = Validation Set</i> 

- **Background:**  
  IoU for the background class rises quickly and stabilizes above 0.9, indicating the model segments background pixels with high reliability.

- **Healthy Lung:**  
  IoU for healthy lung tissue steadily increases and plateaus around 0.55–0.6, showing moderate segmentation performance and improvement over training.

- **COVID-19:**  
  IoU for COVID-19 remains extremely low (near zero) throughout training, despite increased class weighting. This confirms the model still struggles to identify COVID-19 regions, likely due to class imbalance and mask sparsity.

- **Non-COVID:**  
  IoU for non-COVID infection regions improves gradually but remains below 0.45, indicating the model finds this class challenging and segmentation is less accurate compared to background and healthy lung.

*Summary:*  
The new graphs show that, even with class weighting, the model achieves strong segmentation for background and moderate results for healthy lung, but continues to perform poorly on COVID-19 and non-COVID infection classes.

**Conclusions from Analysis of Iteration 3**

Despite implementing class weighting to address the challenge of sparse COVID-19 mask pixels, the model continued to perform poorly in segmenting COVID-19 regions. The IoU for COVID-19 remained extremely low throughout training, indicating that the model was unable to learn meaningful features for this class, likely due to severe class imbalance and mask sparsity. Attempts to improve performance by unfreezing additional layers of the ResNet50 backbone did not yield significant improvements. These results suggest that further optimization of the segmentation model may not be feasible within the current resource and time constraints. As a result, it is necessary to reconsider the project's aim and scope, potentially shifting focus toward classification-based or hybrid approaches that are more practical given the limitations.

## Next Steps and Future Directions

Based on the analysis of Iteration 3, several challenges remain for multi-class segmentation, particularly for the COVID-19 and non-COVID infection classes. Despite attempts to improve performance by introducing class weighting and unfreezing additional layers of the ResNet50 backbone, the model continued to struggle with segmenting rare and small mask regions. These approaches also significantly increased computational demands, making further optimization impractical given available resources and time constraints.

Given these limitations, the following paths forward were considered:

### 1. Shift to a Classification-Based Model - SELECTED PATH FORWARD

- The segmentation model demonstrated strong ability to distinguish the presence or absence of COVID-19, but struggled with precise mask prediction for rare classes.
- A classification-based approach is less computationally intensive and more feasible for deployment within current resource constraints.
- This approach leverages the model's strength in identifying COVID-19 cases and provides a practical solution for clinical use.

### 2. Obtain More Granular Annotation Data

- Improving segmentation performance for non-COVID infections would require more detailed and specific mask annotations.
- Sourcing or generating additional datasets with granular sub-masks for non-COVID infections would be time-consuming and resource-intensive.
- Given current constraints, this option was not pursued further.

### 3. Implement a Hybrid Model Architecture

- A two-stage model could first classify images as Healthy, COVID-19, or Non-COVID.
- If COVID-19 is detected, a secondary segmentation model could be triggered to provide detailed mask predictions.
- This hybrid approach balances computational efficiency with the need for detailed analysis in complex cases.

**Conclusion:**  
Due to the computational power and time required to further augment and annotate data, and the limited improvement seen with segmentation, it was decided to shift the project focus toward a classification-based model. This change in scope aims to deliver a practical and deployable solution within available resources, while still addressing the core clinical need. Notebook 03 details training of the classification model.