# Ovarian Cancer CT Segmentation Lab

Welcome to the ovarian cancer segmentation lab!  
In this lab, you will learn how to:
- Load and explore a real-world medical imaging dataset (CT scans and segmentation masks)
- Visualize and analyze the data
- Prepare the data for deep learning
- (Optionally) Train a segmentation model

**Dataset:**  
A subset of 50 ovarian cancer CT scans and their segmentation masks, pre-packaged for this lab.

Let's get started!

## 2. Mount Google Drive and Extract Data

We will use Google Drive to access the dataset.  
Make sure you have uploaded your `Data_Subsample.zip` file to your Google Drive.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

# List files in your Drive to confirm the zip location
!ls /content/drive/MyDrive

# Unzip the data (adjust the path if your zip is in a subfolder)
!unzip -o /content/drive/MyDrive/Data_Subsample.zip

## 3. Install Required Packages

We need some Python packages for medical image processing and deep learning.

In [None]:
!pip install nibabel scikit-learn torch torchvision matplotlib seaborn tqdm

## 4. Imports and Setup

Let's import the necessary libraries and set up our environment.

In [None]:
import os
import numpy as np
import torch
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms
import nibabel as nib
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm.notebook import tqdm

# For reproducibility
torch.manual_seed(42)
np.random.seed(42)

## 5. Dataset Class and Data Loading

We define a PyTorch Dataset class to load CT scans and their corresponding segmentation masks.

In [None]:
class OvarianDataset(Dataset):
    def __init__(self, ct_dir, mask_dir, transform=None):
        self.ct_files = sorted([f for f in os.listdir(ct_dir) if f.endswith('.nii.gz')])
        self.ct_dir = ct_dir
        self.mask_dir = mask_dir
        self.transform = transform

    def __len__(self):
        return len(self.ct_files)

    def __getitem__(self, idx):
        ct_path = os.path.join(self.ct_dir, self.ct_files[idx])
        mask_path = os.path.join(self.mask_dir, self.ct_files[idx])
        ct_img = nib.load(ct_path).get_fdata()
        mask = nib.load(mask_path).get_fdata()
        ct_img = (ct_img - ct_img.min()) / (ct_img.max() - ct_img.min())
        ct_img = torch.from_numpy(ct_img).float().unsqueeze(0)
        mask = torch.from_numpy(mask).long()
        if self.transform:
            ct_img = self.transform(ct_img)
        return ct_img, mask

# Set up paths
ct_dir = 'Data_Subsample/CT'
mask_dir = 'Data_Subsample/Segmentation'

dataset = OvarianDataset(ct_dir, mask_dir)
print(f"Total cases: {len(dataset)}")

## 6. Exploratory Data Analysis (EDA)

Let's visualize the intensity distribution and a few slices from the CT scans and masks.

In [None]:
def analyze_intensity_distribution(ct_img, mask):
    plt.figure(figsize=(15, 5))
    plt.subplot(1, 2, 1)
    sns.histplot(ct_img.flatten(), bins=50)
    plt.title('CT Intensity Distribution')
    plt.xlabel('Intensity')
    plt.subplot(1, 2, 2)
    sns.histplot(mask.flatten(), bins=2)
    plt.title('Mask Distribution')
    plt.xlabel('Class')
    plt.show()

def show_slices(ct_img, mask, num_slices=3):
    middle = ct_img.shape[2] // 2
    step = ct_img.shape[2] // (num_slices + 1)
    plt.figure(figsize=(15, 4 * num_slices))
    for i in range(num_slices):
        idx = middle + (i - num_slices // 2) * step
        plt.subplot(num_slices, 3, i*3 + 1)
        plt.imshow(ct_img[0, :, :, idx], cmap='gray')
        plt.title(f'CT Slice {idx}')
        plt.axis('off')
        plt.subplot(num_slices, 3, i*3 + 2)
        plt.imshow(mask[:, :, idx], cmap='gray')
        plt.title(f'Mask Slice {idx}')
        plt.axis('off')
        plt.subplot(num_slices, 3, i*3 + 3)
        plt.imshow(ct_img[0, :, :, idx], cmap='gray')
        plt.imshow(mask[:, :, idx], alpha=0.3, cmap='Reds')
        plt.title(f'Overlay Slice {idx}')
        plt.axis('off')
    plt.tight_layout()
    plt.show()

# Visualize a sample
ct, mask = dataset[0]
analyze_intensity_distribution(ct.numpy(), mask.numpy())
show_slices(ct.numpy(), mask.numpy())

## 7. DataLoader for Model Training

Let's prepare a DataLoader for batching and shuffling the data, ready for model training.

In [None]:
transform = None  # You can add transforms if needed

dataloader = DataLoader(
    dataset,
    batch_size=4,
    shuffle=True,
    num_workers=2
)

# Check a batch
for ct_batch, mask_batch in dataloader:
    print(f"CT batch shape: {ct_batch.shape}")
    print(f"Mask batch shape: {mask_batch.shape}")
    break

## 8. Next Steps

You are now ready to:
- Explore the data further
- Implement and train a segmentation model (e.g., U-Net)
- Evaluate your results

**Tip:**  
You can add more cells for model definition, training, and evaluation as needed!

---

**Congratulations! You have set up a real-world ovarian cancer segmentation lab in Colab!**