<h1 dir=ltr align=center style="line-height:200%;font-family:sans-serif;color:#0099cc">
<font face="sans-serif" color="#0099cc">
Ultravision Operations
</font>
</h1>

<h2 dir=ltr align=left style="line-height:200%;font-family:sans-serif;color:#0099cc">
<font face="sans-serif" color="#0099cc">
Introduction and Problem Statement
</font>
</h2>


<p dir=ltr style="direction: ltr; text-align: justify; line-height:200%; font-family:sans-serif; font-size:medium">
<font face="sans-serif">
Welcome to the final stage of the Quera Image Processing and Computer Vision Olympiad! A place where pixels find meaning, models make decisions, and you are the commander of a great scientific mission. In this stage, you will face a multi-part project; a mission that challenges your intelligence, precision, and creativity in the segmentation, classification, and analysis of medical images.
</font>
</p>

<h2 dir=ltr align=left style="line-height:200%;font-family:sans-serif;color:#0099cc">
<font face="sans-serif" color="#0099cc">
Dataset Introduction
</font>
</h2>

<p dir=ltr style="direction: ltr; text-align: justify; line-height:200%; font-family:sans-serif; font-size:medium">
<font face="sans-serif" size=3>
 In the project's root folder, there are two folders named <code>train</code> and <code>test</code>.
Each folder contains the images for the training and test sets.

<p dir=ltr style="direction: ltr; text-align: justify; line-height:200%; font-family:sans-serif; font-size:medium">
<font face="sans-serif" size=3>
The train folder contains all the training images. In this folder, there is another file named <code>train.csv</code>, in which the label for each image is specified in <code>One_Hot</code> format.

<p dir=ltr style="direction: ltr; text-align: justify; line-height:200%; font-family:sans-serif; font-size:medium">
<font face="sans-serif" size=3>
 The training set includes three classes: benign, malignant, and normal. Within each class folder, there are two separate subfolders named images and masks; the images folder contains the original ultrasound images, and the masks folder contains the corresponding segmented masks for those images. The name of the mask file is exactly the same as the original image, with only the suffix _mask added to the end of its name (e.g., the image benign (1).png has the mask benign (1)_mask.png). This structure allows you to easily access the images and masks for each class in the training sets and use them to train and evaluate classification or segmentation models.

<p dir=ltr style="direction: ltr; text-align: justify; line-height:200%; font-family:sans-serif; font-size:medium">
<font face="sans-serif" size=3>
The test folder also contains all the test images. Additionally, a file named <code>test.csv</code> is located in this folder. This file contains the names of the test set images, but their labels are unknown. You must predict the label for each image in the order they appear in this file.
</font>
</p>

<h4 dir=ltr align=left style="line-height:200%;font-family:sans-serif;color:#0099cc">
<font face="sans-serif" color="#0099cc">
Introduction to the Training Dataset (train)
</font>
</h4>

<div dir="ltr">
<p dir=ltr style="direction: ltr; text-align: justify; line-height:200%; font-family:sans-serif; font-size:medium">
<font face="sans-serif" size=3>

The <code>train</code> set has 624 images with average dimensions of 500x500. The segmented images for each class are located in the mask folder for that category.

The <code>train.csv</code> file is where the name of each image and its label are specified in <code>one_hot</code> format. This file is as follows.

<center>
<div dir=ltr style="direction: ltr;line-height:200%;font-family:sans-serif;font-size:medium">
<font face="sans-serif" size=3>
    

| Column Name | Explanation |
|:---: |:---: |
| `image` | Image name|
| `class_benign` | Image contains a benign tumor. |
| `class_malignant` |Image contains a malignant tumor.|
| `class_normal` |Image has no tumor.|


</font>
</div>
</center>
</div>

<h4 dir=ltr align=left style="line-height:200%;font-family:sans-serif;color:#0099cc">
<font face="sans-serif" color="#0099cc">
Introduction to the Test Dataset (test)
</font>
</h4>

<div dir="ltr">
<p dir=ltr style="direction: ltr; text-align: justify; line-height:200%; font-family:sans-serif; font-size:medium">
<font face="sans-serif" size=3>

The <code>test</code> set has 156 images with dimensions of 500x500.

In the <code>test.csv</code> file, the names of the test set images are listed. At the end, you must make your predictions based on this file. This file is as follows:

<center>
<div dir=ltr style="direction: ltr;line-height:200%;font-family:sans-serif;font-size:medium">
<font face="sans-serif" size=3>
    

| Column Name | Explanation |
|:---: |:---: |
| `image` | Image name|


</font>
</div>
</center>
</div>

<h2 dir=ltr align=left style="line-height:200%;font-family:sans-serif;color:#0099cc">
<font face="sans-serif" color="#0099cc">
Part One: Classification using Vision-Language Models
</font>
</h2>

<p dir=ltr style="direction: ltr; text-align: justify; line-height:200%; font-family:sans-serif; font-size:medium">
<font face="sans-serif" size=3>
As described, place the code for designing a vision-language model in this section. Your training results must be evident in this section.
</font>
</p>


In [2]:
import numpy
print(numpy.__version__) # make sure your numpy version < 2.0.0

1.26.4


In [3]:
from tqdm import tqdm
from sklearn.metrics import f1_score, classification_report
import clip
import glob
import pandas as pd
import numpy as np
from PIL import Image
from tqdm.notebook import tqdm
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader, random_split
import torch.optim as optim
import albumentations as A
from albumentations.pytorch import ToTensorV2
import segmentation_models_pytorch as smp
import zipfile
import os

In [4]:
df = pd.read_csv('train/train.csv')

In [5]:
df

Unnamed: 0,image,class_benign,class_malignant,class_normal
0,malignant (88).png,0.0,1.0,0.0
1,benign (81).png,1.0,0.0,0.0
2,malignant (116).png,0.0,1.0,0.0
3,benign (331).png,1.0,0.0,0.0
4,benign (319).png,1.0,0.0,0.0
...,...,...,...,...
619,normal (63).png,0.0,0.0,1.0
620,benign (214).png,1.0,0.0,0.0
621,malignant (7).png,0.0,1.0,0.0
622,benign (400).png,1.0,0.0,0.0


In [4]:
# --- 1. Settings and Configuration ---
class Config:
    DATA_PATH = "train/" # Path to the train folder
    DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
    CLIP_MODEL = "ViT-B/32"
    EPOCHS = 10
    BATCH_SIZE = 32
    LEARNING_RATE = 1e-5 # Low learning rate for fine-tuning
    VALIDATION_SPLIT = 0.2 # 20% for validation
    RANDOM_SEED = 42

In [5]:
# Initial setup
cfg = Config()
torch.manual_seed(cfg.RANDOM_SEED)
np.random.seed(cfg.RANDOM_SEED)

In [6]:
# --- 2. Data Preparation (Dataset and DataLoader) ---
class XRayDataset(Dataset):
    """
    Custom Dataset class for loading images and labels.
    """
    def __init__(self, dataframe, preprocess):
        self.df = dataframe
        self.preprocess = preprocess
        self.class_to_idx = {"normal": 0, "benign": 1, "malignant": 2}

    def __len__(self):
        return len(self.df)

    def __getitem__(self, idx):
        row = self.df.iloc[idx]
        image_path = row['image_path']
        label_name = row['label']
        label_idx = self.class_to_idx[label_name]

        # Load image
        try:
            image = Image.open(image_path).convert("RGB")
            # Apply CLIP-required preprocessing
            image_tensor = self.preprocess(image)
            return image_tensor, label_idx
        except Exception as e:
            print(f"Error loading image {image_path}: {e}")
            # In case of error, return an empty image and label
            
            return torch.zeros((3, 224, 224)), -1 

In [7]:
def prepare_dataframe(data_path):
    """
    Traverse folders and build a DataFrame of image paths and labels.
    """
    classes = ["normal", "benign", "malignant"]
    data = []
    for cls in classes:
        img_dir = os.path.join(data_path, cls, "images")
        if not os.path.exists(img_dir):
            print(f"Warning: Directory not found: {img_dir}")
            continue
            
        for img_name in os.listdir(img_dir):
            if img_name.endswith(".png"):
                data.append({
                    "image_path": os.path.join(img_dir, img_name),
                    "label": cls
                })
    
    if not data:
        raise FileNotFoundError(f"No PNG images found in {data_path}. Check your path and structure.")
        
    return pd.DataFrame(data)

In [8]:
def get_class_weights(dataframe):
    """
    Calculate inverse class weights for the loss function.
    """
    class_counts = dataframe['label'].value_counts().sort_index()
    class_map = {"normal": 0, "benign": 1, "malignant": 2}
    
    # Ensure correct order
    counts = [class_counts.get(cls, 1) for cls in class_map.keys()] # 1 to prevent division by zero
    
    total = sum(counts)
    weights = [total / (len(counts) * count) for count in counts]
    
    # Normalize weights
    weights_tensor = torch.tensor(weights, dtype=torch.float32).to(cfg.DEVICE)
    print(f"Class Counts: {counts}")
    print(f"Calculated Weights: {weights_tensor}")
    return weights_tensor

In [9]:
# --- 3. Define VLM (CLIP) Model and Logic ---

# Load CLIP model and its preprocessing function
print(f"Loading CLIP model: {cfg.CLIP_MODEL} on {cfg.DEVICE}")
clip_model, preprocess = clip.load(cfg.CLIP_MODEL, device=cfg.DEVICE)

device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)

Loading CLIP model: ViT-B/32 on cpu


In [12]:
# This section is the core of the VLM
# Define Text Prompts
text_prompts = [
    "An x-ray image of a normal case without a mass",
    "An x-ray image containing a benign mass",
    "An x-ray image containing a malignant mass"
]

# Tokenize texts and send to device
text_inputs = clip.tokenize(text_prompts).to(cfg.DEVICE)

In [13]:
# --- 4. Training and Validation Functions ---

def train_one_epoch(model, dataloader, criterion, optimizer, text_features):
    model.train()
    running_loss = 0.0
    all_preds = []
    all_labels = []

    pbar = tqdm(dataloader, desc="Training Epoch")
    for images, labels in pbar:
        # Remove corrupted images (which have label -1)
        valid_indices = labels != -1
        if not valid_indices.any():
            continue
        
        images = images[valid_indices].to(cfg.DEVICE)
        labels = labels[valid_indices].to(cfg.DEVICE)

        optimizer.zero_grad()

        # Extract image features
        image_features = model.encode_image(images)
        
        # Normalize features (CLIP standard)
        image_features = image_features / image_features.norm(dim=-1, keepdim=True)

        # Calculate similarity (logits)
        # logit_scale is a learnable parameter in the CLIP model
        logit_scale = model.logit_scale.exp()
        logits_per_image = logit_scale * image_features @ text_features.T
        
        # Calculate loss
        loss = criterion(logits_per_image, labels)
        
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * images.size(0)
        
        # Store predictions and labels for F1
        preds = logits_per_image.argmax(dim=1)
        all_preds.extend(preds.cpu().numpy())
        all_labels.extend(labels.cpu().numpy())
        
        pbar.set_postfix(loss=loss.item())

    epoch_loss = running_loss / len(dataloader.dataset)
    epoch_f1_weighted = f1_score(all_labels, all_preds, average='weighted')
    epoch_f1_macro = f1_score(all_labels, all_preds, average='macro')
    
    return epoch_loss, epoch_f1_weighted, epoch_f1_macro

In [14]:
def validate(model, dataloader, criterion, text_features):
    model.eval()
    running_loss = 0.0
    all_preds = []
    all_labels = []

    with torch.no_grad():
        pbar = tqdm(dataloader, desc="Validating")
        for images, labels in pbar:
            valid_indices = labels != -1
            if not valid_indices.any():
                continue

            images = images[valid_indices].to(cfg.DEVICE)
            labels = labels[valid_indices].to(cfg.DEVICE)
            
            # Extract features
            image_features = model.encode_image(images)
            image_features = image_features / image_features.norm(dim=-1, keepdim=True)
            
            # Calculate logits
            logit_scale = model.logit_scale.exp()
            logits_per_image = logit_scale * image_features @ text_features.T
            
            loss = criterion(logits_per_image, labels)
            running_loss += loss.item() * images.size(0)

            preds = logits_per_image.argmax(dim=1)
            all_preds.extend(preds.cpu().numpy())
            all_labels.extend(labels.cpu().numpy())

    epoch_loss = running_loss / len(dataloader.dataset)
    epoch_f1_weighted = f1_score(all_labels, all_preds, average='weighted', zero_division=0)
    epoch_f1_macro = f1_score(all_labels, all_preds, average='macro', zero_division=0)
    
    # Full report
    report = classification_report(all_labels, all_preds, target_names=["normal", "benign", "malignant"], zero_division=0)
    
    return epoch_loss, epoch_f1_weighted, epoch_f1_macro, report


In [15]:
# --- 5. Main Execution Loop ---

print("Starting data preparation...")

# 1. Prepare DataFrame
full_df = prepare_dataframe(cfg.DATA_PATH)
print(f"Total images found: {len(full_df)}")
print(full_df['label'].value_counts())

# 2. Split data into Train and Validation
val_size = int(len(full_df) * cfg.VALIDATION_SPLIT)
train_size = len(full_df) - val_size
train_df, val_df = random_split(full_df, [train_size, val_size])

# Convert subsets to Dataset
train_dataset = XRayDataset(train_df.dataset.iloc[train_df.indices], preprocess)
val_dataset = XRayDataset(val_df.dataset.iloc[val_df.indices], preprocess)

print(f"Train size: {len(train_dataset)}, Validation size: {len(val_dataset)}")

# 3. Create DataLoaders
train_loader = DataLoader(train_dataset, batch_size=cfg.BATCH_SIZE, shuffle=True, num_workers=0)
val_loader = DataLoader(val_dataset, batch_size=cfg.BATCH_SIZE, shuffle=False, num_workers=0)
# Note: This slows down data loading, but solves the multiprocessing issue.

# 4. Calculate class weights (based on training data)
print("Calculating class weights...")
class_weights = get_class_weights(train_df.dataset.iloc[train_df.indices])

# 5. Define Loss Function and Optimizer
# Use calculated weights to handle imbalance
criterion = nn.CrossEntropyLoss(weight=class_weights)
optimizer = optim.Adam(clip_model.parameters(), lr=cfg.LEARNING_RATE)

# Extract text features only once (since texts are constant)
with torch.no_grad():
    text_features = clip_model.encode_text(text_inputs)
    text_features = text_features / text_features.norm(dim=-1, keepdim=True)

Starting data preparation...
Total images found: 624
label
benign       350
malignant    168
normal       106
Name: count, dtype: int64
Train size: 500, Validation size: 124
Calculating class weights...
Class Counts: [79, 298, 123]
Calculated Weights: tensor([2.1097, 0.5593, 1.3550])


In [16]:
# 6. Training and Validation Loop
best_f1 = 0.0
print("Starting model training...")
for epoch in range(cfg.EPOCHS):
    print(f"\n--- Epoch {epoch+1}/{cfg.EPOCHS} ---")
    
    train_loss, train_f1_w, train_f1_m = train_one_epoch(clip_model, train_loader, criterion, optimizer, text_features)
    print(f"Train Loss: {train_loss:.4f} | Weighted F1: {train_f1_w:.4f} | Macro F1: {train_f1_m:.4f}")

    val_loss, val_f1_w, val_f1_m, report = validate(clip_model, val_loader, criterion, text_features)
    print(f"Valid Loss: {val_loss:.4f} | Weighted F1: {val_f1_w:.4f} | Macro F1: {val_f1_m:.4f}")
    
    # Save the best model based on Macro F1 (as it gives importance to minority classes)
    if val_f1_m > best_f1:
        best_f1 = val_f1_m
        torch.save(clip_model.state_dict(), "best_clip_classifier.pth")
        print(f"New best model saved with Macro F1: {best_f1:.4f}")
        
print("\nTraining Finished.")
print(f"Best Validation Macro F1 Score: {best_f1:.4f}")

# Load the best model and display the final report
print("\nLoading best model for final report on validation set...")
clip_model.load_state_dict(torch.load("best_clip_classifier.pth"))
_, _, _, final_report = validate(clip_model, val_loader, criterion, text_features)
print("--- Final Validation Report ---")
print(final_report)

Starting model training...

--- Epoch 1/10 ---


Training Epoch:   0%|          | 0/16 [00:00<?, ?it/s]

Train Loss: 1.0228 | Weighted F1: 0.5210 | Macro F1: 0.4793


Validating:   0%|          | 0/4 [00:00<?, ?it/s]

Valid Loss: 0.8485 | Weighted F1: 0.5733 | Macro F1: 0.5696
New best model saved with Macro F1: 0.5696

--- Epoch 2/10 ---


Training Epoch:   0%|          | 0/16 [00:00<?, ?it/s]

Train Loss: 0.4968 | Weighted F1: 0.7869 | Macro F1: 0.7643


Validating:   0%|          | 0/4 [00:00<?, ?it/s]

Valid Loss: 0.6248 | Weighted F1: 0.7460 | Macro F1: 0.7408
New best model saved with Macro F1: 0.7408

--- Epoch 3/10 ---


Training Epoch:   0%|          | 0/16 [00:00<?, ?it/s]

Train Loss: 0.3363 | Weighted F1: 0.8883 | Macro F1: 0.8673


Validating:   0%|          | 0/4 [00:00<?, ?it/s]

Valid Loss: 0.6408 | Weighted F1: 0.7486 | Macro F1: 0.7500
New best model saved with Macro F1: 0.7500

--- Epoch 4/10 ---


Training Epoch:   0%|          | 0/16 [00:00<?, ?it/s]

Train Loss: 0.1672 | Weighted F1: 0.9466 | Macro F1: 0.9374


Validating:   0%|          | 0/4 [00:00<?, ?it/s]

Valid Loss: 0.7289 | Weighted F1: 0.7740 | Macro F1: 0.7730
New best model saved with Macro F1: 0.7730

--- Epoch 5/10 ---


Training Epoch:   0%|          | 0/16 [00:00<?, ?it/s]

Train Loss: 0.1683 | Weighted F1: 0.9548 | Macro F1: 0.9406


Validating:   0%|          | 0/4 [00:00<?, ?it/s]

Valid Loss: 0.6557 | Weighted F1: 0.7500 | Macro F1: 0.7408

--- Epoch 6/10 ---


Training Epoch:   0%|          | 0/16 [00:00<?, ?it/s]

Train Loss: 0.0589 | Weighted F1: 0.9900 | Macro F1: 0.9890


Validating:   0%|          | 0/4 [00:00<?, ?it/s]

Valid Loss: 0.5689 | Weighted F1: 0.7788 | Macro F1: 0.7832
New best model saved with Macro F1: 0.7832

--- Epoch 7/10 ---


Training Epoch:   0%|          | 0/16 [00:00<?, ?it/s]

Train Loss: 0.0282 | Weighted F1: 0.9880 | Macro F1: 0.9870


Validating:   0%|          | 0/4 [00:00<?, ?it/s]

Valid Loss: 0.6013 | Weighted F1: 0.7883 | Macro F1: 0.7946
New best model saved with Macro F1: 0.7946

--- Epoch 8/10 ---


Training Epoch:   0%|          | 0/16 [00:00<?, ?it/s]

Train Loss: 0.0227 | Weighted F1: 0.9920 | Macro F1: 0.9916


Validating:   0%|          | 0/4 [00:00<?, ?it/s]

Valid Loss: 0.8949 | Weighted F1: 0.7882 | Macro F1: 0.7934

--- Epoch 9/10 ---


Training Epoch:   0%|          | 0/16 [00:00<?, ?it/s]

Train Loss: 0.0162 | Weighted F1: 0.9960 | Macro F1: 0.9962


Validating:   0%|          | 0/4 [00:00<?, ?it/s]

Valid Loss: 0.6758 | Weighted F1: 0.8151 | Macro F1: 0.8146
New best model saved with Macro F1: 0.8146

--- Epoch 10/10 ---


Training Epoch:   0%|          | 0/16 [00:00<?, ?it/s]

Train Loss: 0.0141 | Weighted F1: 0.9920 | Macro F1: 0.9924


Validating:   0%|          | 0/4 [00:00<?, ?it/s]

Valid Loss: 0.7339 | Weighted F1: 0.7983 | Macro F1: 0.7999

Training Finished.
Best Validation Macro F1 Score: 0.8146

Loading best model for final report on validation set...


Validating:   0%|          | 0/4 [00:00<?, ?it/s]

--- Final Validation Report ---
              precision    recall  f1-score   support

      normal       0.71      0.93      0.81        27
      benign       0.82      0.79      0.80        52
   malignant       0.90      0.78      0.83        45

    accuracy                           0.81       124
   macro avg       0.81      0.83      0.81       124
weighted avg       0.83      0.81      0.82       124



In [17]:
# --- 7. Predict on Test Data and Create Submission File ---

# 1. Load Test DataFrame
test_df_raw = pd.read_csv('test/test.csv')
TEST_IMAGE_DIR = 'test/images/'

# 2. Prepare Test DataFrame (including full image path)
test_data = []
for img_name in test_df_raw['image']:
    test_data.append({
        "image_path": os.path.join(TEST_IMAGE_DIR, img_name),
        "label": "unknown" # unknown label
    })
test_full_df = pd.DataFrame(test_data)

# 3. Define Test Dataset and DataLoader
class TestXRayDataset(Dataset):
    def __init__(self, dataframe, preprocess):
        self.df = dataframe
        self.preprocess = preprocess

    def __len__(self):
        return len(self.df)

    def __getitem__(self, idx):
        row = self.df.iloc[idx]
        image_path = row['image_path']
        
        try:
            image = Image.open(image_path).convert("RGB")
            image_tensor = self.preprocess(image)
            return image_tensor
        except Exception as e:
            print(f"Error loading test image {image_path}: {e}")
            return torch.zeros((3, 224, 224))

test_dataset = TestXRayDataset(test_full_df, preprocess)
test_loader = DataLoader(test_dataset, batch_size=cfg.BATCH_SIZE, shuffle=False, num_workers=0)


# 4. Run Prediction (Inference)
clip_model.eval()
all_predictions = []

print("\nStarting Test Data Prediction...")

with torch.no_grad():
    for images in tqdm(test_loader, desc="Predicting"):
        images = images.to(cfg.DEVICE)
        
        # Extract features
        image_features = clip_model.encode_image(images)
        image_features = image_features / image_features.norm(dim=-1, keepdim=True)
        
        # Calculate logits
        logit_scale = clip_model.logit_scale.exp()
        logits_per_image = logit_scale * image_features @ text_features.T
        
        preds = logits_per_image.argmax(dim=1)
        all_predictions.extend(preds.cpu().numpy())

# 5. Convert predictions to One-Hot and create final DataFrame
idx_to_class = {0: 'normal', 1: 'benign', 2: 'malignant'}
predictions_one_hot = np.zeros((len(all_predictions), 3), dtype=int)

for i, pred_idx in enumerate(all_predictions):
    if pred_idx == 0:
        predictions_one_hot[i, 0] = 1 # normal
    elif pred_idx == 1:
        predictions_one_hot[i, 2] = 1 # benign (Note the order of the output columns)
    elif pred_idx == 2:
        predictions_one_hot[i, 1] = 1 # malignant

# Create final DataFrame
submission = pd.DataFrame({
    'image': test_df_raw['image'],
    'class_benign': predictions_one_hot[:, 2],
    'class_malignant': predictions_one_hot[:, 1],
    'class_normal': predictions_one_hot[:, 0]
    
})

print("\nSubmission DataFrame created successfully.")
print(submission.head())


Starting Test Data Prediction...


Predicting:   0%|          | 0/5 [00:00<?, ?it/s]


Submission DataFrame created successfully.
           image  class_benign  class_malignant  class_normal
0  test_0001.png             0                0             1
1  test_0002.png             1                0             0
2  test_0003.png             1                0             0
3  test_0004.png             1                0             0
4  test_0005.png             0                1             0


In [18]:
submission

Unnamed: 0,image,class_benign,class_malignant,class_normal
0,test_0001.png,0,0,1
1,test_0002.png,1,0,0
2,test_0003.png,1,0,0
3,test_0004.png,1,0,0
4,test_0005.png,0,1,0
...,...,...,...,...
151,test_0152.png,0,0,1
152,test_0153.png,0,1,0
153,test_0154.png,0,1,0
154,test_0155.png,1,0,0


<h2 dir=ltr align=left style="line-height:200%;font-family:sans-serif;color:#0099cc">
<font face="sans-serif" color="#0099cc">
Evaluation Metric
</font>
</h2>

<p dir=ltr style="direction: ltr; text-align: justify; line-height:200%; font-family:sans-serif; font-size:medium">
<font face="sans-serif" size=3>
    The metric we have chosen to evaluate the model's performance is called <code>F1_score</code>.
    <br>
    This metric is the standard for evaluating your model's quality. In other words, the judging system also uses this metric for scoring.
    <br>
    It is recommended to evaluate your model's performance on the training or validation set using this metric.
</font>
</p>

<p dir=ltr style="direction: ltr; text-align: justify; line-height:200%; font-family:sans-serif; font-size:medium">
</font>
</p>

In [19]:
# --- Configuration for Segmentation ---
class SegConfig:
    DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
    IMAGE_SIZE = 256
    ORIGINAL_SIZE = 500
    BATCH_SIZE = 8
    LEARNING_RATE = 1e-4
    NUM_EPOCHS = 20
    BACKBONE = 'resnet34' # U-Net model Encoder
    ACTIVATION = 'sigmoid' # For binary mask output (0 to 1)

seg_cfg = SegConfig()
print(f"Segmentation running on: {seg_cfg.DEVICE}")

# --- Helper functions explained in the previous guide ---

# 1. Path collection function
def create_seg_dataframe(base_dir='initial/train'):
    data = []
    classes = ['benign', 'malignant', 'normal']
    for cls in classes:
        image_dir = os.path.join(base_dir, cls, 'images')
        mask_dir = os.path.join(base_dir, cls, 'masks')
        image_paths = glob.glob(os.path.join(image_dir, '*.png'))
        
        for img_path in image_paths:
            img_name = os.path.basename(img_path)
            if cls == 'normal':
                mask_path = None # All-black mask
            else:
                base_name = os.path.splitext(img_name)[0]
                mask_path = os.path.join(mask_dir, f"{base_name}_mask.png")
            
            data.append({'image_path': img_path, 'mask_path': mask_path, 'class': cls})
    return pd.DataFrame(data)

# 2. Dataset class for segmentation
class SegmentationXRayDataset(Dataset):
    def __init__(self, dataframe, transform=None):
        self.df = dataframe
        self.transform = transform

    def __len__(self):
        return len(self.df)

    def __getitem__(self, idx):
        row = self.df.iloc[idx]
        # Load images as NumPy arrays so Albumentations can process them
        image = np.array(Image.open(row['image_path']).convert("RGB"))

        if row['class'] == 'normal' or row['mask_path'] is None:
            # Create an all-black mask (0) for normal or test images
            mask = np.zeros(image.shape[:2], dtype=np.uint8)
        else:
            # Load mask and ensure it is only 0 and 1
            mask = np.array(Image.open(row['mask_path']).convert("L"))
            mask[mask > 0] = 1 # Tumor white (1), background black (0)

        # Apply transformations (simultaneously on image and mask)
        if self.transform:
            augmented = self.transform(image=image, mask=mask)
            image = augmented['image']
            mask = augmented['mask']
        
        # Output: RGB image (3 channels) and binary mask (1 channel)
        return image, mask.float().unsqueeze(0) 

# 3. Define Combined Loss (Dice + BCE)
class DiceLoss(nn.Module):
    # (Implementation provided in previous cells)
    def __init__(self, smooth=1e-6):
        super(DiceLoss, self).__init__()
        self.smooth = smooth
    def forward(self, inputs, targets):
        inputs = inputs.view(-1)
        targets = targets.view(-1)
        intersection = (inputs * targets).sum()
        dice = (2. * intersection + self.smooth) / (inputs.sum() + targets.sum() + self.smooth)
        return 1 - dice

class CombinedLoss(nn.Module):
    def __init__(self, weight=0.5):
        super(CombinedLoss, self).__init__()
        self.bce = nn.BCEWithLogitsLoss()
        self.dice = DiceLoss()
        self.weight = weight 

    def forward(self, inputs, targets):
        # BCEWithLogitsLoss uses raw output (Logits)
        bce_loss = self.bce(inputs, targets) 
        # Dice Loss uses Sigmoid output
        dice_loss = self.dice(torch.sigmoid(inputs), targets)
        return self.weight * bce_loss + (1 - self.weight) * dice_loss

Segmentation running on: cpu


In [20]:
# --- Prepare Training Data ---
full_train_df = create_seg_dataframe(base_dir='train')
full_train_df

Unnamed: 0,image_path,mask_path,class
0,train\benign\images\benign (1).png,train\benign\masks\benign (1)_mask.png,benign
1,train\benign\images\benign (100).png,train\benign\masks\benign (100)_mask.png,benign
2,train\benign\images\benign (101).png,train\benign\masks\benign (101)_mask.png,benign
3,train\benign\images\benign (102).png,train\benign\masks\benign (102)_mask.png,benign
4,train\benign\images\benign (103).png,train\benign\masks\benign (103)_mask.png,benign
...,...,...,...
619,train\normal\images\normal (95).png,,normal
620,train\normal\images\normal (96).png,,normal
621,train\normal\images\normal (97).png,,normal
622,train\normal\images\normal (98).png,,normal


In [21]:
# Split data into training (80%) and validation (20%)
train_size = int(0.8 * len(full_train_df))
val_size = len(full_train_df) - train_size
train_df, val_df = random_split(full_train_df, [train_size, val_size])
# Convert to DataFrame for easier indexing
train_df = full_train_df.iloc[train_df.indices].reset_index(drop=True)
val_df = full_train_df.iloc[val_df.indices].reset_index(drop=True)

In [22]:
# --- Define Transforms ---
train_transform_seg = A.Compose([
    A.Resize(seg_cfg.IMAGE_SIZE, seg_cfg.IMAGE_SIZE),
    A.HorizontalFlip(p=0.5),
    A.Rotate(limit=15, p=0.5),
    A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
    ToTensorV2(),
])

val_transform_seg = A.Compose([
    A.Resize(seg_cfg.IMAGE_SIZE, seg_cfg.IMAGE_SIZE),
    A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
    ToTensorV2(),
])

In [23]:
# --- Create DataLoaders ---
train_dataset_seg = SegmentationXRayDataset(train_df, transform=train_transform_seg)
val_dataset_seg = SegmentationXRayDataset(val_df, transform=val_transform_seg)

train_loader_seg = DataLoader(train_dataset_seg, batch_size=seg_cfg.BATCH_SIZE, shuffle=True, num_workers=0)
val_loader_seg = DataLoader(val_dataset_seg, batch_size=seg_cfg.BATCH_SIZE, shuffle=False, num_workers=0)

print(f"Train samples: {len(train_dataset_seg)}, Validation samples: {len(val_dataset_seg)}")

Train samples: 499, Validation samples: 125


In [24]:
# --- Define Model, Loss, and Optimizer ---
unet_model = smp.Unet(
    encoder_name=seg_cfg.BACKBONE,     # Pre-trained ResNet34
    encoder_weights="imagenet",        # Use ImageNet weights
    in_channels=3,                     # RGB input
    classes=1,                         # Single-channel output (mask)
    activation=None,                   # To use Logits in BCEWithLogitsLoss
).to(seg_cfg.DEVICE)

loss_fn_seg = CombinedLoss(weight=0.5)
optimizer_seg = optim.Adam(unet_model.parameters(), lr=seg_cfg.LEARNING_RATE)
scheduler_seg = optim.lr_scheduler.ReduceLROnPlateau(optimizer_seg, mode='min', factor=0.1, patience=5)

In [25]:
# --- Evaluation Function ---
# IoU (Intersection over Union) metric for segmentation performance
def iou_metric(preds, targets):
    preds = (preds > 0.5).float() # Convert to binary mask
    intersection = (preds * targets).sum()
    union = (preds + targets).sum() - intersection
    iou = (intersection + 1e-6) / (union + 1e-6)
    return iou.mean()

In [26]:
# --- Training Loop ---
best_val_iou = 0
print("\n--- Starting U-Net Training ---")
for epoch in range(seg_cfg.NUM_EPOCHS):
    # Train Loop
    unet_model.train()
    train_loss = 0
    for images, masks in tqdm(train_loader_seg, desc=f"Epoch {epoch+1} Train"):
        images = images.to(seg_cfg.DEVICE)
        masks = masks.to(seg_cfg.DEVICE)
        
        # Forward pass
        outputs = unet_model(images)
        loss = loss_fn_seg(outputs, masks)
        
        # Backward pass and optimization
        optimizer_seg.zero_grad()
        loss.backward()
        optimizer_seg.step()
        
        train_loss += loss.item()
    
    avg_train_loss = train_loss / len(train_loader_seg)

    # Validation Loop
    unet_model.eval()
    val_loss = 0
    val_iou = 0
    with torch.no_grad():
        for images, masks in tqdm(val_loader_seg, desc=f"Epoch {epoch+1} Val"):
            images = images.to(seg_cfg.DEVICE)
            masks = masks.to(seg_cfg.DEVICE)
            
            outputs = unet_model(images)
            loss = loss_fn_seg(outputs, masks)
            val_loss += loss.item()
            
            # Calculate IoU
            sigmoid_outputs = torch.sigmoid(outputs)
            val_iou += iou_metric(sigmoid_outputs, masks).item()
            
    avg_val_loss = val_loss / len(val_loader_seg)
    avg_val_iou = val_iou / len(val_loader_seg)

    print(f"Epoch {epoch+1}/{seg_cfg.NUM_EPOCHS}: Train Loss: {avg_train_loss:.4f}, Val Loss: {avg_val_loss:.4f}, Val IoU: {avg_val_iou:.4f}")

    # Update learning scheduler and save best model
    scheduler_seg.step(avg_val_loss)

    if avg_val_iou > best_val_iou:
        best_val_iou = avg_val_iou
        torch.save(unet_model.state_dict(), 'best_unet_model.pth')
        print(f"  --> Model saved with improved IoU: {best_val_iou:.4f}")

print("\n--- U-Net Training Complete ---")


--- Starting U-Net Training ---


Epoch 1 Train:   0%|          | 0/63 [00:00<?, ?it/s]

Epoch 1 Val:   0%|          | 0/16 [00:00<?, ?it/s]

Epoch 1/20: Train Loss: 0.7152, Val Loss: 0.6192, Val IoU: 0.3426
  --> Model saved with improved IoU: 0.3426


Epoch 2 Train:   0%|          | 0/63 [00:00<?, ?it/s]

Epoch 2 Val:   0%|          | 0/16 [00:00<?, ?it/s]

Epoch 2/20: Train Loss: 0.5785, Val Loss: 0.5319, Val IoU: 0.4493
  --> Model saved with improved IoU: 0.4493


Epoch 3 Train:   0%|          | 0/63 [00:00<?, ?it/s]

Epoch 3 Val:   0%|          | 0/16 [00:00<?, ?it/s]

Epoch 3/20: Train Loss: 0.5039, Val Loss: 0.4721, Val IoU: 0.4872
  --> Model saved with improved IoU: 0.4872


Epoch 4 Train:   0%|          | 0/63 [00:00<?, ?it/s]

Epoch 4 Val:   0%|          | 0/16 [00:00<?, ?it/s]

Epoch 4/20: Train Loss: 0.4353, Val Loss: 0.4222, Val IoU: 0.5135
  --> Model saved with improved IoU: 0.5135


Epoch 5 Train:   0%|          | 0/63 [00:00<?, ?it/s]

Epoch 5 Val:   0%|          | 0/16 [00:00<?, ?it/s]

Epoch 5/20: Train Loss: 0.3821, Val Loss: 0.3884, Val IoU: 0.5338
  --> Model saved with improved IoU: 0.5338


Epoch 6 Train:   0%|          | 0/63 [00:00<?, ?it/s]

Epoch 6 Val:   0%|          | 0/16 [00:00<?, ?it/s]

Epoch 6/20: Train Loss: 0.3498, Val Loss: 0.3593, Val IoU: 0.5252


Epoch 7 Train:   0%|          | 0/63 [00:00<?, ?it/s]

Epoch 7 Val:   0%|          | 0/16 [00:00<?, ?it/s]

Epoch 7/20: Train Loss: 0.3068, Val Loss: 0.3468, Val IoU: 0.5074


Epoch 8 Train:   0%|          | 0/63 [00:00<?, ?it/s]

Epoch 8 Val:   0%|          | 0/16 [00:00<?, ?it/s]

Epoch 8/20: Train Loss: 0.2744, Val Loss: 0.3268, Val IoU: 0.5063


Epoch 9 Train:   0%|          | 0/63 [00:00<?, ?it/s]

Epoch 9 Val:   0%|          | 0/16 [00:00<?, ?it/s]

Epoch 9/20: Train Loss: 0.2426, Val Loss: 0.3034, Val IoU: 0.5317


Epoch 10 Train:   0%|          | 0/63 [00:00<?, ?it/s]

Epoch 10 Val:   0%|          | 0/16 [00:00<?, ?it/s]

Epoch 10/20: Train Loss: 0.2263, Val Loss: 0.3076, Val IoU: 0.5134


Epoch 11 Train:   0%|          | 0/63 [00:00<?, ?it/s]

Epoch 11 Val:   0%|          | 0/16 [00:00<?, ?it/s]

Epoch 11/20: Train Loss: 0.2145, Val Loss: 0.2862, Val IoU: 0.5244


Epoch 12 Train:   0%|          | 0/63 [00:00<?, ?it/s]

Epoch 12 Val:   0%|          | 0/16 [00:00<?, ?it/s]

Epoch 12/20: Train Loss: 0.1878, Val Loss: 0.2686, Val IoU: 0.5560
  --> Model saved with improved IoU: 0.5560


Epoch 13 Train:   0%|          | 0/63 [00:00<?, ?it/s]

Epoch 13 Val:   0%|          | 0/16 [00:00<?, ?it/s]

Epoch 13/20: Train Loss: 0.1694, Val Loss: 0.2687, Val IoU: 0.5484


Epoch 14 Train:   0%|          | 0/63 [00:00<?, ?it/s]

Epoch 14 Val:   0%|          | 0/16 [00:00<?, ?it/s]

Epoch 14/20: Train Loss: 0.1620, Val Loss: 0.2552, Val IoU: 0.5483


Epoch 15 Train:   0%|          | 0/63 [00:00<?, ?it/s]

Epoch 15 Val:   0%|          | 0/16 [00:00<?, ?it/s]

Epoch 15/20: Train Loss: 0.1458, Val Loss: 0.2549, Val IoU: 0.5514


Epoch 16 Train:   0%|          | 0/63 [00:00<?, ?it/s]

Epoch 16 Val:   0%|          | 0/16 [00:00<?, ?it/s]

Epoch 16/20: Train Loss: 0.1315, Val Loss: 0.2463, Val IoU: 0.5653
  --> Model saved with improved IoU: 0.5653


Epoch 17 Train:   0%|          | 0/63 [00:00<?, ?it/s]

Epoch 17 Val:   0%|          | 0/16 [00:00<?, ?it/s]

Epoch 17/20: Train Loss: 0.1337, Val Loss: 0.2398, Val IoU: 0.5646


Epoch 18 Train:   0%|          | 0/63 [00:00<?, ?it/s]

Epoch 18 Val:   0%|          | 0/16 [00:00<?, ?it/s]

Epoch 18/20: Train Loss: 0.1360, Val Loss: 0.2312, Val IoU: 0.5834
  --> Model saved with improved IoU: 0.5834


Epoch 19 Train:   0%|          | 0/63 [00:00<?, ?it/s]

Epoch 19 Val:   0%|          | 0/16 [00:00<?, ?it/s]

Epoch 19/20: Train Loss: 0.1518, Val Loss: 0.2376, Val IoU: 0.5680


Epoch 20 Train:   0%|          | 0/63 [00:00<?, ?it/s]

Epoch 20 Val:   0%|          | 0/16 [00:00<?, ?it/s]

Epoch 20/20: Train Loss: 0.1342, Val Loss: 0.2387, Val IoU: 0.5671

--- U-Net Training Complete ---


In [27]:
# --- Load the best-trained model ---
best_unet_model = smp.Unet(
    encoder_name=seg_cfg.BACKBONE,
    encoder_weights=None, # We will load weights from the file
    in_channels=3,
    classes=1,
    activation=None,
).to(seg_cfg.DEVICE)
best_unet_model.load_state_dict(torch.load('best_unet_model.pth'))

<All keys matched successfully>

In [28]:
def get_original_image_sizes(test_df_raw, image_dir):
    """Returns a map of image name to original dimensions (width, height)."""
    size_map = {}
    for img_name in test_df_raw['image']:
        img_path = os.path.join(image_dir, img_name)
        try:
            # Open image to read dimensions
            with Image.open(img_path) as img:
                size_map[img_name] = img.size # (width, height)
        except Exception as e:
            # If reading fails, use default dimensions (which was 500x500)
            print(f"Warning: Could not read image size for {img_name}. Using default 500x500. Error: {e}")
            size_map[img_name] = (500, 500) # Fallback
    return size_map

In [29]:
# --- Define final prediction function (modified) ---
def predict_segmentation(model, test_loader, device, output_dir='segmentation_submission_masks'):
    """Generates, saves, and resizes predicted masks to original dimensions."""
    model.eval()
    os.makedirs(output_dir, exist_ok=True)
    
    # Load Test DataFrame
    test_df_raw = pd.read_csv('test/test.csv') 
    
    # A) Collect names and original dimensions
    image_names = test_df_raw['image'].tolist()
    original_sizes_map = get_original_image_sizes(test_df_raw, TEST_IMAGE_DIR)

    print(f"Starting prediction and saving masks to {output_dir}/...")

    with torch.no_grad():
        # Prediction loop (images: image tensor, _: dummy mask tensor)
        for i, (images, _) in enumerate(tqdm(test_loader, desc="Predicting Test Masks")):
            
            images = images.to(device)
            
            # 1. Predict (Logits)
            outputs = model(images)
            
            # 2. Convert to binary
            predictions = torch.sigmoid(outputs) > 0.5
            # Convert to 0 or 255
            predictions = predictions.cpu().numpy().astype(np.uint8) * 255 

            # 3. Save masks with original dimensions
            batch_size = predictions.shape[0]
            for j in range(batch_size):
                mask = predictions[j, 0, :, :] # (1, H, W) -> (H, W)
                
                # Find image name and original dimensions
                idx_in_df = i * test_loader.batch_size + j
                img_name = image_names[idx_in_df]
                original_width, original_height = original_sizes_map[img_name]
                
                # Resizing to original dimensions (W, H)
                mask_pil = Image.fromarray(mask)
                # Use Image.NEAREST to resize pixels without color interpolation
                mask_pil = mask_pil.resize(
                    (original_width, original_height), 
                    resample=Image.NEAREST
                ) 
                
                # Save
                base_name = os.path.splitext(img_name)[0]
                output_mask_name = f"{base_name}_mask.png"
                mask_pil.save(os.path.join(output_dir, output_mask_name))

# --- Prepare test data for segmentation ---

TEST_IMAGE_DIR = 'test/images/' # Test data path

# Create Test DataFrame (no mask needed)
test_data_seg = []
test_df_raw = pd.read_csv('test/test.csv')
for img_name in test_df_raw['image']:
    test_data_seg.append({
        "image_path": os.path.join(TEST_IMAGE_DIR, img_name),
        "mask_path": None, # Mask not available
        "class": "unknown"
    })
test_full_df_seg = pd.DataFrame(test_data_seg)

# Define Test DataLoader (without Augmentation)
test_dataset_seg = SegmentationXRayDataset(test_full_df_seg, transform=val_transform_seg)
test_loader_seg = DataLoader(test_dataset_seg, batch_size=seg_cfg.BATCH_SIZE, shuffle=False, num_workers=0)


# --- Call prediction function ---
predict_segmentation(
    model=best_unet_model, 
    test_loader=test_loader_seg, 
    device=seg_cfg.DEVICE, 
    output_dir='segmentation'
)

print("\n--- Prediction Complete ---")
print("All predicted masks are saved in the 'segmentation' folder.")

Starting prediction and saving masks to segmentation/...


Predicting Test Masks:   0%|          | 0/20 [00:00<?, ?it/s]


--- Prediction Complete ---
All predicted masks are saved in the 'segmentation' folder.


In [30]:
# This command is specific to Jupyter environments to save the notebook file.
# You don't need to run the following cells (its whole purpose was only for making a submission zipped file for the contest)

def compress(file_names):
    print("File Paths to be zipped:")
    print(file_names)
    compression = zipfile.ZIP_DEFLATED
    with zipfile.ZipFile("result.zip", mode="w") as zf:
        for file_name in file_names:
            if os.path.exists(file_name):
                zf.write(file_name, arcname=os.path.basename(file_name), compress_type=compression)
            else:
                print(f"Warning: File not found and will not be added to zip: {file_name}")

In [31]:
# --- Create final submission files ---

# 1. Save the classification results DataFrame to a CSV file
submission.to_csv('submission.csv', index=False)

# 2. Define the list of files for the zip archive
file_names = ['Breast-Cancer-DetSeg.ipynb', 'submission.csv']

# 3. Define the list of mask files
# Assuming mask files are in 'segmentation' folder
segmentation_dir = 'segmentation'
if os.path.exists(segmentation_dir):
    mask_files = [os.path.join(segmentation_dir, f) for f in os.listdir(segmentation_dir) if f.endswith('_mask.png')]
    file_names.extend(mask_files)

# 4. Create the zip file
compress(file_names)
print("Submission file 'result.zip' created successfully!")

File Paths to be zipped:
['Breast-Cancer-DetSeg.ipynb', 'submission.csv', 'segmentation\\test_0001_mask.png', 'segmentation\\test_0002_mask.png', 'segmentation\\test_0003_mask.png', 'segmentation\\test_0004_mask.png', 'segmentation\\test_0005_mask.png', 'segmentation\\test_0006_mask.png', 'segmentation\\test_0007_mask.png', 'segmentation\\test_0008_mask.png', 'segmentation\\test_0009_mask.png', 'segmentation\\test_0010_mask.png', 'segmentation\\test_0011_mask.png', 'segmentation\\test_0012_mask.png', 'segmentation\\test_0013_mask.png', 'segmentation\\test_0014_mask.png', 'segmentation\\test_0015_mask.png', 'segmentation\\test_0016_mask.png', 'segmentation\\test_0017_mask.png', 'segmentation\\test_0018_mask.png', 'segmentation\\test_0019_mask.png', 'segmentation\\test_0020_mask.png', 'segmentation\\test_0021_mask.png', 'segmentation\\test_0022_mask.png', 'segmentation\\test_0023_mask.png', 'segmentation\\test_0024_mask.png', 'segmentation\\test_0025_mask.png', 'segmentation\\test_0026_m