<a href="https://colab.research.google.com/github/MRameezU/ISIC2017-Unet/blob/main/notebooks/isic_cancer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!nvidia-smi

/bin/bash: nvidia-smi: command not found


## 0- Get Setup

In [None]:
!pip install --upgrade torch
!pip install --upgrade torchvision

import torch
import torchvision
print(f"torch version: {torch.__version__}")
print(f"torchvision version: {torchvision.__version__}")

Collecting torch
  Downloading torch-2.5.1-cp310-cp310-manylinux1_x86_64.whl.metadata (28 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)


device agnostic code

In [None]:
device= "cuda" if torch.cuda.is_available() else "cpu"
device

## 1 - Gettting Data

In [1]:
import requests
import zipfile
from pathlib import Path
from tqdm.notebook import tqdm

# path to data folder
data_path = Path("data/")
train_data_path = data_path / "train"
binary_mask_data_path = data_path / "binary"
train_zip_url = "https://isic-challenge-data.s3.amazonaws.com/2017/ISIC-2017_Training_Data.zip"
train_zip_file = data_path / "ISIC-2017_Training_Data.zip"
# binary mask
train_binary_zip_url = "https://isic-challenge-data.s3.amazonaws.com/2017/ISIC-2017_Training_Part1_GroundTruth.zip"
train_binary_zip_file = data_path / "ISIC-2017_Training_Part1_GroundTruth.zip"

def download_file(url, dest_path):
    """Downloads a file from a URL to a destination path with progress bar."""
    response = requests.get(url, stream=True)
    response.raise_for_status()  # Raise an error for bad responses
    total_size = int(response.headers.get('content-length', 0))
    with open(dest_path, "wb") as file, tqdm(
        desc=f"Downloading {dest_path.name}",
        total=total_size,
        unit="B",
        unit_scale=True,
        unit_divisor=1024,
    ) as bar:
        for chunk in response.iter_content(chunk_size=1024):
            file.write(chunk)
            bar.update(len(chunk))
    print(f"Download Complete: {dest_path}")

def extract_zip(file_path, extract_to):
    """Extracts a zip file to the specified directory."""
    with zipfile.ZipFile(file_path, mode="r") as zip_file:
        print(f"Extracting {file_path.name} to {extract_to}")
        zip_file.extractall(extract_to)
    print(f"Extraction Complete: {extract_to}")


In [2]:
# Main script
if train_data_path.is_dir() and binary_mask_data_path.is_dir():
    print(f"{train_data_path} and {binary_mask_data_path} directories already exist.")
else:
    print(f"Preparing data directories at {data_path}")
    train_data_path.mkdir(parents=True, exist_ok=True)

    # Download training data
    print(f"Downloading Training Data from: {train_zip_url}")
    download_file(train_zip_url, train_zip_file)

    # Extract the zip file
    extract_zip(train_zip_file, train_data_path)

    binary_mask_data_path.mkdir(parents=True,exist_ok=True)
    # Download training data
    print(f"Downloading Binary Mask Data from: {train_binary_zip_url}")
    download_file(train_binary_zip_url, train_binary_zip_file)

    # Extract the zip file
    extract_zip(train_binary_zip_file, binary_mask_data_path)

Preparing data directories at data
Downloading Training Data from: https://isic-challenge-data.s3.amazonaws.com/2017/ISIC-2017_Training_Data.zip


Downloading ISIC-2017_Training_Data.zip:   0%|          | 0.00/5.80G [00:00<?, ?B/s]

Download Complete: data/ISIC-2017_Training_Data.zip
Extracting ISIC-2017_Training_Data.zip to data/train
Extraction Complete: data/train
Downloading Binary Mask Data from: https://isic-challenge-data.s3.amazonaws.com/2017/ISIC-2017_Training_Part1_GroundTruth.zip


Downloading ISIC-2017_Training_Part1_GroundTruth.zip:   0%|          | 0.00/8.89M [00:00<?, ?B/s]

Download Complete: data/ISIC-2017_Training_Part1_GroundTruth.zip
Extracting ISIC-2017_Training_Part1_GroundTruth.zip to data/binary
Extraction Complete: data/binary


the data contain the Images and their respective Superpixel mask and we have to download Binary mask seperately

## 1.1 Seperating the Inputs and Ouputs
our train data folder contain both Training Images and SuperPixel mask therfore seperating them into different folders

In [3]:
import os
from pathlib import Path
import shutil


def organize_files(dataset_folder, image_output_folder, superpixel_output_folder):
    """
    Organize files by separating images and superpixel masks into different folders.

    Args:
        dataset_folder (str or Path): Path to the folder containing both images and masks.
        image_output_folder (str or Path): Path to the folder where images will be moved.
        superpixel_output_folder (str or Path): Path to the folder where superpixel masks will be moved.
    """
    # Convert paths to Path objects
    dataset_folder = Path(dataset_folder)
    image_output_folder = Path(image_output_folder)
    superpixel_output_folder = Path(superpixel_output_folder)

    # Create output folders if they don't exist
    image_output_folder.mkdir(parents=True, exist_ok=True)
    superpixel_output_folder.mkdir(parents=True, exist_ok=True)

    # Iterate through all files in the dataset folder
    for file in dataset_folder.iterdir():
        if file.is_file():
            if file.name.endswith(".jpg"):
                # Move image file
                shutil.move(str(file), str(image_output_folder / file.name))
            elif file.name.endswith("_superpixels.png"):
                # Move superpixel mask file
                shutil.move(str(file), str(superpixel_output_folder / file.name))

    print(f"Files have been organized. Images moved to {image_output_folder}, masks to {superpixel_output_folder}.")

if __name__ == "__main__":
    dataset_folder=train_data_path / "ISIC-2017_Training_Data"   #"data/train/ISIC-2017_Training_Data"
    image_output_folder="ISIC-2017_Data/Images"
    superpixel_output_folder="ISIC-2017_Data/Superpixel"

    organize_files(dataset_folder=dataset_folder,
                   image_output_folder=image_output_folder,
                   superpixel_output_folder=superpixel_output_folder)


Files have been organized. Images moved to ISIC-2017_Data/Images, masks to ISIC-2017_Data/Superpixel.


Moving Binary masks

In [4]:
import shutil
# moving file to a consolidated location
source_dir = binary_mask_data_path / "ISIC-2017_Training_Part1_GroundTruth" #Path("data/train/ISIC-2017_Training_Part1_GroundTruth")
destination_dir = Path("ISIC-2017_Data/Binary")
# Create the destination directory if it doesn't exist
destination_dir.mkdir(parents=True, exist_ok=True)
for file in source_dir.iterdir():
  if file.is_file():
    shutil.move(str(file),str(destination_dir/file.name))

## Deleting the Extras
Deleting the `data_path` folder to save storage

In [6]:
# Deleting our data_path after getting our desizerd ouput to free storage
if data_path.exists() and data_path.is_dir():
    shutil.rmtree(data_path)
    print(f"Folder '{data_path}' and all its subdirectories have been deleted.")
else:
    print(f"Folder '{data_path}' does not exist.")

Folder 'data' does not exist.


## Result
 ISIC-2017_Data This folder contains the dataset for the ISIC 2017 skin cancer segmentation project. The dataset is organized into the following subfolders:
 ## Structure

### Subfolders

1. **Images/**
   - Contains the original images used for training and validation.

2. **Superpixel/**
   - Contains the superpixel masks generated for each image.

3. **Binary/**
   - Contains the binary masks indicating the regions of interest in each image.
