# 🚀 End-to-End Workflow: Cell Segmentation with YOLOv11

This notebook provides a complete, reproducible workflow for the Cell Instance Segmentation project. It handles everything from cloning the repository and downloading the data directly from Kaggle to training an improved model and saving the results to your Google Drive.

**This is the definitive 'control panel' for the project.**

### Workflow Steps:
1.  **Setup**: Clones the GitHub repository and installs all dependencies.
2.  **Authentication**: Configures your Kaggle API credentials.
3.  **Data Acquisition**: Downloads and unzips the dataset from Kaggle into the Colab environment.
4.  **Preprocessing**: Runs the `preprocess.py` script to convert annotations to YOLO format.
5.  **Model Training**: Executes a training run to improve the model (`yolov11s-seg` for 50 epochs).
6.  **Save Results**: Copies the final trained model and results to your personal Google Drive for permanent storage.

## 1. Setup: Clone Repository & Install Dependencies

This cell clones the project's GitHub repository, navigates into the project directory, and installs all required Python libraries from the `requirements.txt` file.

In [None]:
!git clone https://github.com/alicefvictorino/cell-instance-segmentation.git
%cd cell-instance-segmentation
!pip install -r requirements.txt

Cloning into 'cell-instance-segmentation'...
remote: Enumerating objects: 30, done.[K
remote: Counting objects: 100% (30/30), done.[K
remote: Compressing objects: 100% (29/29), done.[K
remote: Total 30 (delta 4), reused 23 (delta 1), pack-reused 0 (from 0)[K
Receiving objects: 100% (30/30), 5.77 MiB | 10.54 MiB/s, done.
Resolving deltas: 100% (4/4), done.
/content/cell-instance-segmentation
Collecting ultralytics>=8.0.0 (from -r requirements.txt (line 9))
  Downloading ultralytics-8.3.168-py3-none-any.whl.metadata (37 kB)
Collecting pathlib2>=2.3.0 (from -r requirements.txt (line 19))
  Downloading pathlib2-2.3.7.post1-py2.py3-none-any.whl.metadata (3.5 kB)
Collecting ultralytics-thop>=2.0.0 (from ultralytics>=8.0.0->-r requirements.txt (line 9))
  Downloading ultralytics_thop-2.0.14-py3-none-any.whl.metadata (9.4 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.8.0->ultralytics>=8.0.0->-r requirements.txt (line 9))
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-no

## 2. Kaggle API Authentication and mounting Google Drive

To download the dataset, we need to authenticate with the Kaggle API. This cell will prompt you to upload your `kaggle.json` file.

**Instructions:**
1. Go to your Kaggle account page: [kaggle.com/account](https://www.kaggle.com/account)
2. Click on "Create New API Token" to download `kaggle.json`.
3. Run this cell and use the file browser to upload the `kaggle.json` file you just downloaded.

In [None]:
import os
from google.colab import files

kaggle_dir = os.path.expanduser("~/.kaggle")
kaggle_path = os.path.join(kaggle_dir, "kaggle.json")

os.makedirs(kaggle_dir, exist_ok=True)

print("🔐 Upload kaggle.json file")
uploaded = files.upload()

for fname in uploaded.keys():
    os.rename(fname, kaggle_path)
    os.chmod(kaggle_path, 0o600)

print("✅ Kaggle API successfully configured!")

🔐 Faça o upload do arquivo kaggle.json


Saving kaggle.json to kaggle.json
✅ Kaggle API configurada com sucesso!


In [None]:
!cat ~/.kaggle/kaggle.json

{"username":"alicevictorino","key":"e986c3ed14defcbe874d0eeade120e9c"}

In [None]:
from google.colab import drive
import os

print("📂 Mounting Google Drive...")
drive.mount('/content/drive')
print("✅ Google Drive successfully mounted on /content/drive")

DRIVE_PROJECT_PATH = "/content/drive/MyDrive/Sartorius_Project"
os.makedirs(DRIVE_PROJECT_PATH, exist_ok=True)

📂 Mounting Google Drive...
Mounted at /content/drive
✅ Google Drive successfully mounted on /content/drive


## 3. Download and Unzip Dataset

This cell uses the configured Kaggle API to download the 'Sartorius - Cell Instance Segmentation' dataset. The data is downloaded and unzipped into a temporary local directory `/content/raw_data` within this Colab session.

In [None]:
import os

RAW_DATA_PATH = "/content/raw_data"
ZIP_PATH = f"{RAW_DATA_PATH}/sartorius-cell-instance-segmentation.zip"
CSV_PATH = os.path.join(RAW_DATA_PATH, "train.csv")

# Create folder if not exists
os.makedirs(RAW_DATA_PATH, exist_ok=True)

# Download competition dataset
if not os.path.exists(ZIP_PATH):
    print("⬇️ Downloading competition dataset...")
    !kaggle competitions download -c sartorius-cell-instance-segmentation -p {RAW_DATA_PATH}
else:
    print("📦 The zip file already exists. Skipping download.")

if not os.path.exists(CSV_PATH):
    print("📂 Unzipping files...")
    !unzip -q {ZIP_PATH} -d {RAW_DATA_PATH}
else:
    print("🗃️ Files had already been extracted. Skipping unzip.")


if os.path.exists(CSV_PATH):
    print(f"✅ Dataset available in {RAW_DATA_PATH}")
else:
    print("❌ Error: train.csv not found after unzipping.")

📦 The zip file already exists. Skipping download.
🗃️ Files had already been extracted. Skipping unzip.
✅ Dataset available in /content/raw_data


## 4. Execute Preprocessing

This cell runs the `preprocess.py` script. It takes the raw data from `/content/raw_data` and converts the RLE annotations into the YOLO segmentation format, saving the output to `/content/yolo_dataset`.

In [None]:
import os

RAW_DATA_PATH = "/content/raw_data"
PROCESSED_DATA_PATH = "/content/yolo_dataset"
train_csv_path = os.path.join(RAW_DATA_PATH, "train.csv")

# Checks if the data is present before trying to preprocess
if os.path.exists(train_csv_path):
    print("🧪 Running preprocessing script...")
    exit_code = os.system(f"python scripts/preprocess.py --raw_data_dir {RAW_DATA_PATH} --output_dir {PROCESSED_DATA_PATH}")

    if exit_code == 0:
        print(f"✅ Preprocessing complete. YOLO data saved in {PROCESSED_DATA_PATH}")
    else:
        print("❌ Error executing preprocessing script.")
else:
    print("❌ train.csv not found. Check download and unzip step.")

🧪 Running preprocessing script...
✅ Preprocessing complete. YOLO data saved in /content/yolo_dataset


## 5. Execute Model Improvement Training Run

This is the core training step. This cell executes the `train.py` script with a specific configuration designed to improve upon the baseline model:

-   **Model:** `yolov11s-seg.pt` (the 'small' version for a better performance/speed trade-off).
-   **Epochs:** `50`.
-   **Results Directory:** `/content/training_runs`.
-   **Run Name:** `yolov11s_seg_50_epochs`.

In [None]:
# Define paths and parameters for training
PROCESSED_DATA_PATH = "/content/yolo_dataset"

# Save results on Google Drive
PROJECT_DIR = "/content/drive/MyDrive/Sartorius_Project/training_results"

MODEL_NAME = 'yolo11s-seg.pt'
EPOCHS = 50

print(f"Results will be saved on: {PROJECT_DIR}")

# Execute the training script
!python scripts/train.py \
  --data_dir {PROCESSED_DATA_PATH} \
  --project_dir {PROJECT_DIR} \
  --model_name {MODEL_NAME} \
  --epochs {EPOCHS} \

print(f"✅ Training successful. Results are saved on {PROJECT_DIR}")

Results will be saved on: /content/drive/MyDrive/Sartorius_Project/training_results/yolov11s_seg_50_epochs_direct_save
--- Starting YOLOv11 Model Training ---
Using configuration file: /content/yolo_dataset/dataset.yaml
Results will be saved to: /content/drive/MyDrive/Sartorius_Project/training_results
Ultralytics 8.3.168 🚀 Python-3.11.13 torch-2.6.0+cu124 CUDA:0 (Tesla T4, 15095MiB)
[34m[1mengine/trainer: [0magnostic_nms=False, amp=True, augment=False, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=/content/yolo_dataset/dataset.yaml, degrees=0.0, deterministic=True, device=None, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=50, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=False, iou=0.7, keras=Fals