# Retraining Cellpose on Custom Data

<div class="custom-button-row">
    <a 
        class="custom-button custom-download-button" href="../../../notebooks/05_segmentation/deep_learning/cellpose_retraining_notebook.ipynb" download>
        <i class="fas fa-download"></i> Download this Notebook
    </a>
    <a
    class="custom-button custom-download-button" href="https://colab.research.google.com/github/HMS-IAC/bobiac/blob/gh-pages/colab_notebooks/05_segmentation/deep_learning/cellpose_retraining_notebook.ipynb" target="_blank">
        <img class="button-icon" src="../../../_static/logo/icon-google-colab.svg" alt="Open in Colab">
        Open in Colab
    </a>
</div>

## Overview

In this section, we’ll walk through how to **retrain Cellpose on your own data**. This is useful when the default models don’t perform well on your specific cell type, staining method, or imaging modality.

Retraining allows Cellpose to learn directly from your examples—leading to better segmentation accuracy and more relevant masks for your experiments.

We’ll cover:
- Preparing your training data (images + label masks)
- Mounting your Google Drive to access files
- Setting training parameters
- Running the training process
- Evaluating the new model on test images

> 💡 You’ll need pairs of raw microscopy images and their corresponding label masks. If you haven’t labeled your images yet, we recommend using the [Cellpose GUI](https://cellpose.readthedocs.io/en/latest/gui.html#training-your-own-cellpose-model) to draw or edit masks manually before starting.

The dataset we’ll use here can be downloaded below. It includes both training and test images:
<a href="../../../_static/data/05_segmentation_cellpose_training.zip" download>
<i class="fas fa-download"></i> Cellpose Training Dataset</a>

> ⚠️ Note: If you're using an Apple Silicon Mac or don't have GPU support locally, please run this notebook on **Google Colab** to speed up training.


## Make sure you have GPU access

To Enable GPU:

1. navigate to `Runtime -> Change Runtime Type`
2. select `Python 3` as `Runtime Type`
3. select one available GPU (e.g. `T4 GPU`) as `Hardware accelerator`.

<br>

<div align="left"> <img src="https://raw.githubusercontent.com/HMS-IAC/bobiac/main/_static/images/cellpose/colab_runtime.png" alt="Ilastik Logo" width="400"></div>


## Mount your google drive

To access the data for the course you first need to mount your Google Drive.

Run the cell below to connect your Google Drive to colab and follow the instructions to authenticate your Google account.

You will need to allow access to your Google Drive so that the notebook can read and write files.

In [None]:
from google.colab import drive

drive.mount("/content/drive")


Then click on `folder icon` on the left bar, press the `refresh button`. Your Google Drive folder should now be available here (e.g. MyDrive).

<div align="left"> <img src="https://raw.githubusercontent.com/HMS-IAC/bobiac/main/_static/images/cellpose/colab_folder.png" alt="Ilastik Logo" width="300"></div>

## Download the Data

Run the cell below to download the data for this exercise and save it in you Google Drive. A new folder called `bobiac_data_cellpose` will be created in your Google Drive.

In [None]:
# Create directory
!mkdir -p /content/bobiac_data_cellpose
# Download the data
!wget https://raw.githubusercontent.com/HMS-IAC/bobiac/main/_static/data/05_segmentation_cellpose_training.zip -O /content/bobiac_data_cellpose/05_segmentation_cellpose_training.zip
# Unzip the data, remove zip file and macOS metadata files (if any)
!cd /content/bobiac_data_cellpose && unzip 05_segmentation_cellpose_training.zip && rm -f 05_segmentation_cellpose_training.zip && rm -rf __MACOSX

## Install Cellpose


In [None]:
# !pip install cellpose


## Import Libraries

In [2]:
import tifffile
from cellpose import core, io
import matplotlib.pyplot as plt



Welcome to CellposeSAM, cellpose v
cellpose version: 	4.0.6 
platform:       	darwin 
python version: 	3.13.0 
torch version:  	2.7.1! The neural network component of
CPSAM is much larger than in previous versions and CPU excution is slow. 
We encourage users to use GPU/MPS if available. 




## Setup

In [3]:
io.logger_setup()  # run this to get printing of progress

print("GPU available:", core.use_gpu())

2025-07-11 19:55:40,389 [INFO] WRITING LOG OUTPUT TO /Users/ranit/.cellpose/run.log
2025-07-11 19:55:40,390 [INFO] 
cellpose version: 	4.0.6 
platform:       	darwin 
python version: 	3.13.0 
torch version:  	2.7.1
2025-07-11 19:55:40,435 [INFO] ** TORCH MPS version installed and working. **
GPU available: True


### Init the Model

Before we can train a new model, we need to initialize Cellpose with the correct settings.

Here, we’ll:
- Specify the **model type** (e.g., "cyto" or "nuclei") to use as a base model
- Set the **channels** depending on how your images are structured (e.g., single-channel grayscale, or dual-channel with nuclei and cytoplasm)
- Choose where to **save the model weights** during training

> 💡 Even when training a new model, Cellpose builds on a pre-trained backbone (unless you explicitly start from scratch). This helps it learn faster and perform better—especially on small datasets.


In [4]:
from cellpose import core, io, models, plot
from natsort import natsorted

# Check if colab notebook instance has GPU access
if core.use_gpu():
    gpu = True
else:
    gpu = False
    raise ImportError("No GPU access, change your runtime")


# Initialize the Cellpose model
model = models.CellposeModel(gpu=gpu)

2025-07-11 09:27:02,987 [INFO] ** TORCH MPS version installed and working. **
2025-07-11 09:27:02,987 [INFO] >>>> using CPU
2025-07-11 09:27:02,987 [INFO] >>>> using CPU
2025-07-11 09:27:03,811 [INFO] >>>> loading model /Users/ranit/.cellpose/models/cpsam


## Data Handling

For training, Cellpose expects:
- A folder of raw images (e.g., TIFF or PNG)
- A matching folder of masks, where each mask corresponds to an image and contains labeled regions

You’ll also need to **split your data** into a training set and a test set. This allows the model to learn from one portion of the data, and then be evaluated on a separate portion it hasn't seen before.

> ✅ The images and masks must have the **same filenames** (e.g., `img001.png` and `img001_masks.png`) so Cellpose can pair them correctly.

During training, Cellpose will:
- Load batches of training images
- Compare its predictions to the ground-truth masks
- Adjust itself (via backpropagation) to reduce errors over time

Keep your training and test folders organized and double-check for any mismatches.

In [5]:
import os


ROOT_FOLDER_PATH = "../../../_static/data/05_segmentation_cellpose_training/"

train_dir = os.path.join(ROOT_FOLDER_PATH, "train/")
test_dir = os.path.join(ROOT_FOLDER_PATH, "test/")

masks_ext = "_seg.npy"

# get files
train_data, train_labels, _, test_data, test_labels, _ = io.load_train_test_data(train_dir, test_dir, mask_filter=masks_ext)

2025-07-11 09:27:06,787 [INFO] not all flows are present, running flow generation for all images
2025-07-11 09:27:06,810 [INFO] 5 / 5 images in ../../../_static/data/05_segmentation_cellpose_training/train/ folder have labels
2025-07-11 09:27:06,812 [INFO] not all flows are present, running flow generation for all images
2025-07-11 09:27:06,829 [INFO] 3 / 3 images in ../../../_static/data/05_segmentation_cellpose_training/test/ folder have labels


In [6]:
import numpy as np

# Convert images to float32
train_data = [img.astype(np.float32) for img in train_data]
# Convert labels (masks) to int32
train_labels = [lbl.astype(np.int32) for lbl in train_labels]

# Convert test images to float32 and labels to int32
test_data = [img.astype(np.float32) for img in test_data]
test_labels = [lbl.astype(np.int32) for lbl in test_labels]


In [7]:
from cellpose import metrics

# run model on test images
masks = model.eval(test_data, batch_size=32)[0]

# check performance using ground truth labels
ap = metrics.average_precision(test_labels, masks)[0]
print("")
print(f">>> average precision at iou threshold 0.5 = {ap[:, 0].mean():.3f}")

2025-07-11 09:27:09,482 [INFO] 0%|          | 0/3 [00:00<?, ?it/s]
2025-07-11 09:27:49,597 [INFO] 33%|###3      | 1/3 [00:40<01:20, 40.11s/it]
2025-07-11 09:28:30,621 [INFO] 67%|######6   | 2/3 [01:21<00:40, 40.65s/it]
2025-07-11 09:29:13,093 [INFO] 100%|##########| 3/3 [02:03<00:00, 41.48s/it]
2025-07-11 09:29:13,095 [INFO] 100%|##########| 3/3 [02:03<00:00, 41.20s/it]

>>> average precision at iou threshold 0.5 = 0.731


## Train New Model

Now we’re ready to train! In this step, we’ll tell Cellpose to:
- Use the training images and masks
- Save the trained model to your specified directory
- Run for a defined number of **epochs** (iterations over the full dataset)

You can also set other options like:
- Learning rate
- Batch size
- Whether to use GPU

> 💡 Training time will vary depending on your dataset size and hardware. On Google Colab with a GPU, small datasets may train in just a few minutes.

After training, the model weights will be saved and ready to use for predictions. We’ll evaluate performance on the test data in the next step.


In [None]:
from cellpose import train

model_name = "new_model"

# Training params
n_epochs = 10
learning_rate = 1e-5
weight_decay = 0.1
batch_size = 1

# (not passing test data into function to speed up training)

new_model_path, train_losses, test_losses = train.train_seg(
    model.net,
    train_data=train_data,
    train_labels=train_labels,
    batch_size=batch_size,
    n_epochs=n_epochs,
    learning_rate=learning_rate,
    weight_decay=weight_decay,
    nimg_per_epoch=max(2, len(train_data)),  # can change this
    model_name=model_name,
)

2025-07-11 09:29:16,536 [INFO] computing flows for labels


100%|██████████| 5/5 [00:00<00:00,  5.21it/s]

2025-07-11 09:29:17,499 [INFO] >>> computing diameters



100%|██████████| 5/5 [00:00<00:00, 1860.83it/s]

2025-07-11 09:29:17,503 [INFO] >>> normalizing {'lowhigh': None, 'percentile': None, 'normalize': True, 'norm3D': True, 'sharpen_radius': 0, 'smooth_radius': 0, 'tile_norm_blocksize': 0, 'tile_norm_smooth3D': 1, 'invert': False}
2025-07-11 09:29:17,511 [INFO] >>> n_epochs=10, n_train=5, n_test=None
2025-07-11 09:29:17,511 [INFO] >>> AdamW, learning_rate=0.00001, weight_decay=0.10000
2025-07-11 09:29:17,513 [INFO] >>> saving model to /Users/ranit/Research/github/bobiac/content/05_segmentation/deep_learning/models/new_model





2025-07-11 09:56:13,685 [INFO] 0, train_loss=1.1414, test_loss=0.0000, LR=0.000000, time 1616.17s


## Evaluate on test data

In [None]:
from cellpose import metrics

model = models.CellposeModel(gpu=True, pretrained_model=new_model_path)

# run model on test images
masks = model.eval(test_data, batch_size=32)[0]

# check performance using ground truth labels
ap = metrics.average_precision(test_labels, masks)[0]
print("")
print(f">>> average precision at iou threshold 0.5 = {ap[:, 0].mean():.3f}")