<a href="https://colab.research.google.com/github/Vasquez-505/Data-Science-Projects-/blob/main/docs/notebooks/Training_and_inference_using_Google_Drive.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Training and inference on your own data using Google Drive

In [8]:
!pip uninstall -qqq -y opencv-python opencv-contrib-python
!pip install -qqq "sleap[pypi]>=1.5.1"

# --- Sanity checks ---
import sleap, torch
print(" SLEAP version:", sleap.__version__)
print(" PyTorch version:", torch.__version__)
print(" CUDA available:", torch.cuda.is_available())


[0m SLEAP version: 1.5.1
 PyTorch version: 2.8.0+cu126
 CUDA available: True


In [9]:
!pip uninstall -qqq -y opencv-python opencv-contrib-python
!pip install -qqq "sleap[pypi]==1.5.1" sleap-io==0.5.5


[0m

In [10]:
 !pip install -qqq "sleap-nn[torch-cuda-128]"

[0m

In [4]:
# Sanity Check
import sleap, sleap_io
print("SLEAP:", sleap.__version__)
print("SLEAP-IO:", sleap_io.__version__)

SLEAP: 1.5.1
SLEAP-IO: 0.5.5


### Create and export the training job package
A self-contained **training job package** contains a .slp file with labeled data and images which will be used for training, as well as .json training configuration file(s).

A training job package can be exported in the SLEAP GUI fron the "Run Training.." dialog under the "Predict" menu.

### Upload training job package to Google Drive
To be consistent with the examples in this notebook, name the SLEAP project `colab` and create a directory called `sleap` in the root of your Google Drive. Then upload the exported training job package `colab.slp.training_job.zip` into `sleap` directory.

If you place your training pckage somewhere else, or name it differently, adjust the paths/filenames/parameters below accordingly.

In [11]:
from google.colab import drive
drive.mount('/content/drive/')

Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).


In [6]:
import os
os.chdir("/content/drive/My Drive/sleap")
!unzip colab.slp.training_job.zip
!ls

Archive:  colab.slp.training_job.zip
  inflating: colab.pkg.slp           
  inflating: inference-script.sh     
  inflating: jobs.yaml               
  inflating: single_instance.yaml    
  inflating: train-script.sh         
colab.pkg.slp		    jobs.yaml		  train-script.sh
colab.slp.training_job.zip  models
inference-script.sh	    single_instance.yaml


In [None]:
# === MASTER CONFIG UPDATER (no inference) ===
# Edits: single_instance.yaml, jobs.yaml, train-script.sh
# Matches your schema exactly.

import os, yaml, datetime, pathlib

# ------------------------------
# PICK YOUR RUN / MODEL NAME HERE
# ------------------------------
RUN_NAME = "drosophila_unet_" + datetime.datetime.now().strftime("%y%m%d_%H%M%S")
LABELS_PATH = "colab.pkg.slp"          # or "colab copy.pkg.slp"
CKPT_DIR = f"models/{RUN_NAME}"        # where checkpoints go
SINGLE_INSTANCE_YAML = "single_instance.yaml"
JOBS_YAML = "jobs.yaml"
TRAIN_SH = "train-script.sh"

# ------------------------------
# FULL PARAMETER BLOCKS (edit anything)
# ------------------------------

DATA_CONFIG = {
    "train_labels_path": [LABELS_PATH],
    "val_labels_path": None,
    "validation_fraction": 0.1,
    "test_file_path": None,
    "provider": "LabelsReader",
    "user_instances_only": True,
    "data_pipeline_fw": "torch_dataset",
    "cache_img_path": None,
    "use_existing_imgs": False,
    "delete_cache_imgs_after_training": True,
    "preprocessing": {
        "ensure_rgb": False,            # set True if your images are RGB
        "ensure_grayscale": False,      # set True if you force grayscale
        "max_height": None,             # e.g., 182
        "max_width": None,              # e.g., 682
        "scale": 1.0,                   # <- critical input scaling
        "crop_size": None,              # or [H, W]
        "min_crop_size": 100,
    },
    "use_augmentations_train": False,
    "augmentation_config": {
        "intensity": {
            "uniform_noise_min": 0.0,
            "uniform_noise_max": 1.0,
            "uniform_noise_p": 0.0,
            "gaussian_noise_mean": 5.0,
            "gaussian_noise_std": 0.0,
            "gaussian_noise_p": 0.0,
            "contrast_min": 0.5,
            "contrast_max": 1.75,
            "contrast_p": 0.0,
            "brightness_min": 0.0,
            "brightness_max": 2.0,
            "brightness_p": 0.0,
        },
        "geometric": {
            "rotation_min": -15.0,
            "rotation_max": 15.0,
            "scale_min": 0.9,
            "scale_max": 1.1,
            "translate_width": 0.0,
            "translate_height": 0.0,
            "affine_p": 1.0,
            "erase_scale_min": 0.0001,
            "erase_scale_max": 0.01,
            "erase_ratio_min": 1.0,
            "erase_ratio_max": 1.0,
            "erase_p": 0.0,
            "mixup_lambda_min": 0.01,
            "mixup_lambda_max": 0.05,
            "mixup_p": 0.0,
        },
    },
    "skeletons": None,
}

MODEL_CONFIG = {
    "init_weights": "default",
    "pretrained_backbone_weights": None,
    "pretrained_head_weights": None,
    "backbone_config": {
        "unet": {
            "in_channels": 1,      # 1 for grayscale, 3 for RGB
            "kernel_size": 3,
            "filters": 32,
            "filters_rate": 1.5,
            "max_stride": 32,
            "stem_stride": None,
            "middle_block": True,
            "up_interpolate": True,
            "stacks": 1,
            "convs_per_block": 2,
            "output_stride": 4,
        },
        "convnext": None,
        "swint": None,
    },
    "head_configs": {
        "single_instance": {
            "confmaps": {
                "part_names": None,   # auto from labels if None
                "sigma": 2.5,
                "output_stride": 4,
            }
        },
        "centroid": None,
        "centered_instance": None,
        "bottomup": None,
        "multi_class_bottomup": None,
        "multi_class_topdown": None,
    },
    "total_params": None,
}

TRAINER_CONFIG = {
    "train_data_loader": {"batch_size": 6, "shuffle": False, "num_workers": 0},
    "val_data_loader": {"batch_size": 6, "shuffle": False, "num_workers": 0},
    "model_ckpt": {"save_top_k": 1, "save_last": False},
    "trainer_devices": None,              # 'auto' or explicit list
    "trainer_device_indices": None,
    "trainer_accelerator": "auto",
    "profiler": None,
    "trainer_strategy": "auto",
    "enable_progress_bar": True,
    "min_train_steps_per_epoch": 200,
    "train_steps_per_epoch": None,
    "visualize_preds_during_training": True,
    "keep_viz": False,
    "max_epochs": 200,
    "seed": None,
    "use_wandb": False,
    "save_ckpt": True,
    "ckpt_dir": CKPT_DIR,
    "run_name": RUN_NAME,
    "resume_ckpt_path": None,
    "wandb": {
        "entity": "",
        "project": "",
        "name": "",
        "save_viz_imgs_wandb": False,
        "api_key": "",
        "wandb_mode": None,
        "prv_runid": "",
        "group": "",
        "current_run_id": None,
    },
    "optimizer_name": "Adam",
    "optimizer": {"lr": 1e-4, "amsgrad": False},
    "lr_scheduler": None,
    "early_stopping": {
        "min_delta": 1e-8,
        "patience": 10,
        "stop_training_on_plateau": True,
    },
    "online_hard_keypoint_mining": {
        "online_mining": False,
        "hard_to_easy_ratio": 2.0,
        "min_hard_keypoints": 2,
        "max_hard_keypoints": None,
        "loss_scale": 5.0,
    },
    "zmq": {
        "controller_port": 9000,
        "controller_polling_timeout": 10,
        "publish_port": 9001,
    },
}

# ------------------------------
# Helpers
# ------------------------------
def _ensure_keys(d, template):
    """Recursively ensure keys from template exist in dict d."""
    if d is None:
        return template
    for k, v in template.items():
        if k not in d or d[k] is None:
            d[k] = v
        else:
            if isinstance(v, dict):
                d[k] = _ensure_keys(d.get(k, {}), v)
    return d

def _safe_update(d, updates):
    """Recursively update d with updates, preserving other keys."""
    for k, v in updates.items():
        if isinstance(v, dict):
            d[k] = _safe_update(d.get(k, {}) if isinstance(d.get(k), dict) else {}, v)
        else:
            d[k] = v
    return d

def _load_yaml(path):
    with open(path, "r") as f:
        return yaml.safe_load(f)

def _save_yaml(path, data):
    with open(path, "w") as f:
        yaml.safe_dump(data, f, sort_keys=False)

# ------------------------------
# 1) Update single_instance.yaml
# ------------------------------
si = _load_yaml(SINGLE_INSTANCE_YAML)

# Ensure structure exists exactly as your schema
si = _ensure_keys(si, {
    "data_config": {},
    "model_config": {},
    "trainer_config": {},
    "name": "",
    "description": "",
    "sleap_nn_version": si.get("sleap_nn_version", "0.0.2"),
    "filename": "",
})

# Merge updates
si["data_config"]      = _safe_update(si["data_config"], DATA_CONFIG)
si["model_config"]     = _safe_update(si["model_config"], MODEL_CONFIG)
si["trainer_config"]   = _safe_update(si["trainer_config"], TRAINER_CONFIG)
si["trainer_config"]["run_name"] = RUN_NAME
si["trainer_config"]["ckpt_dir"] = CKPT_DIR

# Save
# Ensure model folder exists for cleanliness
pathlib.Path(CKPT_DIR).mkdir(parents=True, exist_ok=True)
_ save_yaml = _save_yaml  # avoid accidental shadowing
_save_yaml(SINGLE_INSTANCE_YAML, si)
print(f"✅ Updated {SINGLE_INSTANCE_YAML} with run_name={RUN_NAME}")

# ------------------------------
# 2) Update jobs.yaml (best effort; keeps structure)
# ------------------------------
if os.path.exists(JOBS_YAML):
    jobs = _load_yaml(JOBS_YAML)

    # Try common fields used by Colab templates; only update if present
    # We keep your structure intact.
    def set_if_exists(root, keys, value):
        cur = root
        for k in keys[:-1]:
            if not isinstance(cur, dict) or k not in cur:
                return
            cur = cur[k]
        if isinstance(cur, dict) and keys[-1] in cur:
            cur[keys[-1]] = value

    # Common spots we might find these
    set_if_exists(jobs, ["training_job", "trainer_config", "run_name"], RUN_NAME)
    set_if_exists(jobs, ["training_job", "trainer_config", "ckpt_dir"], CKPT_DIR)
    set_if_exists(jobs, ["training_job", "data_config", "train_labels_path"], [LABELS_PATH])
    set_if_exists(jobs, ["training_job", "config_path"], SINGLE_INSTANCE_YAML)

    _save_yaml(JOBS_YAML, jobs)
    print(f"✅ Updated {JOBS_YAML} (run_name, ckpt_dir, labels if present)")
else:
    print(f"ℹ️ {JOBS_YAML} not found — skipped (that’s fine).")

# ------------------------------
# 3) Update train-script.sh
# ------------------------------
train_script = f"""#!/bin/bash
# Auto-generated: {RUN_NAME}
echo "Starting SLEAP training: {RUN_NAME}"
sleap-train {SINGLE_INSTANCE_YAML} {LABELS_PATH} --first-gpu
"""
with open(TRAIN_SH, "w") as f:
    f.write(train_script)
os.chmod(TRAIN_SH, 0o755)
print(f"✅ Updated {TRAIN_SH}")

print(f"\n🎯 Ready. Model/run name: {RUN_NAME}\nCheckpoints: {CKPT_DIR}\nLabels: {LABELS_PATH}")


## Train a model

Let's train a model with the training profile (.json file) and the project data (.slp file) you have exported from SLEAP.


### Note on training profiles
Depending on the pipeline you chose in the training dialog, the config filename(s) will be:

- for a **bottom-up** pipeline approach: `multi_instance.json` (this is the pipeline we assume here),

- for a **top-down** pipeline, you'll have a different profile for each of the models: `centroid.json` and `centered_instance.json`,

- for a **single animal** pipeline: `single_instance.json`.


### Note on training process
When you start training, you'll first see the training parameters and then the training and validation loss for each training epoch.

As soon as you're satisfied with the validation loss you see for an epoch during training, you're welcome to stop training by clicking the stop button. The version of the model with the lowest validation loss is saved during training, and that's what will be used for inference.

If you don't stop training, it will run for 200 epochs or until validation loss fails to improve for some number of epochs (controlled by the early_stopping fields in the training profile).

In [7]:
!sleap-train single_instance.yaml colab.pkg.slp





INFO:numexpr.utils:NumExpr defaulting to 2 threads.
INFO:sleap.legacy_cli_adaptors:Started training at: 2025-10-17 16:30:39.964863
2025-10-17 16:30:40 | INFO | sleap_nn.training.model_trainer:_setup_train_val_labels:216 | Creating train-val split...
2025-10-17 16:30:40 | INFO | sleap_nn.training.model_trainer:_setup_train_val_labels:261 | # Train Labeled frames: 407
2025-10-17 16:30:40 | INFO | sleap_nn.training.model_trainer:_setup_train_val_labels:262 | # Val Labeled frames: 45
2025-10-17 16:30:40 | INFO | sleap_nn.training.model_trainer:setup_config:512 | Setting up config...
2025-10-17 16:30:40 | INFO | sleap_nn.training.model_trainer:_verify_model_input_channels:417 | Updating backbone in_channels to 3 based on the input image channels.
2025-10-17 16:30:41 | INFO | sleap_nn.training.model_trainer:train:849 | Setting up for training...
2025-10-17 16:30:41 | INFO | sleap_nn.training.model_trainer:_setup_model_ckpt_dir:575 | Setting up model ckpt dir: `models/NoneNonesingle_instance_

If instead of bottom-up you've chosen the top-down pipeline (with two training configs), you would need to invoke two separate training jobs in sequence:

- `!sleap-train centroid.json colab.pkg.slp`
- `!sleap-train centered_instance.json colab.pkg.slp`


## Run inference to predict instances

Once training finishes, you'll see a new directory (or two new directories for top-down training pipeline) containing all the model files SLEAP needs to use for inference.

Here we'll use the created model files to run inference in two modes:

- predicting instances in suggested frames from the exported .slp file

- predicting and tracking instances in uploaded video

You can also download the trained models for running inference from the SLEAP GUI on your computer (or anywhere else).

### Predicting instances in suggested frames
This mode of predicting instances is useful for accelerating the manual labeling work; it allows you to get early predictions on suggested frames and merge them back into the project for faster labeling.

Here we assume you've trained a bottom-up model and that the model files were written in directory named `colab_demo.bottomup`; later in this notebook we'll also show how to run inference with the pair of top-down models instead.

In [None]:
!sleap-track \
    -m colab_demo.bottomup \
    --only-suggested-frames \
    -o colab.predicted_suggestions.slp \
    colab.pkg.slp

Now, you can download the generated `colab.predicted_suggestions.slp` file and merge it into your labeling project (**File -> Merge into Project...** from the GUI) to get new predictions for your suggested frames.

### Predicting and tracking instances in uploaded video
Let's first upload the video we want to run inference on and name it `colab_demo.mp4`. (If your video is not named `colab_demo.mp4`, adjust the names below accordingly.)

For this demo we'll just get predictions for the first 200 frames (or you can adjust the --frames parameter below or remove it to run on the whole video).

In [None]:
!sleap-track colab_demo.mp4 \
    --frames 0-200 \
    --tracking.tracker simple \
    -m colab_demo.bottomup

When inference is finished, it will save the predictions in a file which can be opened in the GUI as a SLEAP project file. The file will be in the same directory as the video and the filename will be `{video filename}.predictions.slp`.

Let's inspect the predictions file:

In [None]:
!sleap-inspect colab_demo.mp4.predictions.slp

You can copy this file from your Google Drive to a local drive and open it in the SLEAP GUI app (or open it directly if you have your Google Drive mounted on your local machine). If the video is in the same directory as the predictions file, SLEAP will automatically find it; otherwise, you'll be prompted to locate the video (since the path to the video on your local machine will be different than the path to the video on Colab).

### Inference with top-down models

If you trained the pair of models needed for top-down inference, you can call `sleap-track` with `-m path/to/model` for each model, like so:

In [None]:
!sleap-track colab_demo.mp4 \
    --frames 0-200 \
    --tracking.tracker simple \
    -m colab_demo.centered_instance \
    -m colab_demo.centroid