<a href="https://colab.research.google.com/github/Mitch-P-Analyst/BCHS-TPL-Scheduling-Automation/blob/main/notebooks/02_train_model_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Solfafune Tree Canopy Project



## Overview
This Google Colab notebook provides an alternative environment for Mitchell Palmer’s Solafune Tree Canopy Segmentation project, providing the able use of free or paid Google Colab GPUs.

Repository: [github.com/Mitch-P-Analyst/solafune_tree_canopy](https://github.com/Mitch-P-Analyst/solafune_tree_canopy)

> 📦 Restructuring in progress to improve reproducibility and remove hardcoded symlink logic in future versions.

## Structure Update

This project was originally designed for local development using relative paths (e.g. `REPO_ROOT / data / ...`), which creates compatibility challenges when running in Google Colab.

This Google Colab file has been redesigned from producing the entire project, to only replicating **Step 01 | Data Preparation and Step 02 | Model Training**, before exporting the **trained model** to a local device for remaining Steps 03 - 05 (testing, predicting, exporting submission).


### Oversight:
To support Colab execution:
- The project is cloned into `/content/solafune_tree_canopy`
- Relative local paths for data and configurations are Symlinked to:
  - `/content/solafune_tree_canopy/ `
- Data source ZIP files from Solafune must be stored in an accessible Google Drive location
  - Local data pathways are Symlinked to your chosen Google Drive location
Colab's virtual environment, capitalising on GPU services for model training,while preserving the original folder structure.

## Procedure
- [Repository Setup](#repository-setup)
  - Clone repository and install required packages
- [Mount Google Drive](#mount-google-drive)
  - Mount personal Google Drive for data input and model output
- [Step 01 | Data Preparation](#step-01--data-preparation)
  - Extract, prepare and split data by **same seed** as local device procedure
- [Step 02 | Train Model](#02_Train_Model)
  - Train YOLO model
- [Export Trained Model Weight](#Export-Trained-Model-Weights)
  - Export trained model weights to Google Drive destination

## Results

Upon export of Trained Model Weights to google drive destination, download and import the folder to Local Path Destination
  - `REPO_ROOT / runs / segment /`

Proceed with remaining steps in Local Device








#Repository Setup


In [None]:
# Fresh Directory
!cd /content          # Home directory established for following project base

# Clone repo fresh each session
!git clone https://github.com/Mitch-P-Analyst/solafune_tree_canopy.git

# List all directories present
!ls -al

%cd solafune_tree_canopy


Cloning into 'solafune_tree_canopy'...
remote: Enumerating objects: 590, done.[K
remote: Counting objects: 100% (56/56), done.[K
remote: Compressing objects: 100% (38/38), done.[K
remote: Total 590 (delta 28), reused 41 (delta 17), pack-reused 534 (from 1)[K
Receiving objects: 100% (590/590), 87.87 MiB | 28.01 MiB/s, done.
Resolving deltas: 100% (204/204), done.
total 20
drwxr-xr-x  1 root root 4096 Nov  1 01:19 .
drwxr-xr-x  1 root root 4096 Nov  1 01:13 ..
drwxr-xr-x  4 root root 4096 Oct 30 13:36 .config
drwxr-xr-x  1 root root 4096 Oct 30 13:36 sample_data
drwxr-xr-x 10 root root 4096 Nov  1 01:19 solafune_tree_canopy
/content/solafune_tree_canopy


In [None]:
# Install requirements.txt
%pip -q install -r /content/solafune_tree_canopy/requirements.txt

# Import Packages
from pathlib import Path
import shutil
import papermill as pm
import os
from datetime import datetime
import numpy as np
import time
import yaml
import torch
import pandas as pd
import ultralytics
from ultralytics import YOLO
import json
from PIL import Image






[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.1 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m1.0/1.1 MB[0m [31m44.6 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m29.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.3/12.3 MB[0m [31m133.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.7/76.7 kB[0m [31m7.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m59.8/59.8 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m20.0 MB/s[0m eta [36m0:00:00[0m
[?25hCreating new Ultralytics Settings v0.0.6 file ✅ 
View Ultralytics Settings with 'yolo settings' or at '/root/.config/Ultralytics/settings.json'
Update Settings with 'y

In [None]:
# Directories Setup within '/content/'

REPO_ROOT = Path("/content/solafune_tree_canopy/")                              # default for Colab; will be overridden by Papermill if needed
REPO_DIR = ("/content/solafune_tree_canopy")                                    # Non-Path Repo Directory

##Mount Google Drive

In [None]:
# Mount Google Drive for raw imagery datasets and training model outputs



from google.colab import drive                                                  # Mount Google Drive
drive.mount('/content/solafune_tree_canopy/drive', force_remount=True)

import sys
sys.path.append("/content/solafune_tree_canopy")                                # Utilise symlink module to interlink relative paths with Google Drive

from ultralytics.utils import SETTINGS  # <-- works on current releases
SETTINGS.update({"datasets_dir": REPO_DIR})

SETTINGS



Mounted at /content/solafune_tree_canopy/drive


{'settings_version': '0.0.6',
 'datasets_dir': '/content/solafune_tree_canopy',
 'weights_dir': 'weights',
 'runs_dir': 'runs',
 'uuid': '569f3ba64b326db489132663f79cd37279811de477381b83ac131e6cdd129cbb',
 'sync': True,
 'api_key': '',
 'openai_api_key': '',
 'clearml': True,
 'comet': True,
 'dvc': True,
 'hub': True,
 'mlflow': True,
 'neptune': True,
 'raytune': True,
 'tensorboard': False,
 'wandb': False,
 'vscode_msg': True,
 'openvino_msg': True}

<a id="step-01"></a>
# 01_Data_Preparation

Create SymLink function to connect relative repository paths with mounted Google Drive folder directory.

In [None]:
def make_symlink(src: Path, dst: Path):
    dst.parent.mkdir(parents=True, exist_ok=True)
    # If something already exists at dst, remove it first
    if dst.is_symlink() or dst.exists():
        if dst.is_symlink() or dst.is_file():
            dst.unlink()
        else:
            shutil.rmtree(dst)
    dst.symlink_to(src, target_is_directory=True)
    print(f"Linked: {dst} -> {src}")

In [None]:
# Symlink local data folders paths to established GC '/content/' virtual environment
gc_data_root = Path("/content/solafune_tree_canopy/data/")
local_data_root = Path("/content/data/")

make_symlink(gc_data_root, local_data_root)

gc_data = Path("/content/solafune_tree_canopy/data/")
local_data = Path("/data/")

make_symlink(gc_data, local_data)


#-- Raw Imagery Data --#
  # Symlink input paths of image ZIP folders from local path to Google Drive location

#--- Training Images ---#

# Input your Google Drive folder path hosting **train_images.zip** from Solafune
training_images_ZIP = "/content/solafune_tree_canopy/drive/MyDrive/Datasets/solafune-tree-canopy/data/raw/ZIPs/train_images.zip"

# Run Symlink to ZIP Folder location
drive_train_zip = Path(training_images_ZIP)
repo_train_zip  = Path("/content/data/raw/zips/train_images.zip")
make_symlink(drive_train_zip, repo_train_zip)

#--- Testing Images ---#

# Input your Google Drive folder path hosting **evaluation_images.zip** from Solafune
testing_images_ZIP = "/content/solafune_tree_canopy/drive/MyDrive/Datasets/solafune-tree-canopy/data/raw/ZIPs/evaluation_images.zip"

# Run Symlink to ZIP Folder location
drive_eval_zip = Path(testing_images_ZIP)
repo_eval_zip = Path("/content/data/raw/zips/evaluation_images.zip")
make_symlink(drive_eval_zip, repo_eval_zip)



Linked: /content/data -> /content/solafune_tree_canopy/data
Linked: /data -> /content/solafune_tree_canopy/data
Linked: /content/data/raw/zips/train_images.zip -> /content/solafune_tree_canopy/drive/MyDrive/Datasets/solafune-tree-canopy/data/raw/ZIPs/train_images.zip
Linked: /content/data/raw/zips/evaluation_images.zip -> /content/solafune_tree_canopy/drive/MyDrive/Datasets/solafune-tree-canopy/data/raw/ZIPs/evaluation_images.zip


In [None]:
# Assign Step 01 notebook and python file to variables for terminal activation
data_prep_nb = f"{REPO_DIR}/notebooks/01_data_preparation.ipynb"
data_preb_py = f"{REPO_DIR}/notebooks/01_data_preparation.py"

# Run Step 01
!jupytext --to py:percent "$data_prep_nb"
!python "$data_preb_py"

[jupytext] Reading /content/solafune_tree_canopy/notebooks/01_data_preparation.ipynb in format ipynb
[jupytext] Writing /content/solafune_tree_canopy/notebooks/01_data_preparation.py in format py:percent
✅ Saved COCO annotations to: /content/data/processed/JSONs/train_annotations_coco.json
Annotations /content/solafune_tree_canopy/data/processed/JSONs/train_annotations_coco.json: 100% 150/150 [00:01<00:00, 107.80it/s]
COCO data converted successfully.
Results saved to /content/solafune_tree_canopy/data/temp/temp_labels
✅ Unzipped: /content/data/raw/zips/train_images.zip → /content/data/temp/temp_images
✅ Unzipped: /content/data/raw/zips/evaluation_images.zip → /content/data/processed/images/predict
Begin Data Split
Data Split By Seed: 0
Successful Data Split:
  Train - 80%
  Val - 20%

Validation files count:
  Labels: 30
  Images: 30
Training files count:
  Labels: 120
  Images: 120
Prediction file count:
  Images: 150


# 02_Train_Model

Utilise Google Colab GPUs for training YOLO Image Segementation model on sorted data

### Model Training Parameters

Open the `train_model_overrides` file to modify for training parameters to your desire

/content/solafune_tree_canopy/configurations/train_model_overrides.yaml

### Training Model Py File

Training Model is set to `YOLO11s-seg`. Open the training model to modify desired YOLO model on `Row 21`

/content/solafune_tree_canopy/scripts/02_train_model.py

## Train Model

In [None]:
# Symlink local data folders paths to established GC '/content/' virtual environment

  #-- Configurations --#

gc_configs_path = Path("/content/solafune_tree_canopy/configurations/")
local_configs_path = Path("/configurations/")

make_symlink(gc_configs_path, local_configs_path)

gc_content_configs_path = Path("/content/solafune_tree_canopy/configurations/")
content_configs_path = Path("/content/configurations/")

make_symlink(gc_content_configs_path, content_configs_path)

  #-- Output Runs --#

gc_runs_output = Path("/content/solafune_tree_canopy/runs/")
local_runs_output = Path("/content/runs/")

make_symlink(gc_runs_output, local_runs_output)

Linked: /configurations -> /content/solafune_tree_canopy/configurations
Linked: /content/configurations -> /content/solafune_tree_canopy/configurations
Linked: /content/runs -> /content/solafune_tree_canopy/runs


In [None]:
# Run Training Model File
train_model = f"{REPO_ROOT}/scripts/02_train_model.py"

!python "$train_model"

New https://pypi.org/project/ultralytics/8.3.223 available 😃 Update with 'pip install -U ultralytics'
Ultralytics 8.3.185 🚀 Python-3.12.12 torch-2.8.0+cu126 CUDA:0 (Tesla T4, 15095MiB)
[34m[1mengine/trainer: [0magnostic_nms=False, amp=True, augment=False, auto_augment=randaugment, batch=8, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, conf=None, copy_paste=0.15, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=/content/solafune_tree_canopy/configurations/model_data-seg.yaml, degrees=180, deterministic=True, device=None, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=100, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.028, hsv_s=0.9, hsv_v=0.4, imgsz=832, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.0032, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolo11s-seg.pt, momentum=0.937, mosaic=1.0, multi_scale=False, 

## Export Trained Model Weights

Export model weights for permanent storage and project continuation on a local device

- Requirements
  - Google Drive Output location

In [None]:
# New trained model
  #-- Copy GC path to your trained model weights from:  /content/runs/segment/__ --#

Model_weights = '/content/runs/segment/train_Yolo11s_canopy_832_adamW_' # Input Your Path
trained_model = Path(Model_weights)

# Training Model Parameters
train_parameters = Path("/content/solafune_tree_canopy/configurations/train_model_overrides.yaml")

# Drive Runs Folder
  #-- Choose your Google Drive Output destination for Model Weight --#

Google_Drive_Location = '/content/solafune_tree_canopy/drive/MyDrive/Datasets/solafune-tree-canopy/runs/segment' # Input Your Path

Google_Drive_Runs = Path(Google_Drive_Location)

# Run Folder Output
Google_Drive_Output = Google_Drive_Runs / f"{trained_model.name}_{datetime.now():%Y%m%d-%H%M}"


Google_Drive_Output.parent.mkdir(parents=True, exist_ok=True)

try:
    shutil.copy(train_parameters, trained_model)
    # Copy Training Parameters into Trained Model folder for future reference
    print(f"File '{train_parameters}' successfully copied to '{trained_model}'")
except FileExistsError:
    print(f"Error: Destination folder '{trained_model}' already exists.")

try:
    shutil.copytree(trained_model, Google_Drive_Output, dirs_exist_ok=True)
    # Copy Trained Model folder to Google Drive Location
    print(f"Folder '{trained_model}' successfully copied to '{Google_Drive_Output}'")
except FileExistsError:
    print(f"Error: Destination folder '{Google_Drive_Output}' already exists.")
except Exception as e:
    print(f"An error occurred: {e}")

File '/content/solafune_tree_canopy/configurations/train_model_overrides.yaml' successfully copied to '/content/runs/segment/train_Yolo11s_canopy_832_adamW_'
Folder '/content/runs/segment/train_Yolo11s_canopy_832_adamW_' successfully copied to '/content/solafune_tree_canopy/drive/MyDrive/Datasets/solafune-tree-canopy/runs/segment/train_Yolo11s_canopy_832_adamW__20251101-0151'
