# Label Propagation: Benchmarking Pipeline

This notebook contains the pipeline that helps create benchmarks on different datasets using one of the following approaches:
1. Point Label Aware Superpixels approach
2. SAM -  Multiple "Points Based Prompting" followed by Masks Blending
3. SAM-HQ - Multiple "Points Based Prompting" followed by Masks Blending

## Setups

### Logger Setup

In [14]:
import logging

In [15]:
# define a formatter to display the messages to console (standard output)
console_formatter = logging.Formatter('%(message)s')
# console_formatter = logging.Formatter('%(levelname)s:%(module)s:%(message)s')
console_handler = logging.StreamHandler()
console_handler.setFormatter(console_formatter)

In [16]:
# define a logger for this notebook and attach the console handler
logger = logging.getLogger('Label-Propagation')
logger.handlers.clear()
logger.propagate = False
logger.addHandler(console_handler)

In [17]:
# set an appropriate level of logging for this notebook
logger.setLevel(logging.INFO)

### Mount GCS Bucket

References:
1. [Mount a Cloud Storage bucket using Cloud Storage FUSE](https://cloud.google.com/storage/docs/gcsfuse-quickstart-mount-bucket)
2. [Snippets: Saving Data to Google Cloud Storage](https://colab.research.google.com/notebooks/snippets/gcs.ipynb)
3. [Connect Colab to GCS Bucket Using gcsfuse](https://pub.towardsai.net/connect-colab-to-gcs-bucket-using-gcsfuse-29f4f844d074)

**Authentication**

This step authenticates the user in order to access Google Storage Bucket using an authenticated account.

In [5]:
from google.colab import auth
auth.authenticate_user()

ModuleNotFoundError: No module named 'google'

**Install Cloud Storage FUSE**

1. Add the Cloud Storage FUSE distribution URL as a package source:

In [6]:
# check if gcsfuse.list file exists at the path it normally resides
!ls /etc/apt/sources.list.d

'ls' is not recognized as an internal or external command,
operable program or batch file.


In [7]:
# open gcsfuse.list file and show its contents
!cat /etc/apt/sources.list.d/gcsfuse.list

'cat' is not recognized as an internal or external command,
operable program or batch file.


In [8]:
# add Cloud Storage FUSE distribution URL as a package source
!echo "deb https://packages.cloud.google.com/apt gcsfuse-`lsb_release -c -s` main" | sudo tee /etc/apt/sources.list.d/gcsfuse.list

'sudo' is not recognized as an internal or external command,
operable program or batch file.


In [9]:
# open gcsfuse.list file and show its contents
!cat /etc/apt/sources.list.d/gcsfuse.list

'cat' is not recognized as an internal or external command,
operable program or batch file.


2. Import the Google Cloud APT repository public key and add it to your list of keys:

In [10]:
!curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -

'sudo' is not recognized as an internal or external command,
operable program or batch file.


3. Updates the list of available packages and install gcsfuse:

In [11]:
!apt -qq update && apt -qq install gcsfuse

'apt' is not recognized as an internal or external command,
operable program or batch file.


In [12]:
# check if gcsfuse is installed
!gcsfuse -v

'gcsfuse' is not recognized as an internal or external command,
operable program or batch file.


**Mount Bucket on Colab Disk**

Initialize bucket name and the folder on Colab on which the bucket will be mounted

In [13]:
BUCKET_NAME = 'rs_storage_open'
BUCKET_MOUNT_DIR = f"/mnt/gs/{BUCKET_NAME}"

Create a directory to mount the storage bucket to

In [14]:
!mkdir -p {BUCKET_MOUNT_DIR}

The syntax of the command is incorrect.


In [15]:
!ls -l /mnt/gs

'ls' is not recognized as an internal or external command,
operable program or batch file.


Mount your storage bucket using the gcsfuse command:

In [16]:
!gcsfuse --implicit-dirs {BUCKET_NAME} {BUCKET_MOUNT_DIR}

'gcsfuse' is not recognized as an internal or external command,
operable program or batch file.


In [17]:
!ls -l {BUCKET_MOUNT_DIR}

'ls' is not recognized as an internal or external command,
operable program or batch file.


### Mount Google Drive

In [18]:
import os
from google.colab import drive

ModuleNotFoundError: No module named 'google'

In [19]:
DRIVE_MOUNT_DIR = '/content/drive'
drive.mount(DRIVE_MOUNT_DIR, force_remount=True)

NameError: name 'drive' is not defined

### Directory Setup

Local Root Directory

In [20]:
# local root directory for this notebook
LOCAL_ROOT_DIR = '/content'

Bucket Mount Directory

In [21]:
# Root folder for dataset
logger.info(f"Root directory for dataset: {BUCKET_MOUNT_DIR}")

Root directory for dataset: /mnt/gs/rs_storage_open


Project Root Directory

In [22]:
# Root folder in Google Drive for this project
PROJECT_ROOT_DIR = os.path.join(DRIVE_MOUNT_DIR, 'MyDrive', '20231114-FruitPunch-AI-for-Coral-Reefs-2')
logger.info(f"Root directory for project in Google Drive: {PROJECT_ROOT_DIR}")

Root directory for project in Google Drive: /content/drive\MyDrive\20231114-FruitPunch-AI-for-Coral-Reefs-2


### Custom Libraries

In [23]:
import sys

In [24]:
# add the path where point label aware superpixels and custom packags is located
sys.path.append(os.path.join(PROJECT_ROOT_DIR, 'packages'))

### Dependencies

In [4]:
import torch
print(torch.__version__)


2.4.1+cu124


In [6]:
# workaround to overcome long duration needed for installing torch_scatter
#!pip install pyg_lib torch_scatter -f https://data.pyg.org/whl/torch-{torch.2.4.1+cu124}.html
!pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.4.0+cu124.html

Looking in links: https://data.pyg.org/whl/torch-2.4.0+cu124.html
Collecting pyg_lib
  Downloading https://data.pyg.org/whl/torch-2.4.0%2Bcu124/pyg_lib-0.4.0%2Bpt24cu124-cp312-cp312-win_amd64.whl (1.8 MB)
     ---------------------------------------- 0.0/1.8 MB ? eta -:--:--
     ---------------------------------------- 0.0/1.8 MB ? eta -:--:--
     ---------------------------------------- 0.0/1.8 MB ? eta -:--:--
     ---------------------------------------- 0.0/1.8 MB ? eta -:--:--
     ---------------------------------------- 0.0/1.8 MB ? eta -:--:--
     ---------------------------------------- 0.0/1.8 MB ? eta -:--:--
     -- ------------------------------------- 0.1/1.8 MB 310.3 kB/s eta 0:00:06
     -- ------------------------------------- 0.1/1.8 MB 310.3 kB/s eta 0:00:06
     -- ------------------------------------- 0.1/1.8 MB 310.3 kB/s eta 0:00:06
     ---- ----------------------------------- 0.2/1.8 MB 397.4 kB/s eta 0:00:04
     ---- ----------------------------------- 0.2


[notice] A new release of pip is available: 23.2.1 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [7]:
!pip install torchmetrics




[notice] A new release of pip is available: 23.2.1 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [8]:
!pip install 'git+https://github.com/facebookresearch/segment-anything.git'

ERROR: Invalid requirement: "'git+https://github.com/facebookresearch/segment-anything.git'"

[notice] A new release of pip is available: 23.2.1 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [9]:
!pip install timm




[notice] A new release of pip is available: 23.2.1 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [10]:
!pip install segment-anything-hq




[notice] A new release of pip is available: 23.2.1 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [11]:
import cv2
import yaml
import pandas as pd
import albumentations

from pathlib import Path

  check_for_updates()


In [12]:
from labelmate.loader import LabelPropDataLoader
from labelmate.patchifier import SimplePatchifier
from labelmate.visualizer import visualize_output
from labelmate.evaluator import LabelPropEvaluator
from labelmate.propagator import PLASPIXLabelProp, SAMPointPromptsLabelProp

ModuleNotFoundError: No module named 'plaspix'

In [18]:
custom_modules = ['labelmate.loader', 'labelmate.visualizer',
                  'labelmate.evaluator', 'labelmate.propagator',
                  'labelmate.hypertuner', 'labelmate.patchifier',
                  ]
for module_name in custom_modules:
    logging.getLogger(module_name).setLevel(logging.INFO)

In [19]:
custom_modules = ['torchmetrics',
                  ]
for module_name in custom_modules:
    logging.getLogger(module_name).setLevel(logging.ERROR)

In [20]:
import warnings
warnings.filterwarnings("ignore", category=UserWarning)

### Pre-trained Weights

In [22]:
import shutil
import os
# dowload the pre-trained weights mentioned in GitHub and point to that file
PRE_TRAINED_WEIGHTS = os.path.join(PROJECT_ROOT_DIR, 'models', 'standardization_C=100_step70000.pth')

# copy the pre-trained weights file to the code folder created by git clone
if os.path.exists(PRE_TRAINED_WEIGHTS):
    logger.info(f"Copying weights from {PRE_TRAINED_WEIGHTS}...")
    shutil.copyfile(
        PRE_TRAINED_WEIGHTS,
        os.path.join(LOCAL_ROOT_DIR, 'standardization_C=100_step70000.pth'),
        )
    logger.info(os.listdir(LOCAL_ROOT_DIR))

NameError: name 'PROJECT_ROOT_DIR' is not defined

In [27]:
# dowload the pre-trained weights mentioned in GitHub and point to that file
SAM_WEIGHTS_PATH = '/content/weights/sam'
SAM_WEIGHTS_FILE = 'sam_vit_h_4b8939.pth'
SAM_WEIGHTS_URL = f"https://dl.fbaipublicfiles.com/segment_anything/{SAM_WEIGHTS_FILE}"

#!wget {SAM_WEIGHTS_URL} -P {SAM_WEIGHTS_PATH}
import requests

# Ensure the directory exists
os.makedirs(SAM_WEIGHTS_PATH, exist_ok=True)

# Download the file
response = requests.get(SAM_WEIGHTS_URL)
if response.status_code == 200:
    file_path = os.path.join(SAM_WEIGHTS_PATH, SAM_WEIGHTS_FILE)
    with open(file_path, 'wb') as f:
        f.write(response.content)
    print(f"Weights downloaded and saved to {file_path}")
else:
    print(f"Failed to download the file. Status code: {response.status_code}")


Weights downloaded and saved to /content/weights/sam\sam_vit_h_4b8939.pth


In [29]:
# dowload the pre-trained weights mentioned in GitHub and point to that file
SAM_HQ_WEIGHTS_PATH = '/content/weights/sam-hq'
SAM_HQ_WEIGHTS_FILE = 'sam_hq_vit_h.pth'
SAM_HQ_WEIGHTS_URL = 'https://huggingface.co/lkeab/hq-sam/resolve/main/sam_hq_vit_h.pth'

#!wget {SAM_HQ_WEIGHTS_URL} -P {SAM_HQ_WEIGHTS_PATH}

os.makedirs(SAM_HQ_WEIGHTS_PATH, exist_ok=True)

# Download the file
response = requests.get(SAM_HQ_WEIGHTS_URL)
if response.status_code == 200:
    file_path = os.path.join(SAM_HQ_WEIGHTS_PATH, SAM_HQ_WEIGHTS_FILE)
    with open(file_path, 'wb') as f:
        f.write(response.content)
    print(f"Weights downloaded and saved to {file_path}")
else:
    print(f"Failed to download the file. Status code: {response.status_code}")


Weights downloaded and saved to /content/weights/sam-hq\sam_hq_vit_h.pth


## Path Variables

In [None]:
# Path to folder that contains images and dense masks
DATA_ROOT_DIR = os.path.join(BUCKET_MOUNT_DIR, 'benthic_datasets', 'mask_labels', 'reef_support')
logger.info(f"Dataset (images and masks) will be accessed from: {DATA_ROOT_DIR}")

In [None]:
# Path to permanently store input images, input sparse masks in image format and output from label propagation
RESULTS_UPLOAD_PATH = os.path.join(PROJECT_ROOT_DIR, 'findings', 'SAM-B1-RAND-300-F')
logger.info(f"Results obtained will be saved to: {RESULTS_UPLOAD_PATH}")

In [None]:
# Path to the csv that contains image level summary for Reef Support dataset
REEF_SUPPORT_MANIFEST_PATH = os.path.join(PROJECT_ROOT_DIR, 'data', 'MANIFEST_REEF_SUPPORT.csv')
logger.info(f"Dataset manifest will be accessed from: {REEF_SUPPORT_MANIFEST_PATH}")

In [None]:
# Combined Point Labels data from Reef Support Dense Masks and Seaview Point Labels with results of label comparison exercise
REEF_SUPPORT_SEAVIEW_POINT_LABELS_PATH = os.path.join(PROJECT_ROOT_DIR, 'data', 'DATA-REEF-SUPPORT-POINT-LABELS-v1.csv')
logger.info(f"Point labels data will be loaded from: {REEF_SUPPORT_SEAVIEW_POINT_LABELS_PATH}")

## Helper Functions

In [None]:
def create_labelmate_dataset(quadrat_ids, manifest_df, point_labels_source='DENSE', point_labels_count=100):
    samples = []

    for quadrat_id in quadrat_ids:
        image_file_name = manifest_df.query(f"quadratid == '{quadrat_id}'").image_file_name.item()
        folder = manifest_df.query(f"quadratid == '{quadrat_id}'").folder.item()
        region = manifest_df.query(f"quadratid == '{quadrat_id}'").region.item()
        manifest_index = manifest_df.query(f"quadratid == '{quadrat_id}'").index.item()

        # PLASPIX code encounters CUDA OOM error for images larger than 1100 pixels height or width
        # best hyper parameter combination was found for images with 100 point labels in 1031x1031 samples
        # so, benchmark 1 will be computed by resizing all samples to 1024x1024 and then
        # randomly sampling 100 point labels from dense mask
        point_labels_source = point_labels_source
        point_labels_count = point_labels_count

        sample = dict(
            sample_id=quadrat_id,
            dataset_name=folder,
            region=region,
            image_path=os.path.join(DATA_ROOT_DIR, folder, 'images', image_file_name),
            mask_path=os.path.join(DATA_ROOT_DIR, folder, 'masks_stitched', f"{quadrat_id}_mask.png"),
            point_labels_source=point_labels_source,
            point_labels_count=point_labels_count,
            manifest_index=manifest_index,
        )
        samples.append(sample)

    dataset_df = pd.DataFrame(samples)

    return dataset_df

In [None]:
def get_resize_transform(image_size=1024):
    resize_transform = \
        albumentations.Compose(
            [albumentations.Resize(
                height=image_size,
                width=image_size,
                interpolation=cv2.INTER_AREA,
                p=1,
                ),
            ]
            )
    return resize_transform

## Dataset Preparation

### Read Manifest & Point Labels

In [None]:
manifest_df = pd.read_csv(REEF_SUPPORT_MANIFEST_PATH, header='infer')
point_labels_df = pd.read_csv(REEF_SUPPORT_SEAVIEW_POINT_LABELS_PATH, header='infer')

In [None]:
point_labels_df['class_name'] = point_labels_df.reef_support_class_name
point_labels_df['class_label'] = point_labels_df.reef_support_class_label

In [None]:
if point_labels_df.quadratid.dtype != object:
    point_labels_df = \
        point_labels_df.astype({'quadratid': str})
    print("Changed Quadrat ID to object data type")
else:
    print("Quadrat ID is already in object data type")

In [None]:
manifest_df.sample(2)

In [None]:
point_labels_df.sample(2)

### Sample Selection

In [None]:
with open('yolov8-config.yaml', 'r') as stream:
    yolov8_config = yaml.safe_load(stream)

print(yolov8_config)
print(list(yolov8_config['val_dataset'].keys()))

In [None]:
REEF_SUPPORT_BENCHMARK_DATASET = yolov8_config['val_dataset']
# REEF_SUPPORT_BENCHMARK_DATASET = dict(TETES_PROVIDENCIA=yolov8_config['val_dataset']['TETES_PROVIDENCIA'])

In [None]:
# quadrat_ids = [17001652802, 24047242502]
# quadrat_ids = [20034061802, 17025816302, 17019874202, 20034067102, 17017815902, 25009056001, 20029004202, 20042103002, 24026197901, 17019871102]
# quadrat_ids = [17001652802]
# quadrat_ids = [12025020201]

quadrat_ids = [
    '32011066301',
    # '12025020201',
    '20046281801',
    '20220914_AnB_CB14 (219)',
    'E19_T2_C11_Corr_22sep22',
    'G0088441',
    ]
quadrat_ids

## Label Propagation

### Parameters Setup

In [None]:
dataset_dict = REEF_SUPPORT_BENCHMARK_DATASET.copy()
# dataset_dict = {'REGION-1': ['17001652802'], 'REGION-2': ['12025020201'], 'REGION-3': ['24047242502']}
# dataset_dict = {'ATL': ['20010148701', '17039238402'], 'PAC_AUS': ['10001026902']}
experiment_name = 'SAM-B1-RAND-300-F'
execution_tag = 'C0'
working_folder = Path('/content/labelprop')

image_size = 1024
point_labels_source = 'DENSE'
point_labels_count = 300
patchifier = None # SimplePatchifier
propagator = SAMPointPromptsLabelProp # PLASPIXLabelProp
image_encoder_weights_path = os.path.join(SAM_WEIGHTS_PATH, SAM_WEIGHTS_FILE)

In [None]:
if 'SAM' in propagator.__name__:
    hyper_params = \
        dict(
            point_labels_source=point_labels_source,
            point_labels_count=point_labels_count,
            image_encoder='vit_h',
            image_encoder_weights_path=image_encoder_weights_path,
            )
elif 'PLASPIX' in propagator.__name__:
    # set value for each hyper parameter
    point_labels_type = 'SPARSE'
    num_spixels = 400
    ensemble = 'No'
    alpha = 10
    xysigma = 0.25
    cnnsigma = 0.1

    # create a dictionary listing all hyper parameters and their ranges
    hyper_params = {
        'point_labels_source': point_labels_source,
        'point_labels_count': point_labels_count,
        'point_labels_type': point_labels_type,
        'num_spixels': num_spixels,
        'ensemble': ensemble,
        'alpha': alpha,
        'xysigma': xysigma,
        'cnnsigma': cnnsigma,
    }

In [None]:
logger.info(f"Experiment name         : {experiment_name}")
logger.info(f"Execution tag           : {execution_tag}")
logger.info(f"Local working folder    : {working_folder}")
logger.info(f"Experiment save folder  : {RESULTS_UPLOAD_PATH}")
logger.info(f"")
logger.info(f"Image size              : {image_size}")
logger.info(f"Point labels source     : {point_labels_source}")
logger.info(f"Point labels count      : {point_labels_count}")
logger.info(f"Patchifier              : {patchifier.__name__ if patchifier else 'None'}")
logger.info(f"Propagator              : {propagator.__name__ if propagator else 'None'}")
logger.info(f"")
logger.info(f"Hyper parameters        : {hyper_params}")
logger.info(f"")
logger.info(f"# of folders in dataset : {len(dataset_dict)}")
logger.info(f"# of samples in dataset : {sum([len(x) for x in dataset_dict.values()])}")

### Propagation Pipeline

In [None]:
dataset_full_df = pd.DataFrame({})
patches_dataset_full = []

for folder in dataset_dict.keys():
    logger.info(f"{'-'*80}")
    logger.info(f"Folder: {folder}")

    # get list of quadratids from the dataset
    quadrat_ids = [os.path.splitext(x)[0] for x in dataset_dict[folder]]
    logger.info(f"# of samples in folder: {len(quadrat_ids)}")

    # build a folder specific dataset in the format expected by labelmate package
    dataset_df = create_labelmate_dataset(quadrat_ids, manifest_df, point_labels_source, point_labels_count)

    # append folder specific dataset to full dataset
    dataset_full_df = pd.concat([dataset_full_df, dataset_df], ignore_index=True)

    # setup labelmate dataset
    logger.info(f"Loading samples into local working folder ...")
    label_prop_data = \
        LabelPropDataLoader(
            experiment_name=f"{experiment_name}-{folder}",
            dataset=dataset_df.copy(),
            point_labels=None,
            num_classes=3,
            working_folder=working_folder,
            transforms=get_resize_transform(image_size=image_size),
            )

    # reset working folder contents
    label_prop_data.delete_sub_folders()
    label_prop_data.create_sub_folders()

    # prepare input data in working folders
    label_prop_data.prepare_input_data()
    logger.info(f"# of samples loaded: {len(label_prop_data)}")

    # patchify data if patchifier is provided
    if patchifier:
        logger.info(f"Patchifying samples ...")
        # setup patchifier dataset
        patch_data = \
            patchifier(
                experiment_name=f"{experiment_name}-{folder}",
                samples=label_prop_data,
                num_classes=3,
                working_folder=Path.joinpath(label_prop_data.working_folder, 'patches'),
                patch_height=256,
                patch_width=256,
                step_size=256
            )

        # reset patchifier working folder contents
        patch_data.delete_sub_folders()
        patch_data.create_sub_folders()

        # create patches for samples
        patch_data.patchify_samples()
        logger.info(f"# of patches loaded: {len(patch_data)}")

        # save patch details into a patches dataset
        patch_datatset = \
            [{'dataset_name': folder, 'region': folder, **patch_data[idx]}
            for idx in range(len(patch_data))]

        # append folder specific patches dataset to full patches dataset
        patches_dataset_full.extend(patch_datatset)
    else:
        logger.info(f"Samples will not be patchified")

    # setup label propagator
    logger.info(f"Propagating labels using {propagator.__name__} ...")
    label_propagator = \
        propagator(
            dataloader=patch_data if patchifier else label_prop_data,
            execution_tag=execution_tag,
            hyper_params={'region': folder, **hyper_params},
            )

    # run label propagation
    label_propagator.run_pipeline()

    # plot confusion matrix
    label_propagator.evaluator.plot_confusion_matrix()

    # save experiment data
    label_propagator.save_experiment(
        save_path=Path(RESULTS_UPLOAD_PATH),
        sub_folders=['images', 'masks', 'predictions'],
        )

    # offload propagator from GPU
    if 'SAM' in propagator.__name__:
        del label_propagator.sam_mask_predictor
        del label_propagator.sam_model
        torch.cuda.empty_cache()

    del label_propagator
    torch.cuda.empty_cache()

## Evaluation Metrics

### Evaluation Pipeline

In [None]:
# setup a dummy dataset pointing to the experiment save path
# if patchifier was used, then sample details as well as point labels
# need to be taken from patch dataset related variables
if patchifier:
    label_prop_dummy_dataset = [
        dict(
            sample_id=patches_dataset_full[idx]['sample_id'],
            dataset_name=patches_dataset_full[idx]['dataset_name'],
            region=patches_dataset_full[idx]['region'],
            image_path=os.path.join(RESULTS_UPLOAD_PATH, *(patches_dataset_full[0]['image_path'].parts[-2:])),
            mask_path=os.path.join(RESULTS_UPLOAD_PATH, *(patches_dataset_full[0]['mask_path'].parts[-2:])),
            point_labels_source=point_labels_source,
            point_labels_count=point_labels_count,
        ) for idx in range(len(patches_dataset_full))
    ]
    label_prop_dummy_dataset_df = pd.DataFrame(label_prop_dummy_dataset)
    label_prop_dummy_point_labels_df = \
        pd.concat([patches_dataset_full[idx]['point_labels'] for idx in range(len(patches_dataset_full))], ignore_index=True)
    logger.info(f"# of samples (patches): {label_prop_dummy_dataset_df.shape[0]}")
else:
    label_prop_dummy_dataset_df = dataset_full_df.copy()
    label_prop_dummy_point_labels_df = pd.DataFrame({})
    logger.info(f"# of samples (full images): {label_prop_dummy_dataset_df.shape[0]}")

In [None]:
# setup a dummy dataset pointing to the experiment save path
label_prop_data_dummy = \
    LabelPropDataLoader(
        experiment_name=experiment_name,
        dataset=label_prop_dummy_dataset_df.copy(),
        point_labels=label_prop_dummy_point_labels_df,
        num_classes=3,
        working_folder=Path(RESULTS_UPLOAD_PATH),
        execution_tag=execution_tag,
        transforms=None,
        )

# setup evaluator
label_prop_evaluator = \
    LabelPropEvaluator(
        experiment_name=experiment_name,
        execution_tag=execution_tag,
        num_classes=3,
        hyper_params=hyper_params,
    )

# run evaluator on the dummy dataset
eval_results_samples = label_prop_evaluator.evaluate_samples(label_prop_data_dummy)
eval_results_summary = label_prop_evaluator.generate_summary()

# plot confusion matrix
label_prop_evaluator.plot_confusion_matrix()

# show overall results
eval_results_summary

### Save Evaluation Metrics

In [None]:
label_prop_evaluator.eval_results_summary.to_csv(
    os.path.join(RESULTS_UPLOAD_PATH, f"{experiment_name}-Results-Summary-{execution_tag}.csv"),
    index=False,
    )

In [None]:
label_prop_evaluator.eval_results_samples.to_csv(
    os.path.join(RESULTS_UPLOAD_PATH, f"{experiment_name}-Results-Samples-{execution_tag}.csv"),
    index=False,
    )

## Visualize Results

### Select Samples

In [None]:
sample_ids = [
    '32011066301',
    '12025020201',
    '20046281801',
    '20220914_AnB_CB14 (219)',
    'E19_T2_C11_Corr_22sep22',
    'G0088441',
    ]
# sample_ids = ['17001652802', '12025020201', '24047242502']

if patchifier:
    sample_ids = label_prop_dummy_point_labels_df.query(f"sample_id.isin({sample_ids})").quadratid.unique().tolist()

### Plot Results

In [None]:
for sample_id in sample_ids:
    index = label_prop_dummy_dataset_df.query(f"sample_id == '{sample_id}'").index.item()
    manifest_index = label_prop_dummy_dataset_df.query(f"sample_id == '{sample_id}'").manifest_index.item()
    region = label_prop_dummy_dataset_df.query(f"sample_id == '{sample_id}'").dataset_name.item()
    point_labels_source = label_prop_dummy_dataset_df.query(f"sample_id == '{sample_id}'").point_labels_source.item()
    point_labels_count = label_prop_dummy_dataset_df.query(f"sample_id == '{sample_id}'").point_labels_count.item()

    if label_prop_dummy_point_labels_df.shape[0] == 0:
        if point_labels_source == 'GRID':
            point_labels = \
                label_prop_data_dummy.get_evenly_spaced_point_labels(
                    mask_path=str(Path(label_prop_data_dummy[index]['mask_path']).resolve()),
                    num_point_labels=point_labels_count,
                    )
        else:
            point_labels = \
                label_prop_data_dummy.get_random_point_labels(
                    mask_path=str(Path(label_prop_data_dummy[index]['mask_path']).resolve()),
                    num_point_labels=point_labels_count,
                    random_seed=manifest_index,
                    )
        point_labels_df = pd.DataFrame(point_labels)
        point_labels_df['quadratid'] = sample_id
    else:
        point_labels_df = label_prop_dummy_point_labels_df.copy()

    evaluation_metrics = \
        label_prop_evaluator.eval_results_samples.query(f"sample_id == '{sample_id}'")

    print(f"Sample ID: {sample_id}, Region: {region}")
    print(f"IoU - Hard Coral = {evaluation_metrics.iou_class_1.item():.2%}, IoU - Soft Coral = {evaluation_metrics.iou_class_2.item():.2%}")
    print(f"PA - Hard Coral = {evaluation_metrics.pa_class_1.item():.2%}, PA - Soft Coral = {evaluation_metrics.pa_class_2.item():.2%}")

    visualize_output(
        experiment_name=label_prop_data_dummy[index]['experiment_name'],
        sample_id=label_prop_data_dummy[index]['sample_id'],
        image_path=Path(label_prop_data_dummy[index]['image_path']),
        mask_path=Path(label_prop_data_dummy[index]['mask_path']),
        prediction_path=Path(label_prop_data_dummy[index]['prediction_path']),
        point_labels=point_labels_df,
        )

    print()
    print()

## YOLOv8 Comparison

### Select Samples

In [None]:
sample_ids = [
    '32011066301',
    '12025020201',
    '20046281801',
    '20220914_AnB_CB14 (219)',
    'E19_T2_C11_Corr_22sep22',
    'G0088441',
    ]
# sample_ids = ['17001652802', '12025020201', '24047242502']
sample_ids = ['20010148701', '17039238402', '10001026902']

if patchifier:
    sample_ids = label_prop_dummy_point_labels_df.query(f"sample_id.isin({sample_ids})").quadratid.unique().tolist()

In [None]:
# setup a dummy dataset pointing to the experiment save path
label_prop_data_dummy = \
    LabelPropDataLoader(
        experiment_name=experiment_name,
        dataset=label_prop_dummy_dataset_df.copy(),
        point_labels=label_prop_dummy_point_labels_df,
        num_classes=3,
        working_folder=Path(RESULTS_UPLOAD_PATH),
        execution_tag=execution_tag,
        transforms=None,
        )

### Plot Results

In [None]:
for sample_id in sample_ids:
    index = label_prop_dummy_dataset_df.query(f"sample_id == '{sample_id}'").index.item()
    manifest_index = label_prop_dummy_dataset_df.query(f"sample_id == '{sample_id}'").manifest_index.item()
    region = label_prop_dummy_dataset_df.query(f"sample_id == '{sample_id}'").dataset_name.item()
    point_labels_source = label_prop_dummy_dataset_df.query(f"sample_id == '{sample_id}'").point_labels_source.item()
    point_labels_count = label_prop_dummy_dataset_df.query(f"sample_id == '{sample_id}'").point_labels_count.item()

    if label_prop_dummy_point_labels_df.shape[0] == 0:
        if point_labels_source == 'GRID':
            point_labels = \
                label_prop_data_dummy.get_evenly_spaced_point_labels(
                    mask_path=str(Path(label_prop_data_dummy[index]['mask_path']).resolve()),
                    num_point_labels=point_labels_count,
                    )
        else:
            point_labels = \
                label_prop_data_dummy.get_random_point_labels(
                    mask_path=str(Path(label_prop_data_dummy[index]['mask_path']).resolve()),
                    num_point_labels=point_labels_count,
                    random_seed=manifest_index,
                    )
        point_labels_df = pd.DataFrame(point_labels)
        point_labels_df['quadratid'] = sample_id
    else:
        point_labels_df = label_prop_dummy_point_labels_df.copy()

    evaluation_metrics = \
        label_prop_evaluator.eval_results_samples.query(f"sample_id == '{sample_id}'")

    print(f"Sample ID: {sample_id}, Region: {region}")
    print(f"IoU - Hard Coral = {evaluation_metrics.iou_class_1.item():.2%}, IoU - Soft Coral = {evaluation_metrics.iou_class_2.item():.2%}")
    print(f"PA - Hard Coral = {evaluation_metrics.pa_class_1.item():.2%}, PA - Soft Coral = {evaluation_metrics.pa_class_2.item():.2%}")

    visualize_output(
        experiment_name=label_prop_data_dummy[index]['experiment_name'],
        sample_id=label_prop_data_dummy[index]['sample_id'],
        image_path=Path(label_prop_data_dummy[index]['image_path']),
        mask_path=Path(label_prop_data_dummy[index]['mask_path']),
        prediction_path=Path(label_prop_data_dummy[index]['prediction_path']),
        point_labels=point_labels_df,
        )

    print()
    print()

In [None]:
quadrat_ids = ['20010148701', '17039238402', '10001026902']

In [None]:
dataset_df = create_labelmate_dataset(quadrat_ids, manifest_df, point_labels_source, point_labels_count)

In [None]:
label_prop_data = \
    LabelPropDataLoader(
        experiment_name=f"{experiment_name}-{folder}",
        dataset=dataset_df.copy(),
        point_labels=point_labels_df,
        num_classes=3,
        execution_tag='C0',
        working_folder=Path(RESULTS_UPLOAD_PATH),
        transforms=get_resize_transform(image_size=image_size),
        )

In [None]:
index = 1
visualize_output(
    experiment_name=label_prop_data[index]['experiment_name'],
    sample_id=label_prop_data[index]['sample_id'],
    image_path=Path(label_prop_data[index]['image_path']),
    mask_path=Path(label_prop_data[index]['mask_path']),
    prediction_path=Path(label_prop_data[index]['prediction_path']),
    point_labels=point_labels_df,
    )

In [None]:
os.path.exists(Path(label_prop_data[0]['prediction_path']))

In [None]:
Path(label_prop_data[2]['prediction_path'])

In [None]:
os.path.exists('/content/drive/MyDrive/20231114-FruitPunch-AI-for-Coral-Reefs-2/findings/SAM-B1-V2/predictions/10001026902-C0.png')