# Analyzer Evaluation Demonstration Notebook

This notebook demonstrates how to use the functions in `source_analyzer_evaluation.py` to evaluate three analyzers:

- **AISAnalyzer**: Run on a groundtruth dataset called `vessel_groundtruth`.
- **InfrastructureAnalyzer**: Run on a dataset called `infrastructure_groundtruth`.
- **DarkAnalyzer**: Run on a dataset called `dark_vessel_groundtruth`.

For AISAnalyzer and InfrastructureAnalyzer, we label results using the source name (`st_name`). For DarkAnalyzer, we label results based on the spatial distance (using a 0.005° threshold) to the ground truth dark vessel location.


In [None]:
import os
import sys
from dotenv import load_dotenv

import pandas as pd
import geopandas as gpd
from shapely.geometry import Point
from IPython.display import clear_output

load_dotenv()

# Set the download path for demonstration and ensure the folder exists.
download_path = os.getenv("ASA_DOWNLOAD_PATH")
os.makedirs(download_path, exist_ok=True)

git_path = os.getenv("GIT_FOLDER")
cv3_path = os.getenv("CV3_FOLDER") 
sys.path.append(git_path)
sys.path.append(cv3_path)

In [None]:
from cerulean_cloud.cloud_function_asa.utils.analyzer import (
        AISAnalyzer,
        InfrastructureAnalyzer,
        DarkAnalyzer
)

In [None]:
# Import functions from your script.
from source_analyzer_evaluation import (
    label_dark_vessel_results_with_distance,
    label_results_with_st_name,
    apply_labeling,
    calculate_metrics,
    plot_metrics,
    process_groundtruth_on_analyzer
)

## Load Groundtruth Datasets

Load the dark vessel dataset, dark vessel SAR detections, hitl verification for vessels infrastructure, and gfw infrastructure data. Be sure to change `eval_folder` to point to where these datasets are stored.

In [None]:
eval_folder = os.path.join(cv3_path, "asa_analysis/evaluation")
refined_dark_vessel_dataset = os.path.join(eval_folder, 'refined_dark_vessel_dataset.csv')
sar_detections_hitl_dark_ds = os.path.join(eval_folder, 'sar_detections_hitl_dark_ds.csv')
slick_to_source_dump_2024_12_31 = os.path.join(eval_folder, 'slick_to_source dump 2024-12-31.csv')
nonoise_SAR_Fixed_Infrastructure = os.path.join(eval_folder, 'nonoise_SAR_Fixed_Infrastructure.csv')

The loaded data may require some post-processing to prepare it for evaluation

In [None]:
dark_vess_df = pd.read_csv(refined_dark_vessel_dataset)
sar_detections = pd.read_csv(sar_detections_hitl_dark_ds)

dark_vessel_groundtruth = gpd.GeoDataFrame(
    dark_vess_df, 
    geometry=gpd.points_from_xy(dark_vess_df['lon'], dark_vess_df['lat']),
    crs="EPSG:4326"
)

sar_detections_gdf = gpd.GeoDataFrame(
    sar_detections, 
    geometry=gpd.points_from_xy(sar_detections['detect_lon'], sar_detections['detect_lat']),
    crs="EPSG:4326"
)
sar_detections_gdf = sar_detections_gdf[sar_detections_gdf['structure_id'].isna()]
sar_detections_gdf = sar_detections_gdf.reset_index()

hitl_df = pd.read_csv(slick_to_source_dump_2024_12_31)

infrastructure_groundtruth = hitl_df[(hitl_df['type'] == 2) & (hitl_df['hitl_verification'] == True)]
vessel_groundtruth = hitl_df[(hitl_df['type'] == 1) & (hitl_df['hitl_verification'] == True)]

if "slick" in vessel_groundtruth.columns:
    vessel_groundtruth = vessel_groundtruth.rename(columns={"slick": "slick_id"})
if "slick" in infrastructure_groundtruth.columns:
    infrastructure_groundtruth = infrastructure_groundtruth.rename(columns={"slick": "slick_id"})

df = pd.read_csv(nonoise_SAR_Fixed_Infrastructure)
gfw_gdf = gpd.GeoDataFrame(
    df,
    geometry=[Point(xy) for xy in zip(df['lon'], df['lat'])],
    crs="EPSG:4326" 
)



Limit number of slicks processed for testing

In [None]:
vessel_groundtruth = vessel_groundtruth.iloc[[1,2,3,4,5,6,7,8,9,10]]
# infrastructure_groundtruth = infrastructure_groundtruth.iloc[[0]]
# dark_vessel_groundtruth = dark_vessel_groundtruth.iloc[[0]]

## Process Groundtruth with Each Analyzer

We use the `process_groundtruth_on_analyzer` function to run each analyzer over its respective groundtruth.
This function loops over each groundtruth row, downloads the associated slick and scene GeoJSON, and then computes
coincidence scores using the given analyzer class.


In [None]:
# Process InfrastructureAnalyzer on infrastructure_groundtruth.
results_infra = process_groundtruth_on_analyzer(
    InfrastructureAnalyzer, infrastructure_groundtruth, points_gdf=gfw_gdf, analyzer_params={}
)

In [None]:
# Process DarkAnalyzer on dark_vessel_groundtruth.
results_dark = process_groundtruth_on_analyzer(
    DarkAnalyzer, dark_vessel_groundtruth, points_gdf=sar_detections_gdf, analyzer_params={}
)

In [None]:
results_vessel = process_groundtruth_on_analyzer(
    AISAnalyzer, vessel_groundtruth, analyzer_params={}
)

## Label the Results

For the AIS and Infrastructure analyzers, we label detections using the `st_name` by applying `label_results_with_st_name`. Both the results and the groundtruth dataframes must have `st_name` as the source identifier with type `int`.

For the DarkAnalyzer, we label based on distance using `label_dark_vessel_results_with_distance`.


In [None]:
#postprocess outputs if necessary to align with groundtruth
results_infra['st_name']=results_infra['structure_id']
results_vessel['st_name'] = results_vessel['st_name'].astype(int)

In [None]:
results_vessel_labeled = apply_labeling(results_vessel, vessel_groundtruth, label_results_with_st_name)
results_infra_labeled = apply_labeling(results_infra, infrastructure_groundtruth, label_results_with_st_name)
# Label dark vessel results based on distance.
results_dark_labeled = apply_labeling(results_dark, dark_vessel_groundtruth, label_dark_vessel_results_with_distance)

## Calculate and Display Metrics

We now compute evaluation metrics (e.g., top-1 and top-3 source rates, average coincidence scores, and the ratio of true to false coincidence scores)
for each method.

In [None]:
all_results = {
    "AISAnalyzer": results_vessel_labeled,
    "InfrastructureAnalyzer": results_infra_labeled,
    "DarkAnalyzer": results_dark_labeled
}

metrics_df = calculate_metrics(all_results)
print("Evaluation Metrics:")
plot_metrics(metrics_df)