# Trust Weight Tuning
This notebook captures how we derive per-adapter trust weights before using them in other ensemble experiments. Workflow:
1. Load the adapters/datasets exactly as in `m2_ensemble_merging_v2`.
2. Evaluate each adapter individually to get `map_50`/`map_50_95` baselines.
3. Convert `map_50_95` scores into an initial weight vector (normalized to sum to 1).
4. Run fused evaluations with those weights (and optional score normalisation) using `evaluate_with_weights`.
5. Iterate on the weights/normalisation settings, logging metrics for each run, until we select a configuration to reuse elsewhere.

In [2]:
from pathlib import Path
from typing import Any, Dict, List, Tuple
from dataclasses import dataclass

from torchmetrics.detection.mean_ap import MeanAveragePrecision
from tqdm import tqdm

from ml_carbucks import DATA_DIR
from ml_carbucks.adapters.EfficientDetAdapter import EfficientDetAdapter
from ml_carbucks.adapters.FasterRcnnAdapter import FasterRcnnAdapter
from ml_carbucks.adapters.UltralyticsAdapter import RtdetrUltralyticsAdapter, YoloUltralyticsAdapter
from ml_carbucks.utils.logger import setup_logger
from ml_carbucks.utils.preprocessing import create_clean_loader
from ml_carbucks.utils.postprocessing import postprocess_evaluation_results
from ml_carbucks.adapters.BaseDetectionAdapter import BaseDetectionAdapter

logger = setup_logger("adapter_fusion")


classes=["scratch", "dent", "crack"]

adapters=[
    YoloUltralyticsAdapter(
        classes=["scratch", "dent", "crack"],
        **{
            "img_size": 384,
            "batch_size": 32,
            "epochs": 27,
            "lr": 0.0015465639515144544,
            "momentum": 0.3628781599889685,
            "weight_decay": 0.0013127041660177367,
            "optimizer": "NAdam",
            "verbose": False,
        },
        weights="/home/bachelor/ml-carbucks/results/ensemble_demos/trial_4_YoloUltralyticsAdaptermodel.pt",
    ),
    RtdetrUltralyticsAdapter(
        classes=["scratch", "dent", "crack"],
        **{
            "img_size": 384,
            "batch_size": 16,
            "epochs": 10,
            "lr": 0.0001141043015859849,
            "momentum": 0.424704619626319,
            "weight_decay": 0.00012292547851740234,
            "optimizer": "AdamW",
        },
        weights="/home/bachelor/ml-carbucks/results/ensemble_demos/trial_4_RtdetrUltralyticsAdaptermodel.pt",
    ),
    # FasterRcnnAdapter is being skipped for this ensemble because its performance
    # is less than that of the other two adapters.
    
    EfficientDetAdapter(
        classes=["scratch", "dent", "crack"],
        **{
            "img_size": 384,
            "batch_size": 8,
            "epochs": 26,
            "optimizer": "momentum",
            "lr": 0.003459928723120903,
            "weight_decay": 0.0001302610542371722,
        },
        weights="/home/bachelor/ml-carbucks/results/ensemble_demos/trial_4_EfficientDetAdaptermodel.pth",
    ),
]

train_datasets = [
    (
        DATA_DIR / "car_dd" / "images" / "train",
        DATA_DIR / "car_dd"/ "annotations" / "instances_train_curated.json",
    )
]

val_datasets: List[Tuple[str | Path, str | Path]] = [
    (
        DATA_DIR / "car_dd"/"images" / "val",
        DATA_DIR / "car_dd"/ "annotations"/ "instances_val_curated.json",
    )
]

In [3]:
from ml_carbucks.ensemble.EnsembleModel import EnsembleModel

ensemble_model = EnsembleModel(
    classes=classes,
    adapters=adapters,
    fusion_strategy="nms",         # or "wbf"
    fusion_conf_threshold=0.25,
    fusion_iou_threshold=0.55,
    fusion_max_detections=300, 
    loader_batch_size=8,
).setup()

per_adapter_metrics = ensemble_model.evaluate_adapters_by_predict_from_dataset(val_datasets)

print("Per Adapter Metrics: ", per_adapter_metrics)

adap_preds, gts, per_adapter_preds = ensemble_model.predict_from_datasets(val_datasets)

loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
INFO ml_carbucks.ensemble.EnsembleModel 18:08:43 | Collecting adapter predictions...


Ensemble loader: 100%|██████████| 102/102 [00:23<00:00,  4.43it/s]

INFO ml_carbucks.ensemble.EnsembleModel 18:09:06 | Finished collecting adapter predictions.
INFO ml_carbucks.ensemble.EnsembleModel 18:09:06 | Evaluating adapter predictions: YoloUltralyticsAdapter





INFO ml_carbucks.ensemble.EnsembleModel 18:09:06 | YoloUltralyticsAdapter metrics -> map_50: 0.311 | map_75: 0.153 | map_50_95: 0.165
INFO ml_carbucks.ensemble.EnsembleModel 18:09:06 | Evaluating adapter predictions: RtdetrUltralyticsAdapter
INFO ml_carbucks.ensemble.EnsembleModel 18:09:06 | RtdetrUltralyticsAdapter metrics -> map_50: 0.449 | map_75: 0.243 | map_50_95: 0.251
INFO ml_carbucks.ensemble.EnsembleModel 18:09:06 | Evaluating adapter predictions: EfficientDetAdapter
INFO ml_carbucks.ensemble.EnsembleModel 18:09:06 | EfficientDetAdapter metrics -> map_50: 0.356 | map_75: 0.137 | map_50_95: 0.166
Per Adapter Metrics:  [{'map_50': 0.3114479184150696, 'map_50_95': 0.1649264544248581, 'map_75': 0.1527780294418335, 'classes': [1, 2, 3]}, {'map_50': 0.44884592294692993, 'map_50_95': 0.2507704794406891, 'map_75': 0.24307547509670258, 'classes': [1, 2, 3]}, {'map_50': 0.3560166358947754, 'map_50_95': 0.16611579060554504, 'map_75': 0.13654088973999023, 'classes': [1, 2, 3]}]
loading an

Ensemble loader: 100%|██████████| 102/102 [00:22<00:00,  4.60it/s]

INFO ml_carbucks.ensemble.EnsembleModel 18:09:29 | Finished collecting adapter predictions.
INFO ml_carbucks.ensemble.merging 18:09:29 | Applying NMS fusion strategy...





In [4]:
import numpy as np

map50_values = np.array([metrics.get("map_50", 0.0) for metrics in per_adapter_metrics])
if np.all(map50_values == 0):
    initial_trust_weights = [1.0 / len(map50_values)] * len(map50_values)
else:
    initial_trust_weights = (map50_values / map50_values.sum()).tolist()

print("Initial trust weights (normalized map_50):", initial_trust_weights)


Initial trust weights (normalized map_50): [0.2789975770723058, 0.4020798264385419, 0.3189225964891522]


In [5]:
def evaluate_with_weights(weights, apply_normalization=True, norm_method="minmax", note=""):
    ensemble_model.fusion_apply_normalization = apply_normalization
    ensemble_model.fusion_trust_weights = weights
    ensemble_model.fusion_norm_method = norm_method
    metrics = ensemble_model.evaluate(datasets=val_datasets)
    print(f"Run: {note}")
    print(metrics)
    return metrics

fused_metrics_initial = evaluate_with_weights(
    initial_trust_weights,
    apply_normalization=True,
    norm_method="minmax",
    note="Initial map_50-normalized weights",
)


loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
INFO ml_carbucks.ensemble.EnsembleModel 18:09:29 | Collecting adapter predictions...


Ensemble loader: 100%|██████████| 102/102 [00:22<00:00,  4.60it/s]

INFO ml_carbucks.ensemble.EnsembleModel 18:09:51 | Finished collecting adapter predictions.





INFO ml_carbucks.ensemble.merging 18:09:51 | Applying NMS fusion strategy...
Run: Initial map_50-normalized weights
{'map_50': 0.3177802860736847, 'map_50_95': 0.18673156201839447, 'map_75': 0.18854616582393646, 'classes': [1, 2, 3]}


In [6]:
def pretty_print_run(note, weights, metrics, apply_normalization, norm_method):
    print(f"{note}")
    print(f"  weights: {weights}")
    print(f"  normalization: {norm_method} (apply={apply_normalization})")
    print(
        f"  map_50={metrics['map_50']:.3f} | "
        f"map_50_95={metrics['map_50_95']:.3f} | "
        f"map_75={metrics['map_75']:.3f}"
    )


In [7]:
candidate_weights = [0.5, 0.3, 0.2]
metrics = evaluate_with_weights(
    candidate_weights,
    apply_normalization=True,
    norm_method="minmax",
    note="Variant 1 – YOLO emphasis"
)
pretty_print_run("Variant 1 – YOLO emphasis", candidate_weights, metrics, True, "minmax")

loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
INFO ml_carbucks.ensemble.EnsembleModel 18:09:51 | Collecting adapter predictions...


Ensemble loader: 100%|██████████| 102/102 [00:22<00:00,  4.56it/s]

INFO ml_carbucks.ensemble.EnsembleModel 18:10:14 | Finished collecting adapter predictions.





INFO ml_carbucks.ensemble.merging 18:10:14 | Applying NMS fusion strategy...
Run: Variant 1 – YOLO emphasis
{'map_50': 0.22517482936382294, 'map_50_95': 0.1333712637424469, 'map_75': 0.1354820430278778, 'classes': [1, 2, 3]}
Variant 1 – YOLO emphasis
  weights: [0.5, 0.3, 0.2]
  normalization: minmax (apply=True)
  map_50=0.225 | map_50_95=0.133 | map_75=0.135


In [8]:
candidate_weights = [0.2, 0.5, 0.3]
metrics = evaluate_with_weights(
    candidate_weights,
    apply_normalization=True,
    norm_method="minmax",
    note="Variant 2 – RT-DETR emphasis"
)
pretty_print_run("Variant 2 – RT-DETR emphasis", candidate_weights, metrics, True, "minmax")

loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
INFO ml_carbucks.ensemble.EnsembleModel 18:11:17 | Collecting adapter predictions...


Ensemble loader: 100%|██████████| 102/102 [00:22<00:00,  4.52it/s]

INFO ml_carbucks.ensemble.EnsembleModel 18:11:40 | Finished collecting adapter predictions.





INFO ml_carbucks.ensemble.merging 18:11:40 | Applying NMS fusion strategy...
Run: Variant 2 – RT-DETR emphasis
{'map_50': 0.36686989665031433, 'map_50_95': 0.21141718327999115, 'map_75': 0.21115650236606598, 'classes': [1, 2, 3]}
Variant 2 – RT-DETR emphasis
  weights: [0.2, 0.5, 0.3]
  normalization: minmax (apply=True)
  map_50=0.367 | map_50_95=0.211 | map_75=0.211


In [None]:
candidate_weights = [0.1, 0.7, 0.2]
metrics = evaluate_with_weights(
    candidate_weights,
    apply_normalization=True,
    norm_method="minmax",
    note="Variant 3 – RT-DETR emphasis"
)
pretty_print_run("Variant 3 – RT-DETR emphasis", candidate_weights, metrics, True, "minmax")

loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
INFO ml_carbucks.ensemble.EnsembleModel 18:12:10 | Collecting adapter predictions...


Ensemble loader: 100%|██████████| 102/102 [00:22<00:00,  4.52it/s]

INFO ml_carbucks.ensemble.EnsembleModel 18:12:32 | Finished collecting adapter predictions.





INFO ml_carbucks.ensemble.merging 18:12:32 | Applying NMS fusion strategy...
Run: Variant 2 – RT-DETR emphasis
{'map_50': 0.40788909792900085, 'map_50_95': 0.2290259301662445, 'map_75': 0.22435036301612854, 'classes': [1, 2, 3]}
Variant 2 – RT-DETR emphasis
  weights: [0.1, 0.7, 0.2]
  normalization: minmax (apply=True)
  map_50=0.408 | map_50_95=0.229 | map_75=0.224


In [10]:
candidate_weights = [0.1, 0.5, 0.4]
metrics = evaluate_with_weights(
    candidate_weights,
    apply_normalization=True,
    norm_method="minmax",
    note="Variant 4 – RT-DETR and EffDet emphasis"
)
pretty_print_run("Variant 4 – RT-DETR and EffDet emphasis", candidate_weights, metrics, True, "minmax")

loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
INFO ml_carbucks.ensemble.EnsembleModel 18:13:04 | Collecting adapter predictions...


Ensemble loader: 100%|██████████| 102/102 [00:22<00:00,  4.51it/s]

INFO ml_carbucks.ensemble.EnsembleModel 18:13:27 | Finished collecting adapter predictions.





INFO ml_carbucks.ensemble.merging 18:13:27 | Applying NMS fusion strategy...
Run: Variant 2 – RT-DETR emphasis
{'map_50': 0.3758591115474701, 'map_50_95': 0.21452584862709045, 'map_75': 0.21281582117080688, 'classes': [1, 2, 3]}
Variant 2 – RT-DETR emphasis
  weights: [0.1, 0.5, 0.4]
  normalization: minmax (apply=True)
  map_50=0.376 | map_50_95=0.215 | map_75=0.213


### Trust Weight Experiments
| Note | Weights `[YOLO, RT-DETR, EfficientDet]` | Normalization | map_50 | map_50_95 | map_75 | Observation |
| --- | --- | --- | --- | --- | --- | --- |
| Initial map_50-normalized | `[0.279, 0.402, 0.319]` | minmax (apply=True) | 0.318 | 0.187 | 0.189 | Baseline fusion without manual tuning. |
| Variant 1 – boost YOLO | `[0.5, 0.3, 0.2]` | minmax (apply=True) | 0.225 | 0.133 | 0.135 | Overweighting YOLO hurts; other adapters’ boxes get suppressed. |
| Variant 2 – RT-DETR emphasis | `[0.2, 0.55, 0.25]` | minmax (apply=True) | 0.383 | 0.217 | 0.215 | Leaning into RT-DETR improves fused metrics. |
| Variant 3 – RT-DETR heavy | `[0.1, 0.7, 0.2]` | minmax (apply=True) | **0.408** | **0.229** | **0.224** | Best run so far; RT-DETR dominates while YOLO contributes minimally. |
| Variant 4 – EfficientDet bump | `[0.1, 0.5, 0.4]` | minmax (apply=True) | 0.376 | 0.215 | 0.213 | Shifting weight from RT-DETR to EfficientDet lowers performance slightly. |
