# Pseudo-Labeling Flow

This notebook implements an automated pseudo-labeling pipeline designed to streamline the annotation process for object detection and instance segmentation tasks. The tool iteratively improves model performance by using an initial model trained on a small set of manually annotated data to generate labels on new images, which can then be refined and used to retrain progressively better models.

**A "flow" represents a complete pseudo labeling run with specific configuration settings (model type, initial dataset size, correction strategy), while "iterations" are the individual training cycles within each flow where new data is added and the model is retrained.**

The features within this notebook include:
- **Automated Pipeline**: Complete workflow from data preparation to model training
- **Database Logging**: Database tracking for all iterations
- **CVAT Integration**: For viewing and adjusting annotations
- **Flexible Configuration**: Supports different model architectures and training settings
- **Status Monitoring**: Real-time pipeline status and progress tracking



Automatic Export to CVAT only tested and functional for Instance Segmentation + Object Detection

## Imports

In [1]:
import onedl.client

from pseudo_labeling import PseudoLabelingPipeline

## Global Initalizers
Configure the pipeline with your project-specific settings:


In [2]:
pipeline = PseudoLabelingPipeline(
    project_name="daniel-osman---streamlining-annotation-bootstrapping/pipeline-test",
    main_dataset_name="full-dataset:0", #input only
    initial_annotated_dataset_name="initial-annotations:0",
    validation_dataset="val:0",
    sample_size_per_iter=150,
    current_flow = 0,
    min_confidence=0.5,
    local_path='',
    cvat_project_id=88,#Ignore for now
    db_path="pseudo_labeling_metadata_ptest.db"
)

print("Pipeline initialized")


[32m2025-07-11 16:02:17.486[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:[36mpull[0m:[36m848[0m - [1mPulling dataset full-dataset:0 from remote='daniel-osman---streamlining-annotation-bootstrapping/pipeline-test' with pull_policy=missing.[0m
[32m2025-07-11 16:02:17.492[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:[36mpull[0m:[36m868[0m - [1mDataset full-dataset:0 already exists in local store. Skipping[0m
[32m2025-07-11 16:02:19.863[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:[36mpull[0m:[36m848[0m - [1mPulling dataset initial-annotations:0 from remote='daniel-osman---streamlining-annotation-bootstrapping/pipeline-test' with pull_policy=missing.[0m
[32m2025-07-11 16:02:19.865[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:[36mpull[0m:[36m868[0m - [1mDataset initial-annotations:0 already exists in local store. Skipping[0m


GLOBAL INITIALIZATIONS INITIALIZED
Project: daniel-osman---streamlining-annotation-bootstrapping/pipeline-test
Main dataset: full-dataset:0
Initial annotated dataset: initial-annotations:0
Sample size per iteration: 150
Selected flow: f0
Initial annotated dataset contains: 50 samples
Flow f0 already exists in database - ready to resume
Last completed iteration: 1


[32m2025-07-11 16:02:20.282[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:[36mresolve_latest_version[0m:[36m592[0m - [1mResolved latest version of dataset train-f0 to 2 with remote='daniel-osman---streamlining-annotation-bootstrapping/pipeline-test'.[0m
[32m2025-07-11 16:02:20.283[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:[36mload[0m:[36m385[0m - [1mResolved latest version of dataset train-f0 to 2.[0m
[32m2025-07-11 16:02:20.283[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:[36mpull[0m:[36m848[0m - [1mPulling dataset train-f0:2 from remote='daniel-osman---streamlining-annotation-bootstrapping/pipeline-test' with pull_policy=missing.[0m
[32m2025-07-11 16:02:20.288[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:[36mpull[0m:[36m868[0m - [1mDataset train-f0:2 already exists in local store. Skipping[0m


Training dataset train-f0 exists with 150 samples
Ready for iteration 2

Attempting auto-recovery...
Recovering state for f0 iteration 2...
No database record found - this appears to be a new iteration
Pipeline initialized


## Training Config
Set up your model training parameters interactively:


In [3]:
# Option 1: Interactive widget setup (uncomment to use)
# train_cfg = pipeline.setup_training_config()


# Option 2: Direct dictionary configuration (recommended for specific config)
pipeline.train_cfg = {
    'model_type': 'FasterRCNNConfig',
    'task_type': 'object_detection',
    'backbone': 'RESNET_50',
    'epochs': 6,
    'batch_size': 6,
}

print("Training configuration set:")
print(pipeline.train_cfg)


Training configuration set:
{'model_type': 'FasterRCNNConfig', 'task_type': 'object_detection', 'backbone': 'RESNET_50', 'epochs': 6, 'batch_size': 6}


# (1) Initial Flow, Training, and Evaluation Setup

This step initiates the current flow and establishes the baseline model using your initial annotated dataset.
1. Loads the initial annotated dataset.
2. Created training set for the current flow.
3. Trains the first baseline model (iteration 0) for the specified flow.
4. Evaluates Model
5. Logs Metadata (Only variables generated throughout the process of the pipeline are 'predicted_dataset_name', 'model_uid', 'evaluation_uid', 'evaluation_info')

⚠️ ATTENTION: Skip this section if your current flow already exists and if you already have a baseline model

In [4]:
pipeline.get_pipeline_status()


PIPELINE STATUS REPORT
Flow ID: f0
Current Iteration: 2
Training Dataset: train-f0
Current Model UID: None
Training Configuration: {'model_type': 'FasterRCNNConfig', 'task_type': 'object_detection', 'backbone': 'RESNET_50', 'epochs': 6, 'batch_size': 6}
Database Path: pseudo_labeling_metadata_ptest.db
Sample Size Per Iteration: 150
Minimum Confidence Threshold: 0.5

RECENT ITERATIONS:
  Iteration 1: COMPLETED (completed: 2025-07-11 15:14:48)
  Iteration 0: COMPLETED (completed: 2025-07-11 13:21:04)


### 1.1 Train Initial Model and Evaluate on Validation Set
Train the baseline model on your initial annotated dataset.
Evaluate the initial model performance on the validation dataset.




If you want to use an existing model, then run: <span style="color:#d73a49; font-family:monospace;">pipeline.log_iteration_0_external_model("smoky-shepherd-0")</span>, then skip to section (2):


In [10]:
pipeline.train_model()
pipeline.evaluate_model()

Starting model training...
Training FasterRCNNConfig on dataset: initial-annotations:0
Configuration: 30 epochs, batch size 6
Backbone: RESNET_50


[32m2025-07-11 13:04:00.571[0m | [1mINFO    [0m | [36monedl.client.operations.clients._common[0m:[36mcreate_event_stream[0m:[36m79[0m - [1mSubscribing to job events...[0m
[32m2025-07-11 13:04:00.572[0m | [1mINFO    [0m | [36monedl.client.operations.clients._common[0m:[36mcreate_event_stream[0m:[36m80[0m - [1mJob rounded-type-0 in WAITING state[0m
[32m2025-07-11 13:04:01.594[0m | [1mINFO    [0m | [36monedl.client.operations.clients._common[0m:[36mcreate_event_stream[0m:[36m84[0m - [1mJob rounded-type-0 in RUNNING state[0m
[32m2025-07-11 13:04:01.596[0m | [1mINFO    [0m | [36monedl.client.operations.clients._common[0m:[36mcreate_event_stream[0m:[36m84[0m - [1mJob rounded-type-0 in RUNNING state[0m
[32m2025-07-11 13:15:12.655[0m | [1mINFO    [0m | [36monedl.client.operations.clients._common[0m:[36mcreate_event_stream[0m:[36m84[0m - [1mJob rounded-type-0 in DONE state[0m


Training job submitted
Model UID: rounded-type-0
Training job state: DONE
Evaluating rounded-type-0 on val:0


[32m2025-07-11 13:15:16.340[0m | [1mINFO    [0m | [36monedl.client.operations.clients._common[0m:[36mcreate_event_stream[0m:[36m79[0m - [1mSubscribing to job events...[0m
[32m2025-07-11 13:15:16.342[0m | [1mINFO    [0m | [36monedl.client.operations.clients._common[0m:[36mcreate_event_stream[0m:[36m80[0m - [1mJob rigid-spread-0 in WAITING state[0m
[32m2025-07-11 13:15:17.158[0m | [1mINFO    [0m | [36monedl.client.operations.clients._common[0m:[36mcreate_event_stream[0m:[36m84[0m - [1mJob rigid-spread-0 in RUNNING state[0m
[32m2025-07-11 13:20:56.437[0m | [1mINFO    [0m | [36monedl.client.operations.clients._common[0m:[36mcreate_event_stream[0m:[36m84[0m - [1mJob rigid-spread-0 in DONE state[0m


Evaluation job submitted
Evaluation UID: rigid-spread-0
Evaluation job state: DONE


                                                                       

Evaluation complete
Report URL: https://21e007818fa1dd0840eac0d6d59ba986.eu.r2.cloudflarestorage.com/onedl-data/daniel-osman---streamlining-annotation-bootstrapping/pipeline-test/-/92699172c2ef5c4f9aeb58a06eab97f0.html?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=bb17714b86b2e84a836c55404335cef8%2F20250711%2Fauto%2Fs3%2Faws4_request&X-Amz-Date=20250711T112058Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=439b20968c66e259d9e9f47a04e83258c9a66d2d0c7f1993871493f7af17c5ae
Metrics: {"mAP50": 0.05244764684098055, "mAP75": 0.0014695996497915195, "mAP_all": 0.019946033489825075, "fn_count": 72, "fp_count": 359, "tp_count": 80}




### 1.2 Log
Save all metadata for the initial training iteration to the database.


In [11]:
pipeline.log_iteration_0() # This is required now but will be removed
print("Initial model training and evaluation complete, Current status:")

Iteration 0 logged for f0
Initial model training and evaluation complete, Current status:


# (2) Pseudo-Labeling Iteration Workflow

This step executes a complete pseudo-labeling iteration cycle using the model from the previous iteration to generate labels on new data.
1. Sets up the next iteration with correction strategy (manual or automated).
2. Samples new unlabeled data from the full dataset.
3. Runs inference using the previous iteration's model to generate pseudo-labels.
4. Handles corrections based on strategy: exports to CVAT for manual corrections OR merges pseudo-labels directly.
5. Trains updated model on expanded dataset (original + new data).
6. Evaluates the updated model performance.
7. Logs iteration metadata to track progress and results.

**⚠️ ATTENTION: Set 'manual_corrections=True' for CVAT workflow with human review, or 'manual_corrections=False' for fully automated pseudo-labeling**


In [4]:
pipeline.get_pipeline_status()


PIPELINE STATUS REPORT
Flow ID: f0
Current Iteration: 2
Training Dataset: train-f0
Current Model UID: None
Training Configuration: {'model_type': 'FasterRCNNConfig', 'task_type': 'object_detection', 'backbone': 'RESNET_50', 'epochs': 6, 'batch_size': 6}
Database Path: pseudo_labeling_metadata_ptest.db
Sample Size Per Iteration: 150
Minimum Confidence Threshold: 0.5

RECENT ITERATIONS:
  Iteration 1: COMPLETED (completed: 2025-07-11 15:14:48)
  Iteration 0: COMPLETED (completed: 2025-07-11 13:21:04)


### Local Initializer
Configure the next iteration parameters

Set `manual_corrections = True` for CVAT manual review.

Set `manual_corrections = False` for fully automated pseudo-labeling.


In [5]:
manual_corrections = False
pipeline.setup_next_iteration(manual_corrections)


No status found for iteration 2. Proceeding...
----------------------------------------
ITERATION INITIALIZED - PERSISTENT ARCHITECTURE
Flow ID: f0
Current iteration: 2
Manual corrections: False
Sample size this iteration: 150
GT added this iteration: 0
Pseudo added this iteration: 150
Total GT images after this step: 50
Total pseudo-labeled images after this step: 300
Total expected training set size: 350
Train dataset name: train-f0
Persistent pseudo dataset: pseudo-f0
Manual corrections dataset: manual-corrections-f0
Pseudo input dataset: pseudo-f0
Initial annotations: initial-annotations:0
Inference model UID: chill-muffler-0
----------------------------------------


### 2.1 Sample New Data

In [6]:
pipeline.sample_unseen_inputs()



=== SAMPLING FOR RE-INFERENCE (ITERATION 2) ===


[32m2025-07-11 16:02:36.459[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:[36mresolve_latest_version[0m:[36m592[0m - [1mResolved latest version of dataset train-f0 to 2 with remote='daniel-osman---streamlining-annotation-bootstrapping/pipeline-test'.[0m
[32m2025-07-11 16:02:36.459[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:[36mload[0m:[36m385[0m - [1mResolved latest version of dataset train-f0 to 2.[0m
[32m2025-07-11 16:02:36.459[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:[36mpull[0m:[36m848[0m - [1mPulling dataset train-f0:2 from remote='daniel-osman---streamlining-annotation-bootstrapping/pipeline-test' with pull_policy=missing.[0m
[32m2025-07-11 16:02:36.463[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:[36mpull[0m:[36m868[0m - [1mDataset train-f0:2 already exists in local store. Skipping[0m


✓ Current training dataset has 150 images
Total images in full dataset: 12630
Images already in training: 150
Images NOT in training (available): 12480
✓ Sampled 150 new images from unused set


[32m2025-07-11 16:02:36.740[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:[36mresolve_latest_version[0m:[36m592[0m - [1mResolved latest version of dataset pseudo-f0 to 4 with remote='daniel-osman---streamlining-annotation-bootstrapping/pipeline-test'.[0m
[32m2025-07-11 16:02:36.740[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:[36mload[0m:[36m385[0m - [1mResolved latest version of dataset pseudo-f0 to 4.[0m
[32m2025-07-11 16:02:36.741[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:[36mpull[0m:[36m848[0m - [1mPulling dataset pseudo-f0:4 from remote='daniel-osman---streamlining-annotation-bootstrapping/pipeline-test' with pull_policy=missing.[0m
[32m2025-07-11 16:02:36.743[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:[36mpull[0m:[36m868[0m - [1mDataset pseudo-f0:4 already exists in local store. Skipping[0m
[32m2025-07-11 16:02:36.764[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:

✓ Found existing pseudo dataset: 150 images
✓ Combined: 150 existing + 150 new = 300 total


Files Uploaded:   0%|          | 0/297 [00:00<?, ?file/s]
Files Confirmed:   0%|          | 0/297 [00:00<?, ?file/s][A

Getting upload links:   0%|          | 0/297 [00:00<?, ?file/s][A[A

Getting upload links:  86%|████████▌ | 256/297 [00:00<00:00, 644.63file/s][A[A

Files Uploaded:  53%|█████▎    | 156/297 [00:00<00:00, 228.81file/s]      [A[A
Files Uploaded:  73%|███████▎  | 218/297 [00:10<00:09,  8.66file/s] ][A
Files Uploaded: 100%|██████████| 297/297 [00:22<00:00, 10.45file/s]s][A

Confirming files:   0%|          | 0/141 [00:00<?, ?file/s][A[A

Confirming files: 100%|██████████| 141/141 [00:01<00:00, 137.48file/s][A[A

                                                                      [A[A
                                                                   ] [A
                                                                    [A[32m2025-07-11 16:03:00.656[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:[36mpush[0m:[36m440[0m - [1mPushing

✓ Saved inputs-only dataset: pseudo-f0
✓ Ready for re-inference on 300 images with evolved model


### 2.2 Generate Predictions/Pseudo-Labels


In [7]:
pipeline.run_inference()

Running inference with model: chill-muffler-0
Running inference on persistent pseudo dataset: pseudo-f0


[32m2025-07-11 16:03:11.443[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:[36mresolve_latest_version[0m:[36m592[0m - [1mResolved latest version of dataset pseudo-f0 to 5 with remote='daniel-osman---streamlining-annotation-bootstrapping/pipeline-test'.[0m
[32m2025-07-11 16:03:12.015[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:[36mresolve_latest_version[0m:[36m592[0m - [1mResolved latest version of dataset pseudo-f0 to 5 with remote='daniel-osman---streamlining-annotation-bootstrapping/pipeline-test'.[0m
[32m2025-07-11 16:03:13.528[0m | [1mINFO    [0m | [36monedl.client.operations.clients._common[0m:[36mcreate_event_stream[0m:[36m79[0m - [1mSubscribing to job events...[0m
[32m2025-07-11 16:03:13.530[0m | [1mINFO    [0m | [36monedl.client.operations.clients._common[0m:[36mcreate_event_stream[0m:[36m80[0m - [1mJob ebony-body-0 in WAITING state[0m
[32m2025-07-11 16:03:14.346[0m | [1mINFO    [0m | [36monedl.client.o

Inference complete
Predictions saved as: pseudo-f0-5--cpu--91f39:0

=== REPLACING PERSISTENT PSEUDO DATASET WITH PREDICTIONS ===


[32m2025-07-11 16:12:10.263[0m | [1mINFO    [0m | [36monedl._local_store.blobs[0m:[36mget_path_many[0m:[36m489[0m - [1mPulling 1/1 blobs from remote. pull_policy=<PullPolicy.missing: 'missing'>[0m
Downloading files: 100%|██████████| 1/1 [00:00<00:00,  4.51file/s]     
[32m2025-07-11 16:12:11.262[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:[36msave[0m:[36m139[0m - [1mOverwriting dataset `pseudo-f0`.[0m
[32m2025-07-11 16:12:11.264[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:[36mdelete_local[0m:[36m1147[0m - [1mversion=None[0m


✓ Filtered predictions by confidence >= 0.5


ValueError: Attempting to delete a dataset with unversioned name pseudo-f0, which has multiple versions. Please specify a version to delete, or pass delete_all_versions=True to delete all versions.

### 2.3 CVAT Export
Run even if manual correction is false.

In [16]:
if pipeline.manual_corrections_global:
    print("Manual corrections enabled - proceeding to CVAT export")
    pipeline.manually_correct_cvat()
    print("After completing corrections in CVAT, manually update the predicted dataset and run the merge cell below")
else:
    print("No Manual Correction, Proceed to merging the datasets")


No Manual Correction, Proceed to merging the datasets


### 2.4. Merge Data
This cell merges the dataset with current training set. If **manual_correction = True**, then corrected annotations will be exported and merged.


In [17]:
pipeline.merge_pseudo_labels()

[32m2025-07-11 13:35:15.267[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:[36mpull[0m:[36m848[0m - [1mPulling dataset initial-annotations:0 from remote='daniel-osman---streamlining-annotation-bootstrapping/pipeline-test' with pull_policy=missing.[0m
[32m2025-07-11 13:35:15.271[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:[36mpull[0m:[36m868[0m - [1mDataset initial-annotations:0 already exists in local store. Skipping[0m


Starting simplified merge process...

=== AUTO PSEUDO-LABELING MODE ===
✓ Pseudo dataset already updated after inference

=== REBUILDING TRAINING DATASET (SIMPLIFIED) ===
✓ Started with initial dataset: 50 images


[32m2025-07-11 13:35:15.855[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:[36mresolve_latest_version[0m:[36m586[0m - [1mThere is no remote version. Resolved latest version of dataset manual-corrections-f0 to 0 local='daniel-osman---streamlining-annotation-bootstrapping/pipeline-test'[0m
[32m2025-07-11 13:35:15.855[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:[36mload[0m:[36m385[0m - [1mResolved latest version of dataset manual-corrections-f0 to 0.[0m
[32m2025-07-11 13:35:15.855[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:[36mpull[0m:[36m848[0m - [1mPulling dataset manual-corrections-f0:0 from remote='daniel-osman---streamlining-annotation-bootstrapping/pipeline-test' with pull_policy=missing.[0m
[32m2025-07-11 13:35:15.856[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:[36mpull[0m:[36m859[0m - [1mPulling dataset manual-corrections-f0:0 from remote. pull_policy=<PullPolicy.missing: 'missing'> an

✓ No manual corrections found for this flow: This resource cannot be found. GET https://api.onedl.ai/v2/storage/contexts/daniel-osman---streamlining-annotation-bootstrapping/pipeline-test/-/datasets/manual-corrections-f0:0/info
404 Not Found - {'detail': 'Dataset manual-corrections-f0:0 was not found in project daniel-osman---streamlining-annotation-bootstrapping/pipeline-test.'}
Received Body b'{"detail":"Dataset manual-corrections-f0:0 was not found in project daniel-osman---streamlining-annotation-bootstrapping/pipeline-test."}'


[32m2025-07-11 13:35:16.373[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:[36mresolve_latest_version[0m:[36m592[0m - [1mResolved latest version of dataset pseudo-f0 to 1 with remote='daniel-osman---streamlining-annotation-bootstrapping/pipeline-test'.[0m
[32m2025-07-11 13:35:16.373[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:[36mload[0m:[36m385[0m - [1mResolved latest version of dataset pseudo-f0 to 1.[0m
[32m2025-07-11 13:35:16.374[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:[36mpull[0m:[36m848[0m - [1mPulling dataset pseudo-f0:1 from remote='daniel-osman---streamlining-annotation-bootstrapping/pipeline-test' with pull_policy=missing.[0m
[32m2025-07-11 13:35:16.376[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:[36mpull[0m:[36m868[0m - [1mDataset pseudo-f0:1 already exists in local store. Skipping[0m
[32m2025-07-11 13:35:16.393[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:

✓ Added pseudo dataset: 150 images, total: 200


Generating label map from unique labels: 100%|██████████| 200/200 [00:00<00:00, 5431.24it/s]
[32m2025-07-11 13:35:16.484[0m | [1mINFO    [0m | [36monedl.datasets.columns.base_column[0m:[36m_generate_label_map_from_unique_labels[0m:[36m457[0m - [1mGenerated label map: {0: 'AboveGround', 1: 'Defect', 2: 'Overgrown', 3: 'Stone', 4: 'Tip'}[0m
Generating label map from unique labels: 100%|██████████| 200/200 [00:00<00:00, 1155455.65it/s]
[32m2025-07-11 13:35:16.485[0m | [1mINFO    [0m | [36monedl.datasets.columns.base_column[0m:[36m_generate_label_map_from_unique_labels[0m:[36m457[0m - [1mGenerated label map: {0: 'AboveGround'}[0m
[32m2025-07-11 13:35:16.498[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:[36msave[0m:[36m191[0m - [1mSaved dataset train-f0:1 to local store.[0m
[32m2025-07-11 13:35:16.498[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:[36mresolve_latest_version[0m:[36m577[0m - [1mResolved latest version of data


✓ TRAINING DATASET REBUILT:
  - Initial GT: 50 images
  - Pseudo labels: 150 images
  - Total: 200 images
  - Saved as: train-f0
  - Label map: {0: 'AboveGround', 1: 'Defect', 2: 'Overgrown', 3: 'Stone', 4: 'Tip'}


### 2.5 Train Updated Model
Train a new model on the expanded training set

In [18]:
pipeline.train_model()

Starting model training...


[32m2025-07-11 13:35:39.424[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:[36mresolve_latest_version[0m:[36m592[0m - [1mResolved latest version of dataset train-f0 to 1 with remote='daniel-osman---streamlining-annotation-bootstrapping/pipeline-test'.[0m
[32m2025-07-11 13:35:39.968[0m | [1mINFO    [0m | [36monedl._local_store.datasets[0m:[36mresolve_latest_version[0m:[36m592[0m - [1mResolved latest version of dataset train-f0 to 1 with remote='daniel-osman---streamlining-annotation-bootstrapping/pipeline-test'.[0m


Training FasterRCNNConfig on dataset: train-f0
Configuration: 30 epochs, batch size 6
Backbone: RESNET_50


[32m2025-07-11 13:35:42.478[0m | [1mINFO    [0m | [36monedl.client.operations.clients._common[0m:[36mcreate_event_stream[0m:[36m79[0m - [1mSubscribing to job events...[0m
[32m2025-07-11 13:35:42.478[0m | [1mINFO    [0m | [36monedl.client.operations.clients._common[0m:[36mcreate_event_stream[0m:[36m80[0m - [1mJob chill-muffler-0 in WAITING state[0m
[32m2025-07-11 13:35:43.632[0m | [1mINFO    [0m | [36monedl.client.operations.clients._common[0m:[36mcreate_event_stream[0m:[36m84[0m - [1mJob chill-muffler-0 in RUNNING state[0m
[32m2025-07-11 14:19:54.672[0m | [1mINFO    [0m | [36monedl.client.operations.clients._common[0m:[36mcreate_event_stream[0m:[36m84[0m - [1mJob chill-muffler-0 in DONE state[0m


Training job submitted
Model UID: chill-muffler-0
Training job state: DONE


### 2.6 Evaluate Performance

In [6]:
pipeline.evaluate_model()

Evaluating chill-muffler-0 on val:0


[32m2025-07-11 15:13:53.167[0m | [1mINFO    [0m | [36monedl.client.operations.clients._common[0m:[36mcreate_event_stream[0m:[36m79[0m - [1mSubscribing to job events...[0m
[32m2025-07-11 15:13:53.168[0m | [1mINFO    [0m | [36monedl.client.operations.clients._common[0m:[36mcreate_event_stream[0m:[36m80[0m - [1mJob furious-henry-0 in WAITING state[0m
[32m2025-07-11 15:13:53.889[0m | [1mINFO    [0m | [36monedl.client.operations.clients._common[0m:[36mcreate_event_stream[0m:[36m84[0m - [1mJob furious-henry-0 in RUNNING state[0m
[32m2025-07-11 15:14:46.413[0m | [1mINFO    [0m | [36monedl.client.operations.clients._common[0m:[36mcreate_event_stream[0m:[36m84[0m - [1mJob furious-henry-0 in DONE state[0m


Evaluation job submitted
Evaluation UID: furious-henry-0
Evaluation job state: DONE


                                                                       

Evaluation complete
Report URL: https://21e007818fa1dd0840eac0d6d59ba986.eu.r2.cloudflarestorage.com/onedl-data/daniel-osman---streamlining-annotation-bootstrapping/pipeline-test/-/19e6aefea00922c219416e00ce1ddf9c.html?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=bb17714b86b2e84a836c55404335cef8%2F20250711%2Fauto%2Fs3%2Faws4_request&X-Amz-Date=20250711T131448Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=b2b5ad59cc4303c3a2d2365c40cb698f79e866a142de7644da48f1dc320bfa49
Metrics: {"mAP50": 0.1888017474404626, "mAP75": 0.09925452144850186, "mAP_all": 0.103202114225167, "fn_count": 35, "fp_count": 141, "tp_count": 103}
✓ Iteration automatically marked as COMPLETED




### 2.7 Status

In [12]:
pipeline.get_pipeline_status()


PIPELINE STATUS REPORT
Flow ID: f0
Current Iteration: 2
Current Status: SAMPLING_COMPLETE
Training Dataset: train-f0
Current Model UID: chill-muffler-0
Training Configuration: {'model_type': 'FasterRCNNConfig', 'task_type': 'object_detection', 'backbone': 'RESNET_50', 'epochs': 6, 'batch_size': 6}
Database Path: pseudo_labeling_metadata_ptest.db
Ground Truth Images: 50
Pseudo-labeled Images: 300
Total Training Images: 350
Sample Size Per Iteration: 150
Minimum Confidence Threshold: 0.5

RECENT ITERATIONS:
  Iteration 2: SAMPLING_COMPLETE
  Iteration 1: COMPLETED (completed: 2025-07-11 15:14:48)
  Iteration 0: COMPLETED (completed: 2025-07-11 13:21:04)


# Additional Runs
To run additional iterations, repeat Section 2 after logging. For creating a new flow, go back to Section 1, update the current_flow and go again.
