Skip to content

colingfly/cane-robotics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cane-robotics

Foundation Model Active Learning (FMAL) for autonomous robot object discovery.

Fuses three vision-language foundation models -- GroundingDINO, DINO, and CLIP -- into a unified acquisition function for active learning. The system enables robots to efficiently discover and learn novel objects in unstructured environments with minimal human annotation.

Install

pip install cane-robotics

Quick Start

# Run a single active learning experiment
cane-robotics run --images-dir data/images --labels-dir data/labels --classes box laptop chair

# Run all ablation variants across multiple seeds
cane-robotics ablations --images-dir data/images --labels-dir data/labels

# Evaluate sim-to-real transfer
cane-robotics sim2real --synthetic-dir data/synthetic --real-dir data/real

# Launch annotation GUI
cane-robotics annotate novel_detections/

# Plot experiment results
cane-robotics plot results/

# Generate synthetic training data (Isaac Sim)
cane-robotics generate --output-dir data/synthetic --num-scenes 50

How It Works

The active learning pipeline scores candidate object detections using three complementary signals:

  1. GroundingDINO -- open-vocabulary detection confidence
  2. DINO ViT -- class-agnostic attention saliency (filters background clutter)
  3. CLIP -- semantic novelty relative to known object classes

These are combined into a unified acquisition score:

score(x) = 0.5 * conf_gdino + 0.3 * attn_dino + 0.2 * sim_fg - 0.2 * sim_bg

A temporal deduplication module tracks previously queried objects via embedding similarity, reducing redundant annotation queries by ~69%.

Each round, the top-scoring proposals are labeled (by human or oracle), added to the training set, and a YOLOv8 detector is retrained. The loop repeats until convergence.

Package Structure

cane_robotics/
  pipeline/        Core active learning pipeline, offline replay, ROS node
  models/          Foundation model wrappers (GDINO, CLIP, DINO, dedup)
  dataset/         Dataset management and augmentation
  config/          Experiment configuration (dataclasses + YAML)
  experiments/     Experiment runners, ablations, sim2real evaluation
  training/        YOLO training and dataset preparation
  sim/             Isaac Sim synthetic data generation
  tools/           Annotation GUI, result plotting

Python API

from cane_robotics import (
    ActiveLearningPipeline,
    create_gdino_pipeline,
    ExperimentConfig,
    DatasetManager,
    TemporalDeduplicator,
)

# Create pipeline with full multi-VLM acquisition
pipeline = create_gdino_pipeline(
    known_classes=["mug", "bowl", "can"],
    acquisition_type="full",
    enable_dedup=True,
)

# Process a single image
result = pipeline.process_image("frame_001.jpg")
for obj in result["novel_objects"]:
    print(f"{obj['label']} (score={obj['score']:.3f})")

Ablation Variants

The experiment framework supports 8 acquisition function variants for systematic comparison:

Variant Description
full All three VLM signals combined (default)
random Random scoring baseline
gdino_only GroundingDINO confidence only
clip_only CLIP novelty signal only
dino_only DINO attention only
no_fg_bg_gate Full formula without foreground/background gating
no_dedup Full scoring with deduplication disabled
no_sam Full scoring with SAM splitting disabled

Dependencies

Core: numpy, pyyaml, torch, torchvision, ultralytics, opencv-python, Pillow, transformers

Optional:

  • [sim] -- Isaac Sim for synthetic data generation
  • [dev] -- pytest, ruff for development

License

MIT

About

Multi-VLM active learning pipeline that fuses GroundingDINO, DINO, and CLIP to autonomously discover novel objects for robot perception with minimal human annotation. Includes temporal deduplication, ablation framework, sim-to-real evaluation, and Isaac Sim synthetic data generation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors