cane-robotics

Foundation Model Active Learning (FMAL) for autonomous robot object discovery.

Fuses three vision-language foundation models -- GroundingDINO, DINO, and CLIP -- into a unified acquisition function for active learning. The system enables robots to efficiently discover and learn novel objects in unstructured environments with minimal human annotation.

Install

pip install cane-robotics

Quick Start

# Run a single active learning experiment
cane-robotics run --images-dir data/images --labels-dir data/labels --classes box laptop chair

# Run all ablation variants across multiple seeds
cane-robotics ablations --images-dir data/images --labels-dir data/labels

# Evaluate sim-to-real transfer
cane-robotics sim2real --synthetic-dir data/synthetic --real-dir data/real

# Launch annotation GUI
cane-robotics annotate novel_detections/

# Plot experiment results
cane-robotics plot results/

# Generate synthetic training data (Isaac Sim)
cane-robotics generate --output-dir data/synthetic --num-scenes 50

How It Works

The active learning pipeline scores candidate object detections using three complementary signals:

GroundingDINO -- open-vocabulary detection confidence
DINO ViT -- class-agnostic attention saliency (filters background clutter)
CLIP -- semantic novelty relative to known object classes

These are combined into a unified acquisition score:

score(x) = 0.5 * conf_gdino + 0.3 * attn_dino + 0.2 * sim_fg - 0.2 * sim_bg

A temporal deduplication module tracks previously queried objects via embedding similarity, reducing redundant annotation queries by ~69%.

Each round, the top-scoring proposals are labeled (by human or oracle), added to the training set, and a YOLOv8 detector is retrained. The loop repeats until convergence.

Package Structure

cane_robotics/
  pipeline/        Core active learning pipeline, offline replay, ROS node
  models/          Foundation model wrappers (GDINO, CLIP, DINO, dedup)
  dataset/         Dataset management and augmentation
  config/          Experiment configuration (dataclasses + YAML)
  experiments/     Experiment runners, ablations, sim2real evaluation
  training/        YOLO training and dataset preparation
  sim/             Isaac Sim synthetic data generation
  tools/           Annotation GUI, result plotting

Python API

from cane_robotics import (
    ActiveLearningPipeline,
    create_gdino_pipeline,
    ExperimentConfig,
    DatasetManager,
    TemporalDeduplicator,
)

# Create pipeline with full multi-VLM acquisition
pipeline = create_gdino_pipeline(
    known_classes=["mug", "bowl", "can"],
    acquisition_type="full",
    enable_dedup=True,
)

# Process a single image
result = pipeline.process_image("frame_001.jpg")
for obj in result["novel_objects"]:
    print(f"{obj['label']} (score={obj['score']:.3f})")

Ablation Variants

The experiment framework supports 8 acquisition function variants for systematic comparison:

Variant	Description
`full`	All three VLM signals combined (default)
`random`	Random scoring baseline
`gdino_only`	GroundingDINO confidence only
`clip_only`	CLIP novelty signal only
`dino_only`	DINO attention only
`no_fg_bg_gate`	Full formula without foreground/background gating
`no_dedup`	Full scoring with deduplication disabled
`no_sam`	Full scoring with SAM splitting disabled

Dependencies

Core: numpy, pyyaml, torch, torchvision, ultralytics, opencv-python, Pillow, transformers

Optional:

[sim] -- Isaac Sim for synthetic data generation
[dev] -- pytest, ruff for development

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
active_learning_hsr_corl		active_learning_hsr_corl
cane_robotics		cane_robotics
paper		paper
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cane-robotics

Install

Quick Start

How It Works

Package Structure

Python API

Ablation Variants

Dependencies

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

cane-robotics

Install

Quick Start

How It Works

Package Structure

Python API

Ablation Variants

Dependencies

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages