Skip to content

Feat/2036 with 2037#2107

Open
bogwi wants to merge 47 commits into
mainfrom
feat/2036-with-2037
Open

Feat/2036 with 2037#2107
bogwi wants to merge 47 commits into
mainfrom
feat/2036-with-2037

Conversation

@bogwi
Copy link
Copy Markdown
Collaborator

@bogwi bogwi commented May 16, 2026

Supersedes #2098 so self-hosted CI can run. Prior review discussion is preserved there.


This ship AprilTag marker 3D detector as per #2036

Goes in two parts:

I ) included a camera calibration tool - dedicated utility to calibrate camera and obtain camera_info.yaml
dimos/dimos/utils/cli/cameracalibrate

uv run pytest dimos/utils/cli/cameracalibrate

How to calibrate is explained in the dimos/docs/usage/camera_calibration.md


II ) The detector module here dimos/dimos/perception/fiducial
The module has been tested in real-life detecting

  • individual tags
  • a group of twelve tags
    all tried on on distances up to 4m, and various yaw, pitch, roll changes, slant changes.

verified streaming into rerun.
APRILTAG_36h11_12_A4_rerun_screenshot_2026-05-16 at 7 06 52

or partial detection, notice correctly detected tags on the bottom

APRILTAG_36h11_12_A4_rerun_screenshot_2026-05-16 at 7 17 08

If you have obtained for your camera a camera_info.yaml after the calibration step then you can sub it here dimos/dimos/perception/fiducial/blueprints/fixtures/camera_info.yaml and reproduce testing steps as

Manual sequence (two terminals from dimos repo, uv run dimos):

  1. term1: uv run dimos stop then uv run dimos run desk-marker-tf --daemon — note the printed Log: path.
  2. term2: uv run dimos rerun-bridge — default opens a native viewer (--rerun-open web for browser, none if headless). Waits until Ctrl+C.
  3. In the Rerun timeline / 3D view, expand entities under world/tf/ and confirm base_link, camera_optical, marker_tf/markers, marker_tf/marker_<id> when the printed tag is in view (markers appear in bursts matching detection).
  4. End: Ctrl+C on the bridge, uv run dimos stop.

Python tests are based on:

  • fixture PNGs load; (obtained from rerun)
  • OpenCV detects expected AprilTag IDs;
  • detected corners match the generated 12-tag PDF layout via homography;
  • swapped IDs/layout mismatches fail;
  • MarkerTfModule publishes expected marker frames with finite transforms;
  • AprilTag PDF generator layout remains stable;
  • existing MarkerTf unit behavior still passes.

uv run pytest dimos/perception/fiducial

leshy and others added 30 commits May 9, 2026 22:01
Switch from mypy-ignore to types-reportlab>=4.5.0 (matches reportlab 4.5
in deps), matching the project's pattern for the other ~15 types-* packages.
The stubs immediately caught a real bug — Canvas.setKeywords expects str |
None, not list[str].
Add a top-level `pytest.importorskip("cv2.aruco")` if not already present so CI without the extra skips, not errors.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 16, 2026

Greptile Summary

This PR introduces ArUco/AprilTag marker-based pose estimation (MarkerTfModule), a desk-camera blueprint (DeskStaticTfModule + webcam), an interactive camera calibration CLI (dimos cameracalibrate), and fixture-image verification helpers — along with thorough unit and integration tests for each piece.

  • MarkerTfModule: subscribes to color_image + camera_info, runs solvePnP (IPPE_SQUARE) for each detected marker, and publishes the full world → markers → marker_{id} TF chain. The desk blueprint wires a webcam and a fixed TF publisher that republishes at 10 Hz so TF lookups at image timestamps stay within tolerance.
  • Camera calibration CLI (dimos cameracalibrate / standalone cameracalibrate calibrate): supports webcam and folder sources, auto-detects chessboard orientation/count interpretation, and writes ROS CameraInfo YAML; RuntimeError from camera-open failures and early quit is not caught in either CLI handler, exposing raw tracebacks to the user.
  • fixture_verification.py: validates checked-in board images against the generated PDF layout via homography residuals; imports private _grid_layout (coupling risk), and emits a misleading "0.00px threshold" rejection message when no markers are detected in a positive frame.

Confidence Score: 3/5

Safe to review further but two CLI commands will show raw Python tracebacks on webcam failures before those are fixed.

Both CLI entry points only catch ValueError while _capture_frames_from_webcam raises RuntimeError for camera-open failures, repeated read failures, and user quit — so any of those conditions produce an unformatted traceback instead of a clean error message for the user. The rest of the new code — MarkerTfModule, the desk blueprint, fixture verification, and the test suite — is well-structured and has good coverage, but fixture_verification.py carries a fragile hidden dependency on a private layout symbol.

dimos/robot/cli/dimos.py and dimos/utils/cli/cameracalibrate/cameracalibrate.py both need their exception handlers widened; dimos/perception/fiducial/fixture_verification.py warrants a second look for the private import and the misleading rejection message.

Important Files Changed

Filename Overview
dimos/perception/fiducial/marker_tf_module.py New module: subscribes to color_image + camera_info, runs ArUco/AprilTag detection via solvePnP, and publishes marker poses on the TF bus. Logic looks sound; TF chain composition (world→base→optical→marker) is correct.
dimos/perception/fiducial/fixture_verification.py New helper for AprilTag board image validation: imports private _grid_layout (fragile coupling), and the rejection message for zero-detection positive frames is misleading.
dimos/utils/cli/cameracalibrate/cameracalibrate.py New camera calibration CLI: chessboard detection, webcam capture loop, and ROS CameraInfo YAML writer are well-structured; RuntimeError from webcam failures is not caught in the calibrate command, producing raw tracebacks.
dimos/robot/cli/dimos.py Adds cameracalibrate sub-command; only catches ValueError but RuntimeError can escape from webcam capture, exposing raw tracebacks to CLI users.
dimos/robot/cli/topic.py Refactors inline LCM message decoding into _decode_typed_lcm_message helper; test added; behavior is equivalent to the old code.
dimos/perception/fiducial/blueprints/desk_marker_tf.py New blueprint wiring DeskStaticTfModule, CameraModule (webcam), and MarkerTfModule; static TF republishing at 10 Hz and graceful thread shutdown look correct.
dimos/robot/all_blueprints.py Registers desk-marker-tf blueprint and desk-static-tf-module / marker-tf-module entries; straightforward registry additions.

Sequence Diagram

sequenceDiagram
    participant WC as Webcam
    participant CM as CameraModule
    participant DST as DeskStaticTfModule
    participant MTF as MarkerTfModule
    participant TF as TF Bus

    DST->>TF: "publish(world→base_link, base_link→camera_optical) @ 10 Hz"
    WC->>CM: raw BGR frames
    CM->>MTF: color_image + camera_info
    MTF->>TF: "get(world→base_link, ts, tol=0.5s)"
    TF-->>MTF: T_world_base
    MTF->>TF: "get(base_link→camera_optical, ts, tol=0.5s)"
    TF-->>MTF: T_base_optical
    Note over MTF: detectMarkers → solvePnP per marker
    MTF->>TF: publish(world→markers, markers→marker_N …)
Loading

Reviews (1): Last reviewed commit: "Ignore layout.tags for false positive; r..." | Re-trigger Greptile

Comment thread dimos/robot/cli/dimos.py
Comment on lines +714 to +718
f"({result.get('pattern_label', 'requested inner corners')})"
)
if out is not None:
typer.echo(f"Wrote camera info YAML to {out}")
if preview_out is not None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 RuntimeError from webcam failures surfaces as raw Python tracebacks

_capture_frames_from_webcam raises RuntimeError when the camera device fails to open, when consecutive reads fail, or when the user quits before the target_count is reached. Only ValueError is caught here (and in the mirror calibrate command in cameracalibrate.py), so any of those conditions will produce an unformatted traceback for the user instead of a typer.BadParameter message.

Suggested change
f"({result.get('pattern_label', 'requested inner corners')})"
)
if out is not None:
typer.echo(f"Wrote camera info YAML to {out}")
if preview_out is not None:
try:
result = run_calibration(
source=source,
device_index=device_index,
images=images,
cols=cols,
rows=rows,
square_size_m=square_size_m,
out=out,
preview_out=preview_out,
camera_name=camera_name,
target_count=target_count,
no_display=no_display,
)
except (ValueError, RuntimeError) as exc:
raise typer.BadParameter(str(exc)) from exc

Comment on lines +630 to +645
try:
result = run_calibration(
source=source,
device_index=device_index,
images=images,
cols=cols,
rows=rows,
square_size_m=square_size_m,
out=out,
preview_out=preview_out,
camera_name=camera_name,
target_count=target_count,
no_display=no_display,
)
except ValueError as exc:
raise typer.BadParameter(str(exc)) from exc
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 The same RuntimeError-not-caught issue exists in the standalone calibrate command. Webcam camera-open failures, read failures, and early quit all raise RuntimeError which escapes as a raw traceback here too.

Suggested change
try:
result = run_calibration(
source=source,
device_index=device_index,
images=images,
cols=cols,
rows=rows,
square_size_m=square_size_m,
out=out,
preview_out=preview_out,
camera_name=camera_name,
target_count=target_count,
no_display=no_display,
)
except ValueError as exc:
raise typer.BadParameter(str(exc)) from exc
try:
result = run_calibration(
source=source,
device_index=device_index,
images=images,
cols=cols,
rows=rows,
square_size_m=square_size_m,
out=out,
preview_out=preview_out,
camera_name=camera_name,
target_count=target_count,
no_display=no_display,
)
except (ValueError, RuntimeError) as exc:
raise typer.BadParameter(str(exc)) from exc

"""Run detector, classification, and PDF-layout checks for one manifest row."""
detection = detect_apriltag_frame(
repo_root / frame["image_path"],
manifest["fixture"]["opencv_dictionary"],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Dependency on private _grid_layout creates fragile coupling

fixture_verification.py imports _grid_layout (a leading-underscore private symbol) from dimos.utils.cli.apriltag. Because fixture_verification must reproduce the exact PDF layout algorithm to validate fixtures, any change to _grid_layout's spacing, arguments, or tile geometry would silently break layout verification — detected markers would be compared against stale expected positions, causing spurious rejection of valid board images. Consider either making _grid_layout part of a shared public API or co-locating the layout logic where both the generator and verifier can share it.

Comment on lines +395 to +400
if is_positive:
if board_layout_geometry is None or not board_layout_geometry.ok:
reject_reasons.append(
"Board layout homography exceeds p95 threshold: "
f"{metrics.board_layout_error_px_p95:.2f}px"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Misleading rejection reason when a positive frame detects no markers

When board_layout_geometry is None because no markers were detected, metrics.board_layout_error_px_p95 resolves to 0.0, so the rejection reason reads "Board layout homography exceeds p95 threshold: 0.00px". Zero pixels is not a geometry error — the real problem is that no markers were detected. Distinguishing the two failure modes makes the rejection reasons actionable for callers.

Suggested change
if is_positive:
if board_layout_geometry is None or not board_layout_geometry.ok:
reject_reasons.append(
"Board layout homography exceeds p95 threshold: "
f"{metrics.board_layout_error_px_p95:.2f}px"
)
if is_positive:
if board_layout_geometry is None:
reject_reasons.append("No markers detected; cannot verify board layout geometry")
elif not board_layout_geometry.ok:
reject_reasons.append(
"Board layout homography exceeds p95 threshold: "
f"{metrics.board_layout_error_px_p95:.2f}px"
)

@bogwi bogwi mentioned this pull request May 16, 2026

class DeskStaticTfModuleConfig(ModuleConfig):
world_frame: str = "world"
base_frame: str = "base_link"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does marker care about world, base link etc, those frames are emitted by other modules, it should only care about camera_optical -> it's own detections.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without folding in world -> base -> optical, you would only have optical <- marker.
Then:

A marker sitting on a desk would jump in world whenever the robot moves because optical moves with the robot.
Nav, planning, maps, and multi-module stacks that already reason in world / map and base_link would each have to repeat the same chain as I understand: look up base and camera, compose with marker, and stay in sync on timestamps.

)


class DeskStaticTfModuleConfig(ModuleConfig):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I want to plug this into for example go2
https://github.com/dimensionalOS/dimos/blob/main/dimos/robot/unitree/go2/blueprints/smart/unitree_go2.py#L34

how I should add this module? How do I tell it what's the CameraInfo for that robot?

Copy link
Copy Markdown
Collaborator Author

@bogwi bogwi May 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unitree_go2 = autoconnect(
    unitree_go2_basic,
    VoxelGridMapper.blueprint(),
    CostMapper.blueprint(),
    ReplanningAStarPlanner.blueprint(),
    WavefrontFrontierExplorer.blueprint(),
    PatrollingModule.blueprint(),
    MovementManager.blueprint(),
).global_config(n_workers=10, robot_model="unitree_go2")

You add the fiducial module the same way

from dimos.perception.fiducial.marker_tf_module import MarkerTfModule

unitree_go2 = autoconnect(
    unitree_go2_basic,
    MarkerTfModule.blueprint(
        marker_length_m=...,  # physical edge length of the printed tag, meters
        # optional: aruco_dictionary, marker_namespace_prefix, world_frame, base_frame, max_freq, ...
    ),
    VoxelGridMapper.blueprint(),
    # ... rest unchanged
).global_config(n_workers=10, robot_model="unitree_go2")

You do not pass CameraInfo into MarkerTfModule config. That module has In[CameraInfo] (and In[Image]) and uses whatever stream is connected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants