Feat/2036 with 2037#2107
Conversation
…nto apriltag-generator
Switch from mypy-ignore to types-reportlab>=4.5.0 (matches reportlab 4.5 in deps), matching the project's pattern for the other ~15 types-* packages. The stubs immediately caught a real bug — Canvas.setKeywords expects str | None, not list[str].
# Conflicts: # uv.lock
Add a top-level `pytest.importorskip("cv2.aruco")` if not already present so CI without the extra skips, not errors.
…annel: str, data: bytes) -> None
# Conflicts: # dimos/robot/cli/dimos.py # pyproject.toml # uv.lock
Greptile SummaryThis PR introduces ArUco/AprilTag marker-based pose estimation (
Confidence Score: 3/5Safe to review further but two CLI commands will show raw Python tracebacks on webcam failures before those are fixed. Both CLI entry points only catch ValueError while _capture_frames_from_webcam raises RuntimeError for camera-open failures, repeated read failures, and user quit — so any of those conditions produce an unformatted traceback instead of a clean error message for the user. The rest of the new code — MarkerTfModule, the desk blueprint, fixture verification, and the test suite — is well-structured and has good coverage, but fixture_verification.py carries a fragile hidden dependency on a private layout symbol. dimos/robot/cli/dimos.py and dimos/utils/cli/cameracalibrate/cameracalibrate.py both need their exception handlers widened; dimos/perception/fiducial/fixture_verification.py warrants a second look for the private import and the misleading rejection message. Important Files Changed
Sequence DiagramsequenceDiagram
participant WC as Webcam
participant CM as CameraModule
participant DST as DeskStaticTfModule
participant MTF as MarkerTfModule
participant TF as TF Bus
DST->>TF: "publish(world→base_link, base_link→camera_optical) @ 10 Hz"
WC->>CM: raw BGR frames
CM->>MTF: color_image + camera_info
MTF->>TF: "get(world→base_link, ts, tol=0.5s)"
TF-->>MTF: T_world_base
MTF->>TF: "get(base_link→camera_optical, ts, tol=0.5s)"
TF-->>MTF: T_base_optical
Note over MTF: detectMarkers → solvePnP per marker
MTF->>TF: publish(world→markers, markers→marker_N …)
Reviews (1): Last reviewed commit: "Ignore layout.tags for false positive; r..." | Re-trigger Greptile |
| f"({result.get('pattern_label', 'requested inner corners')})" | ||
| ) | ||
| if out is not None: | ||
| typer.echo(f"Wrote camera info YAML to {out}") | ||
| if preview_out is not None: |
There was a problem hiding this comment.
RuntimeError from webcam failures surfaces as raw Python tracebacks
_capture_frames_from_webcam raises RuntimeError when the camera device fails to open, when consecutive reads fail, or when the user quits before the target_count is reached. Only ValueError is caught here (and in the mirror calibrate command in cameracalibrate.py), so any of those conditions will produce an unformatted traceback for the user instead of a typer.BadParameter message.
| f"({result.get('pattern_label', 'requested inner corners')})" | |
| ) | |
| if out is not None: | |
| typer.echo(f"Wrote camera info YAML to {out}") | |
| if preview_out is not None: | |
| try: | |
| result = run_calibration( | |
| source=source, | |
| device_index=device_index, | |
| images=images, | |
| cols=cols, | |
| rows=rows, | |
| square_size_m=square_size_m, | |
| out=out, | |
| preview_out=preview_out, | |
| camera_name=camera_name, | |
| target_count=target_count, | |
| no_display=no_display, | |
| ) | |
| except (ValueError, RuntimeError) as exc: | |
| raise typer.BadParameter(str(exc)) from exc |
| try: | ||
| result = run_calibration( | ||
| source=source, | ||
| device_index=device_index, | ||
| images=images, | ||
| cols=cols, | ||
| rows=rows, | ||
| square_size_m=square_size_m, | ||
| out=out, | ||
| preview_out=preview_out, | ||
| camera_name=camera_name, | ||
| target_count=target_count, | ||
| no_display=no_display, | ||
| ) | ||
| except ValueError as exc: | ||
| raise typer.BadParameter(str(exc)) from exc |
There was a problem hiding this comment.
The same
RuntimeError-not-caught issue exists in the standalone calibrate command. Webcam camera-open failures, read failures, and early quit all raise RuntimeError which escapes as a raw traceback here too.
| try: | |
| result = run_calibration( | |
| source=source, | |
| device_index=device_index, | |
| images=images, | |
| cols=cols, | |
| rows=rows, | |
| square_size_m=square_size_m, | |
| out=out, | |
| preview_out=preview_out, | |
| camera_name=camera_name, | |
| target_count=target_count, | |
| no_display=no_display, | |
| ) | |
| except ValueError as exc: | |
| raise typer.BadParameter(str(exc)) from exc | |
| try: | |
| result = run_calibration( | |
| source=source, | |
| device_index=device_index, | |
| images=images, | |
| cols=cols, | |
| rows=rows, | |
| square_size_m=square_size_m, | |
| out=out, | |
| preview_out=preview_out, | |
| camera_name=camera_name, | |
| target_count=target_count, | |
| no_display=no_display, | |
| ) | |
| except (ValueError, RuntimeError) as exc: | |
| raise typer.BadParameter(str(exc)) from exc |
| """Run detector, classification, and PDF-layout checks for one manifest row.""" | ||
| detection = detect_apriltag_frame( | ||
| repo_root / frame["image_path"], | ||
| manifest["fixture"]["opencv_dictionary"], |
There was a problem hiding this comment.
Dependency on private
_grid_layout creates fragile coupling
fixture_verification.py imports _grid_layout (a leading-underscore private symbol) from dimos.utils.cli.apriltag. Because fixture_verification must reproduce the exact PDF layout algorithm to validate fixtures, any change to _grid_layout's spacing, arguments, or tile geometry would silently break layout verification — detected markers would be compared against stale expected positions, causing spurious rejection of valid board images. Consider either making _grid_layout part of a shared public API or co-locating the layout logic where both the generator and verifier can share it.
| if is_positive: | ||
| if board_layout_geometry is None or not board_layout_geometry.ok: | ||
| reject_reasons.append( | ||
| "Board layout homography exceeds p95 threshold: " | ||
| f"{metrics.board_layout_error_px_p95:.2f}px" | ||
| ) |
There was a problem hiding this comment.
Misleading rejection reason when a positive frame detects no markers
When board_layout_geometry is None because no markers were detected, metrics.board_layout_error_px_p95 resolves to 0.0, so the rejection reason reads "Board layout homography exceeds p95 threshold: 0.00px". Zero pixels is not a geometry error — the real problem is that no markers were detected. Distinguishing the two failure modes makes the rejection reasons actionable for callers.
| if is_positive: | |
| if board_layout_geometry is None or not board_layout_geometry.ok: | |
| reject_reasons.append( | |
| "Board layout homography exceeds p95 threshold: " | |
| f"{metrics.board_layout_error_px_p95:.2f}px" | |
| ) | |
| if is_positive: | |
| if board_layout_geometry is None: | |
| reject_reasons.append("No markers detected; cannot verify board layout geometry") | |
| elif not board_layout_geometry.ok: | |
| reject_reasons.append( | |
| "Board layout homography exceeds p95 threshold: " | |
| f"{metrics.board_layout_error_px_p95:.2f}px" | |
| ) |
|
|
||
| class DeskStaticTfModuleConfig(ModuleConfig): | ||
| world_frame: str = "world" | ||
| base_frame: str = "base_link" |
There was a problem hiding this comment.
why does marker care about world, base link etc, those frames are emitted by other modules, it should only care about camera_optical -> it's own detections.
There was a problem hiding this comment.
Without folding in world -> base -> optical, you would only have optical <- marker.
Then:
A marker sitting on a desk would jump in world whenever the robot moves because optical moves with the robot.
Nav, planning, maps, and multi-module stacks that already reason in world / map and base_link would each have to repeat the same chain as I understand: look up base and camera, compose with marker, and stay in sync on timestamps.
| ) | ||
|
|
||
|
|
||
| class DeskStaticTfModuleConfig(ModuleConfig): |
There was a problem hiding this comment.
if I want to plug this into for example go2
https://github.com/dimensionalOS/dimos/blob/main/dimos/robot/unitree/go2/blueprints/smart/unitree_go2.py#L34
how I should add this module? How do I tell it what's the CameraInfo for that robot?
There was a problem hiding this comment.
unitree_go2 = autoconnect(
unitree_go2_basic,
VoxelGridMapper.blueprint(),
CostMapper.blueprint(),
ReplanningAStarPlanner.blueprint(),
WavefrontFrontierExplorer.blueprint(),
PatrollingModule.blueprint(),
MovementManager.blueprint(),
).global_config(n_workers=10, robot_model="unitree_go2")You add the fiducial module the same way
from dimos.perception.fiducial.marker_tf_module import MarkerTfModule
unitree_go2 = autoconnect(
unitree_go2_basic,
MarkerTfModule.blueprint(
marker_length_m=..., # physical edge length of the printed tag, meters
# optional: aruco_dictionary, marker_namespace_prefix, world_frame, base_frame, max_freq, ...
),
VoxelGridMapper.blueprint(),
# ... rest unchanged
).global_config(n_workers=10, robot_model="unitree_go2")You do not pass CameraInfo into MarkerTfModule config. That module has In[CameraInfo] (and In[Image]) and uses whatever stream is connected.
Supersedes #2098 so self-hosted CI can run. Prior review discussion is preserved there.
This ship AprilTag marker 3D detector as per #2036
Goes in two parts:
I ) included a camera calibration tool - dedicated utility to calibrate camera and obtain camera_info.yaml
dimos/dimos/utils/cli/cameracalibrate
uv run pytest dimos/utils/cli/cameracalibrateHow to calibrate is explained in the dimos/docs/usage/camera_calibration.md
II ) The detector module here dimos/dimos/perception/fiducial
The module has been tested in real-life detecting
all tried on on distances up to 4m, and various yaw, pitch, roll changes, slant changes.
verified streaming into rerun.

or partial detection, notice correctly detected tags on the bottom
If you have obtained for your camera a camera_info.yaml after the calibration step then you can sub it here dimos/dimos/perception/fiducial/blueprints/fixtures/camera_info.yaml and reproduce testing steps as
Manual sequence (two terminals from
dimosrepo,uv run dimos):uv run dimos stopthenuv run dimos run desk-marker-tf --daemon— note the printedLog:path.uv run dimos rerun-bridge— default opens a native viewer (--rerun-open webfor browser,noneif headless). Waits until Ctrl+C.world/tf/and confirmbase_link,camera_optical,marker_tf/markers,marker_tf/marker_<id>when the printed tag is in view (markers appear in bursts matching detection).uv run dimos stop.Python tests are based on:
uv run pytest dimos/perception/fiducial