Skip to content

feat(memory2): Space raster backend + experimental memory2 agent#2188

Open
Mgczacki wants to merge 8 commits into
dimensionalOS:mainfrom
Mgczacki:memory2-vis-and-agent
Open

feat(memory2): Space raster backend + experimental memory2 agent#2188
Mgczacki wants to merge 8 commits into
dimensionalOS:mainfrom
Mgczacki:memory2-vis-and-agent

Conversation

@Mgczacki
Copy link
Copy Markdown
Collaborator

@Mgczacki Mgczacki commented May 20, 2026

An approach for #1913

What does it do?

Two things in one PR:

  1. memory2 Space gets a cv2 raster backend. Space.to_bgr() / Space.to_png() produce the same world view a vision LLM gets, alongside the existing SVG and Rerun renderers. Plus the elements you need to draw on top of an occupancy map: Polygon, Wedge, RasterOverlay, and Point.shape / halo variants.

  2. An experimental LangChain agent at dimos/memory2/experimental/memory2_agent/. Given a memory2 database, it introspects all streams to perform modality fusion (when the tools are compatible with the modality) in order to generate rich/useful temporal-spatial representations, domain-structure validation, and measurement capabilities on spatial data.

How do we achieve this?

(rendering)

  1. Raster backend mirrors the SVG backend: walks space.elements, accumulates a world-frame Bounds, paints onto a BGR ndarray. The Bounds class is lifted into dimos/memory2/vis/space/bounds.py so both backends share it.
  2. Three new element types — Polygon, Wedge, RasterOverlay — cover what the agent draws on top of occupancy maps: room boundaries, camera FOV cones, arbitrary world-frame masks (overclaim highlights, heatmaps, etc.).
  3. Point grew shape ("dot" / "cross" / "x" / "square") and halo (black underlay), so markers stay readable on busy maps.
  4. resolve_deferred now walks color, fill, AND stroke, so cmap-based colors work on Polygon's two-color split.
  5. PointCloud2 only gets inflated once per render — the bounds pass and the draw pass share a per-call cache keyed by id(el).

(agent)

  1. Provide tools to reason over memory2 types (listing streams, summarizing, getting recent observations).
  2. Rendering of occupancy maps, and of placing points on it to have visual reference for coordinates.
  3. Temporal indexing. As it can be a consistent method of joining between modalities, regardless of sampling rate and measurement error.
  4. Searching for entities - Finding (through CLIP) the entities that exist in an embedded stream, and when they appear.
  5. Searching for coordinates - Finding (through raytracing) whether a coordinate (2D) has been observed in any of the images the system captured.
  6. Walkthrough tools: Allows the system to follow a heavily downsampled video stream's frames, to understand frame by frame what is happening.
  7. Calculations - uses a minimized python REPL for mathematical operations.
  8. Sizing rooms in the occupancy maps by proposing bounding polygon points, then upsizing them/downsizing them based on visual evidence (images), as well as out-of-room lidar measurements, out-of-room odom measurements, and room intersections in order to iteratively refine the border between rooms.
  9. Skills - Any hard algorithms that can be achieved by a composition of the previous skills. The skill set covers room sizing/counting (room_extents), describing a specific room (describe_room), finding exploration frontiers (unexplored_spaces), counting unique instances of a kind of thing across many frames (count_unique_things), distances, object positions, and reasoning from another entity's viewpoint.

Full tool/skill inventory in dimos/memory2/experimental/memory2_agent/README.md (15 tools, 7 skills).

Examples (what the agent sees)

The screenshots below are real tool returns from a memory2 agent run.

show_map — top-down lidar map with the robot's pose pinned. The agent uses this to orient itself in world coordinates.

show_map

frames_facing — top-down view with viewing cones. Given a world (x, y), the tool overlays the cones of the camera frames whose field of view could contain that point. Used for finding which recorded frames "saw" a target location.

frames_facing cones

verify_room_partition — map with the agent's room polygons + per-room areas. The agent submits candidate polygons; the tool overlays them and flags issues (overlap, unpartitioned floor blobs, odometry outside any room).

verify_room_partition

walkthrough — annotated frame strip across a time range. Each tile is captioned with t, robot (x, y), and yaw. Used for summarising what was visible across a stretch of the walk in one call.

walkthrough

frames_facing (per-frame) — recorded camera frame with the query point reprojected as a red X. Used to verify whether a candidate (x, y) actually lands on the target object: if the red cross sits on the body of the thing across multiple views, the position is right.

frames_facing red-X

How do we test the agent?

End-to-end tests live in dimos/memory2/experimental/test_memory2_agent_ask.py. They run the real LangChain agent against a recorded SqliteStore + a live OpenAI model, so they're gated behind the new experimental marker and excluded from the default pytest run.

Each prompt ends with an explicit format directive ("Reply with only the number, nothing else") so the agent commits a clean, parseable final answer instead of dumping its full reasoning chain.

Two recordings, two env vars. The repo doesn't ship the .db files.

Tool-coverage tests (go2_short.db)

These assert the agent picked the right kind of tool — they don't grade the answer's content.

  • test_lists_streams"How many streams does this memory store have?" → expects list_streams to be called and 4 in the answer (build_memory.py writes 4 streams).
  • test_visual_question_uses_image_tool"At t=22s show me what the robot saw directly forward and describe it in one sentence." → expects at least one of {show_image, recall_view, walkthrough, show_map, frames_facing} to be called and a non-empty answer afterwards. Confirms the langgraph Command path is end-to-end functional.

Content-grounded QA on go2_short.dbtest_short_recording_qa (10 cases)

go2_short.db — a short go2 walk through an office with two rooms, two white robots, and a long meeting table. Path supplied via MEMORY2_AGENT_DB.

id Question (verbatim) Expected
rooms_count_2 "How many rooms are there? Reply with only the number, nothing else." 2
biggest_room_area_~80m2 "What's the area of the biggest room? Reply with only the number in m², nothing else." ~80 m², accept 64–96 (±20%)
start_equals_end_room "Did you start in the same room as you ended? Reply with only yes or no, nothing else." yes
closest_to_meeting_table_2m "What's the closest distance in meters that you got to the long meeting table? Round to whole numbers no decimals. Reply with only the number, nothing else." ≤ 2
white_robots_count_2 "How many white robots did you pass by? Reply with only the number, nothing else." 2
white_robots_distance_apart "What's the approximate straight-line distance in meters between the two white robots (not walking distance — the real distance, even across walls)? Round to a whole number. Reply with only the number, nothing else." 3–6 m (inclusive)
man_in_black_moved_hand "What did the man in black move at the end? Reply with only the single body part, nothing else." hand or finger
multi_choice_letter_B 4-option multi-choice: plants vs trashcans behind robots (answer is permuted to position B to dodge first-position bias) letter B
exploration_waypoint_roi "What's the highest-ROI waypoint to explore next to expand the map? Reply with only the coordinate in the format `x, y` (two numbers separated by a comma), nothing else." ~(+4.2, +9.0) — east-lobe frontier, ±1.5 m per axis
passed_through_doorway_top_left "Where is the doorway you passed through that's at the top-left of your trajectory? Reply with only the coordinate in the format `x, y` (two numbers separated by a comma), nothing else." ~(−2.0, +9.1) — interior doorway at the upper-left bend, ±1.5 m per axis

Content-grounded QA on go2_hongkong_office.dbtest_hongkong_recording_qa (3 cases, new)

A longer recording of the Hong Kong office (elevator room, multiple rooms, richer layout). Path supplied via MEMORY2_AGENT_DB_HONGKONG.

id Question (verbatim) Expected
white_robots_count_2_hk "How many white robots did you pass by? Reply with only the number, nothing else." 2 (Need to verify)
elevator_room_center "What's the center coordinate of the room with the elevators? Reply with only the coordinate in the format `x, y` (two numbers separated by a comma), nothing else." ~(+4.55, +2.22) — boundary between the lower central corridor and the right connector, ±1.5 m per axis
total_floor_area "What's the total floor area of the office, summed across all rooms, in square meters? Reply with only the number, nothing else." ~400 m² (eyeballed), accept 300–500 (±100) to absorb polygon-tightness variance

How to run

(default suite — experimental excluded; should stay green for everyone)

```
pytest
```

(unit tests for the new memory2 plotting surface — no LLM, no recording needed)

```
pytest dimos/memory2/vis/space/test_space.py
```

(end-to-end agent tests — opt-in, needs OpenAI + the recording(s); skips cleanly if a recording env var is unset)

```
export OPENAI_API_KEY=...
export MEMORY2_AGENT_DB=/path/to/go2_short.db # required for test_short_recording_qa
export MEMORY2_AGENT_DB_HONGKONG=/path/to/go2_hongkong.db # required for test_hongkong_recording_qa
export MEMORY2_AGENT_MODEL=gpt-4.1-mini # optional, default gpt-5.5
pytest -m experimental dimos/memory2/experimental/ -v
```

(one-shot CLI for ad-hoc questions)

```
python -m dimos.memory2.experimental.memory2_agent.ask \
--db /path/to/recording.db \
--model gpt-4.1-mini \
"Where is the biggest room?"
```

(broader smoke run — 7 mixed questions, no assertions, just prints traces)

```
python -m dimos.memory2.experimental.memory2_agent.run_smoke --db /path/to/recording.db
```

Out of scope

  • No new core dependencies — LangChain / `langchain_openai` / `langchain_google_genai` only get imported under `dimos/memory2/experimental/`.
  • SVG renderer behaviour is preserved (same default padding, same Pose render shape).

Mario Garrido and others added 2 commits May 19, 2026 20:25
Adds a cv2-based raster renderer for `dimos.memory2.vis.space.Space` so
maps can be sent as PNGs to vision LLMs, alongside the existing SVG +
Rerun backends. New Space elements (Polygon, Wedge, RasterOverlay) and
Point shape/halo variants cover the agent's overlay needs.

`dimos.memory2.experimental.memory2_agent` is a LangChain agent that
uses the new rendering surface to answer questions about a recorded
memory2 SqliteStore (occupancy maps, FOV cones, room polygons, image
recall). Tests are gated behind a new `experimental` pytest marker so
they don't run by default.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2026

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
1855 1 1854 43
View the top 1 failed test(s) by shortest run time
dimos.project.test_no_sections::test_no_section_markers
Stack Traces | 0.766s run time
def test_no_section_markers():
        """
        Fail if any file contains section-style comment markers.
    
        If a file is too complicated to be understood without sections, then the
        sections should be files. We don't need "subfiles".
        """
        violations = find_section_markers()
        if violations:
            report_lines = [
                f"Found {len(violations)} section marker(s). "
                "If a file is too complicated to be understood without sections, "
                'then the sections should be files. We don\'t need "subfiles".',
                "",
            ]
            for path, lineno, text in violations:
                report_lines.append(f"  {path}:{lineno}: {text.strip()}")
>           raise AssertionError("\n".join(report_lines))
E           AssertionError: Found 1 section marker(s). If a file is too complicated to be understood without sections, then the sections should be files. We don't need "subfiles".
E           
E             .../experimental/memory2_agent/map_view.py:453: # --- Begin Space construction ----------------------------------------

lineno     = 453
path       = '.../experimental/memory2_agent/map_view.py'
report_lines = ['Found 1 section marker(s). If a file is too complicated to be understood without sections, then the sections should ....../experimental/memory2_agent/map_view.py:453: # --- Begin Space construction ----------------------------------------']
text       = '    # --- Begin Space construction ----------------------------------------'
violations = [('.../experimental/memory2_agent/map_view.py', 453, '    # --- Begin Space construction ----------------------------------------')]

dimos/project/test_no_sections.py:145: AssertionError

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

Mario Garrido and others added 2 commits May 20, 2026 01:10
…e test fixtures

- Add `describe_room` skill: answers "what's in room X" by reading frames inside the
  room (composes room_extents) instead of using semantic search, avoiding question bias.
- Add `unexplored_spaces` skill: surfaces exploration frontiers as the unpartitioned
  orange blobs flagged by verify_room_partition that aren't enclosed by walls.
- Wire MEMORY2_AGENT_DB_HONGKONG fixture + (x, y) parser for content-grounded eval
  cases bound to the larger Hong Kong office recording.
- Update README skill list (now 7 skills, including count_unique_things).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Mgczacki Mgczacki changed the title feat(memory2): Space raster backend + experimental LangChain agent feat(memory2): Space raster backend + experimental memory2 agent May 20, 2026
@Mgczacki Mgczacki marked this pull request as ready for review May 20, 2026 17:47
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 20, 2026

Greptile Summary

This PR delivers two related features: a cv2 raster backend for the existing memory2 Space visualisation layer (to_bgr() / to_png()), and an experimental LangChain agent that uses that rendering pipeline to answer spatial/temporal questions over a recorded SqliteStore.

  • Raster backend (raster.py, bounds.py, elements.py, svg.py, space.py): mirrors the SVG renderer's element walk; adds Polygon, Wedge, and RasterOverlay element types plus shape/halo on Point; shares the extracted Bounds class; PointCloud2 inflation is cached so the bounds and draw passes don't duplicate work.
  • Experimental agent (memory2_agent/): 15 LangChain tools covering stream listing, semantic search, walkthrough, occupancy-map rendering, room partitioning, and a sandboxed Python REPL; tests are opt-in behind the new experimental marker and skipped when db env vars are unset, so the default CI run is unaffected.

Confidence Score: 4/5

Safe to merge with the room-partition index bug acknowledged; the experimental module is opt-in and the failure is caught and returned as an error string rather than crashing the agent.

The raster backend, element additions, and agent wiring are all solid. The one real defect is in verify_room_partition in map_view.py: when any room dict in the input list lacks a valid "polygon" or "rect" key, polys_world ends up shorter than rooms, causing an IndexError in the pairwise-overlap loop and misattributed room labels in the stats. Since all of this lives under the experimental marker and the tool wrapper catches Exception, the agent sees a recoverable error string rather than a hard crash — but the room-partitioning analysis is entirely broken for those inputs until the index alignment is fixed.

dimos/memory2/experimental/memory2_agent/map_view.py — specifically the verify_room_partition function's handling of rooms that fail shape validation.

Important Files Changed

Filename Overview
dimos/memory2/experimental/memory2_agent/map_view.py Large new file with occupancy rendering, walkthrough, and frame-visibility helpers. Contains a P1 index-alignment bug in verify_room_partition where polys_world can be shorter than rooms, leading to an IndexError in the pairwise-overlap loop and wrong room metadata in the output.
dimos/memory2/experimental/memory2_agent/tools.py LangChain tool wrappers; show_image now uses narrow before/after/at queries instead of loading the full stream. known_streams is populated dynamically from store.list_streams() at build time, fixing the stale hardcoded set from earlier drafts.
dimos/memory2/experimental/memory2_agent/agent.py LangGraph agent with custom _OrderedAgentState to keep parallel tool-response ToolMessages contiguous; pre-seeds the skills listing; uses create_agent from langchain.agents (newer LangChain API, confirmed as valid).
dimos/memory2/experimental/memory2_agent/ask.py CLI entry-point; OPENAI_API_KEY guard fires for all models including Gemini, preventing non-OpenAI use without a workaround.
dimos/memory2/vis/space/raster.py New cv2 raster renderer mirroring svg.py; shares Bounds via bounds.py; PointCloud2 inflated grid cached per-call between bounds and draw passes; alpha compositing and all element types correctly handled.
dimos/memory2/vis/space/elements.py Added Polygon, RasterOverlay, Wedge dataclasses and shape/halo fields to Point; clean additions with no regressions to existing elements.
dimos/memory2/vis/space/svg.py Refactored occupancy-grid image emission into shared _rgba_image_to_svg; added SVG renderers for Polygon, Wedge, RasterOverlay, and all new Point shapes; Bounds class moved to bounds.py.
dimos/memory2/vis/space/space.py Added to_bgr() and to_png() methods backed by the new raster renderer; correctly resolves deferred colors before rendering.
dimos/memory2/vis/space/bounds.py New shared Bounds dataclass extracted from svg.py; straightforward and correct.
dimos/memory2/vis/color.py resolve_deferred now walks color, fill, and stroke attributes so Polygon's two-color split supports cmap-deferred colors; clean change.
dimos/memory2/vis/space/rerun.py Added Polygon and Wedge rendering via rr.LineStrips3D; RasterOverlay silently skipped as documented; _edge closure captures ox, oy, length as loop-local variables correctly.
dimos/memory2/experimental/test_memory2_agent_ask.py End-to-end agent tests gated behind experimental marker and skipped when db env vars are unset; won't affect the default CI run.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    Q[User question] --> Agent

    subgraph Agent["LangGraph Agent (agent.py)"]
        direction TB
        PS[Pre-seed list_skills] --> LLM[LLM loop]
        LLM -->|tool call| Tools
        Tools -->|ToolMessage + HumanMessage| LLM
        LLM -->|final text| Answer[Final answer]
    end

    subgraph Tools["LangChain Tools (tools.py)"]
        direction TB
        T1[list_streams / summary / recent]
        T2[search_semantic / near]
        T3[show_image / recall_view]
        T4[show_map]
        T5[walkthrough / walkthrough_timestamps]
        T6[frames_facing]
        T7[verify_room_partition]
        T8[calc / list_skills / load_skill]
    end

    subgraph Render["Space Render Pipeline"]
        direction LR
        Space --> |to_bgr/to_png| Raster[raster.py]
        Space --> |to_svg| SVG[svg.py]
        Space --> |to_rerun| Rerun[rerun.py]
        Raster --> BGRImage[BGR ndarray]
        BGRImage --> |base64 encode| MultiModal[LangChain multimodal msg]
    end

    subgraph Store["SqliteStore"]
        direction TB
        OdomStream[odom stream]
        LidarStream[lidar stream]
        ImageStream[color_image stream]
        EmbStream[color_image_embedded stream]
    end

    T4 --> MapRenderer[MapRenderer]
    T6 --> MapRenderer
    T7 --> MapRenderer
    MapRenderer --> |lidar fusion| Occupancy[OccupancyGrid]
    Occupancy --> Space
    MapRenderer --> Render

    T3 --> ImageStream
    T2 --> EmbStream
    T1 --> Store
Loading

Reviews (2): Last reviewed commit: "[autofix.ci] apply automated fixes" | Re-trigger Greptile

Comment on lines +731 to +737
proj = _project_world_xy_to_pixel(cam_pose=cam_pose, query_x=query_x, query_y=query_y)
if proj is None:
return bgr # query behind camera; return unaltered
px, py_floor, z_cam = proj
H, W = bgr.shape[:2]
if px < -W // 2 or px > W + W // 2:
return bgr # very far off image; not informative
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 _annotate_query_in_frame is documented as returning None when the query is behind the camera or far off-screen, but it always returns the (possibly unmodified) bgr array instead. The caller in tools.py guards on if annotated is None to fall back to the original image encoding path — that branch is unreachable, so every frame (including those where the red cross was never drawn) gets JPEG-encoded and sent to the model as if it were annotated. Changing the two early returns to return None restores the intended contract.

Suggested change
proj = _project_world_xy_to_pixel(cam_pose=cam_pose, query_x=query_x, query_y=query_y)
if proj is None:
return bgr # query behind camera; return unaltered
px, py_floor, z_cam = proj
H, W = bgr.shape[:2]
if px < -W // 2 or px > W + W // 2:
return bgr # very far off image; not informative
proj = _project_world_xy_to_pixel(cam_pose=cam_pose, query_x=query_x, query_y=query_y)
if proj is None:
return None # query behind camera; caller falls back to unannotated frame
px, py_floor, z_cam = proj
H, W = bgr.shape[:2]
if px < -W // 2 or px > W + W // 2:
return None # very far off image; not informative

Comment on lines +1044 to +1057
img_obs = store.stream(stream).to_list()
if not img_obs:
return f"walkthrough: stream {stream!r} is empty"

resolved = _resolve_walkthrough_range(
"walkthrough",
store,
t_start,
t_end,
step_seconds,
WALKTHROUGH_FRAMES_MAX,
)
if isinstance(resolved, str):
return resolved
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 walkthrough_frames calls store.stream(stream).to_list() (loading every image frame into memory) before the range-validation check. If the agent supplies an oversized range, the entire image stream is loaded and then discarded when _resolve_walkthrough_range returns an error string. For a 60 s recording at 10 fps with 1280×720 RGB frames each frame is ~2.8 MB uncompressed — 600 frames = ~1.7 GB materialized unnecessarily. Moving the range check before the stream fetch avoids this.

Suggested change
img_obs = store.stream(stream).to_list()
if not img_obs:
return f"walkthrough: stream {stream!r} is empty"
resolved = _resolve_walkthrough_range(
"walkthrough",
store,
t_start,
t_end,
step_seconds,
WALKTHROUGH_FRAMES_MAX,
)
if isinstance(resolved, str):
return resolved
resolved = _resolve_walkthrough_range(
"walkthrough",
store,
t_start,
t_end,
step_seconds,
WALKTHROUGH_FRAMES_MAX,
)
if isinstance(resolved, str):
return resolved
img_obs = store.stream(stream).to_list()
if not img_obs:
return f"walkthrough: stream {stream!r} is empty"

Comment on lines +88 to +92
def _validate_stream(name: str) -> str | None:
"""Return an error string if the stream name is invalid, else None."""
if name not in _KNOWN_STREAMS:
return f"unknown stream {name!r}; available: {sorted(_KNOWN_STREAMS)}"
return None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Hardcoded stream names may mislead the agent

_KNOWN_STREAMS is a static set of four names. list_streams() queries the actual SQLite store and shows the agent whatever streams actually exist. If a recording uses any stream name not in this set, the agent will be informed (via list_streams) that it exists but will receive "unknown stream 'X'" from every other tool that calls _validate_stream — including summary, recent, search_semantic, near, and show_image. The disconnect makes the agent appear broken rather than incapable. Consider deriving the allowed set dynamically from store.list_streams() at build time.

Comment on lines +308 to +313
try:
all_obs = store.stream(stream).to_list()
if not all_obs:
return f"stream {stream!r} is empty"
obs = min(all_obs, key=lambda o: abs(o.ts - float(ts)))
except Exception as e:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Full image stream materialized on every show_image call

store.stream(stream).to_list() deserializes every observation in the image stream into memory before the single nearest-timestamp entry is picked with min(...). At 10 fps over 60 s, this is ~600 full-resolution images (~2.8 MB each uncompressed) loaded unnecessarily for each tool call. The store already supports ordered queries (see recent which uses .order_by("ts", desc=True).limit(n)). A narrower query or at minimum deferring data decoding would avoid the O(N) memory spike. The same pattern appears in frames_that_could_see_point (loads all color_image frames before filtering by FOV).

Mario Garrido and others added 2 commits May 20, 2026 12:55
- _annotate_query_in_frame: return None (not bgr) on behind-camera /
  off-screen so the caller's fallback branch is reachable, matching the
  docstring.
- walkthrough_frames: validate range before materializing the image
  stream so invalid ranges don't trigger a full stream load.
- build_tools: snapshot store.list_streams() into known_streams instead
  of a static set, so the agent sees consistent answers between
  list_streams() and stream-named tool calls.
- show_image: replace full-stream materialization with three indexed
  pushdown queries (before/at-exact/after). Image streams join blobs
  eagerly in SqliteObservationStore, so the previous to_list() decoded
  every JPEG just to pick one.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment on lines +269 to +310
continue

# Numeric index as the on-image label so the legend (text) and the
# marker (image) line up one-to-one.
space.add(
SpacePoint(
GeoPoint(wx, wy, 0.0),
color=_POINT_COLOR_TO_MEMORY2[color_name],
radius=_POINT_OVERLAY_RADIUS_M,
shape="dot",
halo=True,
label=str(idx),
)
)

line = f" {idx:>2} ({color_name:<7s}) at ({wx:+.2f}, {wy:+.2f})"
if label:
line += f" — {label}"
lines.append(line)

legend = f"Points ({len(pts)}):\n" + "\n".join(lines)
if dropped:
legend += f"\n [dropped {dropped} points beyond soft cap of {POINT_SOFT_CAP}]"
if invalid_colors:
legend += (
"\n [invalid color(s) "
+ ", ".join(sorted(set(invalid_colors)))
+ f"; defaulted to '{POINT_DEFAULT_COLOR}'. Allowed: "
+ ", ".join(POINT_COLORS_ALLOWED)
+ "]"
)
return legend


def encode_space_as_multimodal(
space: Space, caption: str, *, width_px: int
) -> list[dict[str, Any]]:
"""Render *space* to PNG and return LangChain multimodal content blocks."""
png = space.to_png(width_px=width_px, padding_m=0.0)
b64 = base64.b64encode(png).decode("ascii")
return [
{"type": "text", "text": caption},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 polys_world / rooms index mismatch in verify_room_partition

When any room dict is missing a valid "polygon" or "rect" key (e.g. a polygon with fewer than 3 vertices passes the len(r["polygon"]) >= 3 guard), that room is silently skipped and polys_world ends up shorter than rooms. All subsequent index-based accesses then use the wrong room entry or blow up entirely:

  1. overlap_with_per_room is created with range(len(rooms)) entries, but the inner loop room_masks_bool[i] is indexed up to len(rooms)-1 while room_masks_bool only has len(polys_world) items — an IndexError when i >= len(polys_world).
  2. The per-room Space.add(SpacePolygon(..., label=f"#{ident} {desc}")) and stats.append(PartitionStats(id=room.get("id",…))) loops both do room = rooms[i] where i is the polys_world index, so the label/metadata drifts for every room that was skipped.

The except Exception in the tool wrapper converts the crash to an error string so the agent won't hang, but the partition analysis is entirely unusable. Fix by collecting (room, poly) pairs together and iterating the shorter zipped list throughout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant