Model scaffolding and runs by seanrivera · Pull Request #4 · ManifoldRG/MultiNet-v2.0

seanrivera · 2026-04-25T21:24:22Z

This is part 3 of the big 4 part code review. This is the scaffolding and interface code for different models. Currently mostly focused on local models, with a hacky interface to deal with chat interfaces for frontier models.

pranavguru

Did not review the multigrid portion as it is not high priority now

pranavguru · 2026-05-01T20:31:06Z

+        )
+
+        # Process and generate
+        inputs = self.processor(


why is the prompt here different from the prompt used in the lmstudio and ollama adapters?

Honestly Paligemma is both too old and weak to handle grid nav, and the adapter has been left behind for many iterations. We can probably remove it, if we're fine cutting some comparisons from v1.0

Sounds good - lets remove it then @seanrivera

pranavguru · 2026-05-01T20:42:23Z

we can get rid of GUIAction - we are not doing this as a separate domain as of now

@seanrivera this hasn't been addressed yet

pranavguru · 2026-05-01T20:54:01Z

+        self._task_spec = task_spec
+        return task_spec
+
+    def to_canonical(self, domain_spec: TaskSpecification) -> CanonicalTaskSpec:


I can see some issues in the round trip conversion (from_canonical --> to_canonical )

Rules are silently dropped: from_canonical always produces Rules() defaults (line 184); to_canonical doesn't serialize Rules at all. So a maze with observability="view_cone", key_consumption=False, or
hidden_mechanisms=["s1"] round-trips into a maze without those constraints. The hidden-switch tier-5 mazes lose all their identifying features. Either add CanonicalRules to the canonical taxonomy or stuff into domain_config. Seems like there is a similar issue with dependency_chain and distractors.

Coordinate normalization is lossy at boundaries (lines 80–82). int(pos[0] * (grid_w - 1)) plus the max/min clamp silently moves objects toward the interior. Combined with to_canonical's pos.x / (grid_w -1), the round trip isn't idempotent at right/bottom edges.

Double check if the above are issues

@seanrivera this hasn't been addressed yet

pranavguru · 2026-05-01T21:14:36Z

+from typing import Dict
+
+
+class MiniGridActions(IntEnum):


is this the official action space defined by MiniGrid?

@seanrivera this hasn't been answered yet

… the running environment, including the interfaces, includes multigrid for contaimination and spatial issues. Essentially this is the meat of the running pipeline.

… small Multigrid rendering issues

…the running code, the target specific adapters, and several helpful debugging cripts for VLMs to make sure we're testing correctly

…l repo clean

seanrivera requested a review from pranavguru April 25, 2026 21:24

pranavguru requested changes May 1, 2026

View reviewed changes

seanrivera force-pushed the model-scaffolding-and-runs branch from 0445a24 to fe82acd Compare May 1, 2026 22:08

pranavguru mentioned this pull request May 1, 2026

Reporting and examples #5

Open

seanrivera added 15 commits May 9, 2026 22:58

Add foundation and ingestion code

af55a86

Tighten foundation validation and tests

c664dc7

Move shared schema and package metadata to foundation

5e6f703

This is the core gridworld code. Builds the grid off of the spec, has…

ad6091c

… the running environment, including the interfaces, includes multigrid for contaimination and spatial issues. Essentially this is the meat of the running pipeline.

Small fixes to address the new mazes with switches, and clean up some…

43b65fb

… small Multigrid rendering issues

Preserve switch colors across backends

4c0ea41

Move backend surface files to maze branch

01c18e4

Update backend test comments for MultiNet v2

23fd440

Move backend interface docs to maze layer

3e36ee7

This is the model scaffolding area. Covers the evaluation harnesses, …

84c2540

…the running code, the target specific adapters, and several helpful debugging cripts for VLMs to make sure we're testing correctly

Moved the debug scripts into a scripts directory to keep the top leve…

6a93674

…l repo clean

Update model comments for MultiNet v2

cfc67ae

Update cross-domain comments for MultiNet v2

c43146d

Fix canonical gridworld round trips

00b7200

Remove untracked NL domain references from model scaffold

1256b78

seanrivera force-pushed the model-scaffolding-and-runs branch from fe82acd to 1256b78 Compare May 9, 2026 21:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model scaffolding and runs#4

Model scaffolding and runs#4
seanrivera wants to merge 15 commits into
mainfrom
model-scaffolding-and-runs

seanrivera commented Apr 25, 2026

Uh oh!

pranavguru left a comment

Uh oh!

Uh oh!

pranavguru May 1, 2026

Uh oh!

seanrivera May 2, 2026

Uh oh!

pranavguru May 8, 2026

Uh oh!

pranavguru May 1, 2026

Uh oh!

pranavguru May 8, 2026

Uh oh!

pranavguru May 1, 2026

Uh oh!

pranavguru May 8, 2026

Uh oh!

pranavguru May 1, 2026

Uh oh!

pranavguru May 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

seanrivera commented Apr 25, 2026

Uh oh!

pranavguru left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pranavguru May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pranavguru May 8, 2026 •

edited

Loading