Model scaffolding and runs#4
Conversation
pranavguru
left a comment
There was a problem hiding this comment.
Did not review the multigrid portion as it is not high priority now
| ) | ||
|
|
||
| # Process and generate | ||
| inputs = self.processor( |
There was a problem hiding this comment.
why is the prompt here different from the prompt used in the lmstudio and ollama adapters?
There was a problem hiding this comment.
Honestly Paligemma is both too old and weak to handle grid nav, and the adapter has been left behind for many iterations. We can probably remove it, if we're fine cutting some comparisons from v1.0
There was a problem hiding this comment.
we can get rid of GUIAction - we are not doing this as a separate domain as of now
| self._task_spec = task_spec | ||
| return task_spec | ||
|
|
||
| def to_canonical(self, domain_spec: TaskSpecification) -> CanonicalTaskSpec: |
There was a problem hiding this comment.
I can see some issues in the round trip conversion (from_canonical --> to_canonical )
- Rules are silently dropped: from_canonical always produces Rules() defaults (line 184); to_canonical doesn't serialize Rules at all. So a maze with observability="view_cone", key_consumption=False, or
hidden_mechanisms=["s1"] round-trips into a maze without those constraints. The hidden-switch tier-5 mazes lose all their identifying features. Either add CanonicalRules to the canonical taxonomy or stuff into domain_config. Seems like there is a similar issue with dependency_chain and distractors. - Coordinate normalization is lossy at boundaries (lines 80–82). int(pos[0] * (grid_w - 1)) plus the max/min clamp silently moves objects toward the interior. Combined with to_canonical's pos.x / (grid_w -1), the round trip isn't idempotent at right/bottom edges.
Double check if the above are issues
| from typing import Dict | ||
|
|
||
|
|
||
| class MiniGridActions(IntEnum): |
There was a problem hiding this comment.
is this the official action space defined by MiniGrid?
0445a24 to
fe82acd
Compare
… the running environment, including the interfaces, includes multigrid for contaimination and spatial issues. Essentially this is the meat of the running pipeline.
… small Multigrid rendering issues
…the running code, the target specific adapters, and several helpful debugging cripts for VLMs to make sure we're testing correctly
fe82acd to
1256b78
Compare
This is part 3 of the big 4 part code review. This is the scaffolding and interface code for different models. Currently mostly focused on local models, with a hacky interface to deal with chat interfaces for frontier models.