Change how we pass and resolve supervisor policy uris for behavior cloning #4161

nishu-builder · 2025-12-03T22:00:04Z

Centralized URI-to-PolicySpec resolution. Now metta://xxx works with non-mpt policies

A) The MettaSchemeResolver now returns either:

The S3 path containing the policy spec (preferred), or
The raw .mpt checkpoint path for legacy policies

B) policy_spec_from_uri() now handles all URI types uniformly:
- .mpt files → creates MptPolicy spec
- S3 paths → downloads and extracts policy spec archive
- Local dirs → loads policy_spec.json directly

resolve_uri() renamed to get_path_to_policy_spec_or_mpt() to clarify its purpose

Removed other contextmanager-based load-policy-from-s3 logic, and removed multiple points with custom .mpt handling; now it should all be in policy_spec_from_uri. We now just download to a local cache folder with optional cleanup in atexit handlers

Fix passing of behavior-cloning policy URI

Before, we passed around an EnvSupervisorConfig containing a URI string that was being misinterpreted as a path. Now, TrainTool resolves the URI into a PolicySpec upfront and passes the resolved spec directly to VectorizedTrainingEnvironment.

Cogames CLI policy parsing cleanup
The CLI policy parser now uses more shared code for deciding if an input is a URI and parsing it, and allows passing in additional args via key=value pairs.

So now you can do e.g.:

uv run cogames evaluate -m machina_1.machinatrainerbig -p class=random -p metta://policy/dinky,proportion=2 -p s3://softmax-public/policies/daveey.1x4.cvc.dr.maps.1.bct.0/daveey.1x4.cvc.dr.maps.1.bct.0:v1990.mpt,proportion=4

TrainTool.invoke(): 1. supervisor_policy_uri on TrainingEnvironmentConfig (in recipes) 2. policy_spec_from_uri() resolves URI → PolicySpec in TrainTool.invoke 3. VectorizedTrainingEnvironment(cfg, supervisor_policy_spec=...) 4. make_vecenv(..., supervisor_policy_spec=...) 5. MettaGridPufferEnv(..., supervisor_policy_spec=...)

datadog-official · 2025-12-04T01:00:20Z

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 9137b53 | Docs | Was this helpful? Give us feedback!}

packages/mettagrid/python/src/mettagrid/util/uri_resolvers/schemes.py

…uri-resolving

packages/cogames/src/cogames/cli/policy.py

…oning (#4161) 1. Centralized URI-to-PolicySpec resolution. Now `metta://xxx` works with non-mpt policies A) The MettaSchemeResolver now returns either: - The S3 path containing the policy spec (preferred), or - The raw .mpt checkpoint path for legacy policies B) policy_spec_from_uri() now handles all URI types uniformly: - .mpt files → creates MptPolicy spec - S3 paths → downloads and extracts policy spec archive - Local dirs → loads policy_spec.json directly - resolve_uri() renamed to get_path_to_policy_spec_or_mpt() to clarify its purpose Removed other contextmanager-based load-policy-from-s3 logic, and removed multiple points with custom .mpt handling; now it should all be in policy_spec_from_uri. We now just download to a local cache folder with optional cleanup in atexit handlers 2. Fix passing of behavior-cloning policy URI Before, we passed around an EnvSupervisorConfig containing a URI string that was being misinterpreted as a path. Now, TrainTool resolves the URI into a PolicySpec upfront and passes the resolved spec directly to VectorizedTrainingEnvironment. 3. Cogames CLI policy parsing cleanup The CLI policy parser now uses more shared code for deciding if an input is a URI and parsing it, and allows passing in additional args via key=value pairs. So now you can do e.g.: ``` uv run cogames evaluate -m machina_1.machinatrainerbig -p class=random -p metta://policy/dinky,proportion=2 -p s3://softmax-public/policies/daveey.1x4.cvc.dr.maps.1.bct.0/daveey.1x4.cvc.dr.maps.1.bct.0:v1990.mpt,proportion=4 ``` --------- Co-authored-by: Richard Higgins <richard@relh.net> Co-authored-by: Richard Higgins <richard@softmax.com>

github-actions bot assigned nishu-builder Dec 3, 2025

nishu-builder requested a review from relh December 3, 2025 22:00

nishu-builder assigned relh Dec 3, 2025

More generic metta:// resolver

1c0e8e3

Nishad added 2 commits December 3, 2025 17:01

touchups

30f79ec

fix circular imports

623c466

graphite-app bot reviewed Dec 4, 2025

View reviewed changes

packages/mettagrid/python/src/mettagrid/util/uri_resolvers/schemes.py Outdated Show resolved Hide resolved

Merge branch 'main' into nishad/fix-bc_policy_uri-resolving

2f3349e

relh approved these changes Dec 4, 2025

View reviewed changes

relh enabled auto-merge December 4, 2025 02:22

relh and others added 4 commits December 3, 2025 18:31

Merge branch 'main' into nishad/fix-bc_policy_uri-resolving

112210a

resolver test fix

07a39b3

Merge remote-tracking branch 'origin/main' into nishad/fix-bc_policy_…

e53ab27

…uri-resolving

wrap with error for tests

97cecb2

relh added this pull request to the merge queue Dec 4, 2025

nishu-builder removed this pull request from the merge queue due to a manual request Dec 4, 2025

Nishad added 3 commits December 4, 2025 13:48

small differences

e0f8ad2

fix scheme parsing in cogames

1f41111

clean up cogames policy parsing

56faae5

graphite-app bot reviewed Dec 4, 2025

View reviewed changes

packages/cogames/src/cogames/cli/policy.py Outdated Show resolved Hide resolved

Nishad and others added 3 commits December 4, 2025 15:46

fix test

ea04e8f

fix

af2b03d

Merge branch 'main' into nishad/fix-bc_policy_uri-resolving

9137b53

nishu-builder added this pull request to the merge queue Dec 5, 2025

Merged via the queue into main with commit 604fbc6 Dec 5, 2025
12 checks passed

nishu-builder deleted the nishad/fix-bc_policy_uri-resolving branch December 5, 2025 00:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Change how we pass and resolve supervisor policy uris for behavior cloning #4161

Change how we pass and resolve supervisor policy uris for behavior cloning #4161

Uh oh!

nishu-builder commented Dec 3, 2025 •

edited

Loading

Uh oh!

datadog-official bot commented Dec 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Change how we pass and resolve supervisor policy uris for behavior cloning #4161

Change how we pass and resolve supervisor policy uris for behavior cloning #4161

Uh oh!

Conversation

nishu-builder commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

datadog-official bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nishu-builder commented Dec 3, 2025 •

edited

Loading

datadog-official bot commented Dec 4, 2025 •

edited

Loading