Add padding support for mismatched action spaces in ActionProbs component #4277

akrbc9 · 2025-12-09T18:53:40Z

TL;DR

Added support for policies with fewer actions than the environment expects by implementing action logits padding.

What changed?

Added a new pad_to_env_actions flag to ActionProbsConfig (default: True)
Implemented padding logic in ActionProbs.forward_inference() that adds -inf logits when the policy has fewer actions than the environment
Changed the action space mismatch error in load_or_create_policy to a warning message that suggests using the padding feature

How to test?

Create a policy with fewer actions than the environment expects
Ensure pad_to_env_actions=True is set in the ActionProbsConfig
Verify that the policy loads and runs without errors
Check logs for the warning message about action space mismatch

Why make this change?

This change enables more flexibility when using policies across different environments or when using policies that only support a subset of the available actions. Instead of failing with an error when action spaces don't match exactly, the system can now pad the logits with -inf values, effectively giving zero probability to the extra actions while still maintaining compatibility.

akrbc9 · 2025-12-09T18:53:58Z

Add padding support for mismatched action spaces in ActionProbs component #4277 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

metta/rl/training/checkpointer.py

datadog-official · 2025-12-09T19:37:54Z

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: e76342b | Docs | Was this helpful? Give us feedback!}

packages/mettagrid/python/src/mettagrid/policy/mpt_artifact.py

…bs component (#4277)" This reverts commit 1bcfc2b.

#4304) …bs component (#4277)" This reverts commit 1bcfc2b.

…nent (#4277) ### TL;DR Added support for policies with fewer actions than the environment expects by implementing action logits padding. ### What changed? - Added a new `pad_to_env_actions` flag to `ActionProbsConfig` (default: `True`) - Implemented padding logic in `ActionProbs.forward_inference()` that adds `-inf` logits when the policy has fewer actions than the environment - Changed the action space mismatch error in `load_or_create_policy` to a warning message that suggests using the padding feature ### How to test? 1. Create a policy with fewer actions than the environment expects 2. Ensure `pad_to_env_actions=True` is set in the `ActionProbsConfig` 3. Verify that the policy loads and runs without errors 4. Check logs for the warning message about action space mismatch ### Why make this change? This change enables more flexibility when using policies across different environments or when using policies that only support a subset of the available actions. Instead of failing with an error when action spaces don't match exactly, the system can now pad the logits with `-inf` values, effectively giving zero probability to the extra actions while still maintaining compatibility. Co-authored-by: Axel K <ak@Axels-MacBook-Pro.local>

…nent (#4303) Updated packages/mettagrid/python/src/mettagrid/policy/mpt_policy.py so MptPolicy accepts a pad_action_space flag and forwards it to artifact.instantiate, enabling padded action-space loading when requested. - Why it broke: PR #4277 added pad_action_space handling inside MptArtifact.instantiate, and cogames submit now passes that flag when an environment has more actions than the checkpoint. Because MptPolicy.__init__ didn’t accept or forward the flag, Hydra/CLI instantiation raised TypeError: __init__() got an unexpected keyword argument 'pad_action_space', so the run failed before the padding logic could run. - How it happened: wrapper class got out of sync with the underlying artifact API during the PR; the new option was only wired into MptArtifact, not the public MptPolicy entry point used by CLI/config. Tests not run (small constructor change only). --------- Co-authored-by: Axel Kerbec <akerbec@umich.edu> Co-authored-by: Axel K <ak@Axels-MacBook-Pro.local>

…nent (#4277) ### TL;DR Added support for policies with fewer actions than the environment expects by implementing action logits padding. ### What changed? - Added a new `pad_to_env_actions` flag to `ActionProbsConfig` (default: `True`) - Implemented padding logic in `ActionProbs.forward_inference()` that adds `-inf` logits when the policy has fewer actions than the environment - Changed the action space mismatch error in `load_or_create_policy` to a warning message that suggests using the padding feature ### How to test? 1. Create a policy with fewer actions than the environment expects 2. Ensure `pad_to_env_actions=True` is set in the `ActionProbsConfig` 3. Verify that the policy loads and runs without errors 4. Check logs for the warning message about action space mismatch ### Why make this change? This change enables more flexibility when using policies across different environments or when using policies that only support a subset of the available actions. Instead of failing with an error when action spaces don't match exactly, the system can now pad the logits with `-inf` values, effectively giving zero probability to the extra actions while still maintaining compatibility. Co-authored-by: Axel K <ak@Axels-MacBook-Pro.local>

#4304) …bs component (#4277)" This reverts commit 1bcfc2b.

…nent (#4303) Updated packages/mettagrid/python/src/mettagrid/policy/mpt_policy.py so MptPolicy accepts a pad_action_space flag and forwards it to artifact.instantiate, enabling padded action-space loading when requested. - Why it broke: PR #4277 added pad_action_space handling inside MptArtifact.instantiate, and cogames submit now passes that flag when an environment has more actions than the checkpoint. Because MptPolicy.__init__ didn’t accept or forward the flag, Hydra/CLI instantiation raised TypeError: __init__() got an unexpected keyword argument 'pad_action_space', so the run failed before the padding logic could run. - How it happened: wrapper class got out of sync with the underlying artifact API during the PR; the new option was only wired into MptArtifact, not the public MptPolicy entry point used by CLI/config. Tests not run (small constructor change only). --------- Co-authored-by: Axel Kerbec <akerbec@umich.edu> Co-authored-by: Axel K <ak@Axels-MacBook-Pro.local>

akrbc9 changed the title ~~add adding during evaluation inference~~ Add padding support for mismatched action spaces in ActionProbs component Dec 9, 2025

akrbc9 marked this pull request as ready for review December 9, 2025 18:56

akrbc9 requested a review from relh December 9, 2025 18:56

akrbc9 assigned relh Dec 9, 2025

chatgpt-codex-connector bot reviewed Dec 9, 2025

View reviewed changes

metta/rl/training/checkpointer.py Outdated Show resolved Hide resolved

graphite-app bot reviewed Dec 9, 2025

View reviewed changes

packages/mettagrid/python/src/mettagrid/policy/mpt_artifact.py Outdated Show resolved Hide resolved

akrbc9 force-pushed the axel/pad-action-space branch from e3a82ce to 81266c3 Compare December 9, 2025 21:57

add adding during evaluation inference

e76342b

akrbc9 force-pushed the axel/pad-action-space branch from a6f455a to e76342b Compare December 9, 2025 22:20

relh approved these changes Dec 9, 2025

View reviewed changes

relh enabled auto-merge December 9, 2025 22:28

relh added this pull request to the merge queue Dec 9, 2025

Merged via the queue into main with commit 1bcfc2b Dec 9, 2025
15 of 16 checks passed

relh deleted the axel/pad-action-space branch December 9, 2025 22:58

relh mentioned this pull request Dec 10, 2025

Add padding support for mismatched action spaces in ActionProbs component #4303

Merged

relh added a commit that referenced this pull request Dec 10, 2025

Revert "Add padding support for mismatched action spaces in ActionPro…

9494168

…bs component (#4277)" This reverts commit 1bcfc2b.

relh mentioned this pull request Dec 10, 2025

Revert "Add padding support for mismatched action spaces in ActionPro… #4304

Merged

relh added a commit that referenced this pull request Dec 10, 2025

Revert "Add padding support for mismatched action spaces in ActionPro… (

7ecf0c6

#4304) …bs component (#4277)" This reverts commit 1bcfc2b.

zfogg pushed a commit that referenced this pull request Dec 20, 2025

Revert "Add padding support for mismatched action spaces in ActionPro… (

3de11c4

#4304) …bs component (#4277)" This reverts commit 1bcfc2b.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add padding support for mismatched action spaces in ActionProbs component #4277

Add padding support for mismatched action spaces in ActionProbs component #4277

Uh oh!

akrbc9 commented Dec 9, 2025 •

edited

Loading

Uh oh!

akrbc9 commented Dec 9, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

datadog-official bot commented Dec 9, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add padding support for mismatched action spaces in ActionProbs component #4277

Add padding support for mismatched action spaces in ActionProbs component #4277

Uh oh!

Conversation

akrbc9 commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TL;DR

What changed?

How to test?

Why make this change?

Uh oh!

akrbc9 commented Dec 9, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

datadog-official bot commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

akrbc9 commented Dec 9, 2025 •

edited

Loading

datadog-official bot commented Dec 9, 2025 •

edited

Loading