Skip to content

Conversation

@akrbc9
Copy link
Contributor

@akrbc9 akrbc9 commented Dec 9, 2025

TL;DR

Added support for policies with fewer actions than the environment expects by implementing action logits padding.

What changed?

  • Added a new pad_to_env_actions flag to ActionProbsConfig (default: True)
  • Implemented padding logic in ActionProbs.forward_inference() that adds -inf logits when the policy has fewer actions than the environment
  • Changed the action space mismatch error in load_or_create_policy to a warning message that suggests using the padding feature

How to test?

  1. Create a policy with fewer actions than the environment expects
  2. Ensure pad_to_env_actions=True is set in the ActionProbsConfig
  3. Verify that the policy loads and runs without errors
  4. Check logs for the warning message about action space mismatch

Why make this change?

This change enables more flexibility when using policies across different environments or when using policies that only support a subset of the available actions. Instead of failing with an error when action spaces don't match exactly, the system can now pad the logits with -inf values, effectively giving zero probability to the extra actions while still maintaining compatibility.

Copy link
Contributor Author

akrbc9 commented Dec 9, 2025

This stack of pull requests is managed by Graphite. Learn more about stacking.

@akrbc9 akrbc9 changed the title add adding during evaluation inference Add padding support for mismatched action spaces in ActionProbs component Dec 9, 2025
@akrbc9 akrbc9 marked this pull request as ready for review December 9, 2025 18:56
@akrbc9 akrbc9 requested a review from relh December 9, 2025 18:56
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@datadog-official
Copy link

datadog-official bot commented Dec 9, 2025

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: e76342b | Docs | Was this helpful? Give us feedback!

@akrbc9 akrbc9 force-pushed the axel/pad-action-space branch from e3a82ce to 81266c3 Compare December 9, 2025 21:57
@akrbc9 akrbc9 force-pushed the axel/pad-action-space branch from a6f455a to e76342b Compare December 9, 2025 22:20
@relh relh enabled auto-merge December 9, 2025 22:28
@relh relh added this pull request to the merge queue Dec 9, 2025
Merged via the queue into main with commit 1bcfc2b Dec 9, 2025
15 of 16 checks passed
@relh relh deleted the axel/pad-action-space branch December 9, 2025 22:58
relh added a commit that referenced this pull request Dec 10, 2025
relh added a commit that referenced this pull request Dec 10, 2025
relh pushed a commit that referenced this pull request Dec 10, 2025
…nent (#4277)

### TL;DR

Added support for policies with fewer actions than the environment
expects by implementing action logits padding.

### What changed?

- Added a new `pad_to_env_actions` flag to `ActionProbsConfig` (default:
`True`)
- Implemented padding logic in `ActionProbs.forward_inference()` that
adds `-inf` logits when the policy has fewer actions than the
environment
- Changed the action space mismatch error in `load_or_create_policy` to
a warning message that suggests using the padding feature

### How to test?

1. Create a policy with fewer actions than the environment expects
2. Ensure `pad_to_env_actions=True` is set in the `ActionProbsConfig`
3. Verify that the policy loads and runs without errors
4. Check logs for the warning message about action space mismatch

### Why make this change?

This change enables more flexibility when using policies across
different environments or when using policies that only support a subset
of the available actions. Instead of failing with an error when action
spaces don't match exactly, the system can now pad the logits with
`-inf` values, effectively giving zero probability to the extra actions
while still maintaining compatibility.

Co-authored-by: Axel K <ak@Axels-MacBook-Pro.local>
github-merge-queue bot pushed a commit that referenced this pull request Dec 10, 2025
…nent (#4303)

Updated packages/mettagrid/python/src/mettagrid/policy/mpt_policy.py so
MptPolicy accepts a pad_action_space flag and forwards it to
artifact.instantiate, enabling padded action-space loading when
requested.

- Why it broke: PR #4277 added pad_action_space handling inside
MptArtifact.instantiate, and cogames submit now passes that flag when an
environment has more actions than the checkpoint. Because
MptPolicy.__init__ didn’t accept or forward the
flag, Hydra/CLI instantiation raised TypeError: __init__() got an
unexpected keyword argument 'pad_action_space', so the run failed before
the padding logic could run.
- How it happened: wrapper class got out of sync with the underlying
artifact API during the PR; the new option was only wired into
MptArtifact, not the public MptPolicy entry point used by CLI/config.

  Tests not run (small constructor change only).

---------

Co-authored-by: Axel Kerbec <akerbec@umich.edu>
Co-authored-by: Axel K <ak@Axels-MacBook-Pro.local>
zfogg pushed a commit that referenced this pull request Dec 20, 2025
…nent (#4277)

### TL;DR

Added support for policies with fewer actions than the environment
expects by implementing action logits padding.

### What changed?

- Added a new `pad_to_env_actions` flag to `ActionProbsConfig` (default:
`True`)
- Implemented padding logic in `ActionProbs.forward_inference()` that
adds `-inf` logits when the policy has fewer actions than the
environment
- Changed the action space mismatch error in `load_or_create_policy` to
a warning message that suggests using the padding feature

### How to test?

1. Create a policy with fewer actions than the environment expects
2. Ensure `pad_to_env_actions=True` is set in the `ActionProbsConfig`
3. Verify that the policy loads and runs without errors
4. Check logs for the warning message about action space mismatch

### Why make this change?

This change enables more flexibility when using policies across
different environments or when using policies that only support a subset
of the available actions. Instead of failing with an error when action
spaces don't match exactly, the system can now pad the logits with
`-inf` values, effectively giving zero probability to the extra actions
while still maintaining compatibility.

Co-authored-by: Axel K <ak@Axels-MacBook-Pro.local>
zfogg pushed a commit that referenced this pull request Dec 20, 2025
zfogg pushed a commit that referenced this pull request Dec 20, 2025
…nent (#4303)

Updated packages/mettagrid/python/src/mettagrid/policy/mpt_policy.py so
MptPolicy accepts a pad_action_space flag and forwards it to
artifact.instantiate, enabling padded action-space loading when
requested.

- Why it broke: PR #4277 added pad_action_space handling inside
MptArtifact.instantiate, and cogames submit now passes that flag when an
environment has more actions than the checkpoint. Because
MptPolicy.__init__ didn’t accept or forward the
flag, Hydra/CLI instantiation raised TypeError: __init__() got an
unexpected keyword argument 'pad_action_space', so the run failed before
the padding logic could run.
- How it happened: wrapper class got out of sync with the underlying
artifact API during the PR; the new option was only wired into
MptArtifact, not the public MptPolicy entry point used by CLI/config.

  Tests not run (small constructor change only).

---------

Co-authored-by: Axel Kerbec <akerbec@umich.edu>
Co-authored-by: Axel K <ak@Axels-MacBook-Pro.local>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants