-
Notifications
You must be signed in to change notification settings - Fork 48
Add padding support for mismatched action spaces in ActionProbs component #4277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
e3a82ce to
81266c3
Compare
a6f455a to
e76342b
Compare
…nent (#4277) ### TL;DR Added support for policies with fewer actions than the environment expects by implementing action logits padding. ### What changed? - Added a new `pad_to_env_actions` flag to `ActionProbsConfig` (default: `True`) - Implemented padding logic in `ActionProbs.forward_inference()` that adds `-inf` logits when the policy has fewer actions than the environment - Changed the action space mismatch error in `load_or_create_policy` to a warning message that suggests using the padding feature ### How to test? 1. Create a policy with fewer actions than the environment expects 2. Ensure `pad_to_env_actions=True` is set in the `ActionProbsConfig` 3. Verify that the policy loads and runs without errors 4. Check logs for the warning message about action space mismatch ### Why make this change? This change enables more flexibility when using policies across different environments or when using policies that only support a subset of the available actions. Instead of failing with an error when action spaces don't match exactly, the system can now pad the logits with `-inf` values, effectively giving zero probability to the extra actions while still maintaining compatibility. Co-authored-by: Axel K <ak@Axels-MacBook-Pro.local>
…nent (#4303) Updated packages/mettagrid/python/src/mettagrid/policy/mpt_policy.py so MptPolicy accepts a pad_action_space flag and forwards it to artifact.instantiate, enabling padded action-space loading when requested. - Why it broke: PR #4277 added pad_action_space handling inside MptArtifact.instantiate, and cogames submit now passes that flag when an environment has more actions than the checkpoint. Because MptPolicy.__init__ didn’t accept or forward the flag, Hydra/CLI instantiation raised TypeError: __init__() got an unexpected keyword argument 'pad_action_space', so the run failed before the padding logic could run. - How it happened: wrapper class got out of sync with the underlying artifact API during the PR; the new option was only wired into MptArtifact, not the public MptPolicy entry point used by CLI/config. Tests not run (small constructor change only). --------- Co-authored-by: Axel Kerbec <akerbec@umich.edu> Co-authored-by: Axel K <ak@Axels-MacBook-Pro.local>
…nent (#4277) ### TL;DR Added support for policies with fewer actions than the environment expects by implementing action logits padding. ### What changed? - Added a new `pad_to_env_actions` flag to `ActionProbsConfig` (default: `True`) - Implemented padding logic in `ActionProbs.forward_inference()` that adds `-inf` logits when the policy has fewer actions than the environment - Changed the action space mismatch error in `load_or_create_policy` to a warning message that suggests using the padding feature ### How to test? 1. Create a policy with fewer actions than the environment expects 2. Ensure `pad_to_env_actions=True` is set in the `ActionProbsConfig` 3. Verify that the policy loads and runs without errors 4. Check logs for the warning message about action space mismatch ### Why make this change? This change enables more flexibility when using policies across different environments or when using policies that only support a subset of the available actions. Instead of failing with an error when action spaces don't match exactly, the system can now pad the logits with `-inf` values, effectively giving zero probability to the extra actions while still maintaining compatibility. Co-authored-by: Axel K <ak@Axels-MacBook-Pro.local>
…nent (#4303) Updated packages/mettagrid/python/src/mettagrid/policy/mpt_policy.py so MptPolicy accepts a pad_action_space flag and forwards it to artifact.instantiate, enabling padded action-space loading when requested. - Why it broke: PR #4277 added pad_action_space handling inside MptArtifact.instantiate, and cogames submit now passes that flag when an environment has more actions than the checkpoint. Because MptPolicy.__init__ didn’t accept or forward the flag, Hydra/CLI instantiation raised TypeError: __init__() got an unexpected keyword argument 'pad_action_space', so the run failed before the padding logic could run. - How it happened: wrapper class got out of sync with the underlying artifact API during the PR; the new option was only wired into MptArtifact, not the public MptPolicy entry point used by CLI/config. Tests not run (small constructor change only). --------- Co-authored-by: Axel Kerbec <akerbec@umich.edu> Co-authored-by: Axel K <ak@Axels-MacBook-Pro.local>

TL;DR
Added support for policies with fewer actions than the environment expects by implementing action logits padding.
What changed?
pad_to_env_actionsflag toActionProbsConfig(default:True)ActionProbs.forward_inference()that adds-inflogits when the policy has fewer actions than the environmentload_or_create_policyto a warning message that suggests using the padding featureHow to test?
pad_to_env_actions=Trueis set in theActionProbsConfigWhy make this change?
This change enables more flexibility when using policies across different environments or when using policies that only support a subset of the available actions. Instead of failing with an error when action spaces don't match exactly, the system can now pad the logits with
-infvalues, effectively giving zero probability to the extra actions while still maintaining compatibility.