-
Notifications
You must be signed in to change notification settings - Fork 13
Early environment resets based on agents' respawn status. #167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (2)
-
pufferlib/ocean/drive/drive.py, line 253 (link)logic:
termination_modeparameter missing in resamplingenv_initcall (line 234-260), but present in initial call (line 167-189) -
pufferlib/resources/drive/puffer_drive_weights.binlogic: This 2.4MB binary file appears to have been accidentally deleted - it exists on the base branch but is unrelated to the early termination feature
5 files reviewed, 3 comments
pufferlib/ocean/drive/drive.py
Outdated
| offroad_behavior=self.offroad_behavior, | ||
| dt=dt, | ||
| scenario_length=(int(scenario_length) if scenario_length is not None else None), | ||
| termination_mode=int(self.termination_mode), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: int(None) raises TypeError when termination_mode is not provided
| termination_mode=int(self.termination_mode), | |
| termination_mode=(int(self.termination_mode) if self.termination_mode is not None else 0), |
Prompt To Fix With AI
This is a comment left during a code review.
Path: pufferlib/ocean/drive/drive.py
Line: 184:184
Comment:
**logic:** `int(None)` raises TypeError when `termination_mode` is not provided
```suggestion
termination_mode=(int(self.termination_mode) if self.termination_mode is not None else 0),
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
… early-termination-ricky
|
@riccardosavorgnan a note, though it shouldn't block merging. When you suddenly reset the environment like this, it means that the agents don't know when the environment is going to end so it converts it into a problem with a stochastic, unobservable terminal condition. This may work better if we add value truncation when that happens. |
Agree that we should add value truncation. Just wanted to comment that I think the agents currently don’t know when the environment is going to end either. |
… early-termination-ricky
…ab/PufferDrive into early-termination-ricky
…for at least num_agents.
… early-termination-ricky
This PR introduces the ability to reset an environment once all agents have respawned at least once. The behavior is controlled through the termination_mode parameter (0 for "Perform H steps, where H is the set max episode length", 1 for "Terminate once all agents have respawned").
The change improves* convergence speed in early training stages and provides slightly better asymptotic performance.
*based on the limited number of experiments ran so far.