Skip to content

Conversation

@riccardosavorgnan
Copy link
Collaborator

@riccardosavorgnan riccardosavorgnan commented Dec 5, 2025

This PR introduces the ability to reset an environment once all agents have respawned at least once. The behavior is controlled through the termination_mode parameter (0 for "Perform H steps, where H is the set max episode length", 1 for "Terminate once all agents have respawned").

The change improves* convergence speed in early training stages and provides slightly better asymptotic performance.

Screenshot 2025-12-13 at 13 30 23

*based on the limited number of experiments ran so far.

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (2)

  1. pufferlib/ocean/drive/drive.py, line 253 (link)

    logic: termination_mode parameter missing in resampling env_init call (line 234-260), but present in initial call (line 167-189)

  2. pufferlib/resources/drive/puffer_drive_weights.bin

    logic: This 2.4MB binary file appears to have been accidentally deleted - it exists on the base branch but is unrelated to the early termination feature

5 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

offroad_behavior=self.offroad_behavior,
dt=dt,
scenario_length=(int(scenario_length) if scenario_length is not None else None),
termination_mode=int(self.termination_mode),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: int(None) raises TypeError when termination_mode is not provided

Suggested change
termination_mode=int(self.termination_mode),
termination_mode=(int(self.termination_mode) if self.termination_mode is not None else 0),
Prompt To Fix With AI
This is a comment left during a code review.
Path: pufferlib/ocean/drive/drive.py
Line: 184:184

Comment:
**logic:** `int(None)` raises TypeError when `termination_mode` is not provided

```suggestion
                termination_mode=(int(self.termination_mode) if self.termination_mode is not None else 0),
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link

@daphne-cornelisse daphne-cornelisse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@Emerge-Lab Emerge-Lab deleted a comment from greptile-apps bot Dec 6, 2025
@eugenevinitsky
Copy link

@riccardosavorgnan a note, though it shouldn't block merging. When you suddenly reset the environment like this, it means that the agents don't know when the environment is going to end so it converts it into a problem with a stochastic, unobservable terminal condition. This may work better if we add value truncation when that happens.

@daphne-cornelisse
Copy link

@riccardosavorgnan a note, though it shouldn't block merging. When you suddenly reset the environment like this, it means that the agents don't know when the environment is going to end so it converts it into a problem with a stochastic, unobservable terminal condition. This may work better if we add value truncation when that happens.

Agree that we should add value truncation. Just wanted to comment that I think the agents currently don’t know when the environment is going to end either.

@daphne-cornelisse daphne-cornelisse merged commit 19b5eb6 into main Dec 13, 2025
14 checks passed
@daphne-cornelisse daphne-cornelisse deleted the early-termination-ricky branch December 13, 2025 23:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants