Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change end-of-episode in CarRacing to termination as opposed to truncation #813

Merged
merged 5 commits into from
Aug 9, 2024

Conversation

RedTachyon
Copy link
Member

There's been some heated debate about this, I'm still open to have my mind changed, but imo with the new termination-truncation split, this is a clear example of a termination as opposed to truncation.

Truncation is meant to represent hitting the "externally imposed" time limit as opposed to actually reaching a terminal state. When the car does the laps that it needs, the episode should be terminated.

To give some more information, I also added an info entry describing the reason for termination (whether or not the lap was finished)

This also requires a version bump, since implementations that correctly handle termination/truncation may converge to different policies

@pseudo-rnd-thoughts
Copy link
Member

There are a couple of options which we need to plot the training curves for

  1. Do nothing, keep the buggy truncation case
  2. Fix the truncation testing to check that the agent truncates correctly after 2 (or more) laps (add the truncation lap number as an environment parameter, for later) - https://pypi.org/project/ufal.pybox2d/
  3. Change to termination after a single lap

@Kallinteris-Andreas
Copy link
Collaborator

For reference, this is the initial discussion: #106

Here is my analysis, giving a truncation signal after finishing the LAP would imply that the environment is not over, that there would be more LAPs left to complete, while giving a termination signal would indicate that nothing matters after completing the LAP, for example crushing the car afterwords would be totally fine (realistically that is not going to happen).

Learning performance wise, I suspect both options would be really close regardless.

@pseudo-rnd-thoughts
Copy link
Member

@RedTachyon Any chance you can run some training plots for this to measure the change?
Can you update the changelog for the environment, otherwise good with me

@RedTachyon
Copy link
Member Author

Re: changelong - do you mean in the docstring? That's already done. The docs site page (I think) gets automatically generated from that.

Re: experiments - probably too much effort for the benefit. The impact would likely be visible in the qualitative behavior of the agent towards the very end of the race, and that's contingent on the implementation correctly accounting for the termination/truncation split

@pseudo-rnd-thoughts pseudo-rnd-thoughts merged commit c9e2957 into main Aug 9, 2024
23 checks passed
@pseudo-rnd-thoughts pseudo-rnd-thoughts deleted the carracing-termination branch August 30, 2024 09:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants