Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disable env.reset() after every episode #66

Closed
zyzhang1130 opened this issue Feb 14, 2020 · 5 comments
Closed

disable env.reset() after every episode #66

zyzhang1130 opened this issue Feb 14, 2020 · 5 comments
Labels

Comments

@zyzhang1130
Copy link

Hi,
May I check if I would like to keep the environment as it is after each training episode, should I just comment line line 147 in main.py or should I also comment line 130? Besides what am I supposed to do if I just want to reset the agent's position but keep the environment as it is after each training episode?

Thank you.

@Kaixhin
Copy link
Owner

Kaixhin commented Feb 14, 2020

Commenting out line 147 would prevent the environment resetting during training. Commenting out line 130 would prevent the environment resetting during collection of data for validating Q-values. It seems that you might need to write a different environment and use a different set of functions if you want more control.

@Kaixhin Kaixhin closed this as completed Feb 14, 2020
@zyzhang1130
Copy link
Author

In general validating Q-values and training should be consistent right (if one is reset another should also)

@Kaixhin
Copy link
Owner

Kaixhin commented Feb 17, 2020

Yes it makes sense to keep them consistent.

@zyzhang1130
Copy link
Author

zyzhang1130 commented Feb 17, 2020

when I commented the reset at line 130 as well, it gave me this error:
AttributeError: 'NoneType' object has no attribute 'metadata'

when it ran line 132:
next_state, _, done = env.step(np.random.randint(0, action_space))

Does this line just let agent make a random movement and what could be the possible reasons for the error?

@Kaixhin
Copy link
Owner

Kaixhin commented Feb 17, 2020

That line uses random actions to collect data for validating Q-values. I'm not sure why your edit is causing the error, so you will need to try and debug it yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants