Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reward is always 0 when training Breakout-v0 #10

Closed
NeymarL opened this issue Aug 21, 2017 · 8 comments
Closed

Reward is always 0 when training Breakout-v0 #10

NeymarL opened this issue Aug 21, 2017 · 8 comments

Comments

@NeymarL
Copy link

NeymarL commented Aug 21, 2017

I have trained the model a night on Breakout-v0, however the reward is always 0. What reasons may cause this situation? Or could you tell me what the parameters you are using when training to play Breakout-v0? Thank you. Here is the log file.
log.txt

@dgriff777
Copy link
Owner

dgriff777 commented Aug 21, 2017

It looks like you are getting an error on all 4 training processes and the test training processes stays running but all 4 training threads terminated so its not learning anything. Some cuda error. You should not being using Cuda with how set up.

@NeymarL
Copy link
Author

NeymarL commented Aug 21, 2017

Yes, that's the problem! Thank you!
The cuda error has gone after I removed cuda support. But when I train it again with 3 workers, the reward still is zero all the time (maybe due to the training period is too short?). Could you give me some hints?
log.txt

@dgriff777
Copy link
Owner

dgriff777 commented Aug 21, 2017

thats very short amount of training. Especially with 3 workers. Never tried with that few of workers. But I would estimate maybe 1hr30 till score to start going up on Breakout as it takes like 30mins for 16workers till the score starts really going up. With Breakout the game does not automatically restart after each life and needs to learn to press fire button to get game started up again.

@NeymarL
Copy link
Author

NeymarL commented Aug 21, 2017

Thanks. Then I am going to wait a few hours and see.....

@NeymarL
Copy link
Author

NeymarL commented Aug 21, 2017

Cool, it works! The reward starts to increase after 3 hours training! Thanks for your code and help!

@NeymarL NeymarL closed this as completed Aug 21, 2017
@dgriff777
Copy link
Owner

dgriff777 commented Aug 22, 2017

Yeah gonna take a while with only 3 workers. I actually would recommend using another algorithm if only gonna train with 3 workers as it is actually dentrimental to overall performance for a3c to have such few workers as well as making it very slow to train

@NeymarL
Copy link
Author

NeymarL commented Aug 22, 2017

What algorithms suit for training efficiently with 3 workers?

@dgriff777
Copy link
Owner

I would first just recommend using more workers but if thats not doable then it really depends on what you are looking to accomplish/learn to decide on best algorithm for your criteria

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants