Reward is always 0 when training Breakout-v0 #10

NeymarL · 2017-08-21T02:19:13Z

I have trained the model a night on Breakout-v0, however the reward is always 0. What reasons may cause this situation? Or could you tell me what the parameters you are using when training to play Breakout-v0? Thank you. Here is the log file.
log.txt

dgriff777 · 2017-08-21T03:33:03Z

It looks like you are getting an error on all 4 training processes and the test training processes stays running but all 4 training threads terminated so its not learning anything. Some cuda error. You should not being using Cuda with how set up.

NeymarL · 2017-08-21T06:51:11Z

Yes, that's the problem! Thank you!
The cuda error has gone after I removed cuda support. But when I train it again with 3 workers, the reward still is zero all the time (maybe due to the training period is too short?). Could you give me some hints?
log.txt

dgriff777 · 2017-08-21T07:05:23Z

thats very short amount of training. Especially with 3 workers. Never tried with that few of workers. But I would estimate maybe 1hr30 till score to start going up on Breakout as it takes like 30mins for 16workers till the score starts really going up. With Breakout the game does not automatically restart after each life and needs to learn to press fire button to get game started up again.

NeymarL · 2017-08-21T07:11:14Z

Thanks. Then I am going to wait a few hours and see.....

NeymarL · 2017-08-21T11:20:48Z

Cool, it works! The reward starts to increase after 3 hours training! Thanks for your code and help!

dgriff777 · 2017-08-22T10:42:53Z

Yeah gonna take a while with only 3 workers. I actually would recommend using another algorithm if only gonna train with 3 workers as it is actually dentrimental to overall performance for a3c to have such few workers as well as making it very slow to train

NeymarL · 2017-08-22T12:58:45Z

What algorithms suit for training efficiently with 3 workers?

dgriff777 · 2017-08-23T16:21:45Z

I would first just recommend using more workers but if thats not doable then it really depends on what you are looking to accomplish/learn to decide on best algorithm for your criteria

NeymarL closed this as completed Aug 21, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reward is always 0 when training Breakout-v0 #10

Reward is always 0 when training Breakout-v0 #10

NeymarL commented Aug 21, 2017

dgriff777 commented Aug 21, 2017 •

edited

NeymarL commented Aug 21, 2017

dgriff777 commented Aug 21, 2017 •

edited

NeymarL commented Aug 21, 2017

NeymarL commented Aug 21, 2017

dgriff777 commented Aug 22, 2017 •

edited

NeymarL commented Aug 22, 2017

dgriff777 commented Aug 23, 2017

Reward is always 0 when training Breakout-v0 #10

Reward is always 0 when training Breakout-v0 #10

Comments

NeymarL commented Aug 21, 2017

dgriff777 commented Aug 21, 2017 • edited

NeymarL commented Aug 21, 2017

dgriff777 commented Aug 21, 2017 • edited

NeymarL commented Aug 21, 2017

NeymarL commented Aug 21, 2017

dgriff777 commented Aug 22, 2017 • edited

NeymarL commented Aug 22, 2017

dgriff777 commented Aug 23, 2017

dgriff777 commented Aug 21, 2017 •

edited

dgriff777 commented Aug 21, 2017 •

edited

dgriff777 commented Aug 22, 2017 •

edited