You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some of the claims in the notebook about training graphs looking the same don't seem to be quite right. For example, the PBRS wrapper that aims for 'zero' value does, in fact, make an improvement in the small lake training. More confusingly, the 'initializing Q table' training and PBRS training don't seem to be the same- even though a paper claims they should be identical. Is this a seeding issue only or something more?
The text was updated successfully, but these errors were encountered:
Some of the claims in the notebook about training graphs looking the same don't seem to be quite right. For example, the PBRS wrapper that aims for 'zero' value does, in fact, make an improvement in the small lake training. More confusingly, the 'initializing Q table' training and PBRS training don't seem to be the same- even though a paper claims they should be identical. Is this a seeding issue only or something more?
The text was updated successfully, but these errors were encountered: