Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are you sure Deep recurrent notebook is correct? #58

Open
Joshuaalbert opened this issue Jan 13, 2018 · 2 comments
Open

Are you sure Deep recurrent notebook is correct? #58

Joshuaalbert opened this issue Jan 13, 2018 · 2 comments

Comments

@Joshuaalbert
Copy link

Joshuaalbert commented Jan 13, 2018

In the notebook I don't see where your recurrent Q value model gets its trace dimension. You're just reshaping the output of a convnet and feeding this directly into an LSTM. Furthermore, should you not also provide the non-zero initial state determined at play time? I.e. the internal state should be stored in the experience buffer and used during training. Corrent me if I'm wrong please.

@Joshuaalbert Joshuaalbert changed the title Are you sure this is correct? Are you sure Deep recurrent notebook is correct? Jan 13, 2018
@geonyeong-park
Copy link

Yap i thought in similar way but turned that the code seems work correctly.

  1. Reshaping issue
    Here batch_size, trace_length are set to 4,8. Each Qnetwork object(main, target) receives batchtrace=32 frames. After conv4, dimension are turned into (32, 1, 1, 512) = (batchtrace, w, h, hidden units).
  2. Non-zero H0 is iteratively updated and given to feed_dict[network.state]. This state is 'last hidden state' returned by each LSTM forward passing.

@Michaeljurado42
Copy link

I had another thought. Isn't it unnecessary to have a target network for this notebook in the first place? Since you are setting the target network to be equal to the mainDQN right before training?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants