New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
guide on how to use LSTM version of DDPG on gym environments #562
Comments
When you use a recurrent model with DDPG, you need to
so that it uses a batch of sequences, not a batch of transitions, for updates. You can also specify the maximum length of the sequences by |
that works thanks! |
Hi @muupan just a quick follow up on how Does If so, does this mean if I had a multivariate time series problem, and I wanted to use LSTM in the networks, I could input the states as shape (1, N) where (sequence_length, number_of_features), or simply (N,) if sequence_length=1 anyway. So I ask this because in traditional sequential deep learning (without RL), a time series input to a LSTM will have sequence_length=5 for example (and 5 would equal Maybe besides explanation, you could point me to a paper on Any help would be much appreciated :) |
You don't need to change the shape of a state. If the shape of a state is (N,), DDPG internally concatenates states to make an input batch of states (minibatch_size, N) and feed it to the network. If |
@muupan thanks for the quick reply. To confirm my understanding, for an env with 2 observation features X and Y, if one episode has 3 time steps, and we run for 2 episodes, (X1, Y1) (X4, Y4) in FIFO order. And during learning, when sampling, say and just to highlight, as |
EpisodicReplayBuffer is FIFO in a sense that oldest episode in the buffer is discarded when the it hits the capacity limit. Sampling from it is random, not FIFO. The content of the two samples can be random, i.e., they can be as below. It is still guaranteed that one sample is taken from the first episode and the other from the second episode when it has only two episodes. |
although it picks the sub-sequence to sample is at random, looking at this line That means when sampling a sub-sequence, in say episode 2 (for the example above) the possible combinations are: (X4, Y4) and (X5, Y5) ONLY and in no situation would this |
Correct, it is always in order. |
Thanks @muupan you helped me a lot! Really appreciated your time ! :) |
Hi @muupan , may I ask where
|
I am trying to run DDPG with the gym Pendulum-v0 environment. However I am getting this error:
This is my code:
Here is the full initial running and error:
The text was updated successfully, but these errors were encountered: