New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Confusion in few terms used in Hyper parameters in yaml file. #2252
Comments
Hi @Junggy 1 - Processing means to go through one iteration of observation, action, and updating the reward. |
Thank you for quick answer. @unityjeffrey Hmmmm okay you got my 5th question wrong. So did you define buffer_size amount of experience as one full dataset (one epoch) ???? So how it works internally is : every time buffer_size is filled, it calculates gradient with batch_size amount of experience "buffer_size / batch_size" times then post process (i.e. averaging gradient) and update to model. continue this with n_epoch times before collecting new buffer_size amount of experiences. is this correct ? regarding Time horizon, "time_horizon corresponds to how many steps of experience to collect per-agent before adding it to the 1) experience buffer. 2) When this limit is reached before the end of an episode, a value estimate is used to predict the overall expected reward from the agent's current state" 1). experience buffer is mentioned in here. is this experience buffer means same buffer as buffer with Buffer Size (I really don't get this part. what is relation between "Time Horizon" and "Batch Size" & "Buffer Size" & "Num Epoch"? )
|
thanks I also find this all confusing so it would be good to clarify. |
cc: @xiaomaogy |
Hi all. Let me try to clarify: Time horizon is how many steps of experience to collect in a single trajectory before calculating the discounted returns and advantages for that trajectory, and then adding it to the buffer. The buffer size is how big this buffer of trajectories can get before we use it for training. Once the buffer reaches this size, we then go through it (num_epoch) number of times, taking a random (batch_size) size batch at a time. After the epochs of training, we then clear the buffer and start filling it again from scratch. |
@awjuliani Thanks for the answer. Pretty much all makes sense. you said So one unit of buffer is not single experience, but time-horizon amount of experiences (means single trajectory)? (i.e. buffer : buffer_size * single_trajectory (means time_horizon * experiences), not buffer_size * single_experience) Something like, |
The buffer consists of single experiences, and the buffer size corresponds to number of experiences. When a trajectory is added, it is added as single experiences, not as a whole unit. That being said, when dealing with LSTMs the experiences from trajectories are kept in temporal order, so they can be re-used during training. |
could you define the word "experience". Does an experience include all the vector observations from a single timestep. So 1 experience might consist of 40 vector observations for 1 game, but it might consist of 100 vector observations for another game. Also could you define the word "trajectory". |
@awjuliani thanks, its getting more clear! This is what I understood at last. can you clarify whether its wrong or not? let's say, time_horizon = 4, buffer_size= 4, batch_size=2, n_epoch=2
------------------- repeat until buffer is filled ------------------------
...... -------------------------- buffer is filled --------------------------- calculate gradient with batch_size number of samples in buffer (i.e. sample size = 2) empty everything and start over again Is this correct? |
I think experience is everything you received from unity after action, like dictionary. |
This should totally be added to the docs, Thanks for asking and answering. |
Thank you for the discussion. We are closing this issue due to inactivity. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Hello,
I am bit confused with some hyper parameters used in *.yaml file.
(Experiences, Time Horizon, Batch Size, Buffer Size, Num Epoch)
according to documentation (https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-PPO.md) these are what I understood so far :
1. Experience : collected agent's [observations, actions, rewards] per step. (But what does it meant by processing the experiences?)
2. Time Horizon : How many experiences to collect to be used in value estimate
3. Batch Size : How many experiences to use single Gradient
4. Buffer Size : How many Gradient should processed (i.e. averaging multiple gradient) before actually updating model - here, buffer_size = n_gradient * batch size
5. Num Epoch : This one, I don't understand what does it meant by number of passes through the experience buffer during gradient descent. can you give me some detailed explanation? or some reference if it is available?
If there is any misunderstanding / anything wrong, please correct me and give me some detailed explanation.
Thanks in advance.
The text was updated successfully, but these errors were encountered: