Confusion in few terms used in Hyper parameters in yaml file. #2252

Junggy · 2019-07-12T12:10:46Z

Hello,

I am bit confused with some hyper parameters used in *.yaml file.
(Experiences, Time Horizon, Batch Size, Buffer Size, Num Epoch)
according to documentation (https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-PPO.md) these are what I understood so far :

1. Experience : collected agent's [observations, actions, rewards] per step. (But what does it meant by processing the experiences?)
2. Time Horizon : How many experiences to collect to be used in value estimate
3. Batch Size : How many experiences to use single Gradient
4. Buffer Size : How many Gradient should processed (i.e. averaging multiple gradient) before actually updating model - here, buffer_size = n_gradient * batch size
5. Num Epoch : This one, I don't understand what does it meant by number of passes through the experience buffer during gradient descent. can you give me some detailed explanation? or some reference if it is available?

If there is any misunderstanding / anything wrong, please correct me and give me some detailed explanation.

Thanks in advance.

shihzy · 2019-07-12T16:07:09Z

Hi @Junggy

1 - Processing means to go through one iteration of observation, action, and updating the reward.
2 - 4 Not sure on what you mean by your questions, can you elaborate?
5. This article does a good job explaining Epochs and batches for gradient descent https://machinelearningmastery.com/difference-between-a-batch-and-an-epoch/

Junggy · 2019-07-12T16:54:26Z

Thank you for quick answer. @unityjeffrey

Hmmmm okay you got my 5th question wrong.
I know what is batch and epoch completely. I just don't understand what does that sentence mean and especially how the epoch is designed in this case.

So did you define buffer_size amount of experience as one full dataset (one epoch) ????

So how it works internally is : every time buffer_size is filled, it calculates gradient with batch_size amount of experience "buffer_size / batch_size" times then post process (i.e. averaging gradient) and update to model. continue this with n_epoch times before collecting new buffer_size amount of experiences.

is this correct ?

regarding Time horizon,
it is written that

"time_horizon corresponds to how many steps of experience to collect per-agent before adding it to the 1) experience buffer. 2) When this limit is reached before the end of an episode, a value estimate is used to predict the overall expected reward from the agent's current state"

1). experience buffer is mentioned in here. is this experience buffer means same buffer as buffer with Buffer Size (I really don't get this part. what is relation between "Time Horizon" and "Batch Size" & "Buffer Size" & "Num Epoch"? )
So experience buffer takes time-horizon number of experiences as one unit and it experience buffer takes unit * buffer size amount as one epoch ?????
or experience buffer mentioned here is different kind of buffer?

here now it says buffer number in reached its limit it calculates value estimate. So when the Buffer Size is filled, that experiences are used to calculate value estimate ? also the layer to calculate value estimate is updated n_epoch time?

mattinjersey · 2019-07-13T03:22:48Z

thanks I also find this all confusing so it would be good to clarify.

shihzy · 2019-07-15T15:40:44Z

cc: @xiaomaogy

xiaomaogy · 2019-07-15T17:48:37Z

@awjuliani

awjuliani · 2019-07-15T18:24:04Z

Hi all. Let me try to clarify:

Time horizon is how many steps of experience to collect in a single trajectory before calculating the discounted returns and advantages for that trajectory, and then adding it to the buffer.

The buffer size is how big this buffer of trajectories can get before we use it for training. Once the buffer reaches this size, we then go through it (num_epoch) number of times, taking a random (batch_size) size batch at a time. After the epochs of training, we then clear the buffer and start filling it again from scratch.

Junggy · 2019-07-16T09:05:37Z

@awjuliani Thanks for the answer.

Pretty much all makes sense.
But just one last thing. This is what I was always confused about.

you said
"Time horizon is how many steps of experience to collect in a single trajectory** before calculating the discounted returns and advantages for that trajectory, and then adding it to the buffer."

So one unit of buffer is not single experience, but time-horizon amount of experiences (means single trajectory)? (i.e. buffer : buffer_size * single_trajectory (means time_horizon * experiences), not buffer_size * single_experience)

Something like,
Buffer is filled with buffer_size amount of trajectories, trajectories is filled with time_horizon amount of experiences. right?

awjuliani · 2019-07-16T17:29:01Z

@Junggy

The buffer consists of single experiences, and the buffer size corresponds to number of experiences. When a trajectory is added, it is added as single experiences, not as a whole unit. That being said, when dealing with LSTMs the experiences from trajectories are kept in temporal order, so they can be re-used during training.

mattinjersey · 2019-07-16T21:44:38Z

could you define the word "experience". Does an experience include all the vector observations from a single timestep. So 1 experience might consist of 40 vector observations for 1 game, but it might consist of 100 vector observations for another game.

Also could you define the word "trajectory".

Junggy · 2019-07-18T09:36:26Z

@awjuliani thanks, its getting more clear!

This is what I understood at last. can you clarify whether its wrong or not?

let's say, time_horizon = 4, buffer_size= 4, batch_size=2, n_epoch=2

time_horizon amount of experience is gathered, and discounted advantage is calculated - trajectory
(i.e. time_horizon_buffer = [exp_1, exp_2, exp_3, exp_4] -> adv_1)
first experience and advantage goes into buffer.
(i.e. buffer : [(exp_1,adv_1)]
discard first experience in time_horizon buffer
(i.e. time_horizon_buffer = [exp_2, exp_3, exp_4]

------------------- repeat until buffer is filled ------------------------
i.e.

new experience received, next discounted advantage is calculated - new trajectory
(i.e. time_horizon_buffer = [exp_2, exp_3, exp_4, exp_5] -> adv_2)
first experience and advantage goes into buffer.
(i.e. buffer : [(exp_1,adv_1),(exp_2,adv_2)]
discard_first experience in the time_horizon_buffer
(i.e time_horizon_buffer = [exp_3, exp_4, exp_5]

......
......
...... repeated 2 more times

-------------------------- buffer is filled ---------------------------
(i.e. buffer = : [(exp_1,adv_1),(exp_2,adv_2),(exp_3,adv_3),(exp_4,adv_4)])

calculate gradient with batch_size number of samples in buffer (i.e. sample size = 2)
repeat until it goes through all samples in buffer (i.e. 4/2 = 2 times)
repeat this n_epoch times (i.e. 2 times)

empty everything and start over again
(time_horizon_buffer = [], buffer = [])

Is this correct?
Thanks in advance!

Junggy · 2019-07-18T09:39:41Z

@mattinjersey

I think experience is everything you received from unity after action, like dictionary.
like, exp = {observation : some_vector, visual_observation : some_images, reward : some_scalar}
something like this
and seems like it will be changed depending on the setup.

Nanocentury · 2019-07-20T09:02:50Z

This should totally be added to the docs, Thanks for asking and answering.

vincentpierre · 2019-08-19T17:55:55Z

Thank you for the discussion. We are closing this issue due to inactivity.

github-actions · 2021-01-31T00:24:03Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Junggy added the discussion Issue contains general discussion. label Jul 12, 2019

shihzy self-assigned this Jul 12, 2019

vincentpierre closed this as completed Aug 19, 2019

github-actions bot locked as resolved and limited conversation to collaborators Jan 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusion in few terms used in Hyper parameters in yaml file. #2252

Confusion in few terms used in Hyper parameters in yaml file. #2252

Junggy commented Jul 12, 2019

shihzy commented Jul 12, 2019

Junggy commented Jul 12, 2019

mattinjersey commented Jul 13, 2019

shihzy commented Jul 15, 2019

xiaomaogy commented Jul 15, 2019

awjuliani commented Jul 15, 2019

Junggy commented Jul 16, 2019 •

edited

awjuliani commented Jul 16, 2019

mattinjersey commented Jul 16, 2019 •

edited

Junggy commented Jul 18, 2019 •

edited

Junggy commented Jul 18, 2019

Nanocentury commented Jul 20, 2019

vincentpierre commented Aug 19, 2019

github-actions bot commented Jan 31, 2021

Confusion in few terms used in Hyper parameters in yaml file. #2252

Confusion in few terms used in Hyper parameters in yaml file. #2252

Comments

Junggy commented Jul 12, 2019

shihzy commented Jul 12, 2019

Junggy commented Jul 12, 2019

mattinjersey commented Jul 13, 2019

shihzy commented Jul 15, 2019

xiaomaogy commented Jul 15, 2019

awjuliani commented Jul 15, 2019

Junggy commented Jul 16, 2019 • edited

awjuliani commented Jul 16, 2019

mattinjersey commented Jul 16, 2019 • edited

Junggy commented Jul 18, 2019 • edited

Junggy commented Jul 18, 2019

Nanocentury commented Jul 20, 2019

vincentpierre commented Aug 19, 2019

github-actions bot commented Jan 31, 2021

Junggy commented Jul 16, 2019 •

edited

mattinjersey commented Jul 16, 2019 •

edited

Junggy commented Jul 18, 2019 •

edited