You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is to discuss how the episode length may affect the learning process.
Case 1: The default as in the repo
Smoothed steady Reward is around 0.55 (see figure below)
game.set_episode_timeout(300)
Case 2: Shorter episode
game.set_episode_timeout(150)
Very similar to Case 1
Case 3: very short episode
game.set_episode_timeout(70)
The agent should find the policy fast because it has very limited time window to explore.
Delayed convergence (after 500 episodes)
However reward is around 0.65 > case 1 (0.55) (see figure below) Why? convergence is delayed, but on the other hand, we have better rewards. I mean that, the agent usually accomplish the task in less time compared to case 1, or the agent is more efficient and focused compared to case 1) What do you think?!
Case 4: Longer episode
game.set_episode_timeout(450)
smoothed reward is around 0.62
smoothed length is around 33
delayed convergence compared to case 1
It seems that worker_0 with episode length 250 converged faster than worker_7 with episode length 75
The following figure includes all workers:
The following figure includes only worker_0 (episode length = 75) and worker_7 (episode length = 250)
However, all workers share the same global network, Do you think by having different episode lengths, could affect / enhance the learning? What to you think?
Again: Is it even a valid idea?!
Case 6: each worker has its own length, with lager range
Hi
This is to discuss how the episode length may affect the learning process.
Case 1: The default as in the repo
Smoothed steady Reward is around 0.55 (see figure below)
game.set_episode_timeout(300)
Case 2: Shorter episode
game.set_episode_timeout(150)
Very similar to Case 1
Case 3: very short episode
game.set_episode_timeout(70)
The agent should find the policy fast because it has very limited time window to explore.
Delayed convergence (after 500 episodes)
Why? convergence is delayed, but on the other hand, we have better rewards. I mean that, the agent usually accomplish the task in less time compared to case 1, or the agent is more efficient and focused compared to case 1) What do you think?!
Case 4: Longer episode
game.set_episode_timeout(450)
smoothed reward is around 0.62
smoothed length is around 33
delayed convergence compared to case 1
Case 5: each worker has its own length
Is it even a valid idea?!
Where: episode length = 75 + (number *25)
worker_0: episode length = 75
worker_1: episode length = 100
worker_2: episode length = 125
.
worker_7: episode length = 250
It seems that worker_0 with episode length 250 converged faster than worker_7 with episode length 75
The following figure includes all workers:
The following figure includes only worker_0 (episode length = 75) and worker_7 (episode length = 250)
However, all workers share the same global network, Do you think by having different episode lengths, could affect / enhance the learning? What to you think?
Again: Is it even a valid idea?!
Case 6: each worker has its own length, with lager range
Where: episode length = 100 + (number *50)
worker_0: episode length = 100
worker_1: episode length = 150
worker_2: episode length = 200
.
worker_7: episode length = 450
The following figure includes all workers:
The following figure includes only worker_0 (episode length = 100), worker_4 (episode length = 300) worker_7 (episode length = 450)
The longer the episode, the faster the learning
The text was updated successfully, but these errors were encountered: