Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A3C Basic Doom: effect of episode length (Discuss) #23

Open
IbrahimSobh opened this issue Mar 18, 2017 · 0 comments
Open

A3C Basic Doom: effect of episode length (Discuss) #23

IbrahimSobh opened this issue Mar 18, 2017 · 0 comments

Comments

@IbrahimSobh
Copy link

IbrahimSobh commented Mar 18, 2017

Hi

This is to discuss how the episode length may affect the learning process.

Case 1: The default as in the repo

Smoothed steady Reward is around 0.55 (see figure below)

game.set_episode_timeout(300)
doom_basic_all

Case 2: Shorter episode

game.set_episode_timeout(150)

Very similar to Case 1

doom_basic_episode_150

Case 3: very short episode

game.set_episode_timeout(70)

The agent should find the policy fast because it has very limited time window to explore.
Delayed convergence (after 500 episodes)

  • However reward is around 0.65 > case 1 (0.55) (see figure below)
    Why? convergence is delayed, but on the other hand, we have better rewards. I mean that, the agent usually accomplish the task in less time compared to case 1, or the agent is more efficient and focused compared to case 1) What do you think?!

doom_basic_episode_70

Case 4: Longer episode

game.set_episode_timeout(450)

smoothed reward is around 0.62
smoothed length is around 33
delayed convergence compared to case 1

doom_basic_episode_450

Case 5: each worker has its own length

Is it even a valid idea?!

Where: episode length = 75 + (number *25)

worker_0: episode length = 75
worker_1: episode length = 100
worker_2: episode length = 125
.
worker_7: episode length = 250

It seems that worker_0 with episode length 250 converged faster than worker_7 with episode length 75

The following figure includes all workers:
doom_basic_episode_75_to_250_all

The following figure includes only worker_0 (episode length = 75) and worker_7 (episode length = 250)
doom_basic_episode_75_to_250_two

However, all workers share the same global network, Do you think by having different episode lengths, could affect / enhance the learning? What to you think?

Again: Is it even a valid idea?!

Case 6: each worker has its own length, with lager range

Where: episode length = 100 + (number *50)

worker_0: episode length = 100
worker_1: episode length = 150
worker_2: episode length = 200
.
worker_7: episode length = 450

The following figure includes all workers:
doom_basic_episode_100_to_450_all

The following figure includes only worker_0 (episode length = 100), worker_4 (episode length = 300) worker_7 (episode length = 450)

doom_basic_episode_100_to_450_three

The longer the episode, the faster the learning

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant