Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

History is not updated with new game screen created after a terminal state is reached #48

Closed
hipoglucido opened this issue Mar 9, 2018 · 2 comments

Comments

@hipoglucido
Copy link

hipoglucido commented Mar 9, 2018

Hi. I am trying to understand the code and I came across what I think is a bug in:

def train(self):

It is related with the way the agent interacts with the environment: at the beginning of training the environment is reset via self.env.new_random_game() and afterwards the history is filled with the new random state via self.history.add(screen), which is needed because the agent always chooses its actions taking that history as input via action = self.predict(self.history.get()).

When a terminal state is reached a new random game is created but the new random state is not added to the history this time. This causes that the agent will use the terminal state of the last episode to decide which action to take in the first state of the new episode, which I think is wrong.

A way to fix it would be to add

for _ in range(self.history_length):
    self.history.add(screen)

after this line.

I don't know if fixing this would have any positive impact on performance since it only affects the first self.history_length steps of each episode but anyways I wanted to share it.

Thanks in advance.

@Richardxxxxxxx
Copy link

Oh, I currently notice the same issue. #59

After changing this, are you getting a better result?

Thank you very much.

@douglasrizzo
Copy link

Maybe if we did something like this

if terminal:
    screen, reward, action, terminal = self.env.new_random_game()
    for _ in range(self.history_length):
      self.history.add(screen)

we would mimic what happens when the agent is first initialized. We just fill its history with the first observation. I'm not sure if that's theoretically the way to go, though. I'll take a look at the paper and see if they mention anything about the history in the first state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants