New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Great work! Been having a bunch of fun. #183

Closed
mikecann opened this Issue Dec 18, 2017 · 26 comments

Comments

Projects
None yet
6 participants
@mikecann

mikecann commented Dec 18, 2017

This isnt really an issue just a thankyou for this awesome lib.

I have started a blog post series where I eventually want to be able to train an agent to play one of my previous games (Mr Nibbles Forever).

https://mikecann.co.uk/machine-learning/a-game-developer-learns-machine-learning-a-little-deeper/

For now I have only managed to get a very simple 1d agent going but I plan on adding more complexity as I go along.

Anyways feel free to close this issue, just wanted to say thanks.

@MarcoMeter

This comment has been minimized.

Show comment
Hide comment
@MarcoMeter

MarcoMeter Dec 18, 2017

Contributor

Mr Nibbles Forever looks like a pretty cool use-case. Coming up with a suitable state space sounds like a major challenge to me. Curriculum Learning should be a pretty helpful addition ones you face the entire complexity of your game.

Contributor

MarcoMeter commented Dec 18, 2017

Mr Nibbles Forever looks like a pretty cool use-case. Coming up with a suitable state space sounds like a major challenge to me. Curriculum Learning should be a pretty helpful addition ones you face the entire complexity of your game.

@mikecann

This comment has been minimized.

Show comment
Hide comment
@mikecann

mikecann Dec 18, 2017

@MarcoMeter My original plan was Mr Nibbles Forever, here is a quick video of the gameplay: https://www.youtube.com/watch?v=vO6mjWDz5RM

But I have thought on it more and I think it would be simpler (to begin with at least) to try to train an agent on its predecessor game Mr Nibbles:

https://youtu.be/lyAf7VVLdKg?t=25

Its a simpler puzzle game instead of an endless runner, it may be easier to do Curriculum Learning on that instead.

But yes I was wondering about state space. I was thinking that it might have to be a camera feed from the game.. thoughts?

mikecann commented Dec 18, 2017

@MarcoMeter My original plan was Mr Nibbles Forever, here is a quick video of the gameplay: https://www.youtube.com/watch?v=vO6mjWDz5RM

But I have thought on it more and I think it would be simpler (to begin with at least) to try to train an agent on its predecessor game Mr Nibbles:

https://youtu.be/lyAf7VVLdKg?t=25

Its a simpler puzzle game instead of an endless runner, it may be easier to do Curriculum Learning on that instead.

But yes I was wondering about state space. I was thinking that it might have to be a camera feed from the game.. thoughts?

@MarcoMeter

This comment has been minimized.

Show comment
Hide comment
@MarcoMeter

MarcoMeter Dec 18, 2017

Contributor

@mikecann

I agree on using a camera image as input. I'd setup a different camera for the agent to exclude the background image. In the end, the image fed to the brain should be pretty abstract (grayscale and maybe less than 64x64 pixels). As this repository's PPO implementation does not feature frame stacking yet, the state space could be extended by the current velocity of the agent.

Contributor

MarcoMeter commented Dec 18, 2017

@mikecann

I agree on using a camera image as input. I'd setup a different camera for the agent to exclude the background image. In the end, the image fed to the brain should be pretty abstract (grayscale and maybe less than 64x64 pixels). As this repository's PPO implementation does not feature frame stacking yet, the state space could be extended by the current velocity of the agent.

@mikecann

This comment has been minimized.

Show comment
Hide comment
@mikecann

mikecann Dec 18, 2017

@MarcoMeter oh cool, thanks for those tips. Not sure what frame stacking is yet so I will have a look into it.

The original mr nibbles is actually a grid based game (apart from mr nibbles himself) so there is potential I might not need to use the camera.

mikecann commented Dec 18, 2017

@MarcoMeter oh cool, thanks for those tips. Not sure what frame stacking is yet so I will have a look into it.

The original mr nibbles is actually a grid based game (apart from mr nibbles himself) so there is potential I might not need to use the camera.

@MarcoMeter

This comment has been minimized.

Show comment
Hide comment
@MarcoMeter

MarcoMeter Dec 18, 2017

Contributor

@mikecann Frame stacking deals with feed the current frame and n frames of the past to the neural net. So having the current and the past frame, the agent could derive the velocity from that input for example.

Contributor

MarcoMeter commented Dec 18, 2017

@mikecann Frame stacking deals with feed the current frame and n frames of the past to the neural net. So having the current and the past frame, the agent could derive the velocity from that input for example.

@Fangh

This comment has been minimized.

Show comment
Hide comment
@Fangh

Fangh Dec 18, 2017

Thank you @mikecann for this blog post. I think I will read it and use it to make my first ML agent

Fangh commented Dec 18, 2017

Thank you @mikecann for this blog post. I think I will read it and use it to make my first ML agent

@awjuliani

This comment has been minimized.

Show comment
Hide comment
@awjuliani

awjuliani Dec 18, 2017

Collaborator

Very cool blog @mikecann!

As for state-space, there are three main possible approaches:

  1. Encode everything relevant into the state vector. For simple environment this works well, but it doesn't scale with dynamic numbers of objects within a scene.
  2. Use the camera as an observation. This captures everything relevant, but is harder to learn from, and also, since we don't yet have frame-stacking, the agent doesn't learn the important temporal information (like velocity).
  3. Use ray-casting or similar methods to capture all relevant objects close to the agent. This combines the "directness" of 1 with the "perception" of 2. Of course, it currently suffers from the same issue of not containing temporal information as 2, but we are working on an automated way to ask for "past 3 states" as your input to the network. You could also code something like this yourself, where CollectState() keeps track of the last 3 states itself, and passes them in a big vector.

Hope that helps! Definitely looking forward to seeing how things progress.

Collaborator

awjuliani commented Dec 18, 2017

Very cool blog @mikecann!

As for state-space, there are three main possible approaches:

  1. Encode everything relevant into the state vector. For simple environment this works well, but it doesn't scale with dynamic numbers of objects within a scene.
  2. Use the camera as an observation. This captures everything relevant, but is harder to learn from, and also, since we don't yet have frame-stacking, the agent doesn't learn the important temporal information (like velocity).
  3. Use ray-casting or similar methods to capture all relevant objects close to the agent. This combines the "directness" of 1 with the "perception" of 2. Of course, it currently suffers from the same issue of not containing temporal information as 2, but we are working on an automated way to ask for "past 3 states" as your input to the network. You could also code something like this yourself, where CollectState() keeps track of the last 3 states itself, and passes them in a big vector.

Hope that helps! Definitely looking forward to seeing how things progress.

@mikecann

This comment has been minimized.

Show comment
Hide comment
@mikecann

mikecann Dec 19, 2017

@awjuliani thankyou so much for that excellent advice! Solution 3 sounds very interesting.

I was wondering why cant I just pass the velocity vector in as another state property?

Another solution is that because the original mr nibbles' world is laid out on a grid I could just encode that 2D world in states. This is exactly how I built the maps for the original game:

image

I could represent the state of the level in a 100x100 grid where each cell could be a number of different "types" (black is solid floor, yellow is a nibble, blue is a spider etc). Then I would also pass in the position of mr nibbles and his velocity and a few other things. Im thinking that might do it.

I dont think I would even have to supply collision information as the NN should just be able to learn the mapping between the relative states.

So im thinking something like 100x100x(number of cell types) + position + velocity + is-in-air .. so something like 80,003 states.. Im thinking this would take a long time to train?

mikecann commented Dec 19, 2017

@awjuliani thankyou so much for that excellent advice! Solution 3 sounds very interesting.

I was wondering why cant I just pass the velocity vector in as another state property?

Another solution is that because the original mr nibbles' world is laid out on a grid I could just encode that 2D world in states. This is exactly how I built the maps for the original game:

image

I could represent the state of the level in a 100x100 grid where each cell could be a number of different "types" (black is solid floor, yellow is a nibble, blue is a spider etc). Then I would also pass in the position of mr nibbles and his velocity and a few other things. Im thinking that might do it.

I dont think I would even have to supply collision information as the NN should just be able to learn the mapping between the relative states.

So im thinking something like 100x100x(number of cell types) + position + velocity + is-in-air .. so something like 80,003 states.. Im thinking this would take a long time to train?

@MarcoMeter

This comment has been minimized.

Show comment
Hide comment
@MarcoMeter

MarcoMeter Dec 19, 2017

Contributor

DeePtraffic is a great example for the cell-like approach.

There are two things to keep in mind:

  • start on a low scale and increase complexity as results improve (like you already did)
  • just take an excerpt of the nibble environment, because if you take the whole level your trained model will be just good at that level and thus can't generalize for other levels
Contributor

MarcoMeter commented Dec 19, 2017

DeePtraffic is a great example for the cell-like approach.

There are two things to keep in mind:

  • start on a low scale and increase complexity as results improve (like you already did)
  • just take an excerpt of the nibble environment, because if you take the whole level your trained model will be just good at that level and thus can't generalize for other levels
@mikecann

This comment has been minimized.

Show comment
Hide comment
@mikecann

mikecann Dec 19, 2017

@MarcoMeter thanks for sharing deeptraffic, thats really cool.

As to the points you mentioned I totally agree with starting off slow. As im probably going to have to pretty much re-build the game anyways (because the old one was written in haXe) it will be another reason to start simple, then train then add more complexity then train some more.

mikecann commented Dec 19, 2017

@MarcoMeter thanks for sharing deeptraffic, thats really cool.

As to the points you mentioned I totally agree with starting off slow. As im probably going to have to pretty much re-build the game anyways (because the old one was written in haXe) it will be another reason to start simple, then train then add more complexity then train some more.

@awjuliani

This comment has been minimized.

Show comment
Hide comment
@awjuliani

awjuliani Dec 19, 2017

Collaborator

@mikecann sounds like a great approach! You can also definitely just feed in velocity information into the state to augment it.

Feel free to post here on the results of your experimentation, as I'd be interested in learning what approaches do and don't work for problems like this.

Collaborator

awjuliani commented Dec 19, 2017

@mikecann sounds like a great approach! You can also definitely just feed in velocity information into the state to augment it.

Feel free to post here on the results of your experimentation, as I'd be interested in learning what approaches do and don't work for problems like this.

@mikecann

This comment has been minimized.

Show comment
Hide comment
@mikecann

mikecann Dec 20, 2017

@awjuliani Awesome, I will do. Planning on doing more work on this over the holidays.

mikecann commented Dec 20, 2017

@awjuliani Awesome, I will do. Planning on doing more work on this over the holidays.

@mikecann

This comment has been minimized.

Show comment
Hide comment
@mikecann

mikecann Jan 13, 2018

Hey @awjuliani just thought you would like to know I have finally found time to do a little more work on this and write it up: https://mikecann.co.uk/machine-learning/a-game-developer-learns-machine-learning-mr-nibbles-basics/

I ran into a few issues that I think means I will have to use curriculum learning in the future.

Do you think I need to provide velocity or should the network be able to infer velocity from the previous position state?

mikecann commented Jan 13, 2018

Hey @awjuliani just thought you would like to know I have finally found time to do a little more work on this and write it up: https://mikecann.co.uk/machine-learning/a-game-developer-learns-machine-learning-mr-nibbles-basics/

I ran into a few issues that I think means I will have to use curriculum learning in the future.

Do you think I need to provide velocity or should the network be able to infer velocity from the previous position state?

@MarcoMeter

This comment has been minimized.

Show comment
Hide comment
@MarcoMeter

MarcoMeter Jan 13, 2018

Contributor

Hi @mikecann
I skimmed your blog post and code. It looks like that you don't add temporal information to your state space, so I suggest to add the velocity. Another potential problem is that you are not normalizing your state space. Maybe change normalize to true in your PPO members or do it manually in CollectState() of your implementation. I usually do it manually. You can find more information about it in the best practices doc.

Contributor

MarcoMeter commented Jan 13, 2018

Hi @mikecann
I skimmed your blog post and code. It looks like that you don't add temporal information to your state space, so I suggest to add the velocity. Another potential problem is that you are not normalizing your state space. Maybe change normalize to true in your PPO members or do it manually in CollectState() of your implementation. I usually do it manually. You can find more information about it in the best practices doc.

@mikecann

This comment has been minimized.

Show comment
Hide comment
@mikecann

mikecann Jan 15, 2018

@MarcoMeter I was hoping that the velocity could be inferred from the current position and last position but I think you are right, I need to manually supply the velocity. I will read up on the normalize, not sure exactly what the reasons for doing that are just yet.

mikecann commented Jan 15, 2018

@MarcoMeter I was hoping that the velocity could be inferred from the current position and last position but I think you are right, I need to manually supply the velocity. I will read up on the normalize, not sure exactly what the reasons for doing that are just yet.

@MarcoMeter

This comment has been minimized.

Show comment
Hide comment
@MarcoMeter

MarcoMeter Jan 15, 2018

Contributor

Having multiple positions in the state space should work as well. But if there is a possibility to reduce dimensions (e.g. by aggregation), then this is beneficial to your model as it reduces the scale. Two position vectors take more inputs than one velocity vector, but both contain somewhat the same information.

Contributor

MarcoMeter commented Jan 15, 2018

Having multiple positions in the state space should work as well. But if there is a possibility to reduce dimensions (e.g. by aggregation), then this is beneficial to your model as it reduces the scale. Two position vectors take more inputs than one velocity vector, but both contain somewhat the same information.

@mikecann

This comment has been minimized.

Show comment
Hide comment
@mikecann

mikecann Jan 16, 2018

@MarcoMeter sorry, I apologize. What I meant by "inferred from the current position and last position" wasnt that I actually supply the current pos and last pos per "CollectState()" as that would just be the velocity, like you mentioned.

Instead what I meant was that I thought that by default all state gathered by "CollectState()" is temporal. So the network automatically uses the previous state gathered by "CollectState()" as an input, thus the temporal nature of that should mean that the velocity should be inferred from the current position and the last.

Hope that makes sense :)

mikecann commented Jan 16, 2018

@MarcoMeter sorry, I apologize. What I meant by "inferred from the current position and last position" wasnt that I actually supply the current pos and last pos per "CollectState()" as that would just be the velocity, like you mentioned.

Instead what I meant was that I thought that by default all state gathered by "CollectState()" is temporal. So the network automatically uses the previous state gathered by "CollectState()" as an input, thus the temporal nature of that should mean that the velocity should be inferred from the current position and the last.

Hope that makes sense :)

@MarcoMeter

This comment has been minimized.

Show comment
Hide comment
@MarcoMeter

MarcoMeter Jan 16, 2018

Contributor

the network automatically uses the previous state gathered by "CollectState()" as an input

That's not the case (right now). The development branches contain an implementation to stack the state space up to 9 steps in the past (plus the current one).

Contributor

MarcoMeter commented Jan 16, 2018

the network automatically uses the previous state gathered by "CollectState()" as an input

That's not the case (right now). The development branches contain an implementation to stack the state space up to 9 steps in the past (plus the current one).

@mikecann

This comment has been minimized.

Show comment
Hide comment
@mikecann

mikecann Jan 16, 2018

@MarcoMeter oh really? ahh okay, in which case I will definately need to provide the velocity then.

mikecann commented Jan 16, 2018

@MarcoMeter oh really? ahh okay, in which case I will definately need to provide the velocity then.

@mikecann

This comment has been minimized.

Show comment
Hide comment
@mikecann

mikecann Jan 21, 2018

Hey guys, having some issues trying to get my agent to learn how to jump:

https://youtu.be/2lIXYjx1RBw

He has learnt he needs to head towards the exit just fine but seems to be struggling at jumping over a little hurdle.

In the example above I have attempted to limit session length, by setting done = true when cumulative reward < -80.

I have also tried rewarding for each time the agent jumps but that doesnt seem to have helped and is technically wrong because the agent shouldnt be jumping all the time.

Here are my hyperparams: https://github.com/mikecann/MrNibblesML/blob/master/python/mrnibbles.ipynb

And this is the agent: https://github.com/mikecann/MrNibblesML/blob/master/unity/Assets/MrNibbles/Scripts/MrNibblesAgent.cs

Anyone got any clues? I would try curriculum learning but im not sure how that would help with teaching him how to jump over this small obstacle.

Any help would be greatly appreciated!

mikecann commented Jan 21, 2018

Hey guys, having some issues trying to get my agent to learn how to jump:

https://youtu.be/2lIXYjx1RBw

He has learnt he needs to head towards the exit just fine but seems to be struggling at jumping over a little hurdle.

In the example above I have attempted to limit session length, by setting done = true when cumulative reward < -80.

I have also tried rewarding for each time the agent jumps but that doesnt seem to have helped and is technically wrong because the agent shouldnt be jumping all the time.

Here are my hyperparams: https://github.com/mikecann/MrNibblesML/blob/master/python/mrnibbles.ipynb

And this is the agent: https://github.com/mikecann/MrNibblesML/blob/master/unity/Assets/MrNibbles/Scripts/MrNibblesAgent.cs

Anyone got any clues? I would try curriculum learning but im not sure how that would help with teaching him how to jump over this small obstacle.

Any help would be greatly appreciated!

@MarcoMeter

This comment has been minimized.

Show comment
Hide comment
@MarcoMeter

MarcoMeter Jan 21, 2018

Contributor

For your discrete action space, you've chosen a huuuuuge batch size. I'd go by a batch size of max. 128, rather smaller like 64.

Besides the hyperparameters, I think that your state space does not comprise enough information. I'd add some information about obstacles. Maybe cast a ray to let the agent sense any obstacles in front of it. If you are using a camera observation as input, I'd still try that.

Contributor

MarcoMeter commented Jan 21, 2018

For your discrete action space, you've chosen a huuuuuge batch size. I'd go by a batch size of max. 128, rather smaller like 64.

Besides the hyperparameters, I think that your state space does not comprise enough information. I'd add some information about obstacles. Maybe cast a ray to let the agent sense any obstacles in front of it. If you are using a camera observation as input, I'd still try that.

@mikecann

This comment has been minimized.

Show comment
Hide comment
@mikecann

mikecann Jan 22, 2018

Hi @MarcoMeter, thanks for those tips.

I went for the batch size I did because my action space is actually continuous because I want the agent to be able to perform multiple actions at once (eg jump and move right) thus I think I read somewhere that your action space needs to be continuous. But I will try lowering my batch size and see where I get to 👍

Well I was hoping that I wouldn't need to provide collision information as the world is on an evenly spaced grid, very much like the pixels in a camera.. thus if an agent can learn from the pixels in a camera they could be able to learn from the cells in my grid-world no?

mikecann commented Jan 22, 2018

Hi @MarcoMeter, thanks for those tips.

I went for the batch size I did because my action space is actually continuous because I want the agent to be able to perform multiple actions at once (eg jump and move right) thus I think I read somewhere that your action space needs to be continuous. But I will try lowering my batch size and see where I get to 👍

Well I was hoping that I wouldn't need to provide collision information as the world is on an evenly spaced grid, very much like the pixels in a camera.. thus if an agent can learn from the pixels in a camera they could be able to learn from the cells in my grid-world no?

@MarcoMeter

This comment has been minimized.

Show comment
Hide comment
@MarcoMeter

MarcoMeter Jan 22, 2018

Contributor

I overlooked the fact that you are adding each tile's position. I'm wondering if a regular dense layer is capable of sensing the spatial information.

You can try to make the tile's position relative to the player.

In your case I'd do with discrete actions. If you want two actions to be done at once, then create an action which does both. That should be two more actions in your case I guess. That
s not too dimensional for discrete action spaces.
Concerning continuous control, I'd try to clamp (-1,1) each value first. For moving horizontally, one action is enough. Jumping would be triggered if the value is positive.

Maybe this gives you some more ideas.

Contributor

MarcoMeter commented Jan 22, 2018

I overlooked the fact that you are adding each tile's position. I'm wondering if a regular dense layer is capable of sensing the spatial information.

You can try to make the tile's position relative to the player.

In your case I'd do with discrete actions. If you want two actions to be done at once, then create an action which does both. That should be two more actions in your case I guess. That
s not too dimensional for discrete action spaces.
Concerning continuous control, I'd try to clamp (-1,1) each value first. For moving horizontally, one action is enough. Jumping would be triggered if the value is positive.

Maybe this gives you some more ideas.

@mikecann

This comment has been minimized.

Show comment
Hide comment
@mikecann

mikecann Jan 22, 2018

You can try to make the tile's position relative to the player.

This is interesting. Im not sure why making the coordinates relative to the player would help? Im feeding in the position of both the player and each tile in world coords, so why does it matter if its relative to the player or not?

I coded up something last night (but haven't tested it yet) where I only provide the tiles that surround the agent, thus it will mean I can have levels of arbitrary size.

In your case I'd do with discrete actions. If you want two actions to be done at once, then create an action which does both.

Good call, ill try that. Just for my understanding could you explain why discreet action space is preferred over continuous?

Concerning continuous control, I'd try to clamp (-1,1) each value first. For moving horizontally, one action is enough. Jumping would be triggered if the value is positive.

Not sure I fully follow your meaning here. If I am using discrete action-space is this still relevant?

The player can press jump to jump, but if he holds the jump key down longer he can jump higher. I want the agent to learn this. I was hoping the agent would just be able to learn this without any more state but perhaps I should feed in another state variable so the agent can learn that it can continue to jump to get higher?

mikecann commented Jan 22, 2018

You can try to make the tile's position relative to the player.

This is interesting. Im not sure why making the coordinates relative to the player would help? Im feeding in the position of both the player and each tile in world coords, so why does it matter if its relative to the player or not?

I coded up something last night (but haven't tested it yet) where I only provide the tiles that surround the agent, thus it will mean I can have levels of arbitrary size.

In your case I'd do with discrete actions. If you want two actions to be done at once, then create an action which does both.

Good call, ill try that. Just for my understanding could you explain why discreet action space is preferred over continuous?

Concerning continuous control, I'd try to clamp (-1,1) each value first. For moving horizontally, one action is enough. Jumping would be triggered if the value is positive.

Not sure I fully follow your meaning here. If I am using discrete action-space is this still relevant?

The player can press jump to jump, but if he holds the jump key down longer he can jump higher. I want the agent to learn this. I was hoping the agent would just be able to learn this without any more state but perhaps I should feed in another state variable so the agent can learn that it can continue to jump to get higher?

@MarcoMeter

This comment has been minimized.

Show comment
Hide comment
@MarcoMeter

MarcoMeter Jan 22, 2018

Contributor

Im not sure why making the coordinates relative to the player would help?

It might has the potential add some spatial value. Like all what's close to the agent has small values.

Just for my understanding could you explain why discreet action space is preferred over continuous?

Continuous is much more complex. As long as your environment can be played using keyboard keys, I'd go with discrete actions.

Not sure I fully follow your meaning here. If I am using discrete action-space is this still relevant?

This doesn't apply to discrete action spaces. In continuous space, the output for each action can be any real number. It's pretty common practice to clamp the output before taking the action.

The player can press jump to jump, but if he holds the jump key down longer he can jump higher.

That's an interesting scenario. That's where the time of events matter. Well, I'd try to not add any further inputs or rewards to enforce such a jumping behavior. The reinforcement learning algorithm should be able to solve it.

Contributor

MarcoMeter commented Jan 22, 2018

Im not sure why making the coordinates relative to the player would help?

It might has the potential add some spatial value. Like all what's close to the agent has small values.

Just for my understanding could you explain why discreet action space is preferred over continuous?

Continuous is much more complex. As long as your environment can be played using keyboard keys, I'd go with discrete actions.

Not sure I fully follow your meaning here. If I am using discrete action-space is this still relevant?

This doesn't apply to discrete action spaces. In continuous space, the output for each action can be any real number. It's pretty common practice to clamp the output before taking the action.

The player can press jump to jump, but if he holds the jump key down longer he can jump higher.

That's an interesting scenario. That's where the time of events matter. Well, I'd try to not add any further inputs or rewards to enforce such a jumping behavior. The reinforcement learning algorithm should be able to solve it.

@eshvk eshvk added the discussion label Mar 28, 2018

@mmattar

This comment has been minimized.

Show comment
Hide comment
@mmattar

mmattar Mar 28, 2018

Member

Folks, closing the issue as it has been inactive for 30 days. Feel free to reopen to continue the discussion of anything else comes up.

Member

mmattar commented Mar 28, 2018

Folks, closing the issue as it has been inactive for 30 days. Feel free to reopen to continue the discussion of anything else comes up.

@mmattar mmattar closed this Mar 28, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment