Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flamme rouge game initial version #5

Merged
merged 2 commits into from May 18, 2021
Merged

Conversation

zorgluf
Copy link
Contributor

@zorgluf zorgluf commented May 10, 2021

Implementation of the "flamme rouge" game (https://www.ultraboardgames.com/flamme-rouge/game-rules.php), with "peleton" extension, for 5 players. Proposed for merge to the source project, if it is relevant for you.
The best_model need much more training (only 2 hours on one CPU). Model can be largely improved, I am still learning deep learning...

@davidADSP : great thanks for sharing this wonderful project !

@davidADSP
Copy link
Owner

This looks great! Can you add the name of the game in the README under Getting Stated and I'll be happy to merge in - many thanks!

@zorgluf
Copy link
Contributor Author

zorgluf commented May 12, 2021

Sure ! Done.

@davidADSP davidADSP merged commit 2acfd25 into davidADSP:main May 18, 2021
@davidADSP
Copy link
Owner

davidADSP commented Jun 12, 2021

Hey just had a look at your Flamme Rouge code @zorgluf - nice work :) Could you share the training parameters you used in the command line? - I'll try running on my multi-core server and see how good we can train the agent.

@davidADSP
Copy link
Owner

davidADSP commented Jun 12, 2021

@zorgluf - The main thing I would recommend is to change the shape of the observation, so that additional information is added as channels, rather than along the width or height dimension of the input.

For example, the board shape is (120, 3, 7) = (length, lanes, tile types).
Let's say we're playing a 2 player game, so the observation shape gets expanded to (120, 3, 11) - (4 extra channels for rolleur and sprinteur location of each player) 👍

At the moment, extra information is being adding against the length dimension:
e.g. played cards = (12, ) -> resized and padded with zeros to (12, 3, 11) -> appended along length dimension to obs = (132, 3, 11).

This isn't ideal as the convolutional layer won't be able to easily synthesise information from the early cells in the board (e.g. (0,...), (1,...) with the extra information, which at the moment is literally stored at the other end of the board :) (120,...), (121,...) etc.

Instead this extra info needs to be distributed across the entire board as extra channels (exactly like you've done for the position of the cyclists). Also, note that it should be smeared across the board (i.e. repeated) rather than padded with zeros.

So for example:
played cards = (12, ) -> repeated and reshaped to (120, 3, 12) -> appended along channel dimension to obs = (120, 3, 23).

In total it looks like you're creating 84 extra channels for a 2 player game (12 * 2 players played cards, 12 discarded cards, 12 hand card, 36 actions), so the final observations object should have shape (120, 3, 95).

Hope that makes sense - I'm seriously impressed with the implementation. I think with a few tweaks to the design it will be able to get to superhuman performance. Can't wait to see it in action!

@zorgluf
Copy link
Contributor Author

zorgluf commented Jun 12, 2021

Thanks for the comment and advices, I will have a look. By the way, I already started an improvement of the model on my fork https://github.com/zorgluf/SIMPLE/tree/frouge, so that learning will be made separately on conv2D on board and dense on card channels (without the unnecessary empty dimensions). It improve the learning time significantly. Before suggesting a new pull request, I have to review the experiment, even if it's a few week ago, I have the memory of a red fish...
By "training parameters", what do you mean ? I used your train script without any extra parameters.
For the "superhuman" performance, there should be still a gap, I frequently manage to win against the best model so far... I will follow your advices to see if it's better with a new model.
Thanks !

@davidADSP
Copy link
Owner

davidADSP commented Jun 12, 2021

Ok great - I've also just committed the changes described above as a new branch (for testing I've just simplified to the basic board layout and 2 player game).

https://github.com/davidADSP/SIMPLE/tree/frouge_df
See what you think. Your idea is also interesting - using 3D array input for the board and 1D array input for the cards - that would also work I'm sure.

I just wondered if you changed the default arguments to the train.py command line function e.g. batch_size, number of timesteps per training iterations etc. They probably don't need changing - it's always a bit of trial and error with these things.

I'm training now using my updated branch and seeing promising results - will leave it overnight and see what happens.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants