Merge e060d3d into 2a0378b

Edouard360 · Oct 5, 2017 · 15faa43 · 15faa43
2 parents 2a0378b + e060d3d
commit 15faa43
Show file tree

Hide file tree

Showing 7 changed files with 170 additions and 136 deletions.
diff --git a/README.md b/README.md
@@ -11,10 +11,6 @@
 
 <a href="https://halite.io/">Halite</a> is an open source artificial intelligence programming challenge, created by <a href="https://www.twosigma.com/">Two Sigma</a>, where players build bots using the coding language of their choice to battle on a two-dimensional virtual board. The last bot standing or the bot with all the territory wins. Victory will require micromanaging of the movement of pieces, optimizing a bot’s combat ability, and braving a branching factor billions of times higher than that of Go.
 
-## Documentation
-
-The documentation is available <a href="https://edouard360.github.io/Halite-Python-RL/">here</a>.
-
 ## Objective
 
 The objective of the project is to apply **Reinforcement Learning** strategies to teach the Bot to perform as well as possible. We teach an agent to learn the best actions to play at each turn. More precisely, given the game state, our untrained Bot **initially performs random actions, but gets rewarded for the good one**. Over time, the Bot automatically learns how to conquer efficiently the map.
@@ -31,91 +27,6 @@ Indeed, unlike chess or go, in the Halite turn-based game, we can do **multiple
 
 In this repository, we will mainly explore the solutions based on **Neural Networks**, and will start by a very simple <a href="https://en.wikipedia.org/wiki/Multilayer_perceptron">MLP</a>. This is inspired from a <a href="https://medium.com/@awjuliani/super-simple-reinforcement-learning-tutorial-part-2-ded33892c724">tutorial</a> on Reinforcement Learning agent.
 
+## Documentation & Articles
 
-## Detailing the approach step by step
-
-We will explain the rules of the game in this section, along with our strategy for training the agent. To start simple, we will try to conquer a 3*3 map, where we are the only player (cf below). As we can see, this trained agent is already pretty efficient at conquering the map. 
-
-<br>
-<p align="center">
-<img alt="conquermap" src="https://user-images.githubusercontent.com/15527397/30869334-20c1a650-a2e1-11e7-9c1b-9233640ccd01.gif" height="190" width="32%">
-<br></p>
-
-### How does it start ?
-
-Each player starts with a single square of the map, and can either decide:
-
-- To **stay** in order to increase the strength of its square (action = STILL).
-
-- To **move** (/conquer) a neighboring square (action = NORTH, SOUTH, EAST, WEST).
-
-Conquering is only possible once the square's strength is high enough, such that a wise bot would first wait for its strength to increase before attacking any adjacent square, since **squares don't produce when they attack**.
-
-> To conquer a square, we must move in its direction having a strictly superior strength (action = NORTH, SOUTH, EAST, WEST)
-
-<br>
-
-The white numbers on the map below represent the current strength of the squares. On the left is just a snap of the initial state of the game. On the right you can see the strength of the blue square increment over time. This is because our agent decides to stay (action = STILL).
-
-<p align="center">
-<img height="220" width="32%" alt="the strength map" src="https://user-images.githubusercontent.com/15527397/30869344-24b55702-a2e1-11e7-9383-0dc7f562e5d6.png">
-<img height="220" width="32%" src="https://user-images.githubusercontent.com/15527397/30869349-27abe944-a2e1-11e7-8b6e-94dfde9e15a1.gif">
- </p>
-
-The increase in production is computed according to a fixed production map. In our example, we can see the blue square's strength increases by 4 at each turn. Each square has a different production speed, as represented by the white numbers below the squares. (cf below). On the left is also a snap of the initial game, whereas the game's dynamic is on the right. 
-
-<p align="center">
-<img height="220" width="32%" alt="production map" src="https://user-images.githubusercontent.com/15527397/30869351-299bd8c2-a2e1-11e7-80d2-62699551aaa2.png">
-<img height="220" width="32%" src="https://user-images.githubusercontent.com/15527397/30869356-2bce1fce-a2e1-11e7-86e6-339335636e0e.gif">
-</p>
-
-This production map production is invariant over time, and is an information we should use to train our agent. Since we are interesting in maximizing our production, we should intuitively train our agent to target the squares with a high production rate. On the other hand, we should also consider the strength map, since squares with low strength are easier to conquer.
-
-<p align="center">
-<img height="220" width="32%" src="https://user-images.githubusercontent.com/15527397/30869359-2e235f3c-a2e1-11e7-87ce-109ea5c08c27.gif">
-</p>
-
-### The Agent
-
-We will teach our agent with:
-
-- The successive **Game States**.
-- The agent's **Moves** (initially random).
-- The corresponding **Reward** for each Move (that we have to compute).
-
-For now, the Game State is a (3 * 3) * 3 matrix (width * height) * n_features, the features being:
-
-- The **Strength** of the Square
-- The **Production** of the Square
-- The **Owner** of the Square
-
-<p align="center">
-<img height="220" width="32%" alt="matrix" src="https://user-images.githubusercontent.com/15527397/30869363-30c46a56-a2e1-11e7-8882-1c22bc2256f8.png">
-<img height="220" width="32%" src="https://user-images.githubusercontent.com/15527397/30869368-32e9be94-a2e1-11e7-831e-3d74b19981a4.gif">
-</p>
-
-### The Reward
-
-<br>
-As for the reward, we focus on the production. Since each square being conquered increase the total production of our land, the action leading to the conquest is rewarded according to the production rate of the conquered square. This strategy will best reward the conquest of highly productive squares.
-
-<p align="center">
-<img height="220" width="32%" src="https://user-images.githubusercontent.com/15527397/30869372-363a5c7a-a2e1-11e7-8784-9a83d4c62c44.gif">
-</p>
-
-### Current results
-
-We train over 500 games and get significant improvements of the total reward obtained over time.
-
-<p align="center">
-<img alt="screen shot 2017-09-26 at 17 34 04" src="https://user-images.githubusercontent.com/15527397/30869383-3e046b94-a2e1-11e7-91c7-ecf2381eb83f.png" height="190" width="32%">
-</p>
-
-On the right, you can observe the behaviour of the original, untrained bot, with random actions, whereas on the right, you can see the trained bot.
-
-<p align="center">
-<img height="220" width="32%" src="https://user-images.githubusercontent.com/15527397/30869385-3fd296e4-a2e1-11e7-81f7-3a9436740792.gif">
-<img height="220" width="32%" src="https://user-images.githubusercontent.com/15527397/30869390-41fe0d22-a2e1-11e7-9205-88fd2c47a544.gif">
-</p>
-
-#### Isn't that amazing ?
+To get started, blog articles and documentation are available at <a href="https://edouard360.github.io/Halite-Python-RL/">this page</a>.
diff --git a/docs/.config.yml b/docs/.config.yml
diff --git a/docs/README.md b/docs/README.md
@@ -1,45 +1,3 @@
----
-title: Sidebar Navigation
-summary: "My man!"
-sidebar: mydoc_sidebar
-permalink: mydoc_sidebar_navigation.html
-folder: mydoc
----
-
 # Documentation
 
-Go read the documentation [here](https://edouard360.github.io/Halite-Python-RL/).
-
-## Run the Bot
-
-In your console:
-
-`cd networking python start_game.py`
-
-In another tab
-
-`cd public python MyBot.py`
-
-This will run 1 game. Options can be added to starting the game, among which:
-
-`python start_game.py -g 5 -x 30 -z 50`
-
-Will run 5 games, of at most 30 turns, which at most squares of strength 50.
-
-## Visualize the Bot
-
-In your console:
-
-`cd visualize export FLASK_APP=visualize.py;flask run`
-
-Then either:
-
-Look at http://127.0.0.1:5000/performance.png for performance insights.
-
-Or at http://127.0.0.1:5000/ for games replay.
-
-## Working with PyCharm
-
-To run the Bot in Pycharm, you should provide a **mute** argument, since `MyBot.py` needs to know it's not on the Halite server, but running locally. 
-
-Go to edit configuration and add the script argument 2000 (It could be any other number).
+To see the docs, click [here](https://edouard360.github.io/Halite-Python-RL/).
diff --git a/docs/_config.yml b/docs/_config.yml
@@ -1 +1,14 @@
-theme: jekyll-theme-cayman
+# Setup
+theme: jekyll-theme-cayman
+
+title:        Halite Challenge
+tagline:      A data science project
+
+author:
+  name:       Edouard Mehlman
+  url:        edouard.mehlman@polytechnique.edu
+
+collections:
+  documentation:
+    output: true
+    permalink: /:collection/:name # This is just display
diff --git a/docs/_documentation/first_steps.md b/docs/_documentation/first_steps.md
@@ -0,0 +1,40 @@
+---
+layout: default
+title:  "First Steps"
+
+---
+
+
+## Run the Bot
+
+In your console:
+
+`cd networking python start_game.py`
+
+In another tab
+
+`cd public python MyBot.py`
+
+This will run 1 game. Options can be added to starting the game, among which:
+
+`python start_game.py -g 5 -x 30 -z 50`
+
+Will run 5 games, of at most 30 turns, which at most squares of strength 50.
+
+## Visualize the Bot
+
+In your console:
+
+`cd visualize export FLASK_APP=visualize.py;flask run`
+
+Then either:
+
+Look at http://127.0.0.1:5000/performance.png for performance insights.
+
+Or at http://127.0.0.1:5000/ for games replay.
+
+## Working with PyCharm
+
+To run the Bot in Pycharm, you should provide a **mute** argument, since `MyBot.py` needs to know it's not on the Halite server, but running locally. 
+
+Go to edit configuration and add the script argument `slave` (so that the bot knows it is in slave mode).
diff --git a/docs/_posts/2017-09-26-simple-approach.markdown b/docs/_posts/2017-09-26-simple-approach.markdown
@@ -0,0 +1,93 @@
+---
+layout: default
+title:  "A simple approach"
+date:   2016-02-12 17:50:00
+categories: main
+---
+
+## Detailing the approach step by step
+
+We will explain the rules of the game in this section, along with our strategy for training the agent. To start simple, we will try to conquer a 3*3 map, where we are the only player (cf below). As we can see, this trained agent is already pretty efficient at conquering the map. 
+
+<br>
+<p align="center">
+<img alt="conquermap" src="https://user-images.githubusercontent.com/15527397/30869334-20c1a650-a2e1-11e7-9c1b-9233640ccd01.gif" height="190" width="32%">
+<br></p>
+
+
+### How does it start ?
+
+Each player starts with a single square of the map, and can either decide:
+
+- To **stay** in order to increase the strength of its square (action = STILL).
+
+- To **move** (/conquer) a neighboring square (action = NORTH, SOUTH, EAST, WEST).
+
+Conquering is only possible once the square's strength is high enough, such that a wise bot would first wait for its strength to increase before attacking any adjacent square, since **squares don't produce when they attack**.
+
+> To conquer a square, we must move in its direction having a strictly superior strength (action = NORTH, SOUTH, EAST, WEST)
+
+<br>
+
+The white numbers on the map below represent the current strength of the squares. On the left is just a snap of the initial state of the game. On the right you can see the strength of the blue square increment over time. This is because our agent decides to stay (action = STILL).
+
+<p align="center">
+<img height="220" width="220" alt="the strength map" src="https://user-images.githubusercontent.com/15527397/30869344-24b55702-a2e1-11e7-9383-0dc7f562e5d6.png">
+<img height="220" width="220" src="https://user-images.githubusercontent.com/15527397/30869349-27abe944-a2e1-11e7-8b6e-94dfde9e15a1.gif">
+ </p>
+
+The increase in production is computed according to a fixed production map. In our example, we can see the blue square's strength increases by 4 at each turn. Each square has a different production speed, as represented by the white numbers below the squares. (cf below). On the left is also a snap of the initial game, whereas the game's dynamic is on the right. 
+
+<p align="center">
+<img height="220" width="220" alt="production map" src="https://user-images.githubusercontent.com/15527397/30869351-299bd8c2-a2e1-11e7-80d2-62699551aaa2.png">
+<img height="220" width="220" src="https://user-images.githubusercontent.com/15527397/30869356-2bce1fce-a2e1-11e7-86e6-339335636e0e.gif">
+</p>
+
+This production map production is invariant over time, and is an information we should use to train our agent. Since we are interesting in maximizing our production, we should intuitively train our agent to target the squares with a high production rate. On the other hand, we should also consider the strength map, since squares with low strength are easier to conquer.
+
+<p align="center">
+<img height="220" width="220" src="https://user-images.githubusercontent.com/15527397/30869359-2e235f3c-a2e1-11e7-87ce-109ea5c08c27.gif">
+</p>
+
+### The Agent
+
+We will teach our agent with:
+
+- The successive **Game States**.
+- The agent's **Moves** (initially random).
+- The corresponding **Reward** for each Move (that we have to compute).
+
+For now, the Game State is a (3 * 3) * 3 matrix (width * height) * n_features, the features being:
+
+- The **Strength** of the Square
+- The **Production** of the Square
+- The **Owner** of the Square
+
+<p align="center">
+<img height="220" width="220" alt="matrix" src="https://user-images.githubusercontent.com/15527397/30869363-30c46a56-a2e1-11e7-8882-1c22bc2256f8.png">
+<img height="220" width="220" src="https://user-images.githubusercontent.com/15527397/30869368-32e9be94-a2e1-11e7-831e-3d74b19981a4.gif">
+</p>
+
+### The Reward
+
+<br>
+As for the reward, we focus on the production. Since each square being conquered increase the total production of our land, the action leading to the conquest is rewarded according to the production rate of the conquered square. This strategy will best reward the conquest of highly productive squares.
+
+<p align="center">
+<img height="220" width="220" src="https://user-images.githubusercontent.com/15527397/30869372-363a5c7a-a2e1-11e7-8784-9a83d4c62c44.gif">
+</p>
+
+### Current results
+
+We train over 500 games and get significant improvements of the total reward obtained over time.
+
+<p align="center">
+<img height="220" width="350" alt="screen shot 2017-09-26 at 17 34 04" src="https://user-images.githubusercontent.com/15527397/30869383-3e046b94-a2e1-11e7-91c7-ecf2381eb83f.png" >
+</p>
+
+On the right, you can observe the behaviour of the original, untrained bot, with random actions, whereas on the right, you can see the trained bot.
+
+<p align="center">
+<img height="220" width="220" src="https://user-images.githubusercontent.com/15527397/30869385-3fd296e4-a2e1-11e7-81f7-3a9436740792.gif">
+<img height="220" width="220" src="https://user-images.githubusercontent.com/15527397/30869390-41fe0d22-a2e1-11e7-9205-88fd2c47a544.gif">
+</p>
diff --git a/docs/index.html b/docs/index.html
@@ -0,0 +1,20 @@
+---
+layout: default
+title: {{ site.name }}
+---
+
+<div id="home">
+  <h1>Documentation</h1>
+  <ul class="posts">
+    {% for doc in site.documentation %}
+      <li><a href="{{ site.baseurl }}{{ doc.url }}">{{ doc.title }}</a></li>
+    {% endfor %}
+  </ul>
+  <h1>Blog Posts</h1>
+  <ul class="posts">
+    {% for post in site.posts %}
+      <li> <a href="{{ site.baseurl }}{{ post.url }}">{{ post.title }}</a> <span>({{ post.date | date_to_string }})</span></li>
+    {% endfor %}
+  </ul>
+
+</div>