Skip to content

Commit

Permalink
Merge e060d3d into 2a0378b
Browse files Browse the repository at this point in the history
  • Loading branch information
Edouard360 committed Oct 5, 2017
2 parents 2a0378b + e060d3d commit 15faa43
Show file tree
Hide file tree
Showing 7 changed files with 170 additions and 136 deletions.
93 changes: 2 additions & 91 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,6 @@

<a href="https://halite.io/">Halite</a> is an open source artificial intelligence programming challenge, created by <a href="https://www.twosigma.com/">Two Sigma</a>, where players build bots using the coding language of their choice to battle on a two-dimensional virtual board. The last bot standing or the bot with all the territory wins. Victory will require micromanaging of the movement of pieces, optimizing a bot’s combat ability, and braving a branching factor billions of times higher than that of Go.

## Documentation

The documentation is available <a href="https://edouard360.github.io/Halite-Python-RL/">here</a>.

## Objective

The objective of the project is to apply **Reinforcement Learning** strategies to teach the Bot to perform as well as possible. We teach an agent to learn the best actions to play at each turn. More precisely, given the game state, our untrained Bot **initially performs random actions, but gets rewarded for the good one**. Over time, the Bot automatically learns how to conquer efficiently the map.
Expand All @@ -31,91 +27,6 @@ Indeed, unlike chess or go, in the Halite turn-based game, we can do **multiple

In this repository, we will mainly explore the solutions based on **Neural Networks**, and will start by a very simple <a href="https://en.wikipedia.org/wiki/Multilayer_perceptron">MLP</a>. This is inspired from a <a href="https://medium.com/@awjuliani/super-simple-reinforcement-learning-tutorial-part-2-ded33892c724">tutorial</a> on Reinforcement Learning agent.

## Documentation & Articles

## Detailing the approach step by step

We will explain the rules of the game in this section, along with our strategy for training the agent. To start simple, we will try to conquer a 3*3 map, where we are the only player (cf below). As we can see, this trained agent is already pretty efficient at conquering the map.

<br>
<p align="center">
<img alt="conquermap" src="https://user-images.githubusercontent.com/15527397/30869334-20c1a650-a2e1-11e7-9c1b-9233640ccd01.gif" height="190" width="32%">
<br></p>

### How does it start ?

Each player starts with a single square of the map, and can either decide:

- To **stay** in order to increase the strength of its square (action = STILL).

- To **move** (/conquer) a neighboring square (action = NORTH, SOUTH, EAST, WEST).

Conquering is only possible once the square's strength is high enough, such that a wise bot would first wait for its strength to increase before attacking any adjacent square, since **squares don't produce when they attack**.

> To conquer a square, we must move in its direction having a strictly superior strength (action = NORTH, SOUTH, EAST, WEST)
<br>

The white numbers on the map below represent the current strength of the squares. On the left is just a snap of the initial state of the game. On the right you can see the strength of the blue square increment over time. This is because our agent decides to stay (action = STILL).

<p align="center">
<img height="220" width="32%" alt="the strength map" src="https://user-images.githubusercontent.com/15527397/30869344-24b55702-a2e1-11e7-9383-0dc7f562e5d6.png">
<img height="220" width="32%" src="https://user-images.githubusercontent.com/15527397/30869349-27abe944-a2e1-11e7-8b6e-94dfde9e15a1.gif">
</p>

The increase in production is computed according to a fixed production map. In our example, we can see the blue square's strength increases by 4 at each turn. Each square has a different production speed, as represented by the white numbers below the squares. (cf below). On the left is also a snap of the initial game, whereas the game's dynamic is on the right.

<p align="center">
<img height="220" width="32%" alt="production map" src="https://user-images.githubusercontent.com/15527397/30869351-299bd8c2-a2e1-11e7-80d2-62699551aaa2.png">
<img height="220" width="32%" src="https://user-images.githubusercontent.com/15527397/30869356-2bce1fce-a2e1-11e7-86e6-339335636e0e.gif">
</p>

This production map production is invariant over time, and is an information we should use to train our agent. Since we are interesting in maximizing our production, we should intuitively train our agent to target the squares with a high production rate. On the other hand, we should also consider the strength map, since squares with low strength are easier to conquer.

<p align="center">
<img height="220" width="32%" src="https://user-images.githubusercontent.com/15527397/30869359-2e235f3c-a2e1-11e7-87ce-109ea5c08c27.gif">
</p>

### The Agent

We will teach our agent with:

- The successive **Game States**.
- The agent's **Moves** (initially random).
- The corresponding **Reward** for each Move (that we have to compute).

For now, the Game State is a (3 * 3) * 3 matrix (width * height) * n_features, the features being:

- The **Strength** of the Square
- The **Production** of the Square
- The **Owner** of the Square

<p align="center">
<img height="220" width="32%" alt="matrix" src="https://user-images.githubusercontent.com/15527397/30869363-30c46a56-a2e1-11e7-8882-1c22bc2256f8.png">
<img height="220" width="32%" src="https://user-images.githubusercontent.com/15527397/30869368-32e9be94-a2e1-11e7-831e-3d74b19981a4.gif">
</p>

### The Reward

<br>
As for the reward, we focus on the production. Since each square being conquered increase the total production of our land, the action leading to the conquest is rewarded according to the production rate of the conquered square. This strategy will best reward the conquest of highly productive squares.

<p align="center">
<img height="220" width="32%" src="https://user-images.githubusercontent.com/15527397/30869372-363a5c7a-a2e1-11e7-8784-9a83d4c62c44.gif">
</p>

### Current results

We train over 500 games and get significant improvements of the total reward obtained over time.

<p align="center">
<img alt="screen shot 2017-09-26 at 17 34 04" src="https://user-images.githubusercontent.com/15527397/30869383-3e046b94-a2e1-11e7-91c7-ecf2381eb83f.png" height="190" width="32%">
</p>

On the right, you can observe the behaviour of the original, untrained bot, with random actions, whereas on the right, you can see the trained bot.

<p align="center">
<img height="220" width="32%" src="https://user-images.githubusercontent.com/15527397/30869385-3fd296e4-a2e1-11e7-81f7-3a9436740792.gif">
<img height="220" width="32%" src="https://user-images.githubusercontent.com/15527397/30869390-41fe0d22-a2e1-11e7-9205-88fd2c47a544.gif">
</p>

#### Isn't that amazing ?
To get started, blog articles and documentation are available at <a href="https://edouard360.github.io/Halite-Python-RL/">this page</a>.
1 change: 0 additions & 1 deletion docs/.config.yml

This file was deleted.

44 changes: 1 addition & 43 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1,45 +1,3 @@
---
title: Sidebar Navigation
summary: "My man!"
sidebar: mydoc_sidebar
permalink: mydoc_sidebar_navigation.html
folder: mydoc
---

# Documentation

Go read the documentation [here](https://edouard360.github.io/Halite-Python-RL/).

## Run the Bot

In your console:

`cd networking python start_game.py`

In another tab

`cd public python MyBot.py`

This will run 1 game. Options can be added to starting the game, among which:

`python start_game.py -g 5 -x 30 -z 50`

Will run 5 games, of at most 30 turns, which at most squares of strength 50.

## Visualize the Bot

In your console:

`cd visualize export FLASK_APP=visualize.py;flask run`

Then either:

Look at http://127.0.0.1:5000/performance.png for performance insights.

Or at http://127.0.0.1:5000/ for games replay.

## Working with PyCharm

To run the Bot in Pycharm, you should provide a **mute** argument, since `MyBot.py` needs to know it's not on the Halite server, but running locally.

Go to edit configuration and add the script argument 2000 (It could be any other number).
To see the docs, click [here](https://edouard360.github.io/Halite-Python-RL/).
15 changes: 14 additions & 1 deletion docs/_config.yml
Original file line number Diff line number Diff line change
@@ -1 +1,14 @@
theme: jekyll-theme-cayman
# Setup
theme: jekyll-theme-cayman

title: Halite Challenge
tagline: A data science project

author:
name: Edouard Mehlman
url: edouard.mehlman@polytechnique.edu

collections:
documentation:
output: true
permalink: /:collection/:name # This is just display
40 changes: 40 additions & 0 deletions docs/_documentation/first_steps.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
---
layout: default
title: "First Steps"

---


## Run the Bot

In your console:

`cd networking python start_game.py`

In another tab

`cd public python MyBot.py`

This will run 1 game. Options can be added to starting the game, among which:

`python start_game.py -g 5 -x 30 -z 50`

Will run 5 games, of at most 30 turns, which at most squares of strength 50.

## Visualize the Bot

In your console:

`cd visualize export FLASK_APP=visualize.py;flask run`

Then either:

Look at http://127.0.0.1:5000/performance.png for performance insights.

Or at http://127.0.0.1:5000/ for games replay.

## Working with PyCharm

To run the Bot in Pycharm, you should provide a **mute** argument, since `MyBot.py` needs to know it's not on the Halite server, but running locally.

Go to edit configuration and add the script argument `slave` (so that the bot knows it is in slave mode).
93 changes: 93 additions & 0 deletions docs/_posts/2017-09-26-simple-approach.markdown
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
---
layout: default
title: "A simple approach"
date: 2016-02-12 17:50:00
categories: main
---

## Detailing the approach step by step

We will explain the rules of the game in this section, along with our strategy for training the agent. To start simple, we will try to conquer a 3*3 map, where we are the only player (cf below). As we can see, this trained agent is already pretty efficient at conquering the map.

<br>
<p align="center">
<img alt="conquermap" src="https://user-images.githubusercontent.com/15527397/30869334-20c1a650-a2e1-11e7-9c1b-9233640ccd01.gif" height="190" width="32%">
<br></p>


### How does it start ?

Each player starts with a single square of the map, and can either decide:

- To **stay** in order to increase the strength of its square (action = STILL).

- To **move** (/conquer) a neighboring square (action = NORTH, SOUTH, EAST, WEST).

Conquering is only possible once the square's strength is high enough, such that a wise bot would first wait for its strength to increase before attacking any adjacent square, since **squares don't produce when they attack**.

> To conquer a square, we must move in its direction having a strictly superior strength (action = NORTH, SOUTH, EAST, WEST)
<br>

The white numbers on the map below represent the current strength of the squares. On the left is just a snap of the initial state of the game. On the right you can see the strength of the blue square increment over time. This is because our agent decides to stay (action = STILL).

<p align="center">
<img height="220" width="220" alt="the strength map" src="https://user-images.githubusercontent.com/15527397/30869344-24b55702-a2e1-11e7-9383-0dc7f562e5d6.png">
<img height="220" width="220" src="https://user-images.githubusercontent.com/15527397/30869349-27abe944-a2e1-11e7-8b6e-94dfde9e15a1.gif">
</p>

The increase in production is computed according to a fixed production map. In our example, we can see the blue square's strength increases by 4 at each turn. Each square has a different production speed, as represented by the white numbers below the squares. (cf below). On the left is also a snap of the initial game, whereas the game's dynamic is on the right.

<p align="center">
<img height="220" width="220" alt="production map" src="https://user-images.githubusercontent.com/15527397/30869351-299bd8c2-a2e1-11e7-80d2-62699551aaa2.png">
<img height="220" width="220" src="https://user-images.githubusercontent.com/15527397/30869356-2bce1fce-a2e1-11e7-86e6-339335636e0e.gif">
</p>

This production map production is invariant over time, and is an information we should use to train our agent. Since we are interesting in maximizing our production, we should intuitively train our agent to target the squares with a high production rate. On the other hand, we should also consider the strength map, since squares with low strength are easier to conquer.

<p align="center">
<img height="220" width="220" src="https://user-images.githubusercontent.com/15527397/30869359-2e235f3c-a2e1-11e7-87ce-109ea5c08c27.gif">
</p>

### The Agent

We will teach our agent with:

- The successive **Game States**.
- The agent's **Moves** (initially random).
- The corresponding **Reward** for each Move (that we have to compute).

For now, the Game State is a (3 * 3) * 3 matrix (width * height) * n_features, the features being:

- The **Strength** of the Square
- The **Production** of the Square
- The **Owner** of the Square

<p align="center">
<img height="220" width="220" alt="matrix" src="https://user-images.githubusercontent.com/15527397/30869363-30c46a56-a2e1-11e7-8882-1c22bc2256f8.png">
<img height="220" width="220" src="https://user-images.githubusercontent.com/15527397/30869368-32e9be94-a2e1-11e7-831e-3d74b19981a4.gif">
</p>

### The Reward

<br>
As for the reward, we focus on the production. Since each square being conquered increase the total production of our land, the action leading to the conquest is rewarded according to the production rate of the conquered square. This strategy will best reward the conquest of highly productive squares.

<p align="center">
<img height="220" width="220" src="https://user-images.githubusercontent.com/15527397/30869372-363a5c7a-a2e1-11e7-8784-9a83d4c62c44.gif">
</p>

### Current results

We train over 500 games and get significant improvements of the total reward obtained over time.

<p align="center">
<img height="220" width="350" alt="screen shot 2017-09-26 at 17 34 04" src="https://user-images.githubusercontent.com/15527397/30869383-3e046b94-a2e1-11e7-91c7-ecf2381eb83f.png" >
</p>

On the right, you can observe the behaviour of the original, untrained bot, with random actions, whereas on the right, you can see the trained bot.

<p align="center">
<img height="220" width="220" src="https://user-images.githubusercontent.com/15527397/30869385-3fd296e4-a2e1-11e7-81f7-3a9436740792.gif">
<img height="220" width="220" src="https://user-images.githubusercontent.com/15527397/30869390-41fe0d22-a2e1-11e7-9205-88fd2c47a544.gif">
</p>
20 changes: 20 additions & 0 deletions docs/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
layout: default
title: {{ site.name }}
---

<div id="home">
<h1>Documentation</h1>
<ul class="posts">
{% for doc in site.documentation %}
<li><a href="{{ site.baseurl }}{{ doc.url }}">{{ doc.title }}</a></li>
{% endfor %}
</ul>
<h1>Blog Posts</h1>
<ul class="posts">
{% for post in site.posts %}
<li> <a href="{{ site.baseurl }}{{ post.url }}">{{ post.title }}</a> <span>({{ post.date | date_to_string }})</span></li>
{% endfor %}
</ul>

</div>

0 comments on commit 15faa43

Please sign in to comment.