# Miniproject ML4 Snake

# 1. and 2. single- and turn-based multi-player snake

It's much harder to get good results with RL compared to other areas of ML. So always **start your RL as simple as possible. Once you've got good results you can slowly add complexity/reaslism.** So rather than starting directly with multi-player snake, let's start with single-player snake.

OpenAI Gym is the facto standard as RL platform, so it's wise to use a snake environment based on OpenAI Gym. Unfortunately, snake is not part of the environments officially supported by OpenAI Gym, so we use **a modified version of** [this repo](https://github.com/grantsrb/Gym-Snake), providing A single- and multi-snake environment and nice graphics. The adapted environment can be found in the zip on BlackBoard->ML4->Miniproject, in the folder `gym_snake`.


## Understanding the OpenAI Gym platform

The nice thing about OpenAI Gym is that it separates agent and environment via a well-defined interface for the environment. The important methods on the interface of the environment are:
* `env.reset()` creates a new, clean environment; snake: initializes a new snake game
* `obs, reward, done, info = env.step(action)` performs an agent action and immediately returns:
  * `obs`: the changed environment state, called the observation
  * `reward`: the reward, resulting from the action
  * `done`: whether the game has finished
  * `info`: environment-specific additional info
  Based on this information the agent decides what the next action will be. Tha agent continues this loop until `done` is true.
* `env.render()` draws the current state of the environment

The idea is that you create the agent and leave the environment untouched. However in reality you might want to change the environment. An example is the reward function, which is part of the environment.


## Pixel-based vs. coordinate-based environment state representation

The original snake environment of Grantsrb uses a pixel-based environment state representation, rather than a coordinate-based environment state representation. This hugely increases the state size and therefore makes learning slower and harder. This goes against the start-as-simple-as-possible principle. 

A method to perform dimensionality reduction of the state space is using a CNN. The CNN handles the pixel-based state and reduces its dimensionality before passing it to the RL algorithm. But again, a CNN component must be tuned and this adds complexity.

To overcome this, the original code has been slightly adapted: the code keeps using a pixel-based environment state internally, but just before passing it to the agent, it transforms it to coordinate-based environment state. This is really a hack, as it is very inefficient to transform from pixel-based to coordinate-based with every world-tick, but to completely change the internal state representation would have been quite some work. Note that the change from pixel-based to coordinate-based reduced the state size by a factor of 100 (every cell has 10x10 pixels), but not the state space size (the number of grid cells is unchanged).

Once you get good results with the coordinate-based environment you might want to try the pixel-based environment. This is where RL shows its magic. To use the pixel-based environment, set `self.coordinate_based = False` in the file `snake_env.py`.


## Installation of the snake environment

* create a new conda environment running python 3.9: `conda create -n py39 python=3.9`
* `conda activate py39`
* unpack the zip from Blackboard->ML4->Miniproject
* go the folder of the zip and put your RL code there
* If you use Jupyter notebook for your agent and you happen to change somethin in the environment, you need to **restart the kernel** to see environment changes.

This method is not the normal way of working. In the normal way of working you use `pip install -e ./` to install the snake package **and** register the snake environment with OpenAI Gym. The disadvantage of the normal way of working is that every time you change the environment, you need to re-register to see the environment changes. This is quite a hassle and easily forgotten. Advantage of the above method is that code changes in the environment are available directly. 


## Running snake

The rendering of the snake does not work well in a Jupyter notebook. You can use a Jupyter notebook to develop your solution, but to get correct rendering, run the code via the command prompt or via an IDE. Probably it's better to not use Jupyter notebook at all for your project.

## Example agent for 1. single-player snake

See the [README](https://github.com/grantsrb/Gym-Snake) for explanation about the snake environment. 

I didn't go through the hassle of registering it with OpenAI Gym, so you use it as any other python class. This means to create an environment: `env = SnakeEnv(grid_size=[6, 6], snake_size=2)`.

Below a pretty dumb example agent for single-player snake that gives you an idea how to start your agent. Don't put your agent code inside the `gym_snake` folder, but just one level above it.

In [None]:
# Example agent for single-player snake

import numpy as np
import matplotlib.pyplot as plt
import random
import time
import sys
import logging
import gym
#import gym_snake  # don't use the registered snake
from gym_snake.envs.snake_env import SnakeEnv

log = logging.getLogger("miniproject_snake")
log.setLevel(logging.INFO)
log.addHandler(logging.StreamHandler())

# actions
UP = 0
RIGHT = 1
DOWN = 2
LEFT = 3

env = SnakeEnv(grid_size=[6, 6], snake_size=2)
obs = env.reset()  # construct instance of game
done = False
log.info("start game")
for i in range(24):
    if not done:
        env.render()
        obs, reward, done, info = env.step(i%4)  # pass action to step()
        log.info("reward: %.0f", reward)
env.close()


## Solution for single-player snake

For the solution I cheated a bit. Instead of implementing an own solution, I used the RL library Stable Baselines. The algorithm chosen is DQN, deep Q-learning. The default setting is that in fact DDQN, double deep Q-learning, is used.

### Installation of the stable-baselines RL library

* `conda activate py39`
* `conda install -c pytorch pytorch`
* `conda install pip`
* `<location of anaconda>\anaconda\envs\py39\pip install stable-baselines3` (to be sure to use the correct pip binary)
* `<location of anaconda>\anaconda\envs\py39\pip install tensorboard`
    
### Environment code changes needed for Stable Baselines

Below the environment code changes needed for Stable Baselines. The adapted environment can be found in the zip in the folder `gym_snake`.

**In snake_env.py:**

Replace Discrete(4) by spaces.Discrete(4), so:

`self.action_space = spaces.Discrete(4)`

Add this line for coordinate-based state space:

`self.observation_space = spaces.Box(low=0, high=3, shape=(self.grid_size[1], self.grid_size[0]), dtype=np.uint8)`

Or alternatively, add this line for pixel-based state space:

`self.observation_space = spaces.Box(low=0, high=255, shape=(self.grid_size[1] * self.unit_size, self.grid_size[0] * self.unit_size, 3), dtype=np.uint8)`

**In controller.py:**

Replace `if type(directions) == type(int()):` by:

`if type(directions) == type(np.int64()) or type(directions) == type(int()):`


### Discussion

The learned model is saved. With `training = False` the learned model is loaded from disk and used by the agent.

**Assessing learning behavior**
Note that during training, after every 100 episodes (== games), the *mean 100 episode reward* is shown. This number should be slowly increasing as training proceeds, indicating that the agent is learning.

Note that exploring starts at 100% and reduces to about 2%.

Another way of evaluating the learning behavior is to look at the tensorboard logs. 
* Install Tensorboard by opening a anaconda prompt and typing `conda install tensorboard`.
* In the anaconda command prompt, go to the folder where your project resides, using `cd`.
* Start Tensorboard by typing `tensorboard --logdir tensorboard_logs/` in the anaconda command prompt.
 
**Snake behavior**
Note that the reward function does not give the snake an incentive to be efficient in capturing food. Only the discount factor helps the snake te become efficient

In [None]:
# Solution for single-player snake using RL library Stable Baselines

import numpy as np
import matplotlib.pyplot as plt
import random
import time
import sys
import logging
import gym
#import gym_snake  # don't use the registered snake
from stable_baselines3 import DQN

from gym_snake.envs.snake_env import SnakeEnv


log = logging.getLogger("miniproject_snake")
log.setLevel(logging.INFO)
log.addHandler(logging.StreamHandler())


'''
# loading the environment if you've installed and registered it
env = gym.make("snake-v0")
env = DummyVecEnv([lambda: env])
'''

'''
# loading the environment locally (without registering); advantage is that you change the environment 
# without the need for reinstalling and reregistering
# some algorithms require a vectorizes environment
env = DummyVecEnv([lambda: SnakeEnv(grid_size=[6, 6], snake_size=2, random_init=False)])
env.envs[0].grid_size = [6, 6]
env.envs[0].snake_size = 2
env.envs[0].random_init = False
'''

#env = SnakeEnv(grid_size=[6, 6], snake_size=2, random_init=False)
env = SnakeEnv(grid_size=[6, 6], snake_size=2)
obs = env.reset()  # construct instance of game

'''
print("grid_size", env.grid_size)
print("unit_size", env.unit_size)
print("unit_gap", env.unit_gap)
print("snake_size", env.snake_size)
print("n_snakes", env.n_snakes)
print("n_foods", env.n_foods)
snakes_array = env.controller.snakes
snake_object1 = snakes_array[0]
'''

'''
# get zoom-able & resize-able notebook, if you want to work interactively. BUG: only one zoomable notebook can be active. 
# If you don’t deactivate (end interaction to) it, you can’t draw another and get weird bugs in the following cells. 
# This is really bad if you have loops that generate plots in your code…
%matplotlib notebook
%matplotlib inline
env.render()
env.render()
'''

UP = 0
RIGHT = 1
DOWN = 2
LEFT = 3

training = True
if training:
    model = DQN("MlpPolicy", env, verbose=1, tensorboard_log="tensorboard_logs/snake_dqn_agent/")
    #model = DQN("MlpPolicy", env, verbose=1, learning_rate=0.0005, gamma=0.99, policy_kwargs=dict(layers=[64, 64]), tensorboard_log="tensorboard_logs/snake_dqn_agent/")

    model.learn(total_timesteps=10000, reset_num_timesteps=False)
    model.save("learned_models/snake_dqn_agent")
else:
    #del model # remove to demonstrate saving and loading
    model = DQN.load("learned_models/snake_dqn_agent")

log.info("finished training, now use the model and render the env")
obs = env.reset()
done = False
log.info("start game")
while not done:
    env.render()
    action, _states = model.predict(obs)
    obs, reward, done, info = env.step(action)
    #log.info("reward: %.0f", reward)
env.close()


## Example agent for 2. turn-based multi-player snake

Normally, in multi-player snake, all snakes move one step every world-tick. However this poses a problem, as most RL libraries don't have multi-agent support. A relatively easy way out is to use multi-player snake in a turn-based way. Turn-based means that at every world-tick only one snake advances and get a reward. The original snake environment has been adapted in such a way that it **only supports turn-based multi-player snake**. If you want to use the *normal* multi-player snake, use the environment from the original repo.

To use the environment for turn-based multi-player snake:
* declare an environment with multiple snakes
* pass a single action `DOWN` and receive a single reward `1`. This is for the single snake that moved (the others didn't move)

The [README](https://github.com/grantsrb/Gym-Snake) describes that you should use the `snake_extrahard_env.py` environment for the multi-player snake. This is not the case. **You can use `snake_env.py` for turn-based multi-player snake**.

In [None]:
# Example agent for *turn-based* multi-player snake

import numpy as np
import matplotlib.pyplot as plt
import random
import time
import sys
import logging
import gym
#import gym_snake  # don't use the registered snake
from gym_snake.envs.snake_env import SnakeEnv

log = logging.getLogger("miniproject_snake")
log.setLevel(logging.INFO)
log.addHandler(logging.StreamHandler())

NOMOVE = -1
UP = 0
RIGHT = 1
DOWN = 2
LEFT = 3

#env = SnakeEnv(grid_size=[12, 12], snake_size=2, n_snakes=3, n_foods=3)
env = SnakeEnv(grid_size=[9, 9], snake_size=2, n_snakes=2, n_foods=2)
obs = env.reset()  # construct instance of game
done = False
log.info("start game")
for i in range(30):
    if not done:
        env.render()
        #action = [DOWN, DOWN]  # *normal* multi-player snake: all snakes move at the same time and you receive a list of rewards
        action = DOWN  # turn-based multi-player snake: one snake moves, other snakes don't move
        obs, reward, done, info = env.step(action)  # reward is for the snake that has moved
        log.info("reward: %.0f", reward)
env.close()

## Solution for turn-based multi-player snake

As for single-player snake, DQN from the RL library StableBaselines has been used. 

### Discussion
    
As for single-player snake, please verify that during training the *mean 100 episode reward* slowly increases as training proceeds, indicating that the agent is learning. Also look at the tensorboard logs to assess learning behavior.

100 times more learning needed to get any decent learning.

In [None]:
# Solution for turn-based multi-player snake using RL library Stable Baselines

import numpy as np
import matplotlib.pyplot as plt
import random
import time
import sys
import logging
import gym
#import gym_snake  # don't use the registered snake
from stable_baselines3 import DQN

from gym_snake.envs.snake_env import SnakeEnv

log = logging.getLogger("miniproject_snake")
log.setLevel(logging.INFO)
log.addHandler(logging.StreamHandler())

env = SnakeEnv(grid_size=[9, 9], snake_size=2, n_snakes=2, n_foods=2)
obs = env.reset()

training = True
if training:
    model = DQN("MlpPolicy", env, verbose=1, tensorboard_log="tensorboard_logs/multisnake_dqn_agent/")

    model.learn(total_timesteps=100000, reset_num_timesteps=False)
    model.save("learned_models/multisnake_dqn_agent")
else:
    model = DQN.load("learned_models/multisnake_dqn_agent")

log.info("finished training, now use the model and render the env")
obs = env.reset()
done = False
log.info("start game")
while not done:
    env.render()
    action, _states = model.predict(obs)
    obs, reward, done, info = env.step(action)
    #log.info("reward: %.0f", reward)
env.close()

# 3. multi-player snake (cygni)

## Installing & running the snake client

**Installing the environment for the snake client**
* `conda create -n py37 python=3.7` (the snake client requires python 3.7)
* `conda activate py37`
* `conda install numpy`
* `conda install colorlog`
* `pip install autobahn` (not available in conda)
* download [this](https://github.com/cygni/snakebot-client-python/tree/master/client) folder

**Running the client in training mode**
* go the snake client folder
* `python client/client.py -r snakebot.avans-informatica-breda.nl -p 8080 -l info`
* Your bot plays against 4 randomly chosen bots
* the one-before-last line of output contains the URL where the game can be replayed

**Running the client in tournament mode**
* go to http://snakebot.avans-informatica-breda.nl:8090/#/?_k=ynfagb
* login: emil/lime
* click “tournament” menu and create a tournament, e.g. “ercotournament”
* `python client/client.py -r snakebot.avans-informatica-breda.nl -p 8080 -l info -v tournament`
* add multiple bots in the same way, each with a different name
* click “start tournament”
* click “go to game” (the game is started and played very quickly)
* some seconds later the GUI shows the game in slow mo
* tournament options can be changed


## The python client

**Server logic**
* turn-based server: all snakes are called once every world tick
* server calls `get_next_move(self, game_map)` on every client
* client has 250ms to respond (no timely response means no turn -> snake continues in same direction)
* so as a client you perform a turn, and in the next invocation of `get_next_move(self, game_map)` you receive the updated map with which you can decide on the reward. This will take some programming effort.


**Client logic**
* pass `-l debug` instead of `-l info` to get more info about what's happening.
* `snake.py` contains the actual snake logic. `snake.py`is passive; it's `client.py` that invokes methods on the `snake.py`.
* `util.py` utility methods, for example to translate position to coordinates and vice versa
* `client.py` creates the snake object and communicates with the server. No need to change code here.
* `messages.py` contains definitions for communication with the server. No need to change code here.

Note that as the full game map is passed to the snake with every invocation of `get_next_move(self, game_map)` the snake has full knowledge of the world!

Let's have a look at the [code](https://github.com/cygni/snakebot-client-python/tree/master/client).


**Game world: game map**

Map position: from left to right, from top to bottom, starting counting from 0

![](gameworld.png)

`log.debug("*****game map content: " + str(vars(game_map))) (snake.py:15)` will give as output:

```
{
   "game_map":{
      "width":46,
      "height":34,
      "worldTick":0,
      "snakeInfos":[
         {
            "name":"StraightBot",
            "points":0,
            "positions":[
               219
            ],
            "tailProtectedForGameTicks":0,
            "id":"da72d46b-5727-472d-a901-f5d54e70468e"
         },
         {
            "name":"ercosnake.py",
            "points":0,
            "positions":[
               194
            ],
            "tailProtectedForGameTicks":0,
            "id":"afe9ef0d-5904-41fb-b146-cca05471148e"
         },
         {
            "name":"RandomBot",
            "points":0,
            "positions":[
               1495
            ],
            "tailProtectedForGameTicks":0,
            "id":"d14ea681-7290-4bb9-8f79-843bf2bf5736"
         },
         {
            "name":"Snakey",
            "points":0,
            "positions":[
               1008
            ],
            "tailProtectedForGameTicks":0,
            "id":"ca3e65ba-5633-4478-8f91-b09989d46f52"
         },
         {
            "name":"StraightBot",
            "points":0,
            "positions":[
               969
            ],
            "tailProtectedForGameTicks":0,
            "id":"777ec222-3959-484e-80ac-69907f5f1c9e"
         }
      ],
      "foodPositions":[
      ],
      "obstaclePositions":[
         70,
         450,
         451,
         496,
         497,
         1360,
         1361,
         1362,
         1406,
         1407,
         1408,
         1430,
         1452,
         1453,
         1454,
         1532
      ]
   },
   "width":46,
   "height":34
}
```


**Game world: default game rules**

* Snake grows every third game tick
* Each client must respond within 250ms
* 1 point per Snake growth
* 2 points per star consumed
* 10 points per tail nibble
* 5 points per caused death (another snake crashes and dies into your snake)
* 5 black holes
* A nibbled tail is protected for 3 game ticks
* **The last surviving Snake always wins. The ranking for dead snakes is based on accumulated points.**


**Game world: game settings**

`log.debug('*****player_registered: %s', msg) (client.py: 116)` will give as output the game setting:
```
{
   "type":"se.cygni.snake.api.response.PlayerRegistered",
   "gameId":"b2eaa2de-2a04-4f71-9e31-7f9299dcd92a",
   "name":"ercosnake.py",
   "gameSettings":{
      "maxNoofPlayers":5,
      "startSnakeLength":1,
      "timeInMsPerTick":250,
      "obstaclesEnabled":True,
      "foodEnabled":True,
      "headToTailConsumes":True,
      "tailConsumeGrows":False,
      "addFoodLikelihood":15,
      "removeFoodLikelihood":5,
      "spontaneousGrowthEveryNWorldTick":3,
      "trainingGame":False,
      "pointsPerLength":1,
      "pointsPerFood":2,
      "pointsPerCausedDeath":5,
      "pointsPerNibble":10,
      "noofRoundsTailProtectedAfterNibble":3,
      "startFood":0,
      "startObstacles":5
   },
   "gameMode":"TRAINING",
   "receivingPlayerId":"4165f2d1-0114-486c-ab45-ec0e522c0c47",
   "timestamp":1569673113458
}
```


**Snake client and ML**
* Heavily training the client via the network might be problematic -> you might decide to run the server locally.
* The code has clearly been written by software engineers. Some rewriting of the snake client is needed to make it usable for ML-training. 
* A way to run the snake multiple times is to change the following code in `client.py`:

```
 factory = WebSocketClientFactory(u"ws://%s:%s/%s" % (args.host, args.port, args.venue))
 factory.protocol = SnakebotProtocol
 coro = loop.create_connection(factory, args.host, args.port)
 loop.run_until_complete(coro)
 loop.run_forever()
 loop.close()
 sys.exit(0)
 ```
  
  to

```
factory = WebSocketClientFactory(u"ws://%s:%s/%s" % (args.host, args.port, args.venue))
    for i in range(nr_of_episodes):
        factory.protocol = SnakebotProtocol
        coro = loop.create_connection(factory, args.host, args.port)
        loop.run_until_complete(coro)
        loop.run_forever()
    loop.close()
    sys.exit(0)
```
  
* The cygni code is such that the communication protocol class opens a state machine class (== the snake). This is not suitable for ML training. To change this, we can make the snake a static variable that is shared between instances of the communication protocol class. In python you can create a static variable as follows:

```
# python equivalent of static variable
class Snake:
    def __init__(self):
            self.name = ""

class SnakebotProtocol:
    static_snake = Snake()
    
    def get_snake_name(self):
        return self.static_snake.name
        
    def set_snake_name(self, name):
        self.static_snake.name = name        

snakebot_protocol1 = SnakebotProtocol()
snakebot_protocol1.set_snake_name("dqn_snake")
snakebot_protocol2 = SnakebotProtocol()
print(snakebot_protocol2.get_snake_name())
```

[Based on](https://github.com/cygni/snakebot/)