# Minatar integration

## The minatar wrapper.

In **standardized *Reinforcement Learning* (RL) environments and benchmarks**, one usually has:

- a `reset` method with signature `None -> Tensor` (resets and give 1st observation)

- a `step` method with signature `int -> (Tensor, float, bool, dict)` (takes an action steps into the environment and gives an (observation, reward, done, info) tuple)

- a `render` method with signature `None -> None or Tensor` (renders current env state by giving, or not, an image)

However, **MinAtar does not use this standardized approach**! Rather one has access to:

- a `reset` method with signature `None -> None` (resets only)

- an `act` method with signature `int -> (float, bool)` (steps and gives (reward, done) tuple)

- a `state` method with signature `None -> Tensor` (gives an observation)

- a `display_state` method with signature `None -> int(optional)` (renders only)

In order to adapt the minatar benchmark to standard RL environments,

and to do as little changes to the original code as possible,

we implemented a Wrapper class that looks like

In [None]:
class MinatarWrapper(Environment):
    def reset(self):
        """
            Resets the environment.

            Return:
                (observation) the first observation.
        """

    def step(self, actions):
        """
            Steps in the environment.

            Args:
                actions (): the action to take.

            Return:
                (tensor, float, bool, dict) new observation, reward, done signal and complementary informations.
        """

In [None]:
    def render(self, time=0, done=False):
        """
            Resets the environment.

            Args:
                time (int): the number of milliseconds for each frame. if 0, there will be no live animation.
                done (bool): tells if the episode is done.

            Return:
                (Image) the current image of the game.
        """

    def _state(self):
        """
            Reduces the dimensions of the raw observation and normalize it.
        """

in  `_state`, **reduction** and **normalization** tricks are applied.

In [None]:
# sums the object channels to have a single image.
state = np.sum([state[i] * (i+1) for i in range(state.shape[0])], axis=0)

# normalize the image
m, M = np.min(state), np.max(state)
state = 2 * (state - m) / (M - m) - 1

## Environment hyper-definition

### Breakout

In [None]:
breakout = Game(env_name="minatar:breakout",
                actionSelect="softmax",
                input_size=100,
                output_size=6,
                time_factor=0,
                layers=[5, 5],
                i_act=np.full(5, 1),
                h_act=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
                o_act=np.full(1, 1),
                weightCap=2.0,
                noise_bias=0.0,
                output_noise=[False, False, False],
                max_episode_length=1000,
                in_out_labels=['x', 'x_dot', 'cos(theta)', 'sin(theta)', 'theta_dot',
                               'force']
                )
games["minatar:breakout"] = breakout

## Environment hyper-definition
### Freeway

In [None]:
freeway = Game(env_name="minatar:freeway",
                actionSelect="softmax",
                input_size=100,
                output_size=6,
                time_factor=0,
                layers=[5, 5],
                i_act=np.full(5, 1),
                h_act=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
                o_act=np.full(1, 1),
                weightCap=2.0,
                noise_bias=0.0,
                output_noise=[False, False, False],
                max_episode_length=1000,
                in_out_labels=['x', 'x_dot', 'cos(theta)', 'sin(theta)', 'theta_dot',
                               'force']
                )
games["minatar:freeway"] = freeway

## Hyperparameters automatic search.

In [None]:
parameters = OrderedDict(
    popSize=[64, 200],

    prob_addConn=[.025, .1],
    prob_addNode=[.015, .06],
    prob_crossover=[.7, .9],
    prob_enable=[.005, .02],
    prob_mutConn=[.7, .9],
    prob_initEnable=[.8, 1.],
)

In [None]:
class RunBuilder:
    @staticmethod
    def get_runs(parameters):
        runs = []
        for v in product(*parameters.values()):
            runs.append(dict(zip(parameters.keys(), v)))

        return runs

In [None]:
b_fit, b_run = 0, -1
for run in RunBuilder.get_runs(parameters):
    fitness = run_one_hyp(hyp, run)
    if fitness > b_fit:
        b_fit = fitness
        b_run = run

- the runs where ran on **Breakout** because it is a lot faster to evaluate.

- **fitnesses** where **recorded** for further investigations.

however...

results were *not good* at all!

the best set of hyperparameters was:

| popSize | prob_addConn | prob_addNode | prob_crossover | prob_enable | prob_mutAct | prob_mutConn | prob_initEnable | budget |
| ------- | ------------ | ---- | ---- | ---- | ---- | ---- | ---- | ----- |
|    32   |    .025      | .015 |  .7  |  .02 |  .0  |  .9  |  1.  | 50000 |

and the fitness seen during search was 6.0

So, for final training, we have used the above set of parameters.

## The experiment.

- run 50000 learning processes 3 times to show statistical results.

- use the parameters of the above search result.

## The results.

- time spent for Breakout: **~ 2 hours**
- final fitness on Breakout: **max of ~ 2.80**

- time spent for Freeway: **~ 21h** (Cumulated)
- final fitness on Freeway: **14**

A random breakout agent...

<img src="./log/breakout/gifs/3484933263.gif" width="600" align="center">

performing around 0.50

In [None]:
A random freeway agent...

In [None]:
<img src="./357277723.gif" width="600" align="center">

In [None]:
almost never reaches the top

The final best breakout agent given by ***NEAT***:

<img src="./log/breakout/gifs/3292025515.gif" width="600" align="center">

performing around 2.80 and it takes significantly more time to evaluate such an agent!

The final best freeway agent given by ***NEAT***:

In [None]:
<img src="./357277723.gif" width="600" align="center">

In [None]:
performing around 14 but clearly not very efficient... 

In [None]:
Same issue for both problems...

In [None]:
<img src="./Graph.png" width="600" align="center">

In [None]:
local minimum prevent innovation and diversity, species mechanism isn't efficient enough with our hyperparameters set.