# Using FAI to solve Atari environments

# TLDR 

1. In the notebook toolbar click Kernel -> **Restart & Run all**.
2. **Wait a bit while you enjoy how the Agent plays MsPacman**.
3. **You should have finished** at least the **first level of MsPacman-v0** using a uniform prior, and about 150 samples per action.
4. There is a **video** of the game played **inside** the ***videos* folder** of this repository.

### Import everything we will need

In [1]:
from fractalai.model import RandomDiscreteModel
from fractalai.environment import ExternalProcess, ParallelEnvironment, AtariEnvironment
from fractalai.fractalmc import FractalMC

## Available games 

This is a list of all the Atari games that can be played in Openai Gym using RGB images as observations. Just by changing the game name you can see how the algorithm performs on different environments.

**['AirRaid-v0',  'Alien-v0', 'Amidar-v0', 'Assault-v0', 'Asterix-v0', 'Asteroids-v0', 'Atlantis-v0', 'BankHeist-v0',
 'BattleZone-v0', 'BeamRider-v0', 'Berzerk-v0', 'Bowling-v0', 'Boxing-v0', 'Breakout-v0', 'Carnival-v0',
 'Centipede-v0', 'ChopperCommand-v0', 'CrazyClimber-v0', 'DemonAttack-v0', 'DoubleDunk-v0', 'ElevatorAction-v0',
 'Enduro-v0', 'FishingDerby-v0', 'Freeway-v0', 'Frostbite-v0', 'Gopher-v0', 'Gravitar-v0', 'Hero-v0', 'IceHockey-v0', 'Jamesbond-v0', 'JourneyEscape-v0', 'Kangaroo-v0', 'Krull-v0', 'KungFuMaster-v0', 'MontezumaRevenge-v0', 'MsPacman-v0', 'NameThisGame-v0', 'Phoenix-v0', 'Pitfall-v0', 'Pong-v0', 'Pooyan-v0', 'PrivateEye-v0', 'Qbert-v0',
 'Riverraid-v0', 'RoadRunner-v0', 'Robotank-v0', 'Seaquest-v0', 'Skiing-v0', 'Solaris-v0', 'SpaceInvaders-v0',
 'StarGunner-v0', 'Tennis-v0', 'TimePilot-v0', 'Tutankham-v0', 'UpNDown-v0', 'Venture-v0', 'VideoPinball-v0',
 'WizardOfWor-v0', 'YarsRevenge-v0', 'Zaxxon-v0']**

### Using Ram as observations

Instead of a matrix of pixels, you can also use the ram of the Atari as observations. This will make the calculations a bit lighter, so do not be afraid to check it out!

In order to use RAM as observations, add the "-ram-" suffix after the game name, and before "v0", as shown here:

> 'MsPacman-v0' --> 'MsPacman**-ram**-v0'

## Interpreting the parameter choice

The agent relies on four parameters:

- **Fixed steps**: It is the number of consecutive times that we will apply an action to the environment when we perturb it choosing an action. Although this parameter actually depends on the Environment, we can use it to manually set the frequency at which the Agent will play. Taking more consecutive actions allows for exploring further in the future at the cost of less reaction time.

- **Time Horizon**: This value represents "how far we need to look into the future when taking an action". A useful rule of thumb is **Time Horiozon = Nt / Fixed steps**, where **Nt** is the number of frames that it takes the agent to loose one life, (die) since the moment it performs the actions that inevitably lead to its death. This parameters determines the time horizon of the bigger potential well that the Agent should be able to escape.

- **Max states**: This is the maximum number of walkers that can be part of the Swarm. This number is related to "how thick" we want the resulting causal cone to be. The algorithm will try to use the maximum number of walkers possible. 

- **Max samples**: This is the maximum number of times that we can make a perturbation when using a Swarm to build a causal cone. It is a superior bound, and the algorithm will try to use less samples to meet the defined **time horizon**. It is a nice way to limit how fast you need to take an action. A reasonable value could be **max walkers** \* **time horizon** \* ***N***, being ***N=5*** a number that works well in Atari games, but it depends on the task.


You can take a look at the [Fractal AI Performance Sheet](https://docs.google.com/spreadsheets/d/1JcNw2L0YL_I2iGZPJ0bNKJshlTaqMuEl5CP2W5zie6M/edit?usp=sharing) to check the parameters we used to run our experiments.

## Practical example

 ### Minimal Pacman

We will tune the Agent to get a decent score on MsPacman using the minimum amount of computational resources possible. We will deliberately set a very small amount of computational resources for calculating an action.

Doing that we want to address concerns about edge cases of the theory, by showing how the algorithm performs when the size of the swarm Swarm is very little with respect to the size of the state space.

In order to do so, we can give the parameters the following values:

#### Environment Parameters

In [2]:
name = "MsPacman-ram-v0"
render=True # It is funnier if the game is displayed on the screen
clone_seeds = False  # This will speed things up a bit
max_steps = 1e6  # Play until the game is finished.
skip_frames = 80  # The Agent cannot do anything anyway, so it is faster if we skip some frames at the begining
n_repeat_action = 2  # Atari games run at 20 fps, so taking 4 actions per seconds is more 
reward_limit = 10000
render_every = 2
dt_mean = 3
dt_std = 3


#### FAI parameters

In [3]:
max_samples = 5500  # Let see how well it can perform using at most 300 samples per step
max_walkers = 150  # Let's set a really small number to make everthing faster
time_horizon = 25  # 50 frames should be enough to realise you have been eaten by a ghost

With these parameters we are aiming for 100 samples per step, saving up to another 200 samples in case the agent runs into trouble. Using such a low number of samples will mean that the performance could vary widely among different runs.

In our tests, this agent was capable of finishing the first level most of the times if we set max_states = 15 (150 samples). Using only 100 samples will make it hard for the Agent to find rewards that are far away, so at the end of the first level you will be relying mostly on luck.

If you want to get better scores, just increase the values of the parameters accordingly.

### Creating the agent

In [4]:
env = ParallelEnvironment(name=name,env_class=AtariEnvironment,
                          blocking=False, n_workers=8, n_repeat_action=n_repeat_action)  # We will play an Atari game
model = RandomDiscreteModel(max_wakers=max_walkers,
                            n_actions=env.n_actions, samples=10000) # The Agent will take discrete actions at random

fmc = FractalMC(model=model, env=env, max_walkers=max_walkers,
                reward_limit=reward_limit, render_every=render_every,
                time_horizon=time_horizon, dt_mean=dt_mean, dt_std=dt_std)

In [5]:
fmc.run_agent(render=True)

## Replay Game

In [None]:
fmc.render_game()

## We will really appreciate your feedback