## Necassary Imports

### You need to download the following packages:
- `pip install gym-super-mario-bros==7.4.0`
- `pip install nes-py==8.2.1`
- `pip install gym==0.23.1`

In [1]:
import gym_super_mario_bros
from nes_py.wrappers import JoypadSpace
from gym_super_mario_bros.actions import SIMPLE_MOVEMENT

In [2]:
SIMPLE_MOVEMENT # This is the list of possible actions that our character can take

[['NOOP'],
 ['right'],
 ['right', 'A'],
 ['right', 'B'],
 ['right', 'A', 'B'],
 ['A'],
 ['left']]

## Setup Our Mario Game

### Setting up our the environment

In [3]:
env = gym_super_mario_bros.make('SuperMarioBros-v0')
actions_before_wrapping = env.action_space # This is the number of possible actions that our character can take berfore wrapping
env = JoypadSpace(env,SIMPLE_MOVEMENT)

  logger.warn(


- Comparing Number of actions before and after the wrapper

- We can see that JoypadSpace wrapper reduces the action space from 256 discrete values to 7 discrete values which will make our model train faster

In [4]:
print("Number of actions before wrapping: ", actions_before_wrapping)
print("Number of actions after wrapping: ", env.action_space)

Number of actions before wrapping:  Discrete(256)
Number of actions after wrapping:  Discrete(7)


- This means that the game screen will be of size 240 x 256 x 3 (RGB) and the action space will be of size 7

In [5]:
print("Observation space: ", env.observation_space.shape)

Observation space:  (240, 256, 3)


- All the possible actions that we can take are:

In [6]:
print("Action List: ", SIMPLE_MOVEMENT)

Action List:  [['NOOP'], ['right'], ['right', 'A'], ['right', 'B'], ['right', 'A', 'B'], ['A'], ['left']]


### Game loop

In [15]:
done = True
for step in range(100000):
    if done:
        # starts the game or resets it if it is over
        state = env.reset()
    state, reward, done, info = env.step(env.action_space.sample()) # random action
    env.render() # render the game to the screen
env.close() # closes the game window

### Understanding the reward system

- Actions:
    - `env.action_space.sample()` returns a random action

    - `env.action_space.n` returns the number of actions

    - Since we have 7 actions, we can take any action from 0 to 6 & thus the `env.action_space.sample()` returns number from 0 to 6
- Step:
    `env.step(action)` 
    
    - The step function takes in an action and returns the next state, the reward for that action, whether the game is over or not and some additional information

    - Examples to clarify how step works:

    - `env.step(1)[0]` returns the next state

    - `env.step(1)[1]` returns the reward

    - `env.step(1)[2]` returns whether the game is over or not
    
    - `env.step(1)[3]` returns information after taking a specific action
- Reset:
    - `env.reset()` resets the environment and returns the initial state
- Info:
    - `env.step(1)[3]` returns information after taking a specific action
    - it returns a dictionary, the below table will illustrate the keys and their values:

        | Key | Type | Description |
        | --- | --- | --- |
        | coins | int | Number of coins collected |
        | flag_get | bool | True if the level was completed |
        | life | int | Remaining lives |
        | score | int | Current score |
        | stage | int | Current stage |
        | status | str | Level status |
        | time | int | Remaining time |
        | world | int | Current world |
        | x_pos | int | Mario's x position |
        | y_pos | int | Mario's y position |


#### Taking samples of the state, reward, done and info

In [15]:
env.step(1)[0] # returns the state of the game after taking action 0

array([[[104, 136, 252],
        [104, 136, 252],
        [104, 136, 252],
        ...,
        [104, 136, 252],
        [104, 136, 252],
        [104, 136, 252]],

       [[104, 136, 252],
        [104, 136, 252],
        [104, 136, 252],
        ...,
        [104, 136, 252],
        [104, 136, 252],
        [104, 136, 252]],

       [[104, 136, 252],
        [104, 136, 252],
        [104, 136, 252],
        ...,
        [104, 136, 252],
        [104, 136, 252],
        [104, 136, 252]],

       ...,

       [[240, 208, 176],
        [228,  92,  16],
        [228,  92,  16],
        ...,
        [228,  92,  16],
        [228,  92,  16],
        [  0,   0,   0]],

       [[240, 208, 176],
        [228,  92,  16],
        [228,  92,  16],
        ...,
        [228,  92,  16],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[228,  92,  16],
        [  0,   0,   0],
        [  0,   0,   0],
        ...,
        [  0,   0,   0],
        [  0,   0,   0],
        [228,  92,  16]]

In [16]:
env.step(1)[1] # returns the reward after taking action 0

0.0

In [13]:
env.step(1)[2] # returns the done boolean after taking action 1

False

In [14]:
env.step(1)[3] # returns the info after taking action 1

{'coins': 0,
 'flag_get': False,
 'life': 2,
 'score': 0,
 'stage': 1,
 'status': 'small',
 'time': 400,
 'world': 1,
 'x_pos': 40,
 'y_pos': 79}