# Mario Environment Setup

In [4]:
%pip install gym_super_mario_bros==7.3.0 nes_py

Collecting gym_super_mario_bros==7.3.0
  Using cached gym_super_mario_bros-7.3.0-py2.py3-none-any.whl (198 kB)
Collecting nes_py
  Using cached nes_py-8.2.1.tar.gz (77 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: nes_py
  Building wheel for nes_py (setup.py): started
  Building wheel for nes_py (setup.py): finished with status 'done'
  Created wheel for nes_py: filename=nes_py-8.2.1-cp39-cp39-win_amd64.whl size=45819 sha256=7e379aa5587b8e944c4d76c4cb56747e688181995e0c8bddf98d130aed9c939a
  Stored in directory: c:\users\anurag verma\appdata\local\pip\cache\wheels\c6\e1\4b\dbbd5d4a46ad80c0149d5671edb272c728c130e4d5750ca1d2
Successfully built nes_py
Installing collected packages: nes_py, gym_super_mario_bros
Successfully installed gym_super_mario_bros-7.3.0 nes_py-8.2.1
Note: you may need to restart the kernel to use updated packages.


In [44]:
# Import the game
import gym_super_mario_bros
# Import the Joypad wrapper
from nes_py.wrappers import JoypadSpace
# Import the SIMPLIFIED controls
from gym_super_mario_bros.actions import SIMPLE_MOVEMENT

* Use of the `make()` function to setup the simulation

* The `make()` function takes one argument of type str, which is the name of the RL task.
* The RL task and its simulation is usually called environment in RL.

In [45]:
# Setup game
env = gym_super_mario_bros.make('SuperMarioBros-v0')
env = JoypadSpace(env, SIMPLE_MOVEMENT)

The first line imports the Super Mario Bros environment using the gym_super_mario_bros.make() function. This function takes the name of the game as an argument, in this case, "SuperMarioBros-v0". The environment represents the game and provides an interface to interact with it.

The second line wraps the environment with the JoypadSpace function, which allows the use of pre-defined actions for the game. The SIMPLE_MOVEMENT variable represents a set of simple actions that can be performed in the game, such as moving left or right, jumping, etc.

In [46]:
# checking the environment
print(env.observation_space)

Box(0, 255, (240, 256, 3), uint8)


The code `print(env.observation_space)` is used to check the observation space of the Super Mario Bros environment that was previously set up.

The observation space is a description of the state of the environment at any given time, and it represents what the agent can see and use to make decisions. The output of this code will typically be a description of the shape and type of the observation space, which can be a tuple or a Box object.

For example, the output could be `(240, 256, 3)`, which means that the observation space is a 3-dimensional array of size 240x256 pixels, with 3 color channels (RGB).

By knowing the observation space, an agent can use it to determine what actions to take based on what it sees in the environment, and how to represent the state of the environment in its internal model.

In [47]:
# checking the action space
print(env.action_space)


Discrete(7)


The action space is a description of the possible actions that an agent can take in the environment. The output of this code will typically be a description of the type and shape of the action space.

In [48]:
env.reset()
# checking the environment
print(env.observation_space)
# checking the action space
print(env.action_space)

Box(0, 255, (240, 256, 3), uint8)
Discrete(7)


In [74]:
action = env.action_space.sample()
action

3

The code `action = env.action_space.sample()` generates a random action from the action space of the Super Mario Bros environment that was previously set up.

The `env.action_space.sample()` method selects a random action from the action space. The output of this code will typically be an integer that represents the selected action.

By generating a random action, an agent can explore the environment and learn more about how different actions affect the state and reward. However, random actions are usually not very effective for achieving high scores or completing tasks, and more sophisticated algorithms and strategies are needed to achieve good performance.

In [56]:
SIMPLE_MOVEMENT

[['NOOP'],
 ['right'],
 ['right', 'A'],
 ['right', 'B'],
 ['right', 'A', 'B'],
 ['A'],
 ['left']]

* Use of the `reset()` function to reset the environment to its initial state.
* After setup, you can usually inspect the environment anytime by calling the `render()` function.

In [54]:
# reset the environment
env.reset()
# render the environment
env.render()

In [None]:
# Create a flag - restart or not
done = True
# Create a counter for the number of frames
frame = 0
# Create a counter for the number of episodes
episode = 0

# Loop through the episodes
while True:
    # Restart the game
    if done:
        # Reset the environment
        state = env.reset()
        # Increment the episode counter
        episode += 1
    # Render the environment
    env.render()
    # Take a random action
    action = env.action_space.sample()
    try:
        # Get the next state, reward, done and info
        next_state, reward, done, info = env.step(action)
    except ValueError:
        next_state, reward, done = env.step(action)
        info = None
    # Increment the frame counter
    frame += 1
    # Print the number of frames and episodes
    print('Frames: %s, Episodes: %s' % (frame, episode), end='\r')

# Close the environment
env.close()


This code sets up a game environment using the `gym_super_mario_bros` library and creates a flag variable named `done`, a `frame` counter frame, and an `episode` counter episode. The program runs in an infinite loop that starts a new episode whenever the `done` flag is set to `True`. Inside the loop, the environment is rendered using the `env.render()` function, and a random action is selected from the action space using `env.action_space.sample()`. The selected action is applied to the environment using `env.step(action)`, and the resulting `next_state`, `reward`, `done`, and `info` variables are returned. The `frame` and `episode` counters are updated accordingly, and their values are printed to the console. The loop continues until the program is terminated. Finally, the environment is closed using `env.close()`.