# Training an Agent to Solve a MiniGrid Environment

This notebook demonstrates how to train an agent to solve the `MiniGrid-SimpleCrossingS9N1-v0` environment. The agent will take random actions until the mission is complete, and we will display the sequence of actions taken to complete the task.

## Import Libraries

We start by importing the necessary libraries. The `gymnasium` library is used for creating and interacting with the environment, and the `random` library is used to select random actions.

In [6]:
import gymnasium as gym
import random

## Create the Environment

We create the `MiniGrid-SimpleCrossingS9N1-v0` environment using the `gymnasium` library. This environment involves navigating a grid with some obstacles.

In [2]:
# Create the environment
env = gym.make('MiniGrid-SimpleCrossingS9N1-v0',
            #    render_mode = 'human'
               )

## Reset the Environment

We reset the environment to get an initial observation. This sets up the environment and returns the initial state.

In [3]:
# Reset the environment to get an initial observation
obs, _ = env.reset(seed = 46)

## Define a Function to Solve the Mission

We initialize an empty list to store the sequence of actions taken by the agent. We then loop until the mission is complete, performing random actions and appending them to the action sequence.

In [4]:
# Initialize the sequence of actions
action_sequence = []

# Loop until the mission is complete
terminated = False
while not terminated:
    
    # Perform a random action
    action = random.choice(list(env.get_wrapper_attr('actions')))
    _, reward, terminated, _, _ = env.step(action)
    
    # Append the action to the sequence
    action_sequence.append(action)

## Solve the Mission

We run the above loop to solve the mission by taking random actions. After the loop terminates, we print the reward received by the agent and the sequence of actions taken to complete the task.

In [5]:
# Printing reward of agent
print("reward:", reward)

# Printing action sequence to complete the task
for action in action_sequence:
    print("Action sequence:", action.name)

reward: -0.2055555555555555
Action sequence: drop
Action sequence: done
Action sequence: forward
Action sequence: left
Action sequence: done
Action sequence: right
Action sequence: left
Action sequence: pickup
Action sequence: drop
Action sequence: forward
Action sequence: left
Action sequence: forward
Action sequence: pickup
Action sequence: pickup
Action sequence: pickup
Action sequence: done
Action sequence: forward
Action sequence: done
Action sequence: toggle
Action sequence: pickup
Action sequence: forward
Action sequence: left
Action sequence: left
Action sequence: forward
Action sequence: right
Action sequence: left
Action sequence: toggle
Action sequence: pickup
Action sequence: left
Action sequence: toggle
Action sequence: done
Action sequence: drop
Action sequence: left
Action sequence: drop
Action sequence: drop
Action sequence: drop
Action sequence: right
Action sequence: pickup
Action sequence: drop
Action sequence: drop
Action sequence: left
Action sequence: forward
Acti