The Pacman environment
----------------------
The Pacman is an agent that moves around the grid. When it encounters a cell with a breadcrumb, it "eats" it and gets a
positive reward. When all of the breadcrumbs are consumed, he wins.

The Pacman moves 1 cell for each timestep in one of 4 directions; up, down, left, right. All 4 sides of the 
environment grid contain obstacles so that the agent cannot move out of the grid. In addition there are more obstacles
inside the grid as well. If the Pacman tries to move into an obstacle, he gets a negative reward and returns back to the 
cell he moved from. If the Pacman moves into an empty cell, he gets a small negative reward. This is an incentive to get 
game completed as quickly as possible.

The whole algorithm is run from the cell below.

Before running this code, check in the config.py file for the configurable parameters used in this algorithm.
This includes the number of breadcrumbs, the location of the breadcrumbs and the location of the obstacles.
The Pacman is more likely to win if there are fewer breadcrumbs in the grid.

The Pacman learns from completing a large number of episodes. If you intend to run lots of episodes, it is a 
good idea to set the flag show_step = False as step by step changes will output too many lines.

The ghost provides stochasticity in the environment. 

If the Pacman moves onto the ghost, he loses. There is a flag "Hyper.is_ghost" which when set to True ensures
there is a ghost in the environment.

In [1]:
from grid import Pacman_grid
from config import Hyper, Constants

print("\n"*10)
print("-"*100)
print("Start of environment design for Pacman")
Hyper.display()
print("-"*100)
print("The grid is updated and printed for each step. In each cell you see the following symbols:")
print("X - obstacle")
print(". - empty")
print("b - breadcrumb")
print("A - Agent (Pacman)")
print("G - Ghost")
pacman_grid = Pacman_grid()
for i in range(Hyper.total_episodes):
    pacman_grid.reset()
    done = False
    while done == False:
        if Hyper.is_ghost:
            done = pacman_grid.ghost_step(i)
        else:
            done = pacman_grid.step(i)
        pacman_grid.policy.update_epsilon()
    episodes = i + 1
    pacman_grid.print_episode_results(episodes)
    pacman_grid.save_episode_stats()

pacman_grid.print_results()
print("\nThe grid is updated and printed for each step. In each cell you see the following symbols:")
print("X - obstacle")
print(". - empty")
print("b - breadcrumb")
print("A - Agent (Pacman)")
print("G - Ghost")
print("\n"*3)  
print("-"*100)
Hyper.display()
print("End of environment design for Pacman")
print("-"*100)












----------------------------------------------------------------------------------------------------
Start of environment design for Pacman
The Hyperparameters
-------------------
Threshold for exploitation (epsilon) = 0.95
epsilon decay = 0.995
minimum value of epsilon = 0.001
learning rate (alpha) = 0.8
discount factor (gamma) = 0.51
total number of breadcrumbs 10
----------------------------------------------------------------------------------------------------
The grid is updated and printed for each step. In each cell you see the following symbols:
X - obstacle
. - empty
b - breadcrumb
A - Agent (Pacman)
G - Ghost
Completed environment after 1 episodes and 99 timesteps, total reward: -4543 with epsilon: 0.5783737835841118
You lost to the ghost!
Completed environment after 2 episodes and 43 timesteps, total reward: -1462 with epsilon: 0.4662309189661894
You lost to the ghost!
Completed environment after 3 episodes and 38 timesteps, total reward: -2088 with epsilon: 0.38

When all of the episodes are completed, look in the local images folder to view the results