In [1]:
import sys
import numpy as np
import math
import random
import gym
import Pacman

## How the game is set: 

Pacman's goal is to eat all the balls in the grid without dying. At the end of each episode is reported the total reward that has been accumulated and the max_reward that is computed based on how many balls are on the map. If Pacman hits a ghost, it dies losing an high number of points and returns to the starting position without resetting the map, so the balls already eaten will not be restored. In the original game this happens a maximum of 3 three times, after which you die permanently and the map is reset. In this simplified version there is no limit to the times pacman can return to the starting position when it dies without resetting the map.

The size of the grid is not editable and it is 8x8 in size. You can define step_cost, ghosts, and obstacles. The environment is fully observable. Ghosts and walls are static. Pacman performs random actions. As shown in the guide on the gym website, it is possible to play "max_episodes" times each of which is made up of "max_iter_for_ep" iterations

In [2]:
# Hyperparameters
max_episodes=1
max_iter_for_ep=10
# the step_cost is the reward of each ball and it is multipled for the negative reward when Pacman hits a ghosts
step_cost = 1
# It is not necessary to set perimeter obstacles because they will be set by default
ostacoli =[(2, 3),
           (3, 2), (3, 3),
           (4, 2), 
           (5, 4), (5, 5)]
fantasmi = [(1, 5),
            (3, 1), (3, 6),
            (4, 5), 
            (6, 2)]

In [3]:
my_env = gym.make('PacManGame-v0', step_cost=step_cost, ghosts=fantasmi, obstacles=ostacoli)

for i in range(max_episodes):
    my_env.reset()
    print(my_env.descrEnv())
    my_env.render()
    for j in range(max_iter_for_ep):
        obs, reward, done, info = my_env.step(my_env.action_space.sample()) # take a random action
        print(info)
        my_env.render()
        if(done):
            print("\nYou ate all the balls. You won!!")
            break;
    datiRew = my_env.ritValReward()
    print("\nEpisode Completed\nThe final reward of the episode number "+str(i+1)+" is "+str(datiRew[0])+" out of a maximum of "+str(datiRew[1]))
    print("\n####################################################\n")

The environment grid is 8x8 in size and it is composed as follows:

[['Wal' 'Wal' 'Wal' 'Wal' 'Wal' 'Wal' 'Wal' 'Wal']
 ['Wal' 'Pac' '0.0' '0.0' '0.0' 'Gho' '0.0' 'Wal']
 ['Wal' '0.0' '0.0' 'Wal' '0.0' '0.0' '0.0' 'Wal']
 ['Wal' 'Gho' 'Wal' 'Wal' '0.0' '0.0' 'Gho' 'Wal']
 ['Wal' '0.0' 'Wal' '0.0' '0.0' 'Gho' '0.0' 'Wal']
 ['Wal' '0.0' '0.0' '0.0' 'Wal' 'Wal' '0.0' 'Wal']
 ['Wal' '0.0' 'Gho' '0.0' '0.0' '0.0' '0.0' 'Wal']
 ['Wal' 'Wal' 'Wal' 'Wal' 'Wal' 'Wal' 'Wal' 'Wal']]

Next selected action: down
Reward obtained from this action: 1.0

[['Wal' 'Wal' 'Wal' 'Wal' 'Wal' 'Wal' 'Wal' 'Wal']
 ['Wal' '0.0' '0.0' '0.0' '0.0' 'Gho' '0.0' 'Wal']
 ['Wal' 'Pac' '0.0' 'Wal' '0.0' '0.0' '0.0' 'Wal']
 ['Wal' 'Gho' 'Wal' 'Wal' '0.0' '0.0' 'Gho' 'Wal']
 ['Wal' '0.0' 'Wal' '0.0' '0.0' 'Gho' '0.0' 'Wal']
 ['Wal' '0.0' '0.0' '0.0' 'Wal' 'Wal' '0.0' 'Wal']
 ['Wal' '0.0' 'Gho' '0.0' '0.0' '0.0' '0.0' 'Wal']
 ['Wal' 'Wal' 'Wal' 'Wal' 'Wal' 'Wal' 'Wal' 'Wal']]

Next selected action: down
Reward obtained fro