# AI Exam

Consider the following environment:

<img src="images/road_env.jpg" style="zoom: 40%;"/>

The agent starts in cell $(0, 0)$ and must reach the goal in cell $(8,6)$. The agent can move in the four directions (except when a wall is present), and for each step taken the agent receives a negative reward.
In cells representing roads with intersections, the agent must wait for the traffic light to turn green before proceeding. At busy intersections (indicated by two traffic lights in the same cell), the agent will have to wait a long time to cross the intersection. This means that if the agent tries to move to another cell, the action may not succeed, causing the agent to remain in the same cell for an unknown amount of time.

Assume that you do not have access to the motion model and to reward and that the problem is undiscounted, find a solution for the environment reported above with a suitable algorithm of your choice, motivating your choice.

In [8]:
import os, sys 

module_path = os.path.abspath(os.path.join('tools'))
if module_path not in sys.path:
    sys.path.append(module_path)

import gym, envs
from utils.ai_lab_functions import *
import numpy as np
from timeit import default_timer as timer
from tqdm import tqdm as tqdm

env_name = 'RoadEnv-v0'
env = gym.make(env_name)

env.render()

print("\nActions encoding: ", env.actions)

# Remember that you can know the type of a cell whenever you need by accessing the grid element of the environment:
print("Cell type of start state: ",env.grid[env.startstate])
print("Cell type of goal state: ",env.grid[env.goalstate])
state = 15 # a very busy intersection
print(f"Cell type of cell {env.state_to_pos(state)}: ",env.grid[state])
state = 10 # a less busy intersection
print(f"Cell type of cell {env.state_to_pos(state)}: ",env.grid[state])

[['S' 'R' 'W' 'W' 'W' 'W' 'R' 'W' 'W']
 ['W' 'Ts' 'R' 'R' 'R' 'R' 'Tl' 'R' 'R']
 ['W' 'R' 'W' 'W' 'W' 'W' 'R' 'W' 'W']
 ['R' 'Ts' 'R' 'Ts' 'R' 'R' 'Ts' 'W' 'W']
 ['W' 'W' 'W' 'R' 'W' 'W' 'R' 'Ts' 'R']
 ['W' 'R' 'R' 'Tl' 'W' 'W' 'W' 'R' 'W']
 ['W' 'R' 'W' 'R' 'Ts' 'R' 'R' 'Tl' 'R']
 ['W' 'R' 'W' 'W' 'R' 'W' 'W' 'R' 'W']
 ['R' 'Ts' 'R' 'R' 'Tl' 'R' 'G' 'Ts' 'R']]

Actions encoding:  {0: 'L', 1: 'R', 2: 'U', 3: 'D'}
Cell type of start state:  S
Cell type of goal state:  G
Cell type of cell (1, 6):  Tl
Cell type of cell (1, 1):  Ts


In [9]:
#IMPLEMENT THIS FUNCTION, YOU CAN CHANGE THE PARAMETERS FOR THE FUNCTION IF THIS IS USEFUL
def my_solution(environment): 
    return np.random.choice(environment.action_space.n,environment.observation_space.n)

In [10]:
t = timer()

solution = my_solution(env)
print(f"Execution time: {round(timer() - t, 4)}s") 
solution_render = np.vectorize(env.actions.get)(solution.reshape(env.shape))
print(solution_render)

Execution time: 0.0015s
[['L' 'D' 'U' 'R' 'U' 'R' 'L' 'U' 'R']
 ['U' 'D' 'L' 'R' 'R' 'L' 'D' 'D' 'U']
 ['R' 'U' 'R' 'U' 'L' 'R' 'R' 'U' 'L']
 ['D' 'D' 'L' 'D' 'U' 'U' 'D' 'R' 'R']
 ['R' 'D' 'L' 'U' 'D' 'D' 'U' 'L' 'L']
 ['D' 'U' 'D' 'L' 'D' 'U' 'L' 'D' 'R']
 ['U' 'U' 'D' 'U' 'L' 'U' 'R' 'R' 'L']
 ['L' 'L' 'L' 'L' 'L' 'L' 'R' 'D' 'D']
 ['D' 'L' 'L' 'U' 'L' 'D' 'D' 'D' 'L']]
