# Problem set 4

## Admissible heuristics for A*
The heuristic value must be **smaller than or equal to** the number of steps to the goal.  
Note that this means that the heuristic matrix `h` may be all zeros. Values larger than the number of steps would mean that those cells are expanded later, and an optimal path might be missed.

## Stochastic motion
In practice, you don't necesarrily want a robot to find the very shortest path, as this would mean very tighly moving around obstacles, and robots are not infallible. Thus, it's safer to have a robot move a little bit wider around obstacles.  
To do this, we will model robot actions as stochastic. This way, we can find a path that avoids obstacles even when the robot accidentally makes a wrong movement _(e.g. takes into account that the robot might turn left instead of going straight, thus we avoid moving very closely to the left wall.)_  

### Computing the value for stochastic motion
```
value = (
    (p_turn_left * cost_turn_left) +
    (p_go_straight * cost_go_straight) +
    (p_turn_right * cost_turn_right)
)
```
The cost for running into a wall or running off the grid can be set to a higher value than the normal cost for the action, in order to avoid collisions.

In [116]:
# --------------
# USER INSTRUCTIONS
#
# Write a function called stochastic_value that 
# returns two grids. The first grid, value, should 
# contain the computed value of each cell as shown 
# in the video. The second grid, policy, should 
# contain the optimum policy for each cell.

from copy import deepcopy

delta = [[-1, 0 ], # go up
         [ 0, -1], # go left
         [ 1, 0 ], # go down
         [ 0, 1 ]] # go right

delta_name = ['^', '<', 'v', '>'] # Use these when creating your policy grid.


def stochastic_value(grid,goal,cost_step,collision_cost,success_prob):
    # Probability(stepping left) = prob(stepping right) = failure_prob
    failure_prob = (1.0 - success_prob) / 2.0

    # start from goal state
    x = goal[0]
    y = goal[1]

    policy = [[' ' for col in range(len(grid[0]))] for row in range(len(grid))]

    # value function
    value = [[collision_cost for col in range(len(grid[0]))] for row in range(len(grid))]

    change = True
    while change:
        change = False
        for x in range(len(grid)):
            for y in range(len(grid[0])):
                if x == goal[0] and y == goal[1]:
                    if value[x][y] > 0:
                        value[x][y] = 0
                        policy[x][y] = '*'
                        change = True
                elif grid[x][y] == 0:
                    for ind, d in enumerate(delta):
                        # cost of each possible motion, considering possible error
                        g = []
                        for motion in [-1, 0, 1]:
                            x_motion = x + delta[(ind + motion) % 4][0]
                            y_motion = y + delta[(ind + motion) % 4][1]

                            # check for collisions with walls
                            if (
                                x_motion < 0 or y_motion < 0
                                or x_motion >= len(grid) or y_motion >= len(grid[0])
                                ):
                                g.append(collision_cost)
                            # check for collisions with obstacles
                            elif grid[x_motion][y_motion] != 0:
                                g.append(collision_cost)
                            else:
                                g.append(value[x_motion][y_motion])

                        cost = (failure_prob * g[0]) + (success_prob * g[1]) + (failure_prob * g[2])

                        # coordinates in case of correct motion
                        x2 = x + d[0]
                        y2 = y + d[1]
                        if x2 >= 0 and x2 < len(grid) and y2 >=0 and y2 < len(grid[0]):
                            f2 = cost + cost_step
                            if value[x][y] > f2:
                                value[x][y] = f2
                                policy[x][y] = delta_name[ind]
                                change = True
    return value, policy


In [117]:
grid = [[0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 1, 1, 0]]
goal = [0, len(grid[0])-1] # Goal is in top right corner
cost_step = 1
collision_cost = 1000
success_prob = 0.5

In [118]:
value, policy = stochastic_value(grid,goal,cost_step,collision_cost,success_prob)
for row in value:
    print(row)
for row in policy:
    print(row)

# Expected outputs:
#
#[471.9397246855924, 274.85364957758316, 161.5599867065471, 0],
#[334.05159958720344, 230.9574434590965, 183.69314862430264, 176.69517762501977], 
#[398.3517867450282, 277.5898270101976, 246.09263437756917, 335.3944132514738], 
#[700.1758933725141, 1000, 1000, 668.697206625737]


#
# ['>', 'v', 'v', '*']
# ['>', '>', '^', '<']
# ['>', '^', '^', '<']
# ['^', ' ', ' ', '^']

[471.9397246855924, 274.85364957758316, 161.5599867065471, 0]
[334.05159958720344, 230.9574434590965, 183.69314862430264, 176.69517762501977]
[398.3517867450282, 277.5898270101976, 246.09263437756917, 335.3944132514738]
[700.1758933725141, 1000, 1000, 668.697206625737]
['>', 'v', 'v', '*']
['>', '>', '^', '<']
['>', '^', '^', '<']
['^', ' ', ' ', '^']
