# AA228/CS238 Optional Final Project: Escape Roomba

## Baseline Model II: Bumper Sensor Heuristic

In [1]:
# activate project environment
# include these lines of code in any future scripts/notebooks
#---
import Pkg
if !haskey(Pkg.installed(), "AA228FinalProject")
    jenv = joinpath(dirname(@__FILE__()), ".") # this assumes the notebook is in the same dir
    # as the Project.toml file, which should be in top level dir of the project. 
    # Change accordingly if this is not the case.
    Pkg.activate(jenv)
end
#---

"/Users/cbartolm/Desktop/Projects/CS238_Project/Project.toml"

In [2]:
# import necessary packages
using AA228FinalProject
using POMDPs
using POMDPPolicies
using BeliefUpdaters
using ParticleFilters
using POMDPSimulators
using Cairo
using Statistics
using Printf
using Gtk
using Random

┌ Info: Recompiling stale cache file /Users/cbartolm/.julia/compiled/v1.0/AA228FinalProject/uFJfC.ji for AA228FinalProject [fe2df5ea-4d44-4e5a-a895-9dbc87b19b37]
└ @ Base loading.jl:1187
┌ Info: Recompiling stale cache file /Users/cbartolm/.julia/compiled/v1.0/POMDPSimulators/i1HOp.ji for POMDPSimulators [e0d0a172-29c6-5d4e-96d0-f262df5d01fd]
└ @ Base loading.jl:1187


## Define sensor and construct POMDP

In the following cell, we first instantiate a Bump sensor. The Bumper indicates when contact has been made between any part of the Roomba and any wall.

Next, we instantiate the MDP, which defines the underlying simulation environment, assuming full observability. The MDP takes many arguments to specify details of the problem. One argument we must specify here is the ```config```. This argument, which can take values 1,2, or 3, specifies the room configuration, with each configuration corresponding to a different location for the goal and stairs. 

We are going to use the three different configs to see how well the policy does in three different environments. 

Finally, we instantiate the POMDP. The POMDP takes as an argument the underlying MDP as well as the sensor, which it uses to define the observation model. 

In [3]:
sensor = Bumper()
config = 1 # 1,2, or 3

m = RoombaPOMDP(sensor=sensor, mdp=RoombaMDP(config=config));

### Setting up a Particle Filter

First, we instantiate a resampler, which is responsible for updating the belief state given an observation. The first argument for both resamplers is the number of particles that represent the belief state. The lidar resampler takes a low-variance resampler as an additional argument, which is responsible for efficiently resampling a weighted set of particles. 

Next, we instantiate a ```SimpleParticleFilter```, which enables us to perform our belief updates.

Finally, we pass this particle filter into a custom struct called a ```RoombaParticleFilter```, which takes two additional arguments. These arguments specify the noise in the velocity and turn-rate, used when propegating particles according to the action taken. These can be tuned depending on the type of sensor used.

In [4]:
num_particles = 4000
resampler = BumperResampler(num_particles)

spf = SimpleParticleFilter(m, resampler)

v_noise_coefficient = 2.0
om_noise_coefficient = 0.5

belief_updater = RoombaParticleFilter(spf, v_noise_coefficient, om_noise_coefficient);

### Define a policy : A heuristic based on the bumper sensor

First we create a struct that subtypes the Policy abstract type, defined in the package ```POMDPPolicies.jl```. Here, we can also define certain parameters.

Next, we define a function that can take in our policy and the belief state and return the desired action. We do this by defining a new ```POMDPs.action``` function that will work with our policy. 

In [5]:
# Define the policy to test
mutable struct ToEnd <: Policy
    num_translation_1::Int64 # Maximum number of translations between wall contacts 1 and 2
    num_translation_2::Int64 # Maximum number of translations between wall contacts 2 and 3
    num_rot_1::Int64 # Number rotations at the beginning
    num_rot_2::Int64 # Number of rotations in first wall contact
    num_collisions::Int64 # Number of collisions so far 
    # (Allows to define a piece-wise policy)
end

# define a new function that takes in the policy struct and current belief,
# and returns the desired action
function POMDPs.action(p::ToEnd, b::ParticleCollection{RoombaState})
    
    v = 4 # Same speed
    
    # Get goal coordinates
    goal_xy = get_goal_xy(m)
    goal_x, goal_y = goal_xy
    
     ## First action:
     # Spin around to localize during the first 25 time-steps
     ## Useless because we don't have a long range sensor ##
    
    if p.num_rot_1 < 25
        p.num_rot_1 += 1
        return RoombaAct(0.,1) 
    end
    p.num_rot_1 += 1
    
    ## Second action:
    # Move until you reach a wall
    if p.num_collisions == 0 && !AA228FinalProject.wall_contact(m, particle(b, 1))
        return RoombaAct(v, 0)
    end
    
    # Update tracker that counts number of contacts in wall
    if p.num_collisions == 0 && AA228FinalProject.wall_contact(m, particle(b, 1))
        p.num_collisions = 1 
    end
    
    ## Third action:
    # Rotate several times in the first contact wall (max_number = 4)
    if p.num_collisions == 1 && p.num_rot_2 <= 4
        p.num_rot_2 += 1
        return RoombaAct(0., 1)
    end
    
    ## Fourth action:
    # Allow 8 translations maximum
    if p.num_collisions == 1 && p.num_rot_2 > 4 && p.num_translation_1 <= 8
        p.num_translation_1 += 1
        return RoombaAct(v, 0.)
    end
    
    
    if  p.num_collisions == 1 && p.num_translation_1 > 8
        p.num_rot_2 += 1
        p.num_collisions = 2 
    end

    if p.num_collisions == 2 && p.num_rot_2 <= 3
        p.num_rot_2 += 1
        return RoombaAct(0., -1)
    end
    
    if p.num_collisions == 2 && p.num_rot_2 > 3 && !AA228FinalProject.wall_contact(m, particle(b, 1))
        p.num_translation_2 += 1
        return RoombaAct(v, 0.)
    end  
    
    if p.num_collisions == 2 && AA228FinalProject.wall_contact(m, particle(b, 1))
        p.num_rot_2 = 0
        p.num_collisions = 3 
    end
    
    if p.num_collisions == 3 && !AA228FinalProject.wall_contact(m, particle(b, 1))
        return RoombaAct(v, 0.)
    end
    
    if p.num_collisions == 3 && p.num_rot_2 <= 2 && AA228FinalProject.wall_contact(m, particle(b, 1))
        p.num_rot_2 += 1
        return RoombaAct(0., -1 -0.1)
    end
    
    if p.num_collisions == 3 && p.num_rot_2 > 2
        p.num_rot_2 = 0
        return RoombaAct(v, 0.)
    end   
    
end

### Simulation and rendering

Here, we will demonstrate how to seed the environment, run a simulation, and render the simulation. To render the simulation, we use the ```Gtk``` package. 

The simulation is carried out using the ```stepthrough``` function defined in the package ```POMDPSimulators.jl```. During a simulation, a window will open that renders the scene. It may be hidden behind other windows on your desktop.

In [7]:
# first seed the environment
Random.seed!(2)

# reset the policy
p = ToEnd(0,0,0,0,0) # here, the argument sets the time-steps elapsed to 0

# run the simulation
c = @GtkCanvas()
win = GtkWindow(c, "Roomba Environment", 600, 600)
for (t, step) in enumerate(stepthrough(m, p, belief_updater, max_steps=150))
    @guarded draw(c) do widget
        
        # the following lines render the room, the particles, and the roomba
        ctx = getgc(c)
        set_source_rgb(ctx,1,1,1)
        paint(ctx)
        render(ctx, m, step)
        
        # render some information that can help with debugging
        # here, we render the time-step, the state, and the observation
        move_to(ctx,300,400)
        show_text(ctx, @sprintf("t=%d, state=%s, o=%.3f",t,string(step.s),step.o))
    end
    show(c)
    sleep(0.1) # to slow down the simulation
end

### Specifying initial states and beliefs
If, for debugging purposes, you would like to hard-code a starting location or initial belief state for the roomba, you can do so by taking the following steps.

First, we define the initial state using the following line of code:
```
is = RoombaState(x,y,th,0.)
```
Where ```x``` and ```y``` are the x,y coordinates of the starting location and ```th``` is the heading in radians. The last entry, ```0.```, respresents whether the state is terminal, and should remain unchanged.

If you would like to initialize the Roomba's belief as perfectly localized, you can do so with the following line of code:
```
b0 = Deterministic(is)
```
If you would like to initialize the standard unlocalized belief, use these lines:
```
dist = initialstate_distribution(m)
b0 = initialize_belief(belief_updater, dist)
```
Finally, we call the ```stepthrough``` function using the initial state and belief as follows:
```
stepthrough(m,planner,belief_updater,b0,is,max_steps=300)
```

### Evaluation 

Here, we demonstate a simple evaluation of the policy's performance for a few random seeds. This is meant to serve only as an example, and we encourage you to develop your own evaluation metrics.

We intialize the robot using five different random seeds, and simulate its performance for 100 time-steps. We then sum the rewards experienced during its interaction with the environment and track this total reward for the five trials.
Finally, we report the mean and standard error for the total reward. The standard error is the standard deviation of a sample set divided by the square root of the number of samples, and represents the uncertainty in the estimate of the mean value.

In [None]:
import Pkg; Pkg.add("DataFrames")

In [None]:
import Pkg; Pkg.add("CSV")

In [8]:
using DataFrames
using CSV

start = time()

method = "baseline_bumper"
df = DataFrame(num_experience = Int[], reward = String[])

total_rewards = []

exps = 30

for exp = 1:exps    
    
    Random.seed!(exp)
    
    p = ToEnd(0,0,0,0,0)
    traj_rewards = sum([step.r for step in stepthrough(m,p,belief_updater, max_steps=150)])
    
    println("Experience: ", string(exp), "----- Reward: ", traj_rewards)
    push!(total_rewards, traj_rewards)
    
    # Save in a dataframe to finally pull results in a .csv file
    push!(df, (exp, string(traj_rewards)))
    CSV.write(string(method, ".csv"), df)
end

elapsed = time() - start

print("The total time elapsed in seconds is: ", elapsed, "\n")
@printf("Mean Total Reward: %.3f, StdErr Total Reward: %.3f", mean(total_rewards), std(total_rewards)/sqrt(exps))

┌ Info: Recompiling stale cache file /Users/cbartolm/.julia/compiled/v1.0/CSV/HHBkp.ji for CSV [336ed68f-0bac-5ca0-87d4-7b16caf5d00b]
└ @ Base loading.jl:1187


Experience: 1----- Reward: -10.400000000000004
Experience: 2----- Reward: -1.7999999999999972
Experience: 3----- Reward: -15.999999999999998
Experience: 4----- Reward: -20.6
Experience: 5----- Reward: -24.000000000000007
Experience: 6----- Reward: -23.599999999999998
Experience: 7----- Reward: -30.1
Experience: 8----- Reward: -9.500000000000016
Experience: 9----- Reward: -18.1
Experience: 10----- Reward: -3.899999999999997
Experience: 11----- Reward: -25.000000000000007
Experience: 12----- Reward: -23.00000000000001
Experience: 13----- Reward: -25.499999999999993
Experience: 14----- Reward: -5.800000000000002
Experience: 15----- Reward: 5.5
Experience: 16----- Reward: 3.700000000000003
Experience: 17----- Reward: -21.299999999999997
Experience: 18----- Reward: -23.000000000000007
Experience: 19----- Reward: -23.000000000000007
Experience: 20----- Reward: -23.000000000000007
Experience: 21----- Reward: -5.699999999999996
Experience: 22----- Reward: -24.000000000000007
Experience: 23----

### Hyperparameter analysis 

#### Influence of uncertainty in the problem 

###### i) v_noise_coefficient = 8.0

In [6]:
using Statistics
using DataFrames
using CSV

v_noise_coefficient_2 = 8.0
om_noise_coefficient_2 = 0.5

belief_updater_2 = RoombaParticleFilter(spf, v_noise_coefficient_2, om_noise_coefficient_2);

start = time()

method = "baseline_bumper_v8.0"
df = DataFrame(num_experience = Int[], reward = String[])

total_rewards = []

exps = 30

for exp = 1:exps    
    
    Random.seed!(exp)
    
    p = ToEnd(0,0,0,0,0)
    traj_rewards = sum([step.r for step in stepthrough(m,p,belief_updater_2, max_steps=150)])
    
    println("Experience: ", string(exp), " ----- Reward: ", traj_rewards)
    push!(total_rewards, traj_rewards)
    
    # Save in a dataframe to finally pull results in a .csv file
    push!(df, (exp, string(traj_rewards)))
    CSV.write(string(method, ".csv"), df)
end

total_time = time() - start

print("The total execution time of the 30 experiments is: ", total_time, "\n")

@printf("Mean Total Reward: %.3f, StdErr Total Reward: %.3f", mean(total_rewards), std(total_rewards)/sqrt(exps))

┌ Info: Recompiling stale cache file /Users/cbartolm/.julia/compiled/v1.0/CSV/HHBkp.ji for CSV [336ed68f-0bac-5ca0-87d4-7b16caf5d00b]
└ @ Base loading.jl:1187


Experience: 1 ----- Reward: -10.400000000000004
Experience: 2 ----- Reward: -1.7999999999999972
Experience: 3 ----- Reward: -15.999999999999998
Experience: 4 ----- Reward: -20.6
Experience: 5 ----- Reward: -24.000000000000007
Experience: 6 ----- Reward: -23.599999999999998
Experience: 7 ----- Reward: -30.1
Experience: 8 ----- Reward: -9.500000000000016
Experience: 9 ----- Reward: -18.1
Experience: 10 ----- Reward: -3.899999999999997
Experience: 11 ----- Reward: -25.000000000000007
Experience: 12 ----- Reward: -23.00000000000001
Experience: 13 ----- Reward: -25.499999999999993
Experience: 14 ----- Reward: -5.800000000000002
Experience: 15 ----- Reward: 5.5
Experience: 16 ----- Reward: 3.700000000000003
Experience: 17 ----- Reward: -21.299999999999997
Experience: 18 ----- Reward: -23.000000000000007
Experience: 19 ----- Reward: -23.000000000000007
Experience: 20 ----- Reward: -23.000000000000007
Experience: 21 ----- Reward: -5.699999999999996
Experience: 22 ----- Reward: -24.000000000000

###### ii) v_noise_coefficient = 0.5

In [7]:
using Statistics
using DataFrames
using CSV

v_noise_coefficient_3 = 0.5
om_noise_coefficient_3 = 0.5

belief_updater_3 = RoombaParticleFilter(spf, v_noise_coefficient_3, om_noise_coefficient_3);

start = time()

method = "baseline_bumper_v0.5"
df = DataFrame(num_experience = Int[], reward = String[])

total_rewards = []

exps = 30

for exp = 1:exps    
    
    Random.seed!(exp)
    
    p = ToEnd(0,0,0,0,0)
    traj_rewards = sum([step.r for step in stepthrough(m,p,belief_updater_3, max_steps=150)])
    
    println("Experience: ", string(exp), " ----- Reward: ", traj_rewards)
    push!(total_rewards, traj_rewards)
    
    # Save in a dataframe to finally pull results in a .csv file
    push!(df, (exp, string(traj_rewards)))
    CSV.write(string(method, ".csv"), df)
end

total_time = time() - start

print("The total execution time of the 30 experiments is: ", total_time, "\n")

@printf("Mean Total Reward: %.3f, StdErr Total Reward: %.3f", mean(total_rewards), std(total_rewards)/sqrt(exps))

Experience: 1 ----- Reward: -10.400000000000004
Experience: 2 ----- Reward: -1.7999999999999972
Experience: 3 ----- Reward: -15.999999999999998
Experience: 4 ----- Reward: -20.6
Experience: 5 ----- Reward: -24.000000000000007
Experience: 6 ----- Reward: -23.599999999999998
Experience: 7 ----- Reward: -30.1
Experience: 8 ----- Reward: -9.500000000000016
Experience: 9 ----- Reward: -18.1
Experience: 10 ----- Reward: -3.899999999999997
Experience: 11 ----- Reward: -25.000000000000007
Experience: 12 ----- Reward: -23.00000000000001
Experience: 13 ----- Reward: -25.499999999999993
Experience: 14 ----- Reward: -5.800000000000002
Experience: 15 ----- Reward: 5.5
Experience: 16 ----- Reward: 3.700000000000003
Experience: 17 ----- Reward: -21.299999999999997
Experience: 18 ----- Reward: -23.000000000000007
Experience: 19 ----- Reward: -23.000000000000007
Experience: 20 ----- Reward: -23.000000000000007
Experience: 21 ----- Reward: -5.699999999999996
Experience: 22 ----- Reward: -24.000000000000

###### iii) om_noise_coefficient = 2.5

In [8]:
using Statistics
using DataFrames
using CSV

v_noise_coefficient_4 = 2
om_noise_coefficient_4 = 2.5

belief_updater_4 = RoombaParticleFilter(spf, v_noise_coefficient_4, om_noise_coefficient_4);

start = time()

method = "baseline_bumper_om2.5"
df = DataFrame(num_experience = Int[], reward = String[])

total_rewards = []

exps = 30

for exp = 1:exps    
    
    Random.seed!(exp)
    
    p = ToEnd(0,0,0,0,0)
    traj_rewards = sum([step.r for step in stepthrough(m,p,belief_updater_4, max_steps=150)])
    
    println("Experience: ", string(exp), " ----- Reward: ", traj_rewards)
    push!(total_rewards, traj_rewards)
    
    # Save in a dataframe to finally pull results in a .csv file
    push!(df, (exp, string(traj_rewards)))
    CSV.write(string(method, ".csv"), df)
end

total_time = time() - start

print("The total execution time of the 30 experiments is: ", total_time, "\n")

@printf("Mean Total Reward: %.3f, StdErr Total Reward: %.3f", mean(total_rewards), std(total_rewards)/sqrt(exps))

Experience: 1 ----- Reward: -10.400000000000004
Experience: 2 ----- Reward: -1.7999999999999972
Experience: 3 ----- Reward: -15.999999999999998
Experience: 4 ----- Reward: -20.6
Experience: 5 ----- Reward: -24.000000000000007
Experience: 6 ----- Reward: -23.599999999999998
Experience: 7 ----- Reward: -30.1
Experience: 8 ----- Reward: -9.500000000000016
Experience: 9 ----- Reward: -18.1
Experience: 10 ----- Reward: -3.899999999999997
Experience: 11 ----- Reward: -25.000000000000007
Experience: 12 ----- Reward: -23.00000000000001
Experience: 13 ----- Reward: -25.499999999999993
Experience: 14 ----- Reward: -5.800000000000002
Experience: 15 ----- Reward: 5.5
Experience: 16 ----- Reward: 3.700000000000003
Experience: 17 ----- Reward: -21.299999999999997
Experience: 18 ----- Reward: -23.000000000000007
Experience: 19 ----- Reward: -23.000000000000007
Experience: 20 ----- Reward: -23.000000000000007
Experience: 21 ----- Reward: -5.699999999999996
Experience: 22 ----- Reward: -24.000000000000

###### iv) om_noise_coefficient = 0.2

In [9]:
using Statistics
using DataFrames
using CSV

v_noise_coefficient_5 = 2
om_noise_coefficient_5 = 0.2

belief_updater_5 = RoombaParticleFilter(spf, v_noise_coefficient_5, om_noise_coefficient_5);

start = time()

method = "baseline_bumper_om0.2"
df = DataFrame(num_experience = Int[], reward = String[])

total_rewards = []

exps = 30

for exp = 1:exps    
    
    Random.seed!(exp)
    
    p = ToEnd(0,0,0,0,0)
    traj_rewards = sum([step.r for step in stepthrough(m,p,belief_updater_5, max_steps=150)])
    
    println("Experience: ", string(exp), " ----- Reward: ", traj_rewards)
    push!(total_rewards, traj_rewards)
    
    # Save in a dataframe to finally pull results in a .csv file
    push!(df, (exp, string(traj_rewards)))
    CSV.write(string(method, ".csv"), df)
end

total_time = time() - start

print("The total execution time of the 30 experiments is: ", total_time, "\n")

@printf("Mean Total Reward: %.3f, StdErr Total Reward: %.3f", mean(total_rewards), std(total_rewards)/sqrt(exps))

Experience: 1 ----- Reward: -10.400000000000004
Experience: 2 ----- Reward: -1.7999999999999972
Experience: 3 ----- Reward: -15.999999999999998
Experience: 4 ----- Reward: -20.6


InterruptException: InterruptException:

Same values! For the bumper baseline model, uncertainty in v makes no difference!