# Challenge Assignment
## Autonomous Vehicle Breaking

## CSCI E-82A

>**Make sure** you include your name along with the name of your team and team members in the notebook you submit.

**Your name and team name here:** 

## Introduction

As is typically the case with robots, autonomous vehicles use multiple sensors. Agent actions rely on integrating the precepts of these sensors. A number of methods are used to integrate uncertain information from sensors. Directed graphical models are a powerful and flexible approach to sensor integration. 

Another difficulty for autonomous vehicles, and many other robotics problems, is uncertainty about the environment. In the case of autonomous vehicles the uncertainty can include road conditions, visibility conditions, and the actions of human drivers and pedestrians.    

Directed graphical models provide a powerful representation for reasoning with uncertain sensor data and uncertainty in the environment. In this challenge you will perform learning and inference on a directed graphical model of braking for an autonomous vehicle. The goal of the agent is to control the braking of the autonomous vehicle to avoid collisions.

The control of an actual autonomous vehicle is extremely complicated. Autonomous vehicles use many task-specific sensors. Further, any useful model has a large number of variables, many with complex continuous distribution models (e.g. mixture models). For this challenge, the number of senors and variables has been limited. Further, all distributions have simple Binomial posterior distributions.

A practical autonomous vehicle would have a very low probability of collision with an object in its path; e.g. $p(collision) < 10^{-7}$. Since such small probability values are hard to work with, for this challenge, the probabilities of sensor errors and collisions are unrealistically high. 

### Description of Problem

A Directed Acyclic Graph (DAG) of the autonomous vehicle breaking decision model is show below. There are 11 variable, two utility function and one decision node.  

<img src="BrakingDAG.JPG" alt="Drawing" style="width:600px; height:400px"/>
<center> DAG for autonomous vehicle breaking control problem </center>

#### Variables  

The variables for the joint probability distribution are:

1. **Road Condition** is the condition of the road surface; 0 = good, 1 = slippery, eg. wet or icy.
2. **Weather Visibility** is the optical (visual) visibility for the visual sensor; 0 = good, 1 = poor, eg. rain of fog. 
3. **Light Dark** is the lighting conditions for the road ahead; 0 = good, 1 = poor or dark.
4. **Object** indicates an object in the vehicle's path; 0 = no object, 1 = object in path. 
5. **Road Condition Detection** is the conditional probability of the reading (precept) from a sensor that determines road condition, given the Road Condition variable; 0 = good, 1 = slippery, eg. wet or icy.
6. **Weather Detection** is the conditional probability of the reading (precept) from a sensor that determines weather visibility, given the Weather Visibility Variable; 0 = good, 1 = poor, eg. rain of fog.
7. **Visual Sensor Detection** is the conditional probability that the visual sensors see an object in the vehicle's path, or sense a non-existent object (false positive), give the Weather Visibility, Light Dark and Object variables; 0 = no object, 1 = object in path. 
8. **LIDAR Sensor Detection** is the conditional probability that the LIDAR sensor see an object in the vehicle's path, or sense a non-existent object (false positive), give the Object variable; 0 = no object, 1 = object in path. LIDAR uses infrared lasers for imaging and ranging. LIDAR is much less affected by rain and fog, but has lower resolution, when compared to a visual (optical) sensor. 
9. **Sensor Detection** is the integrated posterior distribution of an object being in the vehicle's path, or sensing a non-existent object (false positive), given the precepts of the Weather Detection, Dark Light, Visual Sensor Detection and LIDAR Sensor Detection variables; ; 0 = no object, 1 = object in path.
10. **Early Breaking** is the conditional probability that the autonomous vehicle should apply breaks early to avoid a collision given the Road Condition Detection and Sensor Detection variables; 0 = normal breaking, 1 = apply early breaking. Early breaking should reduce the chances of collision but incurs a cost in terms of delay of the vehicle and other traffic. 
11. **Collision** is the conditional probability of the vehicle colliding with an object given the Object, Road Condition Detection and Sensor Detection variables; 0 = no collision, 1 = collision. 

#### Utility Functions

The utility functions for this problem are:   

- Utility of applying breaking early:

|  | No Early Breaking | Early Breaking |
|----|----|----|
|Utility | 0 | -1 |


- Utility of collision: 

|  | No Collision | Collision |
|----|----|----|
|Utility | 0 | -1000 |

Total utility for this problem is the sum of Early Breaking utility and Collision utility. 

#### Decision Node

There is one decision node in this problem, Early Breaking. This decision node is implemented as evidence for the Early Breaking variable; 0 = normal breaking, 1 = early breaking. 

## Instructions

In this challenge you and your team will do the following:   

### 1. Load Dataset     
Load the required packages and the csv file of 5,000 cases into a Pandas data frame.   

> **Hint:** Carefully examine the variable names. You will need to make sure that you use these names to construct your model. 

### 2. Define the Graphical Model     

Using the tools in pgmpy define the BayesianModel class object for this problem. 

### 3. Factorization of Distribution

Using Markdown write out the joint distribution and the factorization defined in the graphical model. You may use some abbreviations for the long variable names.  

How many states are there in the joint distribution of 11 Binomially distributed variables. 

How many states are in the factorized distribution?

**ANS**: 

**ANS**:

**ANS**: 

### 4. Verify Independencies

With the skeleton of your DAG defined, you can verify the indepenencies. To simplify this problem use the pgmpy local_independencies method. Recall that the local independencies are independencies within the Markov blanket of each variable. 

Are the independencies you find consistent with your factorization of the distribution and why?

**ANS**:

### 5. Maximum Likelihood Estimation of Model Parameters

Next, use pgmy to perform maximum likelihood estimation of the model parameters using the dataset provided.  

Print the CPDs and carefully examine the results. Notice that some of the probabilities for the Collision variable are either 0.0 or 1.0. Is this reasonable and why?

**ANS**:

### 6. Queries

With parameters fit, you are ready to perform queries on your model. Using the pgmpy VariableElimination function, perform the queries specified in the table below, computing the total utility, examine the results, and answer the questions. 

| Query | Query Variables | Evidence |
|:----|:----|:----|
|1 | Collision, Early_Breaking | Road_Condition = 0 |
|2 | Collision, Early_Breaking | Road_Condition = 1 |
|3 | Collision, Early_Breaking | Light_Dark = 0, Weather_Visibility = 0 |
|4 | Collision, Early_Breaking | Light_Dark = 1, Weather_Visibility = 1  |
|5 | Collision | Early_Breaking = 0, Object = 1, Road_Condition = 1 |
|6 | Collision | Early_Breaking = 1, Object = 1, Road_Condition = 1 |
|7 | Collision | Early_Breaking = 0, Object = 1, Light_Dark = 1, Weather_Visibility = 1 |
|8 | Collision | Early_Breaking = 1, Object = 1, Light_Dark = 1, Weather_Visibility = 1 |

**Q 1:** Compare the probability of Collision, Early Breaking and total utility for the different values of evidence specified in queries 1 and 2. Are these values consistent with what you expect and why? 

**Q 2:** Compare the probability of Collision, Early Breaking and total utility for the different values of the evidence variables specified in queries 3 and 4. Are these values significantly different? Given how the sensor data is integrated, do these differences seem reasonable?

**Q 3:** Compare the probability of Collision and total utility for the different values of the evidence variables specified in queries 5 and 6. Are these values consistent with what you expect and why? 

**Q 4** Compare the probability of Collision and total utility for the different values of the evidence variables specified in queries 7 and 8. Are these values consistent with what you expect and why? 

> **Note:** You cannot perform a query on an evidence variable. When Early_Breaking is evidence, make sure it is not a query variable. 

**ANS 1**:

**ANS 2**:

**ANS 3**:

**ANS 4**:

### 7. Bayesian Estimation of Model Parameters

Next, use pgmy to perform Bayesian estimation of the model parameters using the dataset provided. The pseudo counts for the prior are given in a cell below.  

**Q 1:** The prior is generally weak, with the exception of one variable. Which variable has a strong prior and do you think using this strong prior is reasonable in the interest of improving safety and why? 

**Q 2:** Print the CPDs and carefully examine the results. Pay particular attention to the Early_Breaking variable, comparing the values to the values obtained with maximum likelihood estimation. Is this difference reasonable given the prior and why?

**Q 3:** Finally, compare the estimated values for the Collision variable with the ones found using Maximum likelihood estimation. Are these differences expected and why? 

**ANS 1:**

**ANS 2:**

**ANS 3:**

In [67]:
pseudo_counts = {'Road_Condition':[[5],[5]],
                 'Weather_Visibility':[[5],[5]],
                 'Light_Dark':[[5],[5]],
                 'Object':[[5],[5]],
                 'LIDAR_Sensor':[[5,5],[5,5]],
                 'Weather_Detection':[[5,5],[5,5]],
                 'Sensor_Detection':[[5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5],
                                     [5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5]],
                 'Road_Condition_Detection':[[5,5],[5,5]],
                 'Visual_Sensor_Detection':[[5,5,5,5,5,5,5,5],
                                            [5,5,5,5,5,5,5,5]],
                 'Early_Breaking':[[500,500,500,500], 
                                   [5000,5000,5000,5000]],
                 'Collision':[[100,100,100,100,5,5,5,5],[1,1,1,1,5,5,5,5]]}

### 8. Queries

With parameters fit, you are ready to perform queries on your model with Bayesian parameter estimates. Using the pgmpy VariableElimination function, perform the queries 1 and 2 from the table shown previously, computing the total utility, examine the results, and answer the questions. 

**Q 1:** Compare the probability of Collision and total utility for the different values of evidence specified in queries 1 and 2. Are these values consistent with what you expect and why? 

**ANS:**

# Solution

Create cells below for your solution to the stated problem. Be sure to include some Markdown text and code comments to explain each component of your algorithm. 

In [68]:
from pgmpy.models import BayesianModel
from pgmpy.factors.discrete import TabularCPD
from pgmpy.inference import VariableElimination, BeliefPropagation
from pgmpy.estimators.MLE import MaximumLikelihoodEstimator
from pgmpy.estimators.BayesianEstimator import BayesianEstimator
import numpy as np
import numpy.random as nr
import pandas as pd

### Import Data File

In [69]:
samples = pd.read_csv('BreakingData.csv')