# Fruit Inspection Challenge

## CSCI E82A

>**Make sure** you include your name along with the name of your team and team members in the notebook you submit. 

## Introduction

In a previous homework assignment you computed the utility of various approaches to fruit inspection using two unreliable sensors along with human inspection. This challenge exercise differs in from the homework assignment in the following ways:
1. Most importantly, this is not a guided lab, but rather you are free to apply the methods of your choice
2. The parameters for the CPDs must be estimated from data samples.
3. There are a larger number of CPDs.
4. You will perform a query on your graphical model.  

###  Background

Bob's Orchards is a premium seller of apples and pears. Bob's customers pay a substantial premium for superior fruit. To satisfy these customers, Bob's must ensure that the fruit delivered is correctly packed and perfectly ripe. 

Like many legacy industries requiring specialized human skills, Bob's is facing a talent problem. Many of the human inspectors who expertly check each piece of fruit shipped for ripeness are retiring. Management's attempts to recruit and train younger people to apprentice as fruit inspectors have been mixed. In fact, not only is it difficult to recruit people willing to train as inspectors but, it is believed that the newly trained inspectors are prone to errors. Therefore, it has become imperative to find some type of automated system which can reduce the workload on the diminishing number of human inspectors. To address this problem, Bob's has deployed technology from Robots R Us.

The first robotic system to be deployed at Bob's uses a multi-sensor array to determine if the fruit being shipped is at the correct ripeness. There are two sensors, a vision system that examines the fruit for spots or damage, indicating the fruit is over ripe, and a smell sensor that determines if the fruit is not ripe enough. If either sensor indicates the fruit is bad it is sent to a human inspector. In addition customers may reject even perfect fruit for no apparent reason, whereas others seem perfectly happy with less than perfect fruit.   



## Scenario 

In order to better understand the fruit inspection process and customer acceptance of the fruit, Bob's management has authorized the shipment of 1,000 randomly selected orders. All available inspection methods will be applied to each order. Further, a team of the most experienced inspectors will provide an absolute baseline on order quality. The orders will be shipped to customers regardless of the outcome of the inspections. 

Shipping orders regardless of inspection outcome is a significant departure from long-held beliefs and traditions at Bobs. However, the data collected provide a powerful source of information for improving Bob's overall customer satisfaction, which is highly valued by Bob's management.   

Your goal, as the consulting team, is to determine which inspection methods and any other possible process improvement Bob's should apply to maximize customer satisfaction as measured by utility. You will use the data collected from the 1,000 orders to 

### Data description 

For the 1,000 orders in the test sample a number of attributes have been collected. These data are in the `fruit_data.csv` file. The columns in the data set are:
1.  **weather:** indicates the weather conditions the day before the fruit is harvested; 0 = wet, 1 = dry. Prior information indicates that the statistics of weather are constant over the harvest period. 
2. **week:** indicates the week the fruit is harvested; 0 = week 1, 1 = week2. There is a two week harvest season for Bob's orchard where the fruit comes from. 
3. **good_bad** is the quality assigned to the fruit shipment by an independent inspection team of highly experienced inspectors. At least three inspectors has agreed on the fruit quality and these indicators are believed to have absolute accuracy. 
4. **smell_sensor:** are the indicators emitted by the smell fruit inspection sensor; 0 = bad, 1 = good.
5. **visual_sensor:** are the indicators emitted by the visual fruit inspection sensor; 0 = bad, 1 = good.
6. **inspector:** are the indicators determined by the single entry-level fruit inspectors; 0 = bad, 1 = good.
7. **accepted:** indicates if the customer accepted the order as received, or complained and requested an adjustment; 0 = not accepted, 1 = accepted. 

### Bayesian Graph Representation

A directed acyclic graph (DAG) representing the fruit quality process is shown in the diagram below.  

<img src="FruitQualityGraph.JPG" alt="Drawing" style="width:800px; height:450px"/>
<center> **DAG fruit quality process**    
Decision nodes are not shown for simplicity</center>

The representation shown in the diagram illustrates the CDPs in a DAG. There are a number of utility nodes shown. Notice, that the multiple decision nodes are not shown.   

There are two utility functions in this problem. The **utility of a human inspection** is -10.0. And the **utility of the satisfied and unsatisfied customers** is:

|  | Satisfied | Not Satisfied |
|----|----|----|
|Utility | 20 | -40 |

Notice that the DAG shows causality between the CDPs. **Consider how this causality is important in the representation of this problem**. 

### Goals for this analysis

Your goals in this challenge are as follows:

1. Estimate the parameters of the leaf CDPs (nodes) using the data provided. These are the unconditional distributions nodes of the graph. You may use simple ML/frequentist estimates of these parameters.   
2. Compute the conditional probabilities for the remaining nodes. 
3. Using the conditional probability distributions and the utility functions, compute and compare the utilities of the four possible inspection methods:
  - No inspection whatsoever. 
  - Inspection by human inspectors only.
  - Inspection with sensors only.
  - Inspection first with sensors and then with humans for cases where the sensors indicate the fruit may be bad. 
4. Now perform a query on your model when the weather is always dry (evidence). Recompute and compare the utilities for the different inspection methods as you did for step 3. 

> **Methods:** You may use methods of your choice. You can do the calculations directly on the arrays of the CPDs. Alternatively, you can likely use the pgmpy package. If you are ambitious, you can try both approaches. 

# Introduction  & approach taken

* We started the challenge by exploring a number of different methods and tools to produce the desired results:
* We explored the use of pgmpy and defining elements of the graph using TabularCPD's
* As part of our pgmpy we explored using an mle fit approach and an approach using diriclet priors 
* Ultimately due to time we decided we would adopt an approach using pandas an numpy to calculate the cpd's manually as we needed to be able to carefully understand the step by step calcuations occuring
* The fact that pgmpy does not support utilities also factored into our decision
* We took the approach of carefully working out each calculation by first understanding the shape of the matrix at each point which helped us understand the output at each subsequent node after the matrix multiplication.


In [206]:
import pandas as pd
import numpy as np

data = pd.read_csv('fruit_data.csv')


# calculation of fruit quality
data['fruit_quality'] = np.where(((data['weather'] == 1) & (data['week'] == 1 ) & (data['good_bad'] == 1 )), 1, 0)

# calculation of sensor_inspect node
data['sensor_inspect'] = np.where(((data['smell_sensor'] == 1) & (data['visual_sensor'] == 1 ) & (data['fruit_quality'] == 1 )), 1, 0)

# calculation of inspector accuracy
data['inspector_accuracy'] = np.where(((data['inspector'] == data['good_bad'] )), 1, 0)

# calculation of manual inspection
data['manual_inspection'] = np.where(((data['fruit_quality'] == 0 )), 1, 0)

# calculation of manual inspection acceptance
data['manual_inspection_accept'] = np.where(((data['manual_inspection'] == 1 )), 1, 0)

# calculation of no inspection acceptance
data['no_inspection_accept'] = np.where(((data['fruit_quality'] == 1) & (data['accepted'] == 1 )), 1, 0)

# calculation of sensor impact acceptance
data['sensor_impact_accept'] = np.where(((data['sensor_inspect'] == 1) & (data['accepted'] == 1 )), 1, 0)

# calculation of sensor manual inspection
data['sensor_manual_inspect'] = np.where(((data['sensor_inspect'] == 0) & (data['inspector_accuracy'] == 0 )), 1, 0)

# calculation of acceptance after manual inspection
data['sensor_manual_inspect_accept'] = np.where(((data['sensor_manual_inspect'] == 1) & (data['accepted'] == 1 )), 1, 0)

data

Unnamed: 0,weather,week,good_bad,smell_sensor,visual_sensor,inspector,accepted,fruit_quality,sensor_inspect,inspector_accuracy,manual_inspection,manual_inspection_accept,no_inspection_accept,sensor_impact_accept,sensor_manual_inspect,sensor_manual_inspect_accept
0,1,1,1,1,1,1,1,1,1,1,0,0,1,1,0,0
1,1,0,1,1,0,1,1,0,0,1,1,1,0,0,0,0
2,1,0,1,1,0,1,1,0,0,1,1,1,0,0,0,0
3,0,1,0,1,0,0,0,0,0,1,1,1,0,0,0,0
4,1,0,1,1,1,1,1,0,0,1,1,1,0,0,0,0
5,1,0,1,1,1,1,1,0,0,1,1,1,0,0,0,0
6,1,0,1,1,0,1,1,0,0,1,1,1,0,0,0,0
7,1,1,1,1,1,1,1,1,1,1,0,0,1,1,0,0
8,0,1,0,1,0,0,0,0,0,1,1,1,0,0,0,0
9,1,0,1,1,1,1,1,0,0,1,1,1,0,0,0,0


# 1. Estimate the parameters of the leaf CDPs (nodes) using the data provided. These are the unconditional distributions nodes of the graph. You may use simple ML/frequentist estimates of these parameters.   

In [310]:
# We adopt a frequentist approach here

def get_leaf_value(leaf):
    return data[leaf].mean()

leaf_nodes = ['week', 'weather', 'smell_sensor', 'visual_sensor', 'inspector_accuracy']

for leaf in leaf_nodes:
    print('The {} leaf node has a postive value of {} and a negative value of {}'.format(leaf, get_leaf_value(leaf), 1-get_leaf_value(leaf)))


The week leaf node has a postive value of 0.479 and a negative value of 0.521
The weather leaf node has a postive value of 0.663 and a negative value of 0.33699999999999997
The smell_sensor leaf node has a postive value of 0.68 and a negative value of 0.31999999999999995
The visual_sensor leaf node has a postive value of 0.76 and a negative value of 0.24
The inspector_accuracy leaf node has a postive value of 0.961 and a negative value of 0.039000000000000035


### Customer acceptance node
 * This was dealth with slightly differently as we were advised by the professor that we should take the good_bad variable into account. Our calculation was therefore:
 


In [314]:
acccept_good_bad_matrix =  np.array([[142, 20], [37, 801]])
acccept_good_bad_matrix = pd.DataFrame(acccept_good_bad_matrix, columns=['accept0', 'accept1'])
print(acccept_good_bad_matrix)

marginalized_cust_accept = acccept_good_bad_matrix.transpose()/acccept_good_bad_matrix.sum(axis=1)
marginalized_cust_accept_trans =marginalized_cust_accept.transpose()
marginalized_cust_accept_trans

   accept0  accept1
0      142       20
1       37      801


Unnamed: 0,accept0,accept1
0,0.876543,0.123457
1,0.044153,0.955847


# 2. Compute the conditional probabilities for the remaining nodes. 


### Fruit quality node:
We first attemped to calculate this node by taking the probabilities of the week and weather nodes as the graph would suggest but the professor advised that we needed to take into account the good_bad field which results in a marginalized result which was the same as the good_bad distribtion


In [296]:
week_weather_good_bad_matrix = np.array([[40, 48, 47, 27], [123, 126, 311, 278]])
print(week_weather_good_bad_matrix)

week_weather_good_bad_matrix = pd.DataFrame(week_weather_good_bad_matrix, columns=['Wk0Weath0', 'Wk1Weath0', 'Wk0Weath1', 'Wk1Weath1'])
week_weather_good_bad_matrix.sum(axis=1)

[[ 40  48  47  27]
 [123 126 311 278]]


0    162
1    838
dtype: int64

In [297]:
week_weather_good_bad_matrix

Unnamed: 0,Wk0Weath0,Wk1Weath0,Wk0Weath1,Wk1Weath1
0,40,48,47,27
1,123,126,311,278


In [298]:
fruit_qual_pcts = np.array(week_weather_good_bad_matrix.sum(axis=1)/week_weather_good_bad_matrix.sum().sum())
fruit_qual_pcts

array([ 0.162,  0.838])

### Sensor impact node:

The inputs into this node are visual, smell and fruit quality. We combined all three to get the conditional probabilities of the sensor impact node

In [304]:
smell_sensor = np.array([1-data.smell_sensor.mean(), data.smell_sensor.mean()])
smell_sensor

array([ 0.32,  0.68])

In [305]:
visual_sensor = np.array([1-data.visual_sensor.mean(), data.visual_sensor.mean()])
visual_sensor

array([ 0.24,  0.76])

In [307]:
visual_and_smell_cpd = (smell_sensor * visual_sensor)
visual_and_smell_cpd

array([ 0.0768,  0.5168])

In [308]:
fruit_qual_pcts

array([ 0.162,  0.838])

In [309]:
sensor_inspect = (smell_sensor * visual_sensor) * fruit_qual_pcts
sensor_inspect

array([ 0.0124416,  0.4330784])

### No inspection, accept order node:

#### This is a combination of the fruit quality and the customer acceptance leaf nodes. We combine them below using a dot product

In [325]:
joint_prob_qual_accept = np.dot(fruit_qual_pcts, marginalized_cust_accept_trans)
print('No inspection, accept order node: {}'.format(joint_prob_qual_accept))

No inspection, accept order node: [ 0.179  0.821]


### Manual inspect node:

#### This is a combination of the inspector accuracy and the fruit quality nodes. We combine them below using a dot product

In [327]:
inspector_accuracy = data.inspector_accuracy.mean()
inspector_accuracy

0.961

In [329]:
fruit_qual_pcts

array([ 0.162,  0.838])

In [332]:
fruit_qual_inspect_accur_dot_prod =np.dot(inspector_accuracy,fruit_qual_pcts)
fruit_qual_inspect_accur_dot_prod

array([ 0.155682,  0.805318])

In [335]:
fruit_qual_inspect_accur_dot_prod_marginal = fruit_qual_inspect_accur_dot_prod/fruit_qual_inspect_accur_dot_prod.sum()
fruit_qual_inspect_accur_dot_prod_marginal

array([ 0.162,  0.838])

### Manual, accept order node:

#### This is a combination of the manual inspect and the customer acceptance leaf nodes. We combine them below using a dot product

In [336]:
fruit_qual_inspect_accur_dot_prod_marginal

array([ 0.162,  0.838])

In [337]:
marginalized_cust_accept_trans

Unnamed: 0,accept0,accept1
0,0.876543,0.123457
1,0.044153,0.955847


In [341]:
marginalized_cust_accept_trans * fruit_qual_inspect_accur_dot_prod_marginal

Unnamed: 0,accept0,accept1
0,0.142,0.103457
1,0.007153,0.801


In [338]:
np.dot(marginalized_cust_accept_trans, fruit_qual_inspect_accur_dot_prod_marginal)

array([ 0.24545679,  0.80815274])

### Sensor inspect accept order node:

#### This is a combination of the manual inspect and the customer acceptance leaf nodes. We combine them below using a dot product

In [342]:
marginalized_cust_accept_trans

Unnamed: 0,accept0,accept1
0,0.876543,0.123457
1,0.044153,0.955847


In [343]:
sensor_inspect

array([ 0.0124416,  0.4330784])

In [345]:
sensor_inspect_accept_order_node = np.dot(marginalized_cust_accept_trans, sensor_inspect)
sensor_inspect_accept_order_node

array([ 0.06437207,  0.41450613])

In [353]:
sensor_inspect_accept_order_node_marginal = sensor_inspect_accept_order_node/sensor_inspect_accept_order_node.sum()
sensor_inspect_accept_order_node_marginal

array([ 0.13442263,  0.86557737])

### Manual inspect after senor inspect node 

#### This is a combination of the sensor inspect and the inspector accuracy leaf nodes. We combine them below using a dot product

In [348]:
sensor_inspect

array([ 0.0124416,  0.4330784])

In [349]:
inspector_accuracy

0.961

In [350]:
manual_inspect_after_sensor_inspect_node = np.dot(inspector_accuracy,sensor_inspect)
manual_inspect_after_sensor_inspect_node

array([ 0.01195638,  0.41618834])

In [351]:
manual_inspect_after_sensor_inspect_node_marginal = manual_inspect_after_sensor_inspect_node/manual_inspect_after_sensor_inspect_node.sum()
manual_inspect_after_sensor_inspect_node_marginal

array([ 0.02792602,  0.97207398])

### Sensor manual inspect accept order node

#### This is a combination of the manual_inspect_after_sensor_inspect_node and the customer acceptance leaf node. We combine them below using a dot product

In [352]:
manual_inspect_after_sensor_inspect_node_marginal

array([ 0.02792602,  0.97207398])

In [354]:
marginalized_cust_accept_trans

Unnamed: 0,accept0,accept1
0,0.876543,0.123457
1,0.044153,0.955847


In [356]:
sensor_manual_inspect_accept_node = np.dot(marginalized_cust_accept_trans, manual_inspect_after_sensor_inspect_node_marginal)
sensor_manual_inspect_accept_node

array([ 0.1444875 ,  0.93038726])

In [357]:
sensor_manual_inspect_accept_node_marginal = sensor_manual_inspect_accept_node/sensor_manual_inspect_accept_node.sum()
sensor_manual_inspect_accept_node_marginal

array([ 0.13442263,  0.86557737])

# 3. Using the conditional probability distributions and the utility functions, compute and compare the utilities of the four possible inspection methods:
  - No inspection whatsoever. 
  - Inspection by human inspectors only.
  - Inspection with sensors only.
  - Inspection first with sensors and then with humans for cases where the sensors indicate the fruit may be bad. 

# U2 --> No inspection whatsoever. 

Here we take two approachs - one using the dot product of the input nodes and the other by explicitly doing the calculations by hand in order to double check our approach

### U2


In [363]:
joint_prob_qual_accept = np.dot(fruit_qual_pcts, marginalized_cust_accept_trans)
joint_prob_qual_accept

array([ 0.179,  0.821])

In [364]:
utility_no_inspection = [-40, 20]

In [365]:
(joint_prob_qual_accept * utility_no_inspection)

array([ -7.16,  16.42])

In [366]:
print('Utility of no human inspection: {}'.format((joint_prob_qual_accept * utility_no_inspection).sum()))

Utility of no human inspection: 9.259999999999998


In [367]:
CustomerAccep_given_GoodQuality = 801/(801+37)
CustomerAccep_given_BadQuality = 20/(20+142)
CustomerNonAccep_given_GoodQuality = 37/(801+37)
CustomerNonAccep_given_BadQuality = 142/(20+142)
bad_quality = 0.162
good_quality = 0.838

customerAccep_GoodQuality = 20 * (CustomerAccep_given_GoodQuality * good_quality + \
                                 CustomerAccep_given_BadQuality * bad_quality)

customerAccep_BadQuality = -40 * (CustomerNonAccep_given_GoodQuality * good_quality + \
                                 CustomerNonAccep_given_BadQuality * bad_quality)


print(customerAccep_GoodQuality)
print(customerAccep_BadQuality)
print('Utility of no human inspection double check: {}'.format(customerAccep_GoodQuality + customerAccep_BadQuality))

16.42
-7.160000000000001
Utility of no human inspection double check: 9.260000000000002


# U2


In [368]:
good_given_inspector_inaccurate = 838 * (1-inspector_accuracy)
good_given_inspector_inaccurate

32.68200000000003

In [369]:
good_given_inspector_accurate = 838 * inspector_accuracy
good_given_inspector_accurate

805.318

In [370]:
good_given_inspector_accurate/(good_given_inspector_inaccurate + good_given_inspector_accurate)

0.961

In [371]:
fruit_qual_pcts

array([ 0.162,  0.838])

In [274]:
prob_given_accuracy = fruit_qual_pcts * inspector_accuracy
prob_given_accuracy

array([ 0.155682,  0.805318])

In [276]:
prob_given_accuracy_marginal = prob_given_accuracy/prob_given_accuracy.sum()
prob_given_accuracy_marginal

array([ 0.162,  0.838])

In [277]:
utility_inspection = [-10]

In [278]:
prob_given_accuracy_marginal * utility_inspection

array([-1.62, -8.38])

In [279]:
(prob_given_accuracy * utility_inspection).sum()

-9.6099999999999994

# U3 Calc --> Inspection by human inspectors only.


In [377]:
prob_given_accuracy_marginal

array([ 0.162,  0.838])

In [378]:
marginalized_cust_accept_trans

Unnamed: 0,accept0,accept1
0,0.876543,0.123457
1,0.044153,0.955847


In [379]:
probabilities_of_accepting_updated = prob_given_accuracy_marginal * marginalized_cust_accept_trans
probabilities_of_accepting_updated

Unnamed: 0,accept0,accept1
0,0.142,0.103457
1,0.007153,0.801


In [381]:
marginal_probs =  probabilities_of_accepting_updated.sum(axis=1)
marginal_probs = marginal_probs/marginal_probs.sum()
marginal_probs

0    0.232968
1    0.767032
dtype: float64

In [384]:
# Double check
np.dot(marginalized_cust_accept_trans, prob_given_accuracy_marginal)

array([ 0.24545679,  0.80815274])

In [385]:
manual_inspection_utilities = np.array([-50, 10])

In [386]:
manual_inspection_utilities * marginal_probs

0   -11.648376
1     7.670325
dtype: float64

In [437]:
u3_expected_utility = (manual_inspection_utilities * marginal_probs).sum()
u3_expected_utility

-3.9780506170652661

In [438]:
print('Utility of inspection by human inspectors only: {}'.format(u3_expected_utility))

Utility of inspection by human inspectors only: -3.978050617065266


# U5 Calc --> Inspection with sensors only.


In [486]:
def calc_u5():
    # Node as previously calculated above
    print ('sensor_manual_inspect_accept_node: ', sensor_manual_inspect_accept_node)
    
    utilities = np.array([-40, 20])
    print('\n utilities:', utilities)
    
    expected_utility_u5 = np.dot(sensor_manual_inspect_accept_node, utilities)
    print('\n expected_utility_u5:', expected_utility_u5)

    # double check
    print('\n double check:', (sensor_manual_inspect_accept_node * utilities).sum() )
         

In [487]:
calc_u5()

sensor_manual_inspect_accept_node:  [ 0.1444875   0.93038726]

 utilities: [-40  20]

 expected_utility_u5: 12.8282453096

 double check: 12.8282453096


# U4 --> Inspection first with sensors and then with humans for cases where the sensors indicate the fruit may be bad. 

In [482]:
def calc_u4():
    # from earlier calculation of node
    sensor_inspect_accept_order_node_marginal
    print('sensor_inspect_accept_order_node_marginal: ', sensor_inspect_accept_order_node_marginal)
    print('probability it will be sent for manual inspection: {}'.format(sensor_inspect_accept_order_node_marginal[0]))
    
    manual_inspect_node = sensor_inspect_accept_order_node_marginal[0] * inspector_accuracy
    print('\n manual_inspect_node: ', manual_inspect_node)
    
    manual_inspect_node_marginal = np.array([manual_inspect_node, (1-manual_inspect_node)])
    print('\n manual_inspect_node_marginal: ', manual_inspect_node_marginal)
    
    print('\n marginalized_cust_accept_trans', marginalized_cust_accept_trans)
    
    probabilities_of_cust_accepting_sensor_manual= manual_inspect_node_marginal * marginalized_cust_accept_trans
    print('\n probabilities_of_cust_accepting_sensor_manual:\n', probabilities_of_cust_accepting_sensor_manual)
    
    marginal_probs =  probabilities_of_accepting_updated.sum(axis=1)
    marginal_probs = marginal_probs/marginal_probs.sum()
    print('\n marginal_probs:', marginal_probs)
    
    manual_inspection_utilities = np.array([-50, 10])
    print('\n manual_inspection_utilities:', manual_inspection_utilities)
    
    expected_utility = manual_inspection_utilities * marginal_probs
    print('\n expected_utility:', expected_utility.sum())
    
    # double check
    print('\n double check expected_utility: ',  np.dot(manual_inspection_utilities, marginal_probs))
    
    print('Utility of no human inspection: {}'.format(expected_utility.sum()))

In [483]:
calc_u4()

sensor_inspect_accept_order_node_marginal:  [ 0.13442263  0.86557737]
probability it will be sent for manual inspection: 0.13442263428592138

 manual_inspect_node:  0.129180151549

 manual_inspect_node_marginal:  [ 0.12918015  0.87081985]

 marginalized_cust_accept_trans     accept0   accept1
0  0.876543  0.123457
1  0.044153  0.955847

 probabilities_of_cust_accepting_sensor_manual:
     accept0   accept1
0  0.113232  0.107509
1  0.005704  0.832371

 marginal_probs: 0    0.232968
1    0.767032
dtype: float64

 manual_inspection_utilities: [-50  10]

 expected_utility: -3.97805061707

 double check expected_utility:  -3.97805061707
Utility of no human inspection: -3.978050617065266


# Summary of utilities:


In [444]:
print('Utility of inspection by human inspectors only: {}'.format(u3_expected_utility))
print('Inspection with sensors, then with humans: {}'.format(expected_utility.sum()))
print('Utility of no human inspection: {}'.format(customerAccep_GoodQuality + customerAccep_BadQuality))
print('Inspection with sensors only: {}'.format(expected_utility_u5))

Utility of inspection by human inspectors only: -3.978050617065266
Inspection with sensors, then with humans: -3.978050617065266
Utility of no human inspection: 9.260000000000002
Inspection with sensors only: 12.828245309570148


#  4. Now perform a query on your model when the weather is always dry (evidence). Recompute and compare the utilities for the different inspection methods as you did for step 3. 


In [488]:
"""We take a subset of the original matrix where the weather 
is always dry and recompute the marginal probabilities for fruit_quality."""

week_weather_good_bad_matrix = week_weather_good_bad_matrix[['Wk0Weath1', 'Wk1Weath1']]
week_weather_good_bad_matrix

Unnamed: 0,Wk0Weath1,Wk1Weath1
0,47,27
1,311,278


In [544]:
fruit_qual_pcts = np.array(week_weather_good_bad_matrix.sum(axis=1)/week_weather_good_bad_matrix.sum().sum())
fruit_qual_pcts

array([ 0.11161388,  0.88838612])

In [545]:
smell_sensor = np.array([1-data.smell_sensor.mean(), data.smell_sensor.mean()])
smell_sensor

array([ 0.32,  0.68])

In [546]:
visual_sensor = np.array([1-data.visual_sensor.mean(), data.visual_sensor.mean()])
visual_sensor

array([ 0.24,  0.76])

In [547]:
visual_and_smell_cpd = (smell_sensor * visual_sensor)
visual_and_smell_cpd

array([ 0.0768,  0.5168])

In [548]:
fruit_qual_pcts

array([ 0.11161388,  0.88838612])

In [549]:
sensor_inspect = (smell_sensor * visual_sensor) * fruit_qual_pcts
sensor_inspect

array([ 0.00857195,  0.45911795])

### No inspection, accept order node:

#### This is a combination of the fruit quality and the customer acceptance leaf nodes. We combine them below using a dot product

In [550]:
joint_prob_qual_accept = np.dot(fruit_qual_pcts, marginalized_cust_accept_trans)
print('No inspection, accept order node: {}'.format(joint_prob_qual_accept))

No inspection, accept order node: [ 0.13705907  0.86294093]


### Manual inspect node:

#### This is a combination of the inspector accuracy and the fruit quality nodes. We combine them below using a dot product

In [551]:
inspector_accuracy = data.inspector_accuracy.mean()
inspector_accuracy

0.961

In [552]:
fruit_qual_pcts

array([ 0.11161388,  0.88838612])

In [553]:
fruit_qual_inspect_accur_dot_prod =np.dot(inspector_accuracy,fruit_qual_pcts)
fruit_qual_inspect_accur_dot_prod

array([ 0.10726094,  0.85373906])

In [554]:
fruit_qual_inspect_accur_dot_prod_marginal = fruit_qual_inspect_accur_dot_prod/fruit_qual_inspect_accur_dot_prod.sum()
fruit_qual_inspect_accur_dot_prod_marginal

array([ 0.11161388,  0.88838612])

### Manual, accept order node:

#### This is a combination of the manual inspect and the customer acceptance leaf nodes. We combine them below using a dot product

In [555]:
fruit_qual_inspect_accur_dot_prod_marginal

array([ 0.11161388,  0.88838612])

In [556]:
marginalized_cust_accept_trans

Unnamed: 0,accept0,accept1
0,0.876543,0.123457
1,0.044153,0.955847


In [557]:
marginalized_cust_accept_trans * fruit_qual_inspect_accur_dot_prod_marginal

Unnamed: 0,accept0,accept1
0,0.097834,0.109677
1,0.004928,0.849161


In [558]:
np.dot(marginalized_cust_accept_trans, fruit_qual_inspect_accur_dot_prod_marginal)

array([ 0.20751168,  0.8540895 ])

### Sensor inspect accept order node:

#### This is a combination of the manual inspect and the customer acceptance leaf nodes. We combine them below using a dot product

In [559]:
marginalized_cust_accept_trans

Unnamed: 0,accept0,accept1
0,0.876543,0.123457
1,0.044153,0.955847


In [560]:
sensor_inspect

array([ 0.00857195,  0.45911795])

In [561]:
sensor_inspect_accept_order_node = np.dot(marginalized_cust_accept_trans, sensor_inspect)
sensor_inspect_accept_order_node

array([ 0.06419491,  0.43922511])

In [562]:
sensor_inspect_accept_order_node_marginal = sensor_inspect_accept_order_node/sensor_inspect_accept_order_node.sum()
sensor_inspect_accept_order_node_marginal

array([ 0.12751759,  0.87248241])

### Manual inspect after senor inspect node 

#### This is a combination of the sensor inspect and the inspector accuracy leaf nodes. We combine them below using a dot product

In [563]:
sensor_inspect

array([ 0.00857195,  0.45911795])

In [564]:
inspector_accuracy

0.961

In [565]:
manual_inspect_after_sensor_inspect_node = np.dot(inspector_accuracy,sensor_inspect)
manual_inspect_after_sensor_inspect_node

array([ 0.00823764,  0.44121235])

In [566]:
manual_inspect_after_sensor_inspect_node_marginal = manual_inspect_after_sensor_inspect_node/manual_inspect_after_sensor_inspect_node.sum()
manual_inspect_after_sensor_inspect_node_marginal

array([ 0.01832827,  0.98167173])

### Sensor manual inspect accept order node

#### This is a combination of the manual_inspect_after_sensor_inspect_node and the customer acceptance leaf node. We combine them below using a dot product

In [567]:
manual_inspect_after_sensor_inspect_node_marginal

array([ 0.01832827,  0.98167173])

In [568]:
marginalized_cust_accept_trans

Unnamed: 0,accept0,accept1
0,0.876543,0.123457
1,0.044153,0.955847


In [569]:
sensor_manual_inspect_accept_node = np.dot(marginalized_cust_accept_trans, manual_inspect_after_sensor_inspect_node_marginal)
sensor_manual_inspect_accept_node

array([ 0.13725956,  0.93913747])

In [570]:
sensor_manual_inspect_accept_node_marginal = sensor_manual_inspect_accept_node/sensor_manual_inspect_accept_node.sum()
sensor_manual_inspect_accept_node_marginal

array([ 0.12751759,  0.87248241])

# 3. Using the conditional probability distributions and the utility functions, compute and compare the utilities of the four possible inspection methods:
  - No inspection whatsoever. 
  - Inspection by human inspectors only.
  - Inspection with sensors only.
  - Inspection first with sensors and then with humans for cases where the sensors indicate the fruit may be bad. 

# U2 --> No inspection whatsoever. 

Here we take two approachs - one using the dot product of the input nodes and the other by explicitly doing the calculations by hand in order to double check our approach

In [598]:
fruit_qual_pcts

array([ 0.11161388,  0.88838612])

### U2


In [599]:
joint_prob_qual_accept = np.dot(fruit_qual_pcts, marginalized_cust_accept_trans)
joint_prob_qual_accept

array([ 0.13705907,  0.86294093])

In [600]:
utility_no_inspection = [-40, 20]

In [601]:
(joint_prob_qual_accept * utility_no_inspection)

array([ -5.48236284,  17.25881858])

In [602]:
print('Utility of no human inspection: {}'.format((joint_prob_qual_accept * utility_no_inspection).sum()))

Utility of no human inspection: 11.77645573592974


# U2


In [576]:
good_given_inspector_inaccurate = 838 * (1-inspector_accuracy)
good_given_inspector_inaccurate

32.68200000000003

In [577]:
good_given_inspector_accurate = 838 * inspector_accuracy
good_given_inspector_accurate

805.318

In [578]:
good_given_inspector_accurate/(good_given_inspector_inaccurate + good_given_inspector_accurate)

0.961

In [579]:
fruit_qual_pcts

array([ 0.11161388,  0.88838612])

In [580]:
prob_given_accuracy = fruit_qual_pcts * inspector_accuracy
prob_given_accuracy

array([ 0.10726094,  0.85373906])

In [581]:
prob_given_accuracy_marginal = prob_given_accuracy/prob_given_accuracy.sum()
prob_given_accuracy_marginal

array([ 0.11161388,  0.88838612])

In [582]:
utility_inspection = [-10]

In [583]:
prob_given_accuracy_marginal * utility_inspection

array([-1.11613876, -8.88386124])

In [584]:
(prob_given_accuracy * utility_inspection).sum()

-9.6099999999999994

# U3 Calc --> Inspection by human inspectors only.


In [585]:
prob_given_accuracy_marginal

array([ 0.11161388,  0.88838612])

In [586]:
marginalized_cust_accept_trans

Unnamed: 0,accept0,accept1
0,0.876543,0.123457
1,0.044153,0.955847


In [587]:
probabilities_of_accepting_updated = prob_given_accuracy_marginal * marginalized_cust_accept_trans
probabilities_of_accepting_updated

Unnamed: 0,accept0,accept1
0,0.097834,0.109677
1,0.004928,0.849161


In [588]:
marginal_probs =  probabilities_of_accepting_updated.sum(axis=1)
marginal_probs = marginal_probs/marginal_probs.sum()
marginal_probs

0    0.19547
1    0.80453
dtype: float64

In [589]:
# Double check
np.dot(marginalized_cust_accept_trans, prob_given_accuracy_marginal)

array([ 0.20751168,  0.8540895 ])

In [590]:
manual_inspection_utilities = np.array([-50, 10])

In [591]:
manual_inspection_utilities * marginal_probs

0   -9.773524
1    8.045295
dtype: float64

In [592]:
u3_expected_utility = (manual_inspection_utilities * marginal_probs).sum()
u3_expected_utility

-1.7282283530302731

In [593]:
print('Utility of inspection by human inspectors only: {}'.format(u3_expected_utility))

Utility of inspection by human inspectors only: -1.7282283530302731


In [594]:
# We then recompute the utilities below
calc_u4()

sensor_inspect_accept_order_node_marginal:  [ 0.12751759  0.87248241]
probability it will be sent for manual inspection: 0.12751759387032627

 manual_inspect_node:  0.122544407709

 manual_inspect_node_marginal:  [ 0.12254441  0.87745559]

 marginalized_cust_accept_trans     accept0   accept1
0  0.876543  0.123457
1  0.044153  0.955847

 probabilities_of_cust_accepting_sensor_manual:
     accept0   accept1
0  0.107415  0.108328
1  0.005411  0.838714

 marginal_probs: 0    0.19547
1    0.80453
dtype: float64

 manual_inspection_utilities: [-50  10]

 expected_utility: -1.72822835303

 double check expected_utility:  -1.72822835303
Utility of no human inspection: -1.7282283530302731


In [595]:
calc_u5()

sensor_manual_inspect_accept_node:  [ 0.13725956  0.93913747]

 utilities: [-40  20]

 expected_utility_u5: 13.2923670893

 double check: 13.2923670893


# Summary of utilities:


In [604]:
print('Utility of inspection by human inspectors only: {}'.format(u3_expected_utility))
print('Inspection with sensors, then with humans: {}'.format(expected_utility.sum()))
print('Utility of no human inspection: {}'.format((joint_prob_qual_accept * utility_no_inspection).sum()))
print('Inspection with sensors only: {}'.format(expected_utility_u5))


Utility of inspection by human inspectors only: -1.7282283530302731
Inspection with sensors, then with humans: -3.978050617065266
Utility of no human inspection: 11.77645573592974
Inspection with sensors only: 12.828245309570148


### Summary of challenge

* The challenge was extemely interesting and allowed us to explore many different approaches and deal with the tradeoffs and challenges which come from struturing the problem in different ways
* The decision of when to marginalize and when not to was a large source of confusion 
* Decisions related to when to transpose a matrix and when not to were also a challenge
* Overall it was a great learning experience