# Defendant's Fallacy

For more, see this very thorough [treatment](https://www.untrammeledmind.com/2018/12/defense-attorneys-fallacy-a-conditional-probability-problem/) of this problem.

In [1]:
import numpy as np
import pandas as pd
from custom.defendants_fallacy import data_generator, data_tests

In [2]:
# Generate and load data
df = data_generator()
data_tests(df)

No errors!


Note, we fabricated data on a number of couples, $N = 5,000,000$, and assume that the unconditional probability of a wife being murdered to be $P(M) = 0.0005$.

## Motivation 👨🏼‍⚖️

Consider this quote from *Calculated Risks: How to Know When Numbers Deceive You* (Gigerenzer, 2002) regarding the proceedings of the infamous trial of OJ Simpson: 
    
> …the prosecution presented evidence that Simpson had been violent toward his wife, while the defense argued that there was only one woman murdered for every 2,500 women who were subjected to spousal abuse, and that any history of Simpson being violent toward his wife was irrelevant to the trial.

### Question

*Should* this fact be thrown out on the grounds of irrelevance? 

To answer this question, let's use data. Here's what we know using the data on hand and what's been said in court:

- A woman has been murdered, and her husband is accused of having committed the murder. 
- It is known that the man abused his wife repeatedly in the past, and the prosecution argues that this is important evidence pointing towards the man’s guilt. The defense attorney says that the history of abuse is irrelevant, as only 1-in-2500 men who beat their wives end up murdering them.
- Assume that the defense attorney is not committing perjury and the 1-in-2500 figure is correct.
- Our data tell us that half of men who murder their wives previously abused them. 
- Our data also tell us that 20% of murdered married women were killed by their husbands, and that if a woman is murdered and the husband is not guilty, then there is only a 10% chance that the husband abused her. 

## Analysis

Below, we start down the path of trying to figure out whether it matters that the husband abused his wife or not. 

### Step 0

We need to define some events. (Hint: Look at the column headers in our data.) Begin to think about Bayes' Rule...🤔

In [3]:
df.columns

Index(['M', 'G', 'A'], dtype='object')

In [5]:
df.head()

Unnamed: 0,M,G,A
0,0,0,1
1,0,0,1
2,0,0,1
3,0,0,1
4,0,0,1


In [7]:
for col in df:
    display(df[col].unique())

array([0, 1])

array([0, 1])

array([1, 0])

These columns all look like *indicator* random variables. We can also look at the relative frequencies, or the *probabilities*, of each of these events:

In [6]:
df.mean()

M    0.0005
G    0.0001
A    0.1250
dtype: float64

In [22]:
## TODO: Explore the data on your more if you desire.
pd.crosstab(df.M, df.G)

G,0,1
M,Unnamed: 1_level_1,Unnamed: 2_level_1
0,4997500,0
1,2000,500


### Step 1 

Inspect the data. What does it mean? Can you confirm the facts drawn from our data (see above)?

- Show that the defense attorney is correct, i.e., $P(G|A) = \frac{1}{2500}$.
- Show that half of men who murder their wives previously abused them. 
- Show that 20% of murdered married women were killed by their husbands
- Show that if a woman is murdered and the husband is not guilty, then there is only a 10% chance that the husband abused her. 

In [20]:
## TODO: Confirm the facts of the case using the data.

# p_G_given_A = 
print('P(G|A) = {}'.format(df[df['A']==1]['G'].mean()))

# p_A_given_G_and_M = 
print('P(A|G and M) = {}'.format(df[(df['G']==1) & (df['M']==1)]['A'].mean()))

# p_G_given_M = 
print('P(G|M) = {}'.format(df[(df['M']==1)]['G'].mean()))

# p_A_given_notG_and_M = 
print('P(A|notG and M) = {}'.format(df[(df['G']==0) & (df['M']==1)]['A'].mean()))

P(G|A) = 0.0004
P(A|G and M) = 0.5
P(G|M) = 0.2
P(A|notG and M) = 0.1


### Step 2 

Hope you've been pondering Bayes' Theorem. What is the _prior_ probability that the husband is guilty of murdering his wife?

- Calculate $P(G|M)$.

In [23]:
## TODO: Calculate the prior probability of guilt given the wife has been murdered.
##       Ask yourself if you've already done this calculation...
print('P(G|M) = {}'.format(df[(df['M']==1)]['G'].mean()))

P(G|M) = 0.2


We will need to compare our *posterior* to this to see if the husband is more likely of guilt. If so, that implies that the defense attorney is trying to pull the wool over the jury's eyes!

### Step 3

Figure out what the posterior probability of guilt is.

- You know that the posterior probability is $P(G|...)$. 
- Think about what event(s) to condition on (i.e., what evidence do we have?).
- Drawing a tree diagram might help.

In [27]:
## TODO: Calculate the posterior probability of guilt. 
##       Do this either analytically or numerically (directly on the data)

#### Numerically

In [30]:
print('P(G|A and M) = {}'.format(np.round(df[(df['A']==1) & (df['M']==1)]['G'].mean(), 4)))

P(G|A and M) = 0.5556


#### Analytically

$$
\begin{split}
& P(G | A \cap M) & & = \frac{P(G \cap A \cap M)}{P(A \cap M)} \\ \\
& & & = \frac{P(G \cap A \cap M)}{P(A | M)P(M)} \\ \\
& & & = \frac{P(A | G \cap M)P(G \cap M)}{P(A | M)P(M)} \\ \\
& & & = \frac{P(A | G \cap M)P(G | M)P(M)}{P(A | M)P(M)} \\ \\
& & & = \frac{P(A | G \cap M)P(G | M)}{P(A | M)} \\ \\
& & & = \frac{P(A | G \cap M)P(G | M)}{P(A | M)} \\ \\
\end{split}
$$