### Conjunction Fallacy: 

- A formal fallacy that occurs when it is assumed that specific conditions aremoremprobable than a single, general one

Example: 

```
Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. Which is more probable?

- Linda is a bank teller.

- Linda is a bank teller and is active in the feminist movement.
```

The second might seemmore consistent with the descriptions, but it can't be more probable. 
- If we find 1000 people who fit Linda's description and 10 work as bank tells, at most 10 are also feminists (so equal probability, not more probability). 

In [1]:
# Load the data file
# Source: https://github.com/AllenDowney/ThinkBayes2/blob/master/notebooks/chap01.ipynb

from os.path import basename, exists

def download(url):
    filename = basename(url)
    if not exists(filename):
        from urllib.request import urlretrieve
        local, _ = urlretrieve(url, filename)
        print('Downloaded ' + local)
    
download('https://github.com/AllenDowney/BiteSizeBayes/raw/master/gss_bayes.csv')

In [2]:
import pandas as pd

gss = pd.read_csv('gss_bayes.csv', index_col=0)
gss.head()

Unnamed: 0_level_0,year,age,sex,polviews,partyid,indus10
caseid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,1974,21.0,1,4.0,2.0,4970.0
2,1974,41.0,1,5.0,0.0,9160.0
5,1974,58.0,2,6.0,1.0,2670.0
6,1974,30.0,1,5.0,4.0,6870.0
7,1974,48.0,1,5.0,4.0,7860.0


### Basic Probability:

"banking and related activities" is code 6870

Thoughts:
- I appreciate the simplicity of the definition of probability here. Similar to Stat Rethinking, at the basic end we are really just thinking of a subset meeting conditions relative to some other set, and that is a nice way to consider what can be a complex topic.

In [3]:
banker = (gss['indus10'] == 6870)
banker.sum() # total count 

728

In [4]:
banker.mean() # fraction of bankers, or probability of getting a banker when choosing ranodmly

0.014769730168391155

In [5]:
def prob(A):
    """Computes the probability of a proposition, A"""
    return A.mean()

# prob of females
# Note: higher than US pop since it doesn't include
# people in institutions like prisons / military which
# tend to have higher rates of male. 
female = (gss['sex'] == 2)
prob(female)

0.5378575776019476

In [6]:
# Liberal is 1-3
liberal = (gss['polviews'] <= 3)
prob(liberal) 

0.27374721038750255

In [7]:
# dems
democrat = (gss['partyid'] <= 1)
prob(democrat)


0.3662609048488537

### Conjunction: 

This is just the logical `and` operation.

If we have two `propositions`, `A` and `B`, then conjunction A and B is True if both A and B are True. Otherwise it is false. 

In [8]:
import numpy as np
A = np.array([1,1,0])
B = np.array([0,1,1])
A & B

array([0, 1, 0])

In [9]:
# We can now revisit the problem above, except with democrat
prob(banker)

0.014769730168391155

In [10]:
prob(democrat)

0.3662609048488537

In [11]:
# conjunction: Banker and Democrat:
prob(banker & democrat)

0.004686548995739501

#### Comments:

As we'd expect, the `prob(b & D)` is less than `prob(b)` since not all bankers are Democrats. 

We expect conjunction to be commutative, `A&B` should be equal to `B&A`. 
- This is obvious here as the total Trues from the conjunction does not change based on ordering. 

### Conditional Probability: 

We now move to a probability that is dependent on a condition. 

- What is the prob of being a Democrat, given they are liberal?
- What is the prob of being female, given they are a banker?

This method relies on some event B occurring with some relation to another event A. E.g. The person is liberal, how does this impact prob of being democrat? 

Basic Steps:
1) Select all respondents who are liberal

2) Compute the fraction of selected respondents who are Democrats

In [12]:
# subset to libs
liberal = gss[gss['polviews'] <= 3]

# within liberals, who is democratic?
dem_lib = (liberal['partyid'] <= 1)

# pass in prob
prob(dem_lib)

0.5206403320240125

#### 

What is the probability that a respondent is female given they are a banker?

In [13]:
banker = gss[gss['indus10'] == 6870] # subset to bankers
fem_banker = (banker['sex'] == 2)
prob(fem_banker)

0.7706043956043956

In [14]:
def conditional(proposition, given):
    """Probability of A conditioned on given"""
    return prob(proposition[given])

# banker given - series of T/F
banker = (gss['indus10'] == 6870)

# female proposition - series of T/F
female = (gss['sex'] == 2)

conditional(female, banker)

0.7706043956043956

In [15]:
# Liberal given female
conditional((gss['polviews'] <= 3), given = female)

0.27581004111500884

#### More Thoughts:

- Conditional Probability is not commutative

In [16]:
# proof of conditional not being commutative

# given banker, likelihood of female
print(conditional(female, banker))

# given female, likelihood of banker
print(conditional(banker, female))

0.7706043956043956
0.02116102749801969


### 
Condition and Conjunction:

In [17]:
liberal = (gss['polviews'] <= 3)
democrat = (gss['partyid'] <= 1)

# given lib & dem, likelihood of being female
print(conditional(female, given=liberal & democrat))

# given banker, likelihood of lib & fem
print(conditional(liberal & female, given = banker))


0.576085409252669
0.17307692307692307


### Laws of Prob: 

- T1: Using a conjunction to compute a conditional prob

$P(A|B) = \frac{P(A and B)}{P(B)}$

Answers the question: What fraction of bankers are female?
- Solve ratio of two probabilities:
    - respondents who are female bankers over respondents who are bankers 

- Inuition:
    - We start by finding the population rate of female & banker.
    - We then normalize this by the population rate of banker, so assuming banker, what percentage are female?

In [18]:
# approach 1
female[banker].mean()

0.7706043956043956

In [19]:
# approach 2:
conditional(female, given = banker)

0.7706043956043956

In [20]:
# approach 3: ratio 
prob(female & banker) / prob(banker)

0.7706043956043956

In [21]:
prob(female & banker)

0.011381618989653074

T2: Using a conditional probability to compute a conjunction

$P(A \&\ B) = P(B) P(A|B)$

Intuiton: 
- We know likelihood of A given B occurred. 
- To determine A & B conjunction we multiply how likely proposition B is. 


In [22]:
# method 1
prob(liberal & democrat)

0.1425238385067965

In [23]:
# method 2
prob(democrat) * conditional(liberal, democrat)

0.1425238385067965

T3: Baye's Theorem, using conditional(A,B) to compute conditional(B,A)

$P(A|B) = \frac{P(A)P(B|A)}{P(B)}$

Intuition: 
- Numerator is just `P(A & B)`
- We then normalize by dividing by P(B), which is just the likelihood of proposition B. 

In [24]:
conditional(liberal, given=banker)

0.2239010989010989

In [25]:
prob(liberal) * conditional(banker, liberal) / prob(banker)

0.2239010989010989

### Law of Total Probability: 

Below assumes B is comprised of B1 and B2

$P(A) = P(B_{1} \&\ A) + P(B_{2} \&\ A)$

Applies if:
- B1 & B2 are mutually exclusive (can't both be true)
- Collectively exhaustive (one must be true) 

In [26]:
male = (gss['sex'] == 1)
female = (gss['sex'] == 2)

assert prob(banker) == (prob(male & banker) + prob(female & banker)), "Law of total prob failed!"

#### 

MECE - Mutually Exclusive and Collectivelty Exhaustive

Can write this way applying Theorem 2:

$P(A) = P(B_{1})P(A|B_{1}) + P(B_{2})P(A|B_{2})$

In [27]:
(prob(male) * conditional(banker, given=male) +
prob(female) * conditional(banker, given=female))

0.014769730168391153

And as a summation: 

$P(A) = \sum_{i} P(B_{i})P(A|B_{i})$

In [28]:
# Gather political views (1 - 7) as B
B = gss['polviews']
B.value_counts().sort_index()

1.0     1442
2.0     5808
3.0     6243
4.0    18943
5.0     7940
6.0     7319
7.0     1595
Name: polviews, dtype: int64

In [29]:
%%timeit 500
# Generator expression
sum(prob(B==i) * conditional(banker, B==i)
    for i in range(1, 8))

2.8 ms ± 11.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [30]:
%%timeit 500
# List comp
sum([prob(B==i) * conditional(banker, B==i) 
     for i in range(1, 8)])

2.8 ms ± 6.93 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [31]:
%%timeit 500

# Loop
my_sum = 0
for i in range(1,8):
    my_sum += prob(B==i) * conditional(banker, B==i) 

2.81 ms ± 15 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
