# Notes

## Set-up

### Packages

In [1]:
import os
from urllib.request import urlretrieve

import numpy as np
import pandas as pd

from const import DATA_DIR

### Load Data

In [2]:
def get_data_path(filename):
    return os.path.join(DATA_DIR, filename)


def download(url):
    filename = os.path.basename(url)
    filepath = get_data_path(filename)
    if not os.path.exists(filepath):
        local, _ = urlretrieve(url, filepath)
        print('Downloaded ' + filename)

    
download("https://github.com/AllenDowney/ThinkBayes2/raw/master/data/gss_bayes.csv")

gss = pd.read_csv(get_data_path("gss_bayes.csv"), index_col=0)
gss.head()

Unnamed: 0_level_0,year,age,sex,polviews,partyid,indus10
caseid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,1974,21.0,1,4.0,2.0,4970.0
2,1974,41.0,1,5.0,0.0,9160.0
5,1974,58.0,2,6.0,1.0,2670.0
6,1974,30.0,1,5.0,4.0,6870.0
7,1974,48.0,1,5.0,4.0,7860.0


## Probability

**Definition:** A *probability* is a fraction of a finite set.

*Example:* The probability a GSS respondent is male is the fraction that the respondents that are male.

In [3]:
def prob(A):
    """Probability of an event, A, a boolean Series."""
    return A.mean()


male = (gss["sex"] == 1)
prob(male)

0.46214242239805237

## Conjunction

**Conjunction**: The logical `and` operation. The probability of the conjunction `A` and `B` is the fraction for which both `A` and `B` are true.

In [4]:
banker = (gss['indus10'] == 6870)
prob(male & banker)

0.003388111178738081

This is commutative:

In [5]:
prob(banker & male)

0.003388111178738081

## Conditional Probability

*Example:* What is the probability that a respondent is male, given that they are a banker? We interpret this as: "Of all the respondents who are bankers, what fraction are male?”.

To answer this we:
- Restrict to bankers.
- Compute the fraction that are male.

In [6]:
prob(male[banker])

0.22939560439560439

In [7]:
def conditional(proposition, given):
    """Probability of the proposition, conditional on given."""
    return prob(proposition[given])

conditional(male, given=banker)

0.22939560439560439

Conditional probability isn't commutative:

In [8]:
conditional(banker, given=male)

0.007331313929496466

## Condition and Conjunction

We can combine conditions and conjunctions:

In [9]:
liberal = (gss['polviews'] <= 3)

conditional(liberal, given=banker & male)

0.2215568862275449

In [10]:
conditional(liberal & male, given=banker)

0.050824175824175824

## Laws of Probability

### Theorem 1

\begin{equation}
    P(A|B) = \frac{P(A \text{ and } B)}{P(B)}.
\end{equation}

In [11]:
print(conditional(male, given=banker))
print(prob(male & banker) / prob(banker))

0.22939560439560439
0.22939560439560439


### Theorem 2

\begin{equation}
    P(A \text{ and } B) = P(A|B) \cdot P(B).
\end{equation}

In [12]:
print(prob(male & banker))
print(conditional(male, given=banker) * prob(banker))

0.003388111178738081
0.003388111178738081


### Theorem 3 (Bayes)

\begin{equation}
    P(A|B) \cdot P(B) = P(B|A) \cdot P(A)
\end{equation}

or 

\begin{equation}
    P(A|B)= \frac{P(B|A) \cdot P(A)}{P(B)}.
\end{equation}

In [13]:
print(conditional(male, given=banker) * prob(banker))
print(conditional(banker, given=male) * prob(male))

0.003388111178738081
0.003388111178738081


## The Law of Total Probability

If $B_1$ and $B_2$ are Mutually Exclusive and Collective Exhaustive (MECE) then

\begin{align}
    P(A)
        & = P(A \text{ and } B_1) + P(A \text{ and } B_2) \\
        & = P(A | B_1) \cdot P(B_1) + P(A | B_2) \cdot P(B_2).
\end{align}

In [14]:
print(prob(banker))
print(prob(male & banker) + prob(~male & banker))

0.014769730168391155
0.014769730168391155


If $B_1,\ldots, B_n$ are MECE then

\begin{equation}
    P(A) = \sum_{i=1}^n P(A | B_i) \cdot P(B_i).
\end{equation}

In [15]:
gss["polviews"].value_counts().sort_index()

polviews
1.0     1442
2.0     5808
3.0     6243
4.0    18943
5.0     7940
6.0     7319
7.0     1595
Name: count, dtype: int64

In [16]:
print(prob(male))

0.46214242239805237


In [17]:
prob_sum = 0
for i in range(1, 8):
    condition = (gss["polviews"] == i)
    prob_sum += conditional(male, given=condition) * prob(condition)

print(prob_sum)

0.4621424223980523


This can be written more concisely using a *generator expression* (analogous to a list comprehension):

In [18]:
polview = gss["polviews"]
sum(conditional(male, given=(polview==i)) * prob(polview==i) for i in range(1, 8))

0.4621424223980523

>Many of the methods in this book are based on discrete distributions, which makes some people worry about numerical errors. But for real-world problems, numerical errors are almost always smaller than modeling errors.
>
>Furthermore, the discrete approach often allows better modeling decisions, and I would rather have an approximate solution to a good model than an exact solution to a bad model.

# Exercises

## Exercise 1

>Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. Which is more probable?
>
>Linda is a banker.
>
>Linda is a banker and considers herself a liberal Democrat.

To answer this question, compute
- The probability that Linda is a female banker,
- The probability that Linda is a liberal female banker, and
- The probability that Linda is a liberal female banker and a Democrat.

### Solution

In [25]:
female = (gss["sex"] == 2)
banker = (gss['indus10'] == 6870)
liberal = (gss['polviews'] <= 3)
democrat = (gss['partyid'] <= 1)

print(f"Probability Linda is a female banker: {prob(female & banker):.4f}")
print(f"Probability Linda is a liberal female banker: {prob(liberal & female & banker):.4f}")
print(f"Probability Linda is a liberal female banker and a Democrat: {prob(liberal & female & banker & democrat):.4f}")

Probability Linda is a female banker: 0.0114
Probability Linda is a liberal female banker: 0.0026
Probability Linda is a liberal female banker and a Democrat: 0.0012


## Exercise 2

Use conditional to compute the following probabilities:

- What is the probability that a respondent is liberal, given that they are a Democrat?
- What is the probability that a respondent is a Democrat, given that they are liberal?

Think carefully about the order of the arguments you pass to conditional.

### Solution

In [26]:
print(f"Probability respondent is liberal, given they are a Democrat: {conditional(liberal, given=democrat):.2f}")
print(f"Probability respondent is liberal, given they are a Democrat: {conditional(democrat, given=liberal):.2f}")

Probability respondent is liberal, given they are a Democrat: 0.39
Probability respondent is liberal, given they are a Democrat: 0.52


## Exercise 3

There’s a famous quote about young people, old people, liberals, and conservatives that goes something like:

>If you are not a liberal at 25, you have no heart. If you are not a conservative at 35, you have no brain.

Use prob and conditional to compute the following probabilities.

- What is the probability that a randomly chosen respondent is a young liberal?
- What is the probability that a young person is liberal?
- What fraction of respondents are old conservatives?
- What fraction of conservatives are old?

### Solution

In [30]:
# from the book
young = (gss['age'] < 30)
old = (gss['age'] >= 65)
conservative = (gss['polviews'] >= 5)

print(f"Probability that a randomly chosen respondent is a young liberal: {prob(young & liberal):.3f}")
print(f"Probability that a young person is liberal: {conditional(liberal, given=young):.3f}")
print(f"Fraction of respondents that are old conservatives: {prob(old & conservative):.3f}")
print(f"Fraction of conservatives are old: {conditional(old, given=conservative):.3f}")

Probability that a randomly chosen respondent is a young liberal: 0.066
Probability that a young person is liberal: 0.339
Fraction of respondents that are old conservatives: 0.067
Fraction of conservatives are old: 0.196
