# Chapter 1. Probability
[Link to chapter online](https://allendowney.github.io/ThinkBayes2/chap01.html)

## Warning

The content of this file may be incorrect, erroneous and/or harmful. Use it at Your own risk.

## Imports

In [None]:
import CSV as Csv
import DataFrames as Dfs
import StatsBase as SB
import Statistics as Stats

## Functionality developed in this chapter

In [None]:
function getProb(v::BitVector)::Float64
    return Stats.mean(v)
end

In [None]:
function getCondProb(
    proposition::BitVector, given::BitVector
    )::Float64
    return getProb(proposition[given])
end

## Linda the Banker

> Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. Which is more probable?
> 1. Linda is a bank teller.
> 2. Linda is a bank teller and is active in the feminist movement.

## Probability

[Link to the data file (gss_bayes.csv) online](https://raw.githubusercontent.com/AllenDowney/BiteSizeBayes/master/gss_bayes.csv)

In [None]:
gss = Dfs.DataFrame(Csv.File("./gss_bayes.csv"))
first(gss, 5)

## Fraction of bankers

The code for “Banking and related activities” is 6870, so we can select bankers like this:

In [None]:
banker = (gss.indus10 .== 6870)
first(banker, 5)

In [None]:
sum(banker)

In [None]:
Stats.mean(banker)

## The probability function

<pre>
function getProb(v::BitVector)::Float64
    return Stats.mean(v)
end
</pre>

In [None]:
getProb(banker)

columng `gss.sex` coding:
1. male
2. female

In [None]:
female = (gss.sex .== 2);

In [None]:
getProb(female)

## Political Views and Parties

`polviews` coding:
1. Extremely liberal
2. Liberal
3. Slightly liberal
4. Moderate
5. Slightly conservative
6. Conservative
7. Extremely conservative

In [None]:
liberal = (gss.polviews .<= 3);

In [None]:
getProb(liberal)

`partyid` coding:
0. Strong democrat
1. Not strong democrat
2. Independent, near democrat
3. Independent
4. Independent, near republican
5. Not strong republican
6. Strong republican
7. Other party

In [None]:
democrat = (gss.partyid .<= 1);

In [None]:
getProb(democrat)

## Conjunction

“Conjunction” is another name for the logical `and` operation. If you have two propositions, `A` and `B`, the conjunction `A` and `B` is `True` if both `A` and `B` are `True`, and `False` otherwise.

In [None]:
getProb(banker)

In [None]:
getProb(democrat)

In [None]:
getProb(banker .& democrat)

As we should expect, `getProb(banker .& democrat)` is less than `getProb(banker)`, because not all bankers are Democrats.

We expect conjunction to be commutative; that is, `A & B` should be the same as `B & A`. To check, we can also compute `getProb(democrat & banker)`:

In [None]:
getProb(democrat .& banker)

## Conditional Probability

Questions:
- What is the probability that a respondent is a Democrat, given that they are liberal?
- What is the probability that a respondent is female, given that they are a banker?
- What is the probability that a respondent is liberal, given that they are female?

In [None]:
democratGivenLiberal = democrat[liberal];

In [None]:
getProb(democratGivenLiberal)

In [None]:
femaleGivenBanker = female[banker]
getProb(femaleGivenBanker)

In [None]:
getCondProb(liberal, female)

## Conditional Probability is Not Commutative

In [None]:
getCondProb(female, banker)

About 77% of bankers are female

In [None]:
getCondProb(banker, female)

About 2% of female are bankers

## Condition and Conjunction

In [None]:
getCondProb(female, liberal .& democrat)

About 57% of liberal Democrats are female.

In [None]:
getCondProb(liberal .& female, banker)

About 17% of bankers are liberal women.

## Laws of Probability

Three theorems (relationships between conjunction and conditional probability), We can:
- Theorem 1: use a conjunction to compute a conditional probability.
- Theorem 2: use a conditional probability to compute a conjunction.
- Theorem 3: use `conditional(A, B)` to compute `conditional(B, A)`.

Theorem 3 is also known as Bayes’s Theorem.

The three theorems rewriten matematically using the following symbols:
- $P(A)$ is the probability of proposition $A$
- $P(A~\mathrm{and}~B)$ is the probability of the conjuction of $A$ and $B$, that is, the probability that both are true.
- $P(A | B)$ is the conditional probability of $A$ given that $B$ is true. The vertical line between $A$ and $B$ is pronounced "given".

### Theorem 1

What fraction of bankers are female?

In [None]:
(
    Stats.mean(female[banker]),
    # or
    getCondProb(female, banker)
)

Another way to compute this conditional probability:
1. The fraction of respondents who are female bankers, and
2. The fraction of respondents who are bankers

In other words:

$P(A | B) = \frac{P(A~\mathrm{and}~B)}{P(B)}$

In [None]:
getProb(female .& banker) / getProb(banker)

### Theorem 2

$P(A | B) = \frac{P(A~\mathrm{and}~B)}{P(B)}$

multiply both sides by $P(B)$

$P(A|B) ~ P(B) = P(A~\mathrm{and}~B)$

we swap sides

$P(A~\mathrm{and}~B) = P(A|B) ~ P(B)$

and again we swap sides, but just for the right side (voila we got Theorem 2)

$P(A~\mathrm{and}~B) = P(B) ~ P(A|B)$

In [None]:
(
    getProb(liberal .& democrat),
    # or
    getProb(democrat) * getCondProb(liberal, democrat)
)

### Theorem 3

We have established that conjunction is commutative:

$P(A~\mathrm{and}~B) = P(B~\mathrm{and}~A)$

We apply Theorem 2, so:

$P(A~\mathrm{and}~B) = P(B)~P(A|B)$

and by analogy

$P(B~\mathrm{and}~A) = P(A)~P(B|A)$

to both side we get:

$P(B)~P(A|B) = P(A)~P(B|A)$

Here’s one way to interpret that: if you want to check $A$ and $B$, you can do it in either order:
1. You can check $B$ first, then $A$ conditioned on $B$, or
2. You can check $A$ first, then $B$ conditioned on $A$.

If we divide both sides by $P(B)$ we get Theorem 3:

$P(A|B) = \frac{P(A)P(B|A)}{P(B)}$

That is the **Bayes's Theorem**

In [None]:
# fraction of bankers who are liberal

(
    getCondProb(liberal, banker),
    # or
    getProb(liberal) * getCondProb(banker, liberal) / getProb(banker)
)

## The Law of Total Probability

Below the law of total probability expressed mathematically

$P(A) = P(B_1 ~\mathrm{and}~A) + P(B_2 ~\mathrm{and}~A)$

Assumptions regarding $B_1$ and $B_2$, both the events are:
- Mutually exclusive (only one of them can be true), and
- Collectively exhaustive (one of them must be true)

In [None]:
# P(A) from the formula above (calculated directly)
getProb(banker)

In [None]:
male = (gss.sex .== 1)
# P(A) from the formula above (calculated using the formula above)
getProb(male .& banker) + getProb(female .& banker)

By applying Theorem 2, i.e.

$P(A~\mathrm{and}~B) = P(B) ~ P(A|B)$

*In Theorem 2 (sub-chapter above) we said about commutativity, so $P(A\ and\ B) = P(B\ and\ A)$*, so I can write:

$P(B~\mathrm{and}~A) = P(B) ~ P(A|B)$

And use right side of $P(B\ and\ A)$ to apply to the law of total probability:

$P(A) = P(B_1 ~\mathrm{and}~A) + P(B_2 ~\mathrm{and}~A)$

to get:

$P(A) = P(B_1)P(A|B_1) + P(B_2)P(A|B_2)$

In [None]:
# lets test the last formula
(getProb(male) * getCondProb(banker, male)) +
(getProb(female) * getCondProb(banker, female))

We can shorten the last formula to:

$P(A) = \sum_i P(B_i) P(A|B_i)$

In [None]:
# in this scale 4.0 represents "Moderate"
countsPolviews = SB.countmap(gss.polviews)
for k in sort(collect(keys(countsPolviews)))
    println(k, " => ", countsPolviews[k])
end

In [None]:
# probability of a moderate banker
moderate = (gss.polviews .== 4)
(
    getProb(moderate) * getCondProb(banker, moderate),
    getProb(moderate .& banker)
)

In [None]:
(
    getProb(banker),
    [
        getProb(gss.polviews .== i)  *
        getCondProb(banker, gss.polviews .== i)
        for i in 1:7
    ] |> sum
)

## Summary

**Theorem 1.** Computing conditional probability using conjunction

$P(A|B) = \frac{P(A\ and\ B)}{P(B)}$

**Theorem 2.** Computing conjunction using a conditional probability

$P(A\ and\ B) = P(B) P(A|B)$

**Theorem 3.** Bayes's Theorem, a way to get from $P(A|B)$ to $P(B|A)$, or the other way around:

$P(A|B) = \frac{P(A)P(B|A)}{P(B)}$

**The Law of Total Probability.** A way to compute probabilities by adding up the pieces:

$P(A) = \sum_i P(B_i) P(A|B_i)$

## Exercises

### Exercise 1

Compute;
- The probability that Linda is a female banker,
- The probability that Linda is a liberal female banker, and
- The probability that Linda is a liberal female banker and a Democrat.

In [None]:
# female banker
getProb(female .& banker)

In [None]:
getProb(female .& banker .& liberal)

In [None]:
getProb(female .& banker .& liberal .& democrat)

### Exercise 2

Compute:
- What is the probability that a respondent is liberal, given that they are a Democrat?
- What is the probability that a respondent is a Democrat, given that they are liberal?

In [None]:
getCondProb(liberal, democrat)

In [None]:
getCondProb(democrat, liberal)