# Chapter 4: Probabilities and Proportions

Using the textbook by Wild & Seber as a guide, I document some statistical lessons in this notebook.

The primary emphasis of the chapter is providing a foundation for developing statistical theory. Most people have an intuitive feeling for probability, but care is needed as intuition can lead you astray if there isn't a sure foundation.

There are 

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline
import seaborn as sns
from scipy import stats, integrate

# Code formatting
%load_ext nb_black

<IPython.core.display.Javascript object>

## Introduction

xxxxx

## Coin Tossing and Probability Models

xxxxx

## Where do probabilities come from?

xxxxx

## Simple probability models

xxxxx

## Probability rules

xxxxx

## Conditional probability

xxxxx

## Statistical independence

## Summary

1. **Some basic ideas about probabilities**
    - how probabilities arise
    - the idea of a simple probability model
    - important but different notions of mutually exclusive events and statistically independent events
    - a probability depends on the infomration available
    - a formalization of this through conditional probability

2. **Some facility with manipulating probabilities and proportions**
    - Some standard types of probability manipulation that will be used repeatedly. For example:
    - **Adding** of probabilities of **mutually exclusive** events to get the probability that at least one of them occurs
    - **Multiplying** the probabilities of **independent events** to get the probability that all of them occur.
    - Also, some problems can best be solved by constructing tw-way tables of counts or proportions.

### Summary of concepts

1. **Probabilities** come frmo three main sources:
    - **Models** (idealizations such as the notion of equally likely outcomes, which suggest probabilities by symmetry)
    - **Data** (e.g. relative frequencies with which the event has occurred in the past)
    - **Subjective feelings** representing a degre of belief


2. A simple **probability model** consists of a sample space and a probability distribution.


3. The **sample space, S** for a random experiment is the set of all possible out comes of the experiment.


4. Probability distribution


5. An event is a collection of outcomes. An event occurs if any outcome making up that event occurs.


6. The probability of event A can be obtained by adding up the probabilities of all the outcomes in A.


7. If all outcomes are equally likely:
<br>
    $ \text{pr(A)} = \frac{\text{number of outcomes in A}}{\text{total number of outcomes}} $
<br>
<br>
8. The complement of an event A, denoted $\bar{A}$ occurs if A does not occur.


9. It is useful to represent events diagramatically using Venn diagrams.


10. A union of events, A or B, contains all outcomes in A or B (including those in both). It occurs if at least one of A or B occurs.


11. An intersection of events, A and B, contains all outcomes that are in both A and B. It occurs only if both A and B occur.


12. Mutually exclusive events cannot occur at the same time.


13. The conditional probability of A occurring given that B occurs is given by:
<br>
      $ \text{pr(A | B)} = \frac{\text{pr(A and B)}}{\text{pr(B)}} $
<br>
<br>
14. Events A and B are statistically independent if knowing whether B has cocurred gives no new information about the chances of A occurring, that is, if pr(A | B) = pr(A).

15. If events are physically independent, then under any sensible probability model they are also statistically independent.

16. Assuming that events are independent when in reality they are not can often lead to answers that are grossly too big or grossly too small.

### Summary of useful formulas

1. For discrete sample spaces, pr(A) can be obtained by adding the probabilities of all outcomes in A.
2. For equally likely outcomes in a finite sample space
<br>
    $ \text{pr(A)} = \frac{\text{number of outcomes in A}}{\text{total number of outcomes}} $
<br>
<br>

**General probability rules**
<br>
1. pr(S) = 1
2. pr($\bar{A}$) = 1 - pr(${A}$)
3. If A and B are mutually exclusive events, then 
<br>
    pr(A or B) = pr(A) + pr(B)    
<br>
4. If $A_1, A_2, ..., A_k$ are mutually exclusive events, then 
<br>
    pr($A_1 or A_2 or ... A_k$) = pr($A_1$) + pr($A_2$) + ... + pr($A_k$)
    
**Conditional probability**
<br>
1. Definition:
<br>
$ \text{pr(A | B)} = \frac{\text{pr(A and B)}}{\text{pr(B)}} $
<br>
2. Multiplication rule:
<br>
$ \text{pr(A and B)} = \text{pr(B|A)}\text{pr(A)} = \text{pr(A|B)}\text{pr(B)} $
<br>

**Independence**
<br>
1. If A and B are independent events, then
<br>
$ \text{pr(A and B)} = \text{pr(A)pr(B)} $
<br>
2. If $A_1, A_2, ..., A_k$ are mutually independent, then it follows that
<br>
    $pr(A_1 and A_2 and ... A_n) = pr(A_1)pr(A_2)...pr(A_n)$
    

## End of the chapter problems

### Bertrand's Box Problem

24. A historic problem called Bertrand's Box Problem. A box contains three drawers: one containing two gold coins, one containing two silver coins, and one containing one gold and one silver coin. A drawer is chosen at random, and a coin is randomly selected from that drawer. If the selected coin turns out to be gold, what is the probability that the chosen drawer is the one with two gold coins?

A. G,G
B. S,S
C. G,S

#### My application of Bayes

A. G,G
B. S,S
C. G,S

**Right answer - not 100% sure about reasoning but I think it's okay.**

Probability of it coming from Drawer B is 0. If it came from Drawer C, gold is 1 out of 2. If it came from Drawer A, gold is 2 out of 2.

P(A given drawing one gold coin) = P(A|G) = P(A & G) / P(G) 

P(A & G) = P(G|A)P(A) = P(A|G)P(G)

P(A|G) = P(G|A)P(A)/P(G)

= (1)*(1/3) / (1/2)

= 2/3

[Bertrand's box problem on Wikipedia](https://en.wikipedia.org/wiki/Bertrand%27s_box_paradox)

#### Reasoning

Let's say that each of the boxes has two drawers. The problem can be re-framed as, "If you randomly choose a box, and then find a gold coin in one of the drawers, what is the probability that the other will be a gold coin?"

You can eliminate box B (S,S). That leaves two, but the answer is NOT 1/2.

The coin you pick can be one of the following three gold coins:
G1 (box A)
G2 (box A)
G3 (box B)

Therefore, it's a 2/3 probability that it comes from box A.

#### Bayes solution (Wikipedia) - application of conditional probability

P(A|G)P(G) = P(G|A)P(G)

P(A|G) = P(G|A)P(A) / P(G)

Let's look at denominator
P(G) = P(G|A)P(A) + P(G|B)P(B) + P(G|C)P(C)
     = (1)*(1/3) + (1/2)*(1/3) + (0)*(1/3)
    
    
P(A|G) =   (1)*(1/3) / (1)*(1/3) + (1/2)*(1/3)

(1/3 cancel out)
       = 1 / 1.5
       = 2/3

#### Experimental / simulation approach

In [17]:
import random

random.sample(["A", "B", "C"], 1)

['A']

<IPython.core.display.Javascript object>

In [29]:
# Run a lot of trials

from collections import Counter
import random

boxes = ["A", "B", "C"]
coin_side = [1, 2]  # Assume gold is on side 1 of Box B

box_count = Counter()
trial_box_list = list()

trial_box_list_wgold = list()
box_count_wgold = Counter()

for trial in range(10000):
    # Randomly pick a box
    box = random.sample(boxes, 1)[0]
    trial_box_list.append(box)

    # Keep track of box distribution
    box_count[box] += 1

    # Randomly pick coin after picking a box
    if box == "A":
        side = random.sample(coin_side, 1)[
            0
        ]  # not really necessary for Box A but showing for completion
        if (side == 1) | (side == 2):
            trial_box_list_wgold.append(box)
            box_count_wgold[box] += 1  # Keep track of box with gold distribution

    elif box == "B":
        side = random.sample(coin_side, 1)[0]  # Assume gold is on side 1 of Box B
        if side == 1:
            trial_box_list_wgold.append(box)
            box_count_wgold[box] += 1  # Keep track of box with gold distribution

    # ignore box C

<IPython.core.display.Javascript object>

In [33]:
box_count_wgold["A"] / (box_count_wgold["A"] + box_count_wgold["B"])

0.6625049544193421

<IPython.core.display.Javascript object>

In [None]:
#### Other versions:

Card version:
    - 3 cards with colors on each side (BB, WW, BW)
    - One card is pulled and the side you see is black. What are the odds that the other side is black?
    
An alternative question: Pick a card at random. What are the odds that it has the same color on either side?
    

### Satellite signals

25. Digital data are transmitted as a sequence of signals that represent 0s and 1s. Suppose that data are being transmitted to a satellite and then relayed to a distant site. Suppose that, due to electrical interference in the atmostphere, there is a 1-in-1000 chance that a transmitted 0 will be reversed between the sender and sattelite (such that the satellite interprets it as a 1) and a 2-in-1000 chance that a transmitted 1 will be reversed. Suppose that 40% of transmitted digits are 0s.

a. What is the probability that a transmitted digit is correctly received by the satellite?
b. Assuming independence, what is the probability that all digits are received correctly (i) if 1000 digits are transmitted? (ii) if 10,000 are transmitted?