# Definitions

- Population: Entire group of individuals for which we want to perform inference.
- Sample: The data set; we want the sample to be representative of the population.
- Random Sample: Every individual has an equal chance of being selected.
- Experiment: When you observe one sample.
- Outcome: Set of things that could happen.
- Sample Space: Set of all outcomes
- Event: Set of outcomes
- Observation / trial: The number of sub-experiments within the larger experiment; e.g., one coin toss
- Random variable: Function from the outcomes to a number
- Parameter: Unknown value that we want to estimate
- Statistic: Summary of our data (e.g., sample mean)

$\hat{\mu} = \bar{x}$

$\hat{\sigma} = s$

# Coin tossing example

- Experiment: Toss a coin three times
- Population: All potential coin tosses (there are infinitely many)
- Sample: Three coin tosses
- Outcomes: HHH, HHT, HTH, HTT, THH, THT, TTH, TTT
- Sample Space: {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}
- Events: There's a lot...all potential collections of the above outcomes
- Example 1 of random variable: X = 1, 2, ..., 8
- Example 2 of random variable: Y = 1 if H first; otherwise 0
- Example 3 of random variable: Z = Number of heads: 0, 1, 2, 3

- P(Z = 0) = 1/8
- P(Z = 1) = 3/8
- P(Z = 2) = 3/8
- P(Z = 3) = 1/8
- {HHH}, {HHH, HHT}, {HHH, HTH}, {HHH, HHT, HTH} ...

P(Z = 1) = P({HTT, THT, TTH}) = 3/8

Suppose you're analyzing a multiple choice exam with 6 questions, where each question has 4 options and only one is correct. Suppose the students guess at random. What's the probability they get exactly 2 correct?

- Outcomes: aaaaaa, aaaaab, ...
- Sample: One exam (with 6 questions) with answers from a student
- Population: Exams from all students in the class

Let's use the binomial distribution
- Outcomes: cccccc, ccccci, ...
- p: 1/4 = 25%
- n: 6

$X \sim binomial(p=1/4, n=6)$

X is the number of correct answers

We need to calculate $P(X = 2)$

$\frac{n!}{x!(n-x)!} p^x (1-p)^{n-x}$

$\frac{6 * 5}{2} (1/4)^2 (3/4)^4$

In [2]:
15 * (1/4)**2 * (3/4)**4

0.296630859375

In [5]:
from scipy.stats import binom

In [13]:
1 - binom.cdf(k=11, n=20, p=3/4)

0.9590748322934814

- Mutually Exclusive: Either A occurs or B occurs
- P(A or B) = P(A) + P(B) (if mutually exclusive)
- P(A or B) = P(A) + P(B) - P(A and B) (in general)
- Independence: The probability of one event doesn't impact the probability of another
- Mathematically, this means P(A and B) = P(A) * P(B)

$ P(A | B) = \frac{P(A\ and\ B)}{P(B)}$

$P(A) = P(A | B)= \frac{P(A\ and\ B)}{P(B)} = \frac{P(A) P(B)}{P(B)} = P(A)$