In [13]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import math
import random


# Chapter 1: Probability and Distributions
## Notes on Mathematical Statistics

Main reference: Intro to Mathematical Statistics by Hogg



# 1.1 Introduction

This is content from your first year probability course
## Definitions:
**Random Experiment**:

**Sample Space**: $\mathcal{C}$

### Example 1.1.1
Coin toss

### Example 1.1.2
Roll die



In [22]:
#Example 1.1.1: Coin Toss
def probability_k_heads(k, n):
  """Calculates the probability of getting k heads in n coin tosses.

  Args:
    n: The number of coin tosses.
    k: The number of heads.

  Returns:
    The probability of getting k heads in n tosses, or 0 if the input is invalid.
  """
  if not isinstance(n, int) or not isinstance(k, int) or n < 0 or k < 0 or k > n:
    return 0  # Handle invalid inputs

  combinations = math.comb(n, k)
  probability = combinations * (0.5 ** n)

  return probability


#Example 1.1.2: Roll die
#n times, k sides

def roll_dice(j, k, n):
  """Rolls j dice with k sides n times.

  Args:
    j: The number of dice.
    k: The number of sides on each die.
    n: The number of times to roll the dice.

  Returns:
    A list of lists, where each inner list represents the outcome of a single roll of j dice.
  """
  results = []
  for _ in range(n):
    roll_results = []
    for _ in range(j):
      roll_results.append(random.randint(1, k))
    results.append(roll_results)
  return results


In [23]:
print(probability_k_heads(1,4))
print(roll_dice(2,5,10))


0.25
[[4, 2], [2, 2], [5, 5], [5, 2], [2, 3], [3, 5], [2, 3], [3, 4], [1, 4], [1, 2]]


### Example 1.1.3
Every time pair of die roll sum is equal to seven.

In [None]:
#Example 1.1.3: Dice roll sum


### Remark 1.1.1
This interpretation of probability is referred to as *relative frequency approach* and it obviously depends upon the fact that an experiment can be repeated under essentially identical conditions.

# 1.2 Sets

## DeMorgan's Laws

\begin{align}
(A \cap B)^c = A^c \cup B^c \\
(A \cup B)^c = A^c \cap B^c
\end{align}

## Set Functions
Set functions can be defined in terms of sums or integrals:
\begin{align}
\int_A f(x) dx \\
\sum_A f(x)
\end{align}

### Geometric Series

\begin{align}
\sum_{n=0}^\infty a^n = \frac{1}{1-a}, \quad \text{if } \left|{a}\right| < 1
\end{align}


# 1.3 Probability Set Function

$\sigma\text{-field}$ of subsets is a collection of events

### Definition 1.3.1 (Probability)
Let $\mathcal{C}$ be a sample space and let $\mathcal{B}$ be the set of events. Let $P$ be a real-valued function defined on $\mathcal{B}$. Then $P$ is a **probability set function** if $P$ satifies the following three conditions:

1. $P(A) \geq 0. \forall A \in \mathcal{B}$
2. $P\left(\mathcal{C}\right) = 1$
3. If $\{A_n\}$ is a sequence of events in $\mathcal{B}$ and $A_m \cap A_n = \phi, \quad \forall m \neq n$ then

$$
P\left( \bigcup^\infty_{n=1} A_n\right) = \sum^\infty_{n=1 } P(A_n)
$$

A collection of events whose members are pairwise disjoint, as in $(3)$, is said to be a **mutually exclusive** collection and its union is often referred to as a **disjoint union**. The collection is further said to be **exhaustive** if the union of its events is the sample space, in which $\sum^\infty_{n=1} P(A_n) = 1$. We often say that a mutually exclusive and exhaustive collection of events forms a **partition** of $\mathcal{C}$.

## Counting Rules
multiplication rule

addition rule

permutation

combination


### Example 1.3.3 (Birthday Problem)

### Example 1.3.4 (Poker Hands)


In [None]:
#Birthday Problem

In [None]:
#Poker Hands

# 1.4 Conditional Probability and Independence

### Definition 1.4.1 (Conditional Probability)
Let $B$ and $A$ be events with $P(A) > 0. Then we define **conditional probability** of $B$ given $A$ as:
$$
\begin{align}
P(B|A)=\frac{P(A\cap B)}{P(A)}
\end{align}
$$

### Lemma 1.4.1 (Law of Total Probability)
Consider $k$ mutually exclusive and exhaustive events $A_1, A_2, \dots, A_k$ such that P(A_i) > 0, i =1 ,2, ... , k; i.e A_i forms a parition of C.

Here the events A_i do not need to be equally likely. Let B be another event such that P(b) > 0. Thus B occurs with one and only one of the events A_i; that is:
$$
P(B) = \sum_{i=1}^k P(B \mid A_i) P(A_i)
$$
![image.png](attachment:image.png)


### Theorem 1.4.1 (Bayes)
Let $A_1, A_2, \dots, A_k$ be events such that $P(A_i) > 0$ for $i = 1, 2, \dots, k$. Assume further that $A_1, \dots, A_k$ form a partition of the sample space $\mathcal{C}$. Let $B$ be any event. Then:
$$
P(A_j \mid B) = \frac{P(A_j) P(B \mid A_j)}{\sum_{i=1}^k P(A_i) P(B \mid A_i)}
$$

**Proof:** Based on the definition of conditional probability:
$$
P(A_j \mid B) = \frac{P(B \cap A_j)}{P(B)} = \frac{P(A_j) P(B \mid A_j)}{P(B)}.
$$
The result then follows by the law of total probability.

## Independence

The occurrence of event $A$ does not change the probability of event $B$; that is, when $P(A) > 0$:
$$
P(B \mid A) = P(B).
$$
In this case, events $A$ and $B$ are independent, and the multiplication rule becomes:
$$
P(A \cap B) = P(A) P(B \mid A) = P(A) P(B).
$$

### Definition 1.4.2 (Independence)
Events $A$ and $B$ are **independent** if:
$$
P(A \cap B) = P(A) P(B).
$$
They are **mutually independent** if they are pairwise independent.




# 1.5 Random Variables

### Definition 1.5.1 (Random Variable)
Consider a random experiment with a sample space $\mathcal{C}$. A function $X$, which assigns to each element $c \in \mathcal{C}$ one and only one number $X(c) = x$, is called a random variable. The space or range of $X$ is the set of real numbers:
$$
\mathcal{D} = \{x : x = X(c), \, c \in \mathcal{C}\}.
$$

**Continuous Random Variable:** Characterized by a probability density function (PDF).  
**Discrete Random Variable:** Characterized by a probability mass function (PMF).

### Definition 1.5.2 (Cumulative Distribution Function)
Let $X$ be a random variable. Then its **Cumulative Distribution Function (CDF)** is defined as:
$$
F_X(x) = P_X((-\infty, x]) = P(\{c \in \mathcal{C} : X(c) \leq x\}).
$$

### Theorem 1.5.1
Let X be random variable with CDF F(x). Then:

(a) For all a and b if a < b then F(a) \leq F(b) (F is nondecreasing)
(b) lim_x \rightarrow -\infty F(x) = 0 (the lower limit of F is 0)
(c)  lim_x \rightarrow \infty F(x) = 1 (the upper limit of F is 1)
(d) lim x \downarrow x_0 F(x) = F(x_0) (F is right cont.)

Proof:


# 1.6 Discrete Random Variable

Def 1.6.1: A RV is a discrete RV if its space is either finite or countable

Def 1.6.2 (PMF): 
Let X be discrete RV  with space D. The PMF of X is given by:
$$
p_X(x) = P[X=x], \quad \text{for } x \in \mathcal{D} 
$$
with properties:
(i) 0 \leq p_X(x) \leq 1, x \in \mathcal{D}, and 
(ii) \sum_{x\in \mathcal{D}} pX(x) = 1

## 1.6.1 Transformations
Suppose Y is a RV with a transformation of X say, Y=g(X) and you want to find out dsitribution of Y. Assume X is discrete with space $\mathcal{D}_X$. Then the space of Y is DY = {g(x) : x\in DX}. We consider two cases:

case 1: g is 1-1 then clearly pmf of  Y is obtained as:
$$

$$
### Example 1.6.3

### Example 1.6.4

The second case is where the transformation g(x) is not 1-1. 


# 1.7 Continuous Random Variables
We say a randopm variable is a cont. random variable if its cumulative distribution function F_X(x) is a continious function for all x \in R

FTC imples
d/dx FX(x) = fX(x)

Support of a cont. rv X consists of all points x s.t fX(x) > 0. 

## 1.7.1 Quantiles

The difference $\text{iq} = q_3 - q_1$ is called the interquartile range of $X$. It is used as a measure of spread or dispersion of the distribution $X$.

## 1.7.2 Transformations

Y=g(X) is a transformation

# 1.8 Expectation of a RV

E(k) = kp(k) = k

## Theorem 1.8.1
Let X be rv and y=g(X)

E(Y) = \int g(x) f_X(x) dx
E(Y) = \sum g(x) p_X dx



# 1.9 Special Expectations

Def: Moment Generating Function:
Let X be rv such that for some h>0
The expectation of e^tX exists for -h < t < h (open ball h). The MGF of X is defined to be the function
M(t) = E(e^tX) for \h < t < h



# 1.10 Important Inequalities

## Theorem 1.10.1
Let X be a rv and let m be a positive integer. Suppose E[X^m] exists. If k is a positive integer and k \leq then E[X^k] exists.

## Theorem 1.10.2 (Markov's Inequality)

Let u(X) be a nonnegative fn of the rv X. If E[u(X)] exists, then for every positive constant c.

$$
P\left[u(x) \geq c\right] \leq \frac{E[u(X)]}{c}
$$

## Theorem 1.10.3 (Chebyshev's Inequality)
Let X be rv with finite variance \sigma^2. Then for every k > 0
$$
P(\X-\mu) \geq k\sigma \leq \frac{1}{k^2}
$$
also:
$$
P(\X-\mu) < k\sigma \geq 1-\frac{1}{k^2}
$$

## Example 1.10.4 (Harmonic and Geometric Means)