<a href="https://colab.research.google.com/github/JardRily/Mathematical-Methods-Data-Sciences/blob/main/2_2_Probability_Distribution.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 2.2 Probability Distribution

A probability distribution is a function that gives the probabilities of certain events occuring for a given experiment. Their are both Discrete Probability Distributions and Continous Probability Distribution.

## 2.2.1 Probability Axioms

Experiment - an activity who's process whose outcome can be uncertain. This can be something as complex as measuring the occurance of a genome in a species of insects, or as simple as flipping a coin.

Sample Space - Denoted as $S$, the set of all possible outcomes of an experiment. Can be infinite or finite.

Event - Any subset of $S$. This can be thought of as the collection of outcomes we are studying.

Probability Distribution Function - Given an experiment with sample space $S$, each event $A$ has a specific function that represents the probability of that event occuring denoted as $P(A)$. The probability asignments must follow these rules:
- For and event $A, 1 \ge P(A) \ge 0$
- $P(S) = 1$
- If $A_1, A_2, A_3,...$ is an infinite collection of disjoint events, then 
$P(A_1 \cup A_2 \cup P(A_3) \cup \ldots) = \sum_{i=1}^\infty P(A')$
- For any event $A, P(A) + P(A') = 1$, from which $P(A) = 1 - P(A')$
- When events $A$ and $B$ are mutually exclusive, $P(A \cup B)=P(A) + P(B)$
- For any two events $A$ and $B$, $P(A \cup B) = P(A) + P(B) - P(A \cap B)$

#### Example 1
Lets look at a common experiment, flipping a coin. We know that when we flip a coin, there are 2 possible outcomes, thus are sample space is: $S = \{heads,\;tails\}$. We can see from this that there is a $\frac{1}{2}$ chance that we will get the outcome $heads$ when we flip a coin, and a $\frac{1}{2}$ chance we will get outcome $tails$ when we flip a coin. We can better denote this as $P(heads) = \frac{1}{2}$ and $P(tails) = \frac{1}{2}$. We can double check that this is correct by verifying some of the rules: $P(S) = P(heads) + P(tails) = 1$. 

#### Example 2
Now consider the act of rolling a dice. Our sample space $S = \{1, \;2, \;3, \;4, \;5, \;6\}$. We know from real life experience that the probability of rolling any number $A$ in the set of $S$ will be $\frac{1}{6}$ (ie $P(3) = \frac{1}{6}$). If we add up all the probabilities, we see $P(S) = P(1) + P(2) + P(3) + P(4) + P(5) + P(6) = 1$.

Notice how in both examples, there were $N$ number of outcomes, and each outcome had exactly $\frac{1}{N}$ probability of happening. In these examples, we can denote the probability of any event $A$ occuring as $P(A) = \frac{N(A)}{N}$.

## 2.2.2 Conditional Probability

This can be described as the likelihood of an outcome occuring based on the occurence of a previous outcome. This can be expressed in a ratio $P(A | B) = \frac{P(A \cap B)}{P(B)}$. This also means that $P(A \cap B) = P(A | B) \cdot P(B)$. From this rule, we can gather a definition of inependence. $A$ and $B$ are independent events if $P(A | B) = P(A)$ or $P(A \cap B) = P(A) \cdot P(B)$. This rule can be extended into a more general rule: Events $A_1, \ldots, A_n$ are mutually independent if for every $k = 1, 2, \ldots, n$ and for every subset of indices $i_1, i_2, ..., i_k$: $P(A_{i1} \cap A_{i2} \cap \ldots \cap A_{ik}) = P(A_{i1}) \cdot P(A_{i2}) \cdot P(A_{ik})$ (where $P(A_{ik})$ represents the probability of a subset of $S$ occures).

## 2.2.3 Discrete Random Variables

A random variable is a measurable function whos values depend on outcomes of a random phenomenon. For a given sample space $S$, a random variable is any rule that associates a number with each outcome in $S$ (ie the random variable is a function whos domain is $S$, and whose range is $\mathbb{R}$).

A discrete random variable is a random variable whose possible values are a finite set, or can be listed in an infinite sequence. 

A random variable is continuous if: 
- Its set of possible values consit of all number in a single interval on the number line.
- $P(X = c) = 0$ for any possible value individual $c$.

Probability Mass Function - a funciton that gives the probability that a discrete random variable is exactly equal to some value. The pmf of a probability distribution specifies the probability of observing that value when the experiment is performed. The probability distributio or pmf of a discrete random variable is defined for every number x by $p(x) = P(X=x)=P(all\;s\in S:X(s)=x)$

Cumulitive Distribution Function - $F(x)$ of a discrete random variable $X$ with pmf $p(x)$ is defined for every number $x$ by $F(x) = P(X \le x) = \Sigma_{y:y\le x}\;p(y)$

### 2.2.3.1 Bernoulli Distribution

Any random variable whose only possible values are 0 and 1 are called a Bernoulli random variable. The binomial random variable $X$ associated with independent Bernoulli experiment consisting of $n$ trials is defined as $X =\;the\;number\;of\;1's\;among\;the\;n\;trials$. The probability of success is constant $p$ from trial to trial. The pmf of $X$ has the form 
\begin{gather*}
b(x;n, p) = 
\begin{cases}
{n \choose x}p^x(1-p)^{n-x} & x=0, 1, 2, 3,\ldots,n\\
0 & otherwise
\end{cases}
\end{gather*}
The cdf of $X$ has the form
\begin{gather*}
B(x;n, p) = P(X \le x)= \sum_{y \le x}b(x;n, p) = \sum_{y=0}^x {n \choose y}p^x(1-p)^{n-x}
\end{gather*}



### 2.2.3.2 Poisson Distribution

The Possion Distribution is a discrete probability distribution that describes the probability of a given number of events occurring in an interval of time or space. We use this distribution if we know a constant mean rate that is independent of the time since the last event. A discrete random variable $X$ has Possion distribution with parameter $\mu$ if the pmf of $X$ is 
\begin{gather*}
p(x;\mu) = \frac{e^{-\mu}\mu^x}{x!}, x=0, 1, 2, 3, \ldots
\end{gather*}

### 2.2.3.3 The Expected Value of Variance of X
The expected value of a random variable $X$ is a generalization of the weighted average, and is intuitively the arithmetic mean of a large number of independent realizations of $X$. Let $X$ be a discrete random variable with set of possible values $D$ and pmf $p(x)$. The expected values or mean value of $X$, denoted by $E(X), \mu_x,$ or $\mu$ is $E(X) = \mu_x = \Sigma_{x\in D} \;x \cdot p(x)$.

#### Example 1
Let $X=1$ be the Bernoulli random variable with pmf $p(1) = p, p(0) = 1- p$ and, from which $E(X) = 0 \times p(0) + 1 \times p(1)=p$. That is, the expected value of $X$ is just the probability that $X$ takes on the value $1$. If the random variable $X$ has a set of possible values $D$ and pmf $p(x)$, then the expected value of any function $h(X)$, denoted by $E[h(X)]$ or $\mu_{h(X)}$ is computed by 
\begin{gather*}
E[h(X)] = \sum_D h(x) \cdot p(x)\\
E(aX + b) = a \cdot E(X) + b
\end{gather*}

Variances measure how far a set of numbers is spread out from their average value. Let $X$ have pmf $p(x)$ and expected value $\mu$. Then the variance of $X$, denoted by $V(X)$ or $\sigma_X^2, or just \sigma^2, is 
\begin{gather*}
V(X) = \sum_D(x-\mu)^2 \cdot p(x) = E[(X-\mu)^2]
\end{gather*}

The standard deviation of X is $\sigma_X = \sqrt{\sigma_X^2}$

There are two distributions whos expected values and variances are important to know:
- If $X$ is a binomial random variable with parameters $n, p$, then $E(X) = np, V(X) = np(1-p), \sigma_X = \sqrt{np(1-p)}$
- If $X$ is a Poisson distribution with parameter $\mu$, then $E(X) = \mu, V(X) = \mu$

## 2.2.4 Continuous Random Variables
A random variable X is continous if all possible values comprise either a singla interval on the number line or a union of disjoint intervals. Let X be a continous random variable. Then a probability distribution or probability density function of X is a function f(x) such that for any two nubers $a$ and $b$ with $a \le b$:
\begin{gather*}
P(a \le X \le b) = \int_a^b f(x)dx
\end{gather*}  

That is the probability that $X$ takes on a values in the interval $[a, b]$ is the area under the probability funciton $f(x)$ in this interval. $f(x)$ must satisfythe following two conditions:
- $f(x) \ge 0$ for all $x$
- $\int_{-\infty}^{\infty} f(x)dx = 1$

### 2.2.4.1 Expected Values and Variances
The expected or mean value of a continuous random variable $X$ with pdf $f(x)$ is $\mu_X = E(X) = \int_{-\infty}^{\infty} x \cdot f(x) dx$. The variance of a continuous random variable $X$ with pdf $f(x)$ and the variance is $\mu$ is $\sigma_X^2 = V(X) = \int_{-\infty}^{\infty} (x- \mu)^2 \cdot f(x)dx = E[(X - \mu)^2]$. The standard deviation of $X$ is $\sigma_X = \sqrt{V(X)}$. The Expected values and variance have the following properties:
- If X is a continuous random variable with pdf $f(x)$ and $h(X)$ is any function of $X$, then $E[h(X)] = \mu_{h(X)} = \int_{-\infty}^{\infty} h(X) \cdot f(X)dx$
- $V(X) = E(X^2) - [E(X)]^2$

$X$ is said to have an exponential distribution with parameter $\lambda (\lambda > 0) $ if the pdf of $X$ is 
\begin{gather*}
f(x;h)=
\begin{cases}
\lambda e^{-\lambda x} & x \ge 0\\
0 & otherwise
\end{cases}
\end{gather*}

The expected value of an exponentially distributed random variable $X$ is $E(X) = \int_{0}^{\infty} x\lambda e^{-\lambda x}dx$.
Obtaining this expected value neccessitates doing an integration by parts. The variance of $X$ can be computed using the fact that $V(X) = E(X^2) - [E(X)]^2$. The determination of $E(X^2)$ requires integrating by parts twice in succession. The results of these are $\mu = \frac{1}{\lambda}\; \sigma^2 = \frac{1}{\lambda^2}$. Both the mean and the standard deviation of the exponential distribution equal $\frac{1}{\lambda}$.

### 2.2.4.2 The Normal Distribution

Normal distributions are often used to represent real-valued random variables wohse distributions are not known. A continuous random variable $X$ is said to have a normal distribution with parameters $\mu$ and $\sigma$ (or $\mu$ and $\sigma^2$), where $-\infty < \mu < \infty$ and $0 < \sigma$, if the pdf of $X$ is $f(x; \mu, \sigma) = \frac{1}{\sigma \sqrt{2\pi}}e^{\frac{-(x-\mu)^2}{2\sigma^2}}$ where $-\infty < x < \infty$.

The computation of $P(a \le X \le b)$ when $X$ is a normal random variable with parameters $\mu$ and $\sigma$ requires evaluating
\begin{gather*}
\int_a^b \frac{1}{\sigma \sqrt{2\pi}}e^{\frac{-(x-\mu)^2}{2\sigma^2}} dx
\end{gather*}

The normal distribution with parameter values $\mu = 0$ and $\sigma = 1$ is called the standard normal distribution. A random variable having a standard normal distribution is called a standard normal random variable and will be denoted by $Z$. The pdf of $Z$ is $f(z; 0, 1) = \frac{1}{\sqrt{2\pi}}e^{\frac{-z^2}{2}}$ where $-\infty < x < \infty$. the function of f(z; 0, 1) is known as the normal curve (or z curve). its inflection points are at 1 and -1. The cdf of $Z$ is $P(Z \le z) = \int_{-\infty}^Z f(y; 0, 1) dy$, which is denoted by $\Phi(z)$.

A normal distribution, $X ~ N(\mu, \sigma^2)$ can be converted to the standardized variables $Z = \frac{X-\mu}{\sigma}$. 