<a href="https://colab.research.google.com/github/aaronyu888/mat-494-notebooks/blob/main/Probability_Distribution.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#2.2 Probability Distribution
A probability distribution is a function that gives the probabilities of different possible outcomes of an experiment.


---



#2.2.1 Probability Axioms
An **experiment** is any activity or process whose outcome is subject to uncertainty.

The **sample space** $S$ of an experiment is the set of all possible outcomes of that experiment. It is usually more meaningful to study collections of outcomes from $S$ than individual outcomes.

An **event** is any subset of outcomes contained in the sample space $S$. An event is simple if it consists of only one outcome and compound if it consists of multiple.

The **probability distribution** is a function which assigns to each event $A$ a number $P(A)$ which will give a precise measure of the chance that $A$ will occur.

*   $1 \geq P(A) \geq 0$
*   $P(S) = 1$
*   If $A_1,A_2,...$ is an infinite collection of disjoint events, then $P(A_1\cup A_2\cup ...) = \sum\limits_{i=1}^{\infty} P(A_i)$
*   For any event $A$, $P(A) + P(A') = 1$, from which $P(A) = 1 - P(A')$
*   When events $A$ and $B$ are mutually exclusive, $P(A\cup B) = P(A) + P(B)$
*   For any two events $A$ and $B$, $P(A\cup B) = P(A) + P(B) - P(A\cap B)$

In [8]:
import random
one, two, three, four, five = 0, 0, 0, 0, 0
for i in range(10000):
  num = random.randint(1, 5)
  if num == 1:
    one += 1
  elif num == 2:
    two += 1
  elif num == 3:
    three += 1
  elif num == 4:
    four += 1
  else:
    five += 1
print("number of 1s:", one, "\nnumber of 2s: ", two, "\nnumber of 3s: ", three, "\nnumber of 4s: ", four, "\nnumber of 5s: ", five)

number of 1s: 2051 
number of 2s:  1980 
number of 3s:  1939 
number of 4s:  2055 
number of 5s:  1975


Given a random number generator with 5 equally likely outcomes, each outcome should happen around 2,000 times if run 10,000 times. From the code above, we can see that the probability is roughly evenly distributed. 

#2.2.2 Conditional Probability
**Condition probability** is defined as the likelihood of an event or outcome happening based on the occurrence of a previous event or outcome. It's expressed as a ratio of unconditional probabilities: the numerator is the probability of the intersection of the two events, whereas the denominator is the probability of the conditioning event $B$. The conditional probability of $A$ given $B$ is proportional to $P(A\cap B)$.

The conditional probability of $A$ given that $B$ has occurred is defined by $P(A|B) = \frac{P(A\cap B)}{P(B)}$

Conditional probability also gives rise to the multiplication rule
$P(A\cap B) = P(A|B) \cdot P(B)$. This is important because $P(A\cap B)$ is often desired, and $P(B)$ and $P(A|B)$ are specified in the problem.

$A$ and $B$ are independent events if $P(A|B) = P(A)$ or $P(A\cap B) = P(A)\cdot P(B)$. This also applies to collections of events as well.



In [24]:
A = 0.4
B = 0.8
cond = "{:.2f}".format(A * B / B)
print(cond)
A *= B
cond = "{:.2f}".format(A * B / B)
print(cond)

0.40
0.32


From the above code, we can see that when we set $P(A)$ as a constant, the formula $P(A|B) = \frac{P(A\cap B)}{P(B)}$ equals $P(A)$. This verifies that $A$ is independent. When we make $A$ a function of $B$, $P(A|B) = 0.32$, verifying that $A$ is dependent on $B$.


#2.2.3 Discrete Random Variables
A **discrete random variable** is a random variable whose possible values either constitute a finite set or else can be listed in an infinite sequence.

A **probability mass function** of a discrete random variable is defined for every number $x$ by $p(x) = P(X = x) = P($all $ s \in S:X(s) = x)$.

The **cumulative distribution function** of a discrete random variable $X$ with pmf $p(x)$ is defined for every number $x$ by $F(x) = P(X \leq x) = \sum\limits_{y:y\leq x} p(y)$.

Any random variable whose only possible values are 0 and 1 is called a **Bernoulli random variable**.

The **Poisson distribution** is a discrete probability distribution that describes the probability of a given number of events occuring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event. A discrete random variable $X$ has a Poisson distribution with parameter $\mu$ if the pmf of $X$ has the form $p(x;\mu) = \frac{e^{-\mu}\mu^x}{x!}$.

The **expected value** of a random variable $X$ is a generalization of the weighted average and is intuitively the arithmetic mean of a large number of independent realizations of $X$. It can be written as $E(X) = \mu_x = \sum\limits_{x\in D} x\cdot p(x)$.

If the random variable $X$ has a set of possible values $D$ and pmf $p(x)$, then the expected value of any function $h(X)$, denoted by $E[h(x)]$ or $\mu_{h(X)}$ is computed by $E[h(X)] = \sum\limits_D h(x)\cdot p(x)$. In particular, $E(aX + b) = a \cdot E(X) + b$.

Let $X$ have pmf $p(x)$ and expected value $\mu$. Then the **variance** of $X$, denoted by $V(x)$ or $\sigma_X^2$ or just $\sigma^2$ is $V(x) = \sum\limits_D (x-\mu)^2 \cdot p(x) = E[(X - \mu)^2]$.

The **standard deviation** of $X$ is $\sigma_X = \sqrt{\sigma_X^2}$.

If $X$ is a binomial random variable with parameters $n, p$, then $E(X) = np, V(X) = np(1-p), \sigma_x = \sqrt{np(1-p)}$.

If $X$ is a Poisson distribution with parameter $\mu$, then $E(X) = \mu, V(X) = \mu$.

#2.2.4 Continous Random Variables
A random variable $X$ is **continuous** if possible values comprise either a single interval on the number line or a union of disjoint intervals.

A **probability density function** of a continuous random variable $X$ is a function $f(x)$ such that for any two numbers $a$ and $b$ with $a\leq b$, $P(a \leq X \leq b) = \int_a^b f(x)dx$.

The expected value of a continuous random variable $X$ with pdf $f(x)$ is $\mu_X = E(X) = \int_{-\infty}^{\infty} x\cdot f(x)dx$.

The variance of a continuous random variable $X$ with pdf $f(x)$ and the variance $\mu$ is $\sigma_X^2 = V(X) = \int_{-\infty}^{\infty} (x-\mu)^2 \cdot f(x)dx = E[(X - \mu)^2]$.

The **standard deviation** of $X$ is $\sigma_X = \sqrt{V(X)}$.

The expected value of an exponentially distributed random variable $X$ is $E(X) = \int_0^\infty x\lambda e^{-\lambda x}dx$.

A continuous random variable $X$ is said to have a normal distribution with parameters $\mu$ and $\sigma$, where $-\infty<\mu<\infty$ and $0<\sigma$, if the pdf of $X$ is $f(x;\mu,\sigma) = \frac{1}{\sqrt{2\pi}\sigma}e^{-(x-\mu)^2/(2\sigma^2)}$.

The normal distribution with parameter values $\mu = 0$ and $\sigma = 1$ is called the standard normal distribution. A random variable having a standard normal distribution is called a standard normal random variable and will be denoted by $Z$. The pdf of $Z$ is $f(z;0,1) = \frac{1}{sqrt{2\pi}}e^{-z^2/2}$. The graph of $f(z;0,1)$ is called the standard normal curve.