<div style="background-image: linear-gradient(145deg, rgba(35, 47, 62, 1) 0%, rgba(0, 49, 129, 1) 40%, rgba(32, 116, 213, 1) 60%, rgba(244, 110, 197, 1) 85%, rgba(255, 173, 151, 1) 100%); padding: 1rem 2rem; width: 95%"><img style="width: 60%;" src="../../images/MLU_logo.png"></div>

# <a name="0">MLU Mathematical Fundamentals for Machine Learning</a>
# <a name="0">Lecture 3: Probability and Statistics Fundamentals</a>
## <a name="0">Lab 3.3: Probability Distributions</a>

 1. <a href="#1">Discrete Distributions</a> 
 2. <a href="#2">Continuous Distributions</a> 
 
This notebook covers some important discrete and continuous statistical distributions.

In [None]:
# Import libraries
import numpy as np
import matplotlib.pyplot as plt

from IPython.display import Markdown, display

# Set a seed for reproducibility
np.random.seed(99)

## <a name="1">1. Discrete Distributions</a>
(<a href="#0">Go to top</a>)

Now that we have learned how to work with probability in both the discrete and the continuous setting, let us get to know some of the common distributions encountered. Depending on the area of machine learning, we may need to be familiar with vastly more of these, or for some areas of deep learning potentially none at all. This is, however, a good basic list to be familiar with.

## Bernoulli

This is the simplest random variable usually encountered.  This random variable encodes a coin flip which comes up $1$ with probability $p$ and $0$ with probability $1-p$.  If we have a random variable $X$ with this distribution, we will write

$$
X \sim \mathrm{Bernoulli}(p).
$$

The cumulative distribution function is 

$$F(x) = \begin{cases} 0 & x < 0, \\ 1-p & 0 \le x < 1, \\ 1 & x >= 1 . \end{cases}$$

The probability mass function is plotted below.

In [None]:
p = 0.3

plt.figure(figsize=(5,3))
plt.stem([0, 1], [1 - p, p])
plt.xlabel("x")
plt.ylabel("p.m.f.")
plt.show()

If $X \sim \mathrm{Bernoulli}(p)$, then:

* $\mu_X = p$,
* $\sigma_X^2 = p(1-p)$.

We can sample an array of arbitrary shape from a Bernoulli random variable as follows.

In [None]:
1 * (np.random.rand(10, 10) < p)

## Discrete Uniform

The next commonly encountered random variable is a discrete uniform.  For our discussion here, we will assume that it is supported on the integers $\{1, 2, \ldots, n\}$, however any other set of values can be freely chosen.  The meaning of the word *uniform* in this context is that every possible value is equally likely.  The probability for each value $i \in \{1, 2, 3, \ldots, n\}$ is $p_i = \displaystyle{\frac{1}{n}}$.  We will denote a random variable $X$ with this distribution as

$$
X \sim U(n).
$$

The cumulative distribution function is 

$$F(x) = \begin{cases} 0 & x < 1, \\ \displaystyle{\frac{k}{n}} & k \le x < k+1 \text{ with } 1 \le k < n, \\ 1 & x >= n . \end{cases}$$

Let us first plot the probability mass function.

In [None]:
n = 5

plt.figure(figsize=(5,3))
plt.stem([i + 1 for i in range(n)], n * [1 / n])
plt.xlabel("x")
plt.ylabel("p.m.f.")
plt.show()

If $X \sim U(n)$, then:

* $\mu_X = \displaystyle{\frac{1+n}{2}}$,
* $\sigma_X^2 = \displaystyle{\frac{n^2-1}{12}}$.

We can sample an array of arbitrary shape from a discrete uniform random variable as follows.


In [None]:
np.random.randint(1, n, size=(10, 10))

## Binomial

Let us make things a little more complex and examine the *binomial* random variable.  This random variable originates from performing a sequence of $n$ independent experiments, each of which has probability $p$ of succeeding, and asking how many successes we expect to see.

Let us express this mathematically.  Each experiment is an independent random variable $X_i$ where we will use $1$ to encode success, and $0$ to encode failure.  Since each is an independent coin flip which is successful with probability $p$, we can say that $X_i \sim \mathrm{Bernoulli}(p)$.  Then, the binomial random variable is

$$
X = \sum_{i=1}^n X_i.
$$

In this case, we will write

$$
X \sim \mathrm{Binomial}(n, p).
$$

To get the cumulative distribution function, we need to notice that getting exactly $k$ successes can occur in $\binom{n}{k} = \frac{n!}{k!(n-k)!}$ ways each of which has a probability of $p^k(1-p)^{n-k}$ of occurring.  Thus the cumulative distribution function is

$$F(x) = \begin{cases} 0 & x < 0, \\ \displaystyle{\sum_{m \le k} \binom{n}{m} p^m(1-p)^{n-m}}  & k \le x < k+1 \text{ with } 0 \le k < n, \\ 1 & x >= n . \end{cases}$$

Let us first plot the probability mass function.

In [None]:
n, p = 10, 0.2

# Compute binomial coefficient
def binom(n, k):
    comb = 1
    for i in range(min(k, n - k)):
        comb = comb * (n - i) // (i + 1)
    return comb


pmf = np.array([p ** i * (1 - p) ** (n - i) * binom(n, i) for i in range(n + 1)])

plt.figure(figsize=(5,3))
plt.stem([i for i in range(n + 1)], pmf)
plt.xlabel("x")
plt.ylabel("p.m.f.")
plt.show()

While this result is not simple, the means and variances are.  If $X \sim \mathrm{Binomial}(n, p)$, then:

* $\mu_X = np$,
* $\sigma_X^2 = np(1-p)$.

This can be sampled as follows.


In [None]:
np.random.binomial(n, p, size=(10, 10))

## <a name="2">2. Continuous Distributions</a>
(<a href="#0">Go to top</a>)

## Continuous Uniform

Next, let us discuss the continuous uniform distribution. The idea behind this random variable is that if we increase the $n$ in the discrete uniform distribution, and then scale it to fit within the interval $[a, b]$, we will approach a continuous random variable that just picks an arbitrary value in $[a, b]$ all with equal probability.  We will denote this distribution as

$$
X \sim U(a, b).
$$

The probability density function is 

$$p(x) = \begin{cases} \displaystyle{\frac{1}{b-a}} & x \in [a, b], \\ 0 & x \not\in [a, b].\end{cases}$$

The cumulative distribution function is 

$$F(x) = \begin{cases} 0 & x < a, \\ \displaystyle{\frac{x-a}{b-a}} & x \in [a, b], \\ 1 & x >= b . \end{cases}$$

Let us first plot the probability density function :

In [None]:
a, b = 1, 3

x = np.arange(0, 4, 0.01)
p = (x > a) * (x < b) / (b - a)

plt.figure(figsize=(5,3))
plt.plot(x, p)
plt.xlabel("x")
plt.ylabel("p.d.f.")
plt.show()

If $X \sim U(a, b)$, then:

* $\mu_X = \displaystyle{\frac{a+b}{2}}$,
* $\sigma_X^2 = \displaystyle{\frac{(b-a)^2}{12}}$.

We can sample an array of arbitrary shape from a uniform random variable as follows.  Note that it by default samples from a $U(0,1)$, so if we want a different range we need to scale it.

In [None]:
(b - a) * np.random.rand(10, 10) + a

### Gaussian or Normal

We say a random variable $X$ is normally distributed with given mean $\mu$ and variance $\sigma^2$, written $X \sim \mathcal{N}(\mu, \sigma^2)$ if $X$ has density

$$p_X(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\displaystyle{\frac{(x-\mu)^2}{2\sigma^2}}}.$$

Let us first plot the probability density function.

In [None]:
mu, sigma = 0, 1

x = np.arange(-3, 3, 0.01)
p = 1 / np.sqrt(2 * np.pi * sigma ** 2) * np.exp(-((x - mu) ** 2) / (2 * sigma ** 2))


plt.figure(figsize=(5,3))
plt.plot(x, p)
plt.xlabel("x")
plt.ylabel("p.d.f.")
plt.show()

The Gaussian is what is known as a *maximum entropy distribution*.  We will get into entropy more deeply in the Information Theory section of lecture 5, however all we need to know at this point is that it is a measure of randomness.  In a rigorous mathematical sense, we can think of the Gaussian as the *most* random choice of random variable with fixed mean and variance.  Thus, if we know that our random variable has some mean and variance, the Gaussian is in a sense the most conservative choice of distribution we can make.

To close the section, Let us recall that if $X \sim \mathcal{N}(\mu, \sigma^2)$, then:

* $\mu_X = \mu$,
* $\sigma_X^2 = \sigma^2$.

We can sample from the Gaussian (or standard normal) distribution as shown below.


In [None]:
np.random.normal(mu, sigma, size=(10, 10))

### Exercise 1

<div style="align: left; border: 4px solid cornflowerblue; text-align: left; margin: auto; padding-left: 20px; padding-right: 20px; width: 65%">
        <img style="float: left; max-width: 80%; max-height:80%; margin: 5px;" src="../../images/MLU_challenge.png" alt="MLU challenge" width=12% height=12%/>
    <span style="padding: 20px; align: left;">
        <p><b>Try it yourself!</b></p>
        <p><b>Exercise 1.</b>Look at the <code>scipy</code> python package documentation and find the function to generate the standard normal PDF (Probability Density Function) and plot it. Make sure to include the necessary imports.</p>
    </span>
</div>

In [None]:
###### YOUR CODE HERE ######






###### END OF CODE ######

In [None]:
# %load solutions/lab33_ex1_solutions.txt

### Exercise 2

<div style="align: left; border: 4px solid cornflowerblue; text-align: left; margin: auto; padding-left: 20px; padding-right: 20px; width: 65%">
        <img style="float: left; max-width: 80%; max-height:80%; margin: 5px;" src="../../images/MLU_challenge.png" alt="MLU challenge" width=12% height=12%/>
    <span style="padding: 20px; align: left;">
        <p><b>Try it yourself!</b></p>
        <p><b>Exercise 2.</b>Look at the <code>scipy</code> python package documentation and find the function to generate the standard normal CDF (Cumulative Density Function) and plot it. You should have already imported the necessary library in Exercise 1.</p>
    </span>
</div>

In [None]:
###### YOUR CODE HERE ######






###### END OF CODE ######

In [None]:
# %load solutions/lab33_ex2_solutions.txt

### Exercise 3

<div style="align: left; border: 4px solid cornflowerblue; text-align: left; margin: auto; padding-left: 20px; padding-right: 20px; width: 65%">
        <img style="float: left; max-width: 80%; max-height:80%; margin: 5px;" src="../../images/MLU_challenge.png" alt="MLU challenge" width=12% height=12%/>
    <span style="padding: 20px; align: left;">
        <p><b>Try it yourself!</b></p>
        <p><b>Exercise 3.</b>Look at the <code>scipy</code> python package documentation and find the function to generate a 10 by 10 array random sample from the standard normal distribution and print it. You should have already imported the necessary library in Exercise 1.</p>
    </span>
</div>

In [None]:
###### YOUR CODE HERE ######






###### END OF CODE ######

In [None]:
# %load solutions/lab33_ex3_solutions.txt

In these three excercises we have seen how straightforward the <code>scipy</code> library it is to use to generate theoretical probability distributions given their parameters as well as to generate random samples drawn from a given distribution.

#### Final Project Note
<img src="../../images/MLU_question.png" width=80 height=80 />

Predicting human behaviour is impossible  - there is randomness in the data generation, data collection, matrix decomposition choices, there is also noise layered on top of the ground truth noise. For instance, if you asked a person to rate a book on 100 different random days, they might not always give the same rating - perhaps personal concerns would nudge the score up or down by a star.  How to encode the variablity of the data and of the process into the model? Gaussian noise is one way to account for and incorporate noise into machine learning models.

<div style="display: flex; align-items: center; justify-content: left; background-color:#330066; width:99%;"> 
        <img style="float: left; max-width: 100%; max-height:100%; margin: 15px;" src="../../images/MLU_robot.png" alt="MLU robot" width="100" height="100"/>
    <span style="color: white; padding-left: 10px; align: left; margin: 15px;">
        <h3>Congratulations!</h3>
        You have completed Lab 3.3: Probability Distributions of Lecture 3: Probability and Statistics Fundamentals of MLU Mathematical Fundamentals of Machine Learning.
        <br/>
    </span>
</div>