## MLOps Lifecycle Toolkit Chapter 2 Lab: Mathematical Statistics Fundamentals
* [Probability Distributions in Python](#probability-distributions)
* [Creating Your Own Mathematical Functions in Python](#mathematical-functions)
* [Coin Toss Experiment](#coin-toss-experiment)
* [Computing the Characteristic Function of a Discrete Probability Distribution In Python](#installing-packages)
* [Recovering the Probability Distribution from a Characteristic Function in Python](#bayesian-example)


## Import some mathematical packages 

In [None]:
import numpy as np
import random
import math

Let's look at the characteristic function of a normal distribution ~ N(5, 1)  with mean 5 and unit standard deviation. This is a complex valued function. Characteristic functions are a fundamental tool in mathematical statistics and even used in the proof of the Central Limit Theorem.


In [None]:
sigma = 1
mu = 5
N = 100


def char(t):
 """
 Estimate of characteristic function for normally distributed data set
 """
 val = np.exp((1j)*t*mu-(1/2)*sigma**2 * t**2)
 return val


char(5)

(3.693869103004312e-06-4.932290693320294e-07j)

Note, this is actually a discrete-time Fourier transform. The details of this calculation are standard in many mathematical statistics textbooks but take our word for it.

We need to create a function to estimates the probability mass function for a discrete training set X_train. Here X_train will be a list of numbers representing the result of a coin toss.

In [None]:



def estimate_pmf(X_train):
 """
 Estimate the probability mass function for training data X
 """
 _, freq = np.unique(X_train, return_counts=True)
 estimate = freq / X_train.size
 return estimate


## Coin Toss Simulation 

Let's generate our training set. We simulate N = 10,000 random flips of a fair coin. We encode flip as 'H' for heads and 'T' for tails and store the result in a list called vector.

In [None]:
#set random seed for reproducibility
random.seed(123)

#simulate 10,000 trials
N = 10000

def toss_simulation():
 """
 Simulate coin toss for a fair coin
 """
 return random.choice(['H', 'T'])


# training data size = N
X_train = np.array([toss_simulation() for _ in np.arange(N)])


We also need to create the inverse function to compute the characteristic function. In Python, a function can return another function!

In [None]:
def compute_char(vector):
  """
  This returns the characteristic
  function of a pmf represented
  as a vector.
  """
  def char(w):
      """
      Compute characteristic function at poitn x given PMF
      """
      pmf = estimate_pmf(vector)
      l = len(pmf)
      weights = pmf[:l]
      phi = [np.exp((1j)*(math.pi)*w*k) for k in range(l)]
      return np.dot(phi, pmf[:l])
  return char

char = compute_char(X_train)

Can you use it to recover the original Bernoulli proability distribution for our coin toss from the characteristic function? Yes, we need to use the inverse Fourier transform.


---





In [None]:
def proba(x, char, N):
 """
 x (float): value of random variable rv in range 0, 1,2, ... N-1
 char (func): estimation of characteristic function  for random variable
 N (int): total number of discrete outcomes in probability mass function
 returns: re-constructed probability of rv at x
 """
 if x < N:
   probability = 1/N * np.sum(
      np.array(
      [char(2*n*math.pi/N) * np.exp((-1j)*2*math.pi*x*n/N) for n in np.arange(N)]
      )
   )
   # probability should be a real number between 0 and 1
   if probability.real > 1:
     raise ValueError("Probability estimate out of bounds.")
   return probability.real
 else:
   raise ValueError("Point x is not in support of random variable.")

Finally, let's print out the probability of success and failure for our coin toss based on the recovered probability distribution to see if our experiment worked.

```
# This is formatted as code
```



In [None]:
n_outcomes = 2 # heads or tails

p , q = [proba(x, char, 2) for x in range(2) ] 

print(f"The probability of heads is {p} and the probability of tails is {q}")

The probability of heads is 0.5236627196400556 and the probability of tails is 0.4763372803599444


Ok it recovered probability of heads as 0.52 which is pretty close to 0.5. Our experiment worked!