# TOC 

[Fundamentals](#fundamentals)<br>
<br>

[Random Variables](#random-variables)<br>
<br>

[Distribution Definition](#defining-distributions)<br>
<br>

[Types Of Distribution](#types-of-probability-distributions-overview)<br>
<br>

[Discrete Distribution Overview](#discrete-distribution-overview)<br>
<br>

[Continuous Distribution](#continuous-distributions)<br>
<br>

[Discrete]()<br>
<br>

# Distributions Fundamentals

## Random Variables 
A Random Variable in its simplest form is a function. In probability we often use random variables to represent random events. A Random Variable allows you to convert a real world experiment into a numerical value that we can perform some mathematics on. For example, we could use a random variable to represent the outcome of a die roll, being a set between 1 to 6. <br>

* Random Variables can be either Discrete or Continuous
 * Discrete Random Variables are countable values such as the outcome of a die roll (1,2,3,4,5,6)
 * Continuous Random Variables are uncountable values such as the height of a person ( 5.6 feet, 5.67 feet, 5.678 feet etc)


In [3]:
import numpy as np
from IPython.display import display, Math

die_6 = range(1, 7)

rolls = np.random.choice(die_6, size = 2, replace = 2)
ans = str(rolls[0]) + ', ' + str(rolls[1])

disp = '\\text{Dice Roll: }%s'

display(Math(disp%(ans)))

<IPython.core.display.Math object>

### Random Variable Example

Say you are sitting on a park bench, enjoying the nice weather as you people watch. You decide to count how many people walk by in a five minute period. You note the hair color and height of each person. One observation would be Person1 has brown hair and his height is 165 cm.<br>
<br>
After you complete this experiment you ask yourself a few questions:<br>

1. How many people walked past ? <br>
&emsp;10 people passed by, so the sample space is 10 so I will assign a name to the sample space which is referred to as the <b>outcome</b>:<br>
&emsp;&emsp;$\text{Outcome }\Rightarrow \Omega_1 = 10~people$<br>
<br>
&emsp;Since this is an outcome of the experiment, you want to make a <b>random variable:</b><br>
&emsp;&emsp;$\text{Random Variable }\Rightarrow X_1(\Omega_1) = 10$<br>
<br>

2. What was the average height of the people who walked pass?<br>
&emsp;&emsp;$\text{Outcome }\Rightarrow \Omega_2 = 165.32~cm$<br>
&emsp;&emsp;$\text{Random Variable }\Rightarrow X_2(\Omega_2) = 165.32$<br>
<br>

<br>
<br>

### Defining Distributions

In short a distribution is a the possible values a variable can take and how frequently they occur<br>
<br>
<b>ℹ️ Notations:</b><br>

* $Y \rightarrow \text{The actual outcome of an event}$
* $y \rightarrow \text{One of the possible outcomes}$
* $P(Y = y) \rightarrow \text{Outcome Y with the value of y or } P(y)$
 
 * $Y \rightarrow \text{The number of balls ⚫ we draw out of a bag}$
 
 * $Y \rightarrow 3 \text{ The value of y has a specific value 3 (⚫ ⚫ ⚫)}$
 
 * $P(Y = 3) \rightarrow \text{The probability of getting exactly 3 balls or it can be expressed as p(3)} $
 
  * $\text{p(3) is referred to as the } \textbf{Probability Function}$
<br>
<br>

<b>Probability Frequency Distribution</b><br>
Can also be referred to as <b>Probabilities</b> is the measure of the likelihood of an outcome depending on how often it is featured in the Sample Space $\Omega$<br>
<br>

The Frequency Distribution Table below details the roll of 2 dice. The Sum column is the sum of the roll of the two dice from 2 ( 1 + 1) to 12 (6 + 6) with the highest frequency being the sum of 7. There are 6 different ways to roll a 7 resulting in a probability of $\dfrac{1}{6}$. Each probability is equal to the Frequency divided by the Sample Space $\Omega$. For example the frequency for 7 is 6, resulting in $\dfrac{6}{36}$ or $\dfrac{1}{6}$.<br>
<br>
This is is the typical way to construct probabilities when we have a <b>Finite</b> amount of elements.<br>
<br>


But what do you do with Infinite Sample Spaces ? <br>
It would be impossible to record the frequency for each element. The data above is <b>Discrete</b>, However when dealing with the amount of rain or a population's height, those element values are <b>Continuous</b><br>



In [None]:

info = {'Sum' : [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
        'Frequency' : [1, 2, 3, 4, 5, 6, 5, 4, 3, 2, 1],
        'Probability': ['$\\dfrac{1}{36}$', '$\\dfrac{1}{18}$', '$\\dfrac{1}{12}$', '$\\dfrac{1}{9}$',  '$\\dfrac{5}{36}$', '$\\dfrac{1}{6}$',
                        '$\\dfrac{5}{36}$', '$\\dfrac{1}{9}$', '$\\dfrac{1}{12}$', '$\\dfrac{1}{18}$', '$\\dfrac{1}{36}$']
        }

from tabulate import tabulate 
print(tabulate(info, headers = 'keys', tablefmt = 'pipe', numalign='center', stralign='center'))



| Sum | Frequency |  Probability  |
|:-----:|:-----------:|:---------------:|
|  2  |   1   | $\dfrac{1}{36}$ |
|  3  |   2   | $\dfrac{1}{18}$ |
|  4  |   3   | $\dfrac{1}{12}$ |
|  5  |   4   | $\dfrac{1}{9}$ |
|  6  |   5   | $\dfrac{5}{36}$ |
|  7  |   6   | $\dfrac{1}{6}$ |
|  8  |   5   | $\dfrac{5}{36}$ |
|  9  |   4   | $\dfrac{1}{9}$ |
| 10  |   3   | $\dfrac{1}{12}$ |
| 11  |   2   | $\dfrac{1}{18}$ |
| 12  |   1   | $\dfrac{1}{36}$ |

<br><br>

### Distribution Example

It helps me to start with an example when I am trying to grasp something that is not initially clear to me. So that is what I will do here with a Distribution<br>
<br>
Lets start with the random variable X, which will represent 3 coin tosses of a fair coin. The possible outcomes of three tosses are:<br>
<br>
<b>H = heads<br>
T = tails<br></b>

* HHH
* HHT
* HTH
* THH
* TTH
* THT
* HTT
* TTT

<b>The 8 trials represent make up the Random Variable X</b><br>

<b>The Possible Outcomes Of X:</b><br>
There are FOUR possible values for X:<br>

* you can flip THREE times and get ZERO Heads (TTT)
* you can flip THREE times and get ONE Heads (TTH, THT, HTT)
* you can flip THREE times and get TWO Heads (HHT, HTH, THH)
* you can flip THREE times and get THREE Heads (HHH)

<b>The Probabilities of the Possible Outcomes Of X:</b><br>

* Probability of getting 0 Heads (TTT): $P(X = 0) = \dfrac{1}{8} = 0.125$
* Probability of getting 1 Head (TTH, THT, HTT): $P(X = 1) = \dfrac{3}{8} = 0.375$
* Probability of getting 2 Heads (HHT, HTH, THH): $P(X = 2) = \dfrac{3}{8} = 0.375$
* Probability of getting 3 Heads (HHH): $P(X = 3) = \dfrac{1}{8} = 0.125$
<br>
<b>Show The Distribution Of X</b><br>




In [9]:

import numpy as np

from bokeh.io import curdoc, show, output_notebook
from bokeh.models import ColumnDataSource, Grid, LinearAxis, Plot, VBar, TeX

output_notebook(hide_banner=True)
curdoc().theme = 'dark_minimal'

x = list(range(0, 4))
y = [1, 3, 3, 1]

source = ColumnDataSource(dict(x=x,top=y,))

plot = Plot(
    title="Discrete Distribution\nFor Random Variable X", width=500, height=400,
    min_border=0, toolbar_location=None)

glyph = VBar(x="x", top="top", bottom=0, width=1, fill_color="dodgerblue")
plot.add_glyph(source, glyph)

xaxis = LinearAxis()
plot.add_layout(xaxis, 'below')

yaxis = LinearAxis()
plot.add_layout(yaxis, 'left')

plot.add_layout(Grid(dimension=0, ticker=xaxis.ticker))
plot.add_layout(Grid(dimension=1, ticker=yaxis.ticker))

# Configure y-axis ticks and labels
from bokeh.models import FixedTicker
plot.yaxis.ticker = FixedTicker(ticks=[0, 1, 2, 3])
plot.yaxis.major_label_overrides = {
    0: TeX(r"0"),
    1: TeX(r"\dfrac{1}{8}"),
    2: TeX(r"\dfrac{1}{4}"),
    3: TeX(r"\dfrac{3}{8}"),
}
# Add minor ticks for better visibility
plot.yaxis.minor_tick_line_color = "gray"
plot.yaxis.minor_tick_line_alpha = 0.5

plot.yaxis.axis_label = 'Probabilities'
plot.xaxis.axis_label = 'Coin Tosses'

curdoc().add_root(plot)

show(plot)

In [7]:

from IPython.display import display, Math
import scipy.stats as  stats 

x = 1 # The value of interest
n = 3 # Then number of trials 
p = 0.5 # the probability of success 

ans = round(stats.binom.pmf(x, n, p), 4)

times = (lambda x: 'times' if x > 1 else 'time')
tosses = (lambda n: 'tosses' if n > 1 else 'toss')


msg = '\\text{What is the probability of getting heads %s %s in %s %s ? %s}'

display(Math(msg%(x, times(x), n, tosses(n), ans)))



<IPython.core.display.Math object>

### C++ Implementation

Below is a C++ implementation of the binomial probability mass function that provides the same functionality as the SciPy version above, but with potential performance benefits for computational-intensive applications.

**Key Features:**
- Uses the `tgamma` function for factorial calculations (more numerically stable)
- Implements proper error checking for invalid parameters
- Designed for integration with Python via pybind11
- Optimized for both accuracy and performance

#### C++ Header File (binomial_stats.hpp)

```cpp
#ifndef BINOMIAL_STATS_HPP
#define BINOMIAL_STATS_HPP

#include <cmath>
#include <stdexcept>
#include <iomanip>
#include <sstream>

namespace BinomialStats {

/**
 * Calculate binomial coefficient C(n, k) using the gamma function
 * More numerically stable than factorial calculation
 */
double binomial_coefficient(int n, int k) {
  if (k < 0 || k > n) return 0.0;
  if (k == 0 || k == n) return 1.0;
  
  // Use gamma function: C(n,k) = Γ(n+1) / (Γ(k+1) * Γ(n-k+1))
  return std::tgamma(n + 1) / (std::tgamma(k + 1) * std::tgamma(n - k + 1));
}

/**
 * Calculate binomial probability mass function
 * P(X = k) = C(n,k) * p^k * (1-p)^(n-k)
 */
double binomial_pmf(int k, int n, double p) {
  // Input validation
  if (n < 0) {
    throw std::invalid_argument("Number of trials (n) must be non-negative");
  }
  if (p < 0.0 || p > 1.0) {
    throw std::invalid_argument("Probability (p) must be between 0 and 1");
  }
  if (k < 0 || k > n) {
    return 0.0; // Outside valid range
  }
  
  // Handle edge cases
  if (n == 0) return (k == 0) ? 1.0 : 0.0;
  if (p == 0.0) return (k == 0) ? 1.0 : 0.0;
  if (p == 1.0) return (k == n) ? 1.0 : 0.0;
  
  // Calculate PMF
  double coeff = binomial_coefficient(n, k);
  double prob_success = std::pow(p, k);
  double prob_failure = std::pow(1.0 - p, n - k);
  
  return coeff * prob_success * prob_failure;
}

/**
 * Format probability question with proper grammar
 */
std::string format_probability_question(int k, int n, double p, double result, int precision = 4) {
  std::ostringstream oss;
  
  // Handle singular/plural forms
  std::string times_word = (k == 1) ? "time" : "times";
  std::string tosses_word = (n == 1) ? "toss" : "tosses";
  
  oss << "What is the probability of getting heads " 
    << k << " " << times_word 
    << " in " << n << " " << tosses_word 
    << " ? " << std::fixed << std::setprecision(precision) << result;
  
  return oss.str();
}

} // namespace BinomialStats

#endif // BINOMIAL_STATS_HPP
```

#### C++ Implementation File (binomial_stats.cpp)

```cpp
#include "binomial_stats.hpp"
#include <iostream>
#include <vector>

using namespace BinomialStats;

/**
 * Main function demonstrating the C++ binomial distribution implementation
 * This creates the same calculations as the Python scipy.stats.binom.pmf examples above
 */
int main() {
  std::cout << "C++ Binomial Distribution Examples\n";
  std::cout << "==================================\n\n";
  
  // Example 1: Single probability calculation
  int n1 = 10, k1 = 3;
  double p1 = 0.5;
  double result1 = binomial_pmf(k1, n1, p1);
  
  std::cout << format_probability_question(k1, n1, p1, result1) << "\n\n";
  
  // Example 2: Multiple probability calculations
  std::vector<int> k_values = {0, 1, 2, 3, 4, 5};
  int n2 = 5;
  double p2 = 0.5;
  
  std::cout << "Probability distribution for " << n2 << " coin tosses:\n";
  std::cout << "k\tP(X = k)\n";
  std::cout << "---------------\n";
  
  for (int k : k_values) {
    double prob = binomial_pmf(k, n2, p2);
    std::cout << k << "\t" << std::fixed << std::setprecision(4) << prob << "\n";
  }
  
  // Example 3: Verification against known values
  std::cout << "\nVerification (should match Python scipy.stats.binom.pmf results):\n";
  std::cout << "P(X=2|n=5,p=0.5) = " << binomial_pmf(2, 5, 0.5) << " (expected: 0.3125)\n";
  std::cout << "P(X=3|n=10,p=0.5) = " << binomial_pmf(3, 10, 0.5) << " (expected: 0.1172)\n";
  
  return 0;
}
```

#### Makefile for Compilation

```makefile
# Makefile for Binomial Statistics C++ Implementation
CXX = g++
CXXFLAGS = -std=c++17 -Wall -Wextra -O2
TARGET = binomial_stats
SOURCES = binomial_stats.cpp
HEADERS = binomial_stats.hpp

# Default target
all: $(TARGET)

# Build the executable
$(TARGET): $(SOURCES) $(HEADERS)
	$(CXX) $(CXXFLAGS) -o $(TARGET) $(SOURCES)

# Clean build artifacts
clean:
	rm -f $(TARGET)

# Run the program
run: $(TARGET)
	./$(TARGET)

# Debug build with symbols
debug: CXXFLAGS += -g -DDEBUG
debug: $(TARGET)

# Install dependencies (if needed)
install-deps:
	# No external dependencies required for this implementation

.PHONY: all clean run debug install-deps
```

#### Compilation and Usage Instructions

To compile and run the C++ implementation:

```bash
# Compile the program
make

# Run the executable
make run

# Or run directly
./binomial_stats

# Clean build artifacts
make clean

# Debug build
make debug
```

#### Performance Comparison

The C++ implementation offers several advantages over the Python/SciPy version:

1. **Speed**: C++ direct calculation is typically 10-100x faster than Python
2. **Memory**: Lower memory footprint, no NumPy array overhead
3. **Precision**: Direct control over numerical precision and error handling
4. **Independence**: No external library dependencies beyond standard C++ library

For educational purposes, the Python implementation with SciPy is more readable and interactive, while the C++ version is better for production applications requiring high performance statistical computations.

#### Python Wrapper Using ctypes

The Python wrapper provides a user-friendly interface to the compiled C++ library, making the high-performance C++ functions accessible through familiar Python syntax.

In [None]:
"""
binomial_wrapper.py - Python wrapper for C++ binomial statistics library

This module provides a Pythonic interface to the high-performance C++ 
binomial distribution implementation.
"""

import ctypes
import os
from pathlib import Path
from typing import Union, List
import numpy as np


class BinomialStats:
    """
    Python wrapper for C++ binomial statistics functions.
    
    This class loads the compiled C++ shared library and provides
    Python-friendly interfaces to the optimized C++ functions.
    """
    
    def __init__(self, lib_path: Union[str, Path] = None):
        """
        Initialize the wrapper and load the C++ library.
        
        Args:
            lib_path: Path to the compiled shared library (.so on Linux, .dylib on Mac)
                     If None, searches in common locations
        """
        if lib_path is None:
            # Search for the library in common locations
            search_paths = [
                './libbinomial_stats.so',
                './libbinomial_stats.dylib',
                '../lib/libbinomial_stats.so',
                '/usr/local/lib/libbinomial_stats.so',
            ]
            for path in search_paths:
                if os.path.exists(path):
                    lib_path = path
                    break
            
            if lib_path is None:
                raise FileNotFoundError(
                    "Could not find binomial_stats library. "
                    "Please compile the C++ code and specify lib_path."
                )
        
        # Load the shared library
        self._lib = ctypes.CDLL(str(lib_path))
        
        # Configure function signatures
        # double binomial_coefficient(int n, int k)
        self._lib.binomial_coefficient.argtypes = [ctypes.c_int, ctypes.c_int]
        self._lib.binomial_coefficient.restype = ctypes.c_double
        
        # double binomial_pmf(int k, int n, double p)
        self._lib.binomial_pmf.argtypes = [ctypes.c_int, ctypes.c_int, ctypes.c_double]
        self._lib.binomial_pmf.restype = ctypes.c_double
    
    def binomial_coefficient(self, n: int, k: int) -> float:
        """
        Calculate binomial coefficient C(n, k) = n! / (k! * (n-k)!)
        
        Args:
            n: Total number of items
            k: Number of items to choose
            
        Returns:
            The binomial coefficient as a float
            
        Example:
            >>> bs = BinomialStats()
            >>> bs.binomial_coefficient(10, 3)
            120.0
        """
        return self._lib.binomial_coefficient(n, k)
    
    def pmf(self, k: Union[int, List[int], np.ndarray], 
            n: int, p: float) -> Union[float, np.ndarray]:
        """
        Calculate binomial probability mass function.
        
        Computes P(X = k) for a binomial distribution with parameters n and p.
        This matches the behavior of scipy.stats.binom.pmf(k, n, p).
        
        Args:
            k: Number of successes (can be int, list, or array)
            n: Number of trials
            p: Probability of success on each trial
            
        Returns:
            Probability (float if k is int, array if k is list/array)
            
        Examples:
            >>> bs = BinomialStats()
            >>> # Single value
            >>> bs.pmf(3, 10, 0.5)
            0.1171875
            
            >>> # Multiple values
            >>> bs.pmf([0, 1, 2, 3], 5, 0.5)
            array([0.03125, 0.15625, 0.3125 , 0.3125 ])
        """
        # Handle scalar input
        if isinstance(k, int):
            return self._lib.binomial_pmf(k, n, p)
        
        # Handle list/array input
        k_array = np.asarray(k, dtype=int)
        result = np.zeros_like(k_array, dtype=float)
        
        for i, k_val in enumerate(k_array.flat):
            result.flat[i] = self._lib.binomial_pmf(int(k_val), n, p)
        
        return result
    
    def pmf_range(self, k_min: int, k_max: int, n: int, p: float) -> float:
        """
        Calculate probability for a range of values: P(k_min <= X <= k_max)
        
        This efficiently computes the sum of PMF values over the range.
        
        Args:
            k_min: Minimum number of successes (inclusive)
            k_max: Maximum number of successes (inclusive)
            n: Number of trials
            p: Probability of success on each trial
            
        Returns:
            Total probability over the range
            
        Example:
            >>> bs = BinomialStats()
            >>> # Probability of 2-4 heads in 10 coin flips
            >>> bs.pmf_range(2, 4, 10, 0.5)
            0.3662109375
        """
        total_prob = 0.0
        for k in range(k_min, k_max + 1):
            total_prob += self._lib.binomial_pmf(k, n, p)
        return total_prob
    
    def format_question(self, k: int, n: int, p: float, precision: int = 4) -> str:
        """
        Format a probability question with proper grammar.
        
        Args:
            k: Number of successes
            n: Number of trials
            p: Probability of success
            precision: Number of decimal places for the result
            
        Returns:
            Formatted question string with answer
            
        Example:
            >>> bs = BinomialStats()
            >>> bs.format_question(3, 10, 0.5)
            'What is the probability of getting heads 3 times in 10 tosses? 0.1172'
        """
        result = self.pmf(k, n, p)
        times_word = "time" if k == 1 else "times"
        tosses_word = "toss" if n == 1 else "tosses"
        
        return (f"What is the probability of getting heads {k} {times_word} "
                f"in {n} {tosses_word}? {result:.{precision}f}")


# Convenience function for quick usage
def binom_pmf(k: Union[int, List[int], np.ndarray], 
              n: int, p: float) -> Union[float, np.ndarray]:
    """
    Convenience function matching scipy.stats.binom.pmf interface.
    
    Args:
        k: Number of successes
        n: Number of trials
        p: Probability of success
        
    Returns:
        Probability mass function value(s)
        
    Example:
        >>> binom_pmf(3, 10, 0.5)
        0.1171875
    """
    bs = BinomialStats()
    return bs.pmf(k, n, p)

#### Usage Examples

Here's how to use the Python wrapper to access the C++ binomial statistics functions:

In [None]:
# Example 1: Single probability calculation
bs = BinomialStats()

# What's the probability of getting exactly 3 heads in 10 coin tosses?
prob = bs.pmf(3, 10, 0.5)
print(f"P(X=3|n=10,p=0.5) = {prob:.4f}")

# Example 2: Multiple probabilities at once (like scipy.stats.binom.pmf)
k_values = [0, 1, 2, 3, 4, 5]
probabilities = bs.pmf(k_values, 5, 0.5)

print("\nProbability distribution for 5 coin tosses:")
print("k\tP(X=k)")
print("-" * 20)
for k, p in zip(k_values, probabilities):
    print(f"{k}\t{p:.4f}")

# Example 3: Range probability (sum of PMF values)
# Probability of getting between 2 and 4 heads in 10 tosses
range_prob = bs.pmf_range(2, 4, 10, 0.5)
print(f"\nP(2 <= X <= 4|n=10,p=0.5) = {range_prob:.4f}")

# Example 4: Using the convenience function (scipy-like interface)
print("\nUsing convenience function:")
print(f"binom_pmf(3, 10, 0.5) = {binom_pmf(3, 10, 0.5):.4f}")

# Example 5: Formatted question output
question = bs.format_question(3, 10, 0.5)
print(f"\n{question}")

#### Performance Comparison: C++ vs SciPy

Let's compare the performance of our C++ implementation (via Python wrapper) against SciPy's pure Python implementation:

In [None]:
import time
import numpy as np
from scipy.stats import binom

# Initialize our C++ wrapper
bs = BinomialStats()

# Test parameters
n, p = 100, 0.5
k_values = np.arange(0, n+1)
iterations = 1000

print("Performance Comparison: C++ vs SciPy")
print("=" * 50)
print(f"Calculating PMF for {len(k_values)} values, {iterations} iterations\n")

# Benchmark C++ implementation
start = time.perf_counter()
for _ in range(iterations):
    cpp_result = bs.pmf(k_values, n, p)
cpp_time = time.perf_counter() - start

# Benchmark SciPy implementation
start = time.perf_counter()
for _ in range(iterations):
    scipy_result = binom.pmf(k_values, n, p)
scipy_time = time.perf_counter() - start

# Verify results match
max_diff = np.max(np.abs(cpp_result - scipy_result))

print(f"C++ Implementation:   {cpp_time:.4f} seconds")
print(f"SciPy Implementation: {scipy_time:.4f} seconds")
print(f"Speedup:              {scipy_time/cpp_time:.2f}x faster")
print(f"Max difference:       {max_diff:.2e} (numerical precision)")

# Verify correctness
print("\nSample Results Comparison:")
print("k\tC++ PMF\t\tSciPy PMF\tMatch")
print("-" * 60)
for k in [0, 25, 50, 75, 100]:
    cpp_val = bs.pmf(k, n, p)
    scipy_val = binom.pmf(k, n, p)
    match = "✓" if np.isclose(cpp_val, scipy_val) else "✗"
    print(f"{k}\t{cpp_val:.6f}\t{scipy_val:.6f}\t{match}")

#### Key Features of the Python Wrapper

The Python wrapper provides several advantages for building a hybrid C++/Python statistical computing library:

1. **Pythonic Interface**: Matches SciPy's API (`binom.pmf(k, n, p)`) for easy adoption
2. **Type Flexibility**: Accepts int, list, or NumPy arrays for the `k` parameter
3. **Automatic Library Loading**: Searches common installation locations for the compiled C++ library
4. **Error Handling**: Validates inputs and provides clear error messages
5. **Vectorized Operations**: Efficiently processes arrays of k values
6. **Educational Features**: Includes `format_question()` for generating readable probability statements
7. **Performance**: Delivers C++ speed with Python convenience

#### Integration with pip Package

To create a distributable pip package, you would structure your project like this:

```
binomial_stats/
├── setup.py
├── pyproject.toml
├── README.md
├── binomial_stats/
│  ├── __init__.py
│  ├── wrapper.py (the Python code above)
│  └── lib/
│    ├── binomial_stats.hpp
│    ├── binomial_stats.cpp
│    └── Makefile
├── tests/
│  └── test_binomial.py
└── docs/
  └── examples.ipynb
```

The `setup.py` would use `setuptools` with build extensions to compile the C++ code during installation, ensuring users get optimized binaries for their platform.


<b>Distributions Characteristics</b><br>
<br>

* mean
 * The average value of a Sample Space or Set
 * Denoted with mu $\mu$
 
* Variance
 * How spread out the data is 
 * The spread is measured in terms of how far away from the mean the data is, the more disperse the data, the higher the variance
 * Denoted with sigma squared $\sigma^2$
 * Population Variance: $\sigma^2 = \dfrac{\sum_{i=1}^{N}(x_i - \mu)^2}{N}$
 * Sample Variance: $s^2 = \dfrac{\sum_{i=1}^{n}(x_i - \bar{x})^2}{n-1}$

* Standard Deviation
 * One Standard Deviation is equal to $\mu - \sigma$ to $\mu + \sigma$
  * The more congested the data with in the one standard deviation, the more data falls inside of it
  * The less congested the data within in the one standard deviation, the less data falls inside of it
 * Population Standard Deviation is denoted as $\sigma$
  * Formula: $\sigma = \sqrt{\dfrac{\sum_{i=1}^{N}(x_i - \mu)^2}{N}}$
 * Sample Standard Deviation is denoted as $s$
  * Formula: $s = \sqrt{\dfrac{\sum_{i=1}^{n}(x_i - \bar{x})^2}{n-1}}$
 * The positive square root of variance $\rightarrow \sqrt{\sigma^2}$ 


<b>ℹ️ NOTE:</b><br>
variance is measured in squared units. For example if X represents a $\Omega$ of time in seconds, the variance would be measured in $seconds^2$, which makes the analysis of the data more difficult. Standard Deviation is typically used in place of variance because it is measured in the same units as the mean <br>
<br>
<b>ℹ️ NOTE:</b><br>
Sample variance and sample standard deviation use $n-1$ in the denominator (Bessel's correction) instead of $n$ to provide an unbiased estimator of the population variance. This correction accounts for the fact that we're using the sample mean $\bar{x}$ instead of the true population mean $\mu$.<br>
<br>

<b>Population vs Sample</b><br>
<br>

* Population Data
 * All the data
  * population mean 
  * population standard deviation $\sigma$
  
 * Sample Data 
  * sample mean $\bar{x}$
  * sample variance $s^2$
  * sample standard deviation $s$
<br>

<b>ℹ️ NOTE:</b><br>
There is a constant relationship between mean and variance<br>
&emsp;The variance is equal to the expected value fo the squared difference from the mean for any value<br>

&emsp;&emsp; $\sigma^2 = E((Y - \mu)^2) = E(Y^2) - \mu^2$<br>
<br>
<br>

#### Types of Probability Distributions Overview
<br>
<b>First A Note On Distribution Notation</b>
<br>

&emsp;&nbsp; Format: $\text{variable tilde type (characteristics of data set)}$<br>
<br>
&emsp;&nbsp; Example: $\text{X \~ ~N } (\mu, \sigma^2)$<br>
<br>
The characteristics are usually mean and variance, however they may vary depending on the type of distribution<br>
<br>
<Br>
<span style = "color:mediumseagreen;font-size:104%">


### Discrete Distribution Overview

* Discrete Distribution 
 * Uniform Distribution 
  * Equiprobable 
   * equally likely outcomes 
   * Example
    * You are equally likely to get heads or tails on the toss of coin
 * Bernoulli Distribution 
  * distributions that have the probability of being True or False, as also described as Success or Failure. 
  * Does not have to be Equiprobable
 * Binomial Distribution
  * A discrete distribution with many iterations where the outcome has two probabilities such as true/false....pass/fail...
  * Example
   * If I flip a coin 10 times what is the probability of getting heads 5 times $P(heads \cdot 5)$
 * Poison Distribution
  * A discrete distribution that is used to predict or explain the number of events occurring within a given interval or time space
  * Example
    * A NFL team averages 27 points per game. What is the probability of that team scoring 10 points in the first quarter of their next game? The interval is games, however we want the probability of a quarter of the game. <br>
<br>
A Discrete Distribution has a finite number of outcomes. Therefore it can be expressed with any of the following:<br>

* table 
* graph 
* formula 

The only criteria is that every unique outcome has a probability assigned to it.<br>
<b>However we are often more interested in the interval more than the unique value</b>
<br> Its easy to do this with discrete probabilities. We simply add all the probabilities within that interval or range. <br>
<br>
Example:<br>
Suppose we want to know the probability of drawing three spades or fewer. First we would calculate the probability for getting zero, one, two or three spades. Then sum the four. <br>
<br>

&emsp;$P(0) + P(1) + P(2) + P(3) = P(y \leq 3)$ <br>
<br>

<b>ℹ️NOTE:</b><br>
One peculiarity of discrete events is the probability of Y being less than or equal to y is the same as the probability of Y being less than y + 1.<br>I know... weird!<br>
<br>

&emsp;$P(Y \leq y) = P[Y < (y + 1)]$<br>
<br> 
&emsp;$P(♠️ \leq 3) = P(♠️ < 4)$<br>
<br>


#### Discrete Distributions And The General Use Of PMF

Recall in the Random Variable discussion, we used the example of an experiment of people watching where 10 people were observed. The result of the experiment was:<br>
&emsp;10 people passed by, so the sample space is 10 so I will assign a name to the sample space which is referred to as the <b>outcome</b>:<br>
<br>
&emsp;&emsp;$\text{People Count Outcome }\Rightarrow \Omega_1 = 10~people$<br>
&emsp;&emsp;$\text{Random Variable }\Rightarrow X_1(\Omega_1) = 10$<br>
<br>
&emsp;&emsp;$\text{Average Height Outcome }\Rightarrow \Omega_2 = 165.32~cm$<br>
&emsp;&emsp;$\text{Random Variable }\Rightarrow X_2(\Omega_2) = 165.32$<br>
<br>

What if you wanted to ask a a different question such as <b>What is the probability of 3 people walking pass?</b> <br>
So I created the random variable $X_1$ which represents 10 people observed in the experiment. However this random variable could take on the value of 0, 1, 2, 3.... The key take away from this is that the set of numbers is <b>discrete</b><br>
<br>
A Probability Mass Function is a type of Probability Distribution Function<br>
<br>
The notation for a Probability Distribution Function is : <br>
<br>

$$P(X = x) \in [0, 1]$$
<br>

* P is the probability distribution function
* X is the random variable
* x is a variable that can take on a value in the space of X
* [0, 1] denotes that all the possible values of x will be greater then zero and less than 1<br>
<br>
With this example, the probability distribution function for the Random Variable $X_1$ would look like this:<br>

$$P(X_1 = x) \in [0, 1]$$
<br>
<br>
A typical notation for the PMF would be:<br>

$$\displaystyle p_{x1}(x) \in [0, 1]$$
<br>

* The x in the PMF will always be between 0 and 1 $\Rightarrow 0 \geq p_{x1} \geq 1$<br>

* The sum of all the probabilities is equal to 1

For this exercise, lets make the set $\{1, 2, 3, 4, 5, 6\}$ We can graph this as a <b>Probability Distribution Function</b><br>
<br>
<B>NOTE:</b><br>
As more values are added to the set (x - axis in the graph), the bars will get smaller. This is because the sum of the heights of all the bars will always equal 1. No matter how many bars are required. This is part to the reason the function is termed Probability <b>Mass</b> Function, because the "mass" of the each rectangle bar has meaning. Notice that the sum of the probabilities of each bar equals 1. 

<br><Br>

There are performance issues using Python for these use cases. Especially considering population data.<br>

I would use C++/Python statistical library in a production setting, implementing both population and sample versions ensures users can apply the appropriate calculation for their specific use case, whether analyzing complete populations or working with sample data to make inferences about larger populations.<br>

Python is used hear because it's easier to read, and doesn't get in the way of the discussion. 
<br><Br>

In [14]:

import numpy as np

from bokeh.io import curdoc, show, output_notebook
from bokeh.models import ColumnDataSource, Grid, LinearAxis, Plot, VBar, TeX


curdoc().theme = 'dark_minimal'
output_notebook(hide_banner=True)


x = list(range(1, 7))
y1 = [1, 0, 3, 2, 1, 1]
y2 = [0, 0, 3, 0, 0, 0]

source1 = ColumnDataSource(dict(x=x,top=y1,))
source2 = ColumnDataSource(dict(x=x,top=y2,))

plot = Plot(
    title="Discrete Distribution\nFor Random Variable X", width=500, height=400,
    min_border=0, toolbar_location=None)

glyph1 = VBar(x="x", top="top", bottom=0, width=1, fill_color="dodgerblue")
plot.add_glyph(source1, glyph1)

xaxis = LinearAxis()
xaxis.axis_label = 'Passersby'
plot.add_layout(xaxis, 'below')

yaxis = LinearAxis()
yaxis.ticker = [0, 0.5, 1, 1.5, 2, 2.5, 3]
yaxis.major_label_overrides = {
    0: TeX(r"0.0"),
    0.5: TeX(r""),
    1: TeX(r"0.166."),
    1.5: TeX(r""),
    2: TeX(r"0.333"),
    2.5: TeX(r""),
    3: TeX(r"0.5"),
}
yaxis.axis_label = 'Probabilities'
plot.add_layout(yaxis, 'left')

plot.add_layout(Grid(dimension=0, ticker=xaxis.ticker))
plot.add_layout(Grid(dimension=1, ticker=yaxis.ticker))

curdoc().add_root(plot)

show(plot)

In [13]:

import numpy as np

from bokeh.io import curdoc, show, output_notebook
from bokeh.models import ColumnDataSource, Grid, LinearAxis, Plot, VBar, TeX


curdoc().theme = 'dark_minimal'
output_notebook(hide_banner=True)


x = list(range(1, 7))
y1 = [1, 0, 0, 2, 1, 1]
y2 = [0, 0, 3, 0, 0, 0]

source1 = ColumnDataSource(dict(x=x,top=y1,))
source2 = ColumnDataSource(dict(x=x,top=y2,))

plot = Plot(
    title="Discrete Distribution\nFor Random Variable X", width=500, height=400,
    min_border=0, toolbar_location=None)

glyph1 = VBar(x="x", top="top", bottom=0, width=1, fill_color="dodgerblue")
plot.add_glyph(source1, glyph1)

glyph2 = VBar(x="x", top="top", bottom=0, width=1, fill_color="firebrick")
plot.add_glyph(source2, glyph2)

xaxis = LinearAxis()
plot.add_layout(xaxis, 'below')

yaxis = LinearAxis()
plot.add_layout(yaxis, 'left')

plot.add_layout(Grid(dimension=0, ticker=xaxis.ticker))
plot.add_layout(Grid(dimension=1, ticker=yaxis.ticker))

plot.yaxis.ticker = [0, 0.5, 1, 1.5, 2, 2.5, 3]
plot.yaxis.major_label_overrides = {
    0: TeX(r"0.0"),
    0.5: TeX(r""),
    1: TeX(r"0.166."),
    1.5: TeX(r""),
    2: TeX(r"0.333"),
    2.5: TeX(r""),
    3: TeX(r"0.5"),
}

plot.yaxis.axis_label = 'Probabilities'
plot.xaxis.axis_label = 'Pasersby'

curdoc().add_root(plot)

show(plot)

#### Using The PMF To Calculate A Range

We have seen that we can calculate the probability of observing a specific value using a probability mass function. What if we want to find the probability of observing a range of values for a discrete random variable? One way we could do this is by adding up the probability of each value.<br>
<br>
For example, let’s say we flip a fair coin 10 times, and want to know the probability of getting between 2 and 4 heads.<br>
<Br>


In [17]:
""" 
The code below is design to get specific probability or a range of probabilities. 
If a range is used the x variable needs to be set to zero
"""

import scipy.stats as stats
from IPython.display import display, Math

x = 0 # The value of interest
prob_range = (2,4) # this tuple is used to get a range of values of interest, x needs to be set to 0
n = 10 # Then number of trials 
p = 0.5 # the probability of success 

def get_pmf_probabilities(x : int, prob_range: tuple, n: int, p: float ):
    msg = ''

    if x == 0:
        lst = list(range(prob_range[0], (prob_range[1] + 1)))
        ans_lst = []
        for i in lst:
            ans_lst.append(stats.binom.pmf(i, n, p))
        msg = msg + '\\text{What is the probability of getting heads %s to %s times out of %s? %s}'
        ans = round(sum(ans_lst),5)
        display(Math(msg%(lst[0], lst[-1], n, ans)))
    else:
        ans = round(stats.binom.pmf(x, n, p), 5)
        times = (lambda x: 'times' if x > 1 else 'time')
        tosses = (lambda n: 'tosses' if n > 1 else 'toss')
        msg = msg + '\\text{What is the probability of getting heads %s %s in %s %s ? %s}'
        display(Math(msg%(x, times(x),n, tosses(n), ans)))

get_pmf_probabilities(x, prob_range, n, p)
    

<IPython.core.display.Math object>

We can also calculate the probability of observing less than a certain value, let’s say 3 heads, by adding up the probabilities of the values below it:<br>
<br>
What is the probability of getting heads less than 3 times

In [18]:

x = 0
prob_range = (0, 2)
n = 10
p = 0.5
get_pmf_probabilities(x, prob_range, n, p)

<IPython.core.display.Math object>

<p>&nbsp;</p>

The cumulative distribution function for a discrete random variable can be derived from the probability mass function. However, instead of the probability of observing a specific value, the cumulative distribution function gives the probability of observing a specific value <b>OR LESS</b><br>
<br>
Cumulative distribution functions are constantly increasing, so for two different numbers that the random variable could take on, the value of the function will always be greater for the larger number.<br>
<br>
We showed how the probability mass function can be used to calculate the probability of observing less than 3 heads out of 10 coin flips by adding up the probabilities of observing 0, 1, and 2 heads. The cumulative distribution function produces the same answer by evaluating the function at CDF(X=2). In this case, using the CDF is simpler than the PMF because it requires one calculation rather than three<br>
<br>
We can use a cumulative distribution function to calculate the probability of a specific range by taking the difference between two values from the cumulative distribution function. For example, to find the probability of observing between 3 and 6 heads, we can take the probability of observing 6 or fewer heads and subtracting the probability of observing 2 or fewer heads. This leaves a remnant of between 3 and 6 heads<br>
<br>

#### Calculating Probability With The CDF
<br>
We can use the scipys' binom.cdf() function to calculate the cumulative distribution function<br>
Calculate the probability of observing 6 or fewer heads from 10 fair coin flips (0 to 6) mathematically


In [None]:

import scipy.stats as stats
from IPython.display import Math 

x = 6 
n = 10
p = 0.5

def get_cdf_probabilities(x: int, n: int, p: float):
    return round(stats.binom.cdf(x, n, p), 5)

    
ans = get_cdf_probabilities(x, n, p)

msg = '\\text{The CDF probability of getting heads %s or fewer times with %s flips is %s}'
display(Math(msg%(x,n,ans)))


<IPython.core.display.Math object>

Calculating the probability of observing between 4 and 8 heads from 10 fair coin flips can be thought of as taking the difference of the value of the cumulative distribution function at 8 from the cumulative distribution function at 3:<br>
<br>

$P(4~to~8~Heads) = (0~to~8~Heads) - (0~to~3~Heads)$

In [13]:

x1 = 8
x2 = 3
n = 10
p = 0.5

""" 
Using the PMF would have been a lot more coding:
print(stats.binom.pmf(2, n=10, p=.5) 
+ stats.binom.pmf(3, n=10, p=.5) 
+ stats.binom.pmf(4, n=10, p=.5) 
+ stats.binom.pmf(5, n=10, p=.5))
"""


ans1 = get_cdf_probabilities(x1, n, p)
ans2 = get_cdf_probabilities(x2, n, p)
prob_ans2 = ans1 - ans2
msg = '\\text{The CDF probability of getting between 4 and 8 heads is %s}'
display(Math(msg%(prob_ans2)))


<IPython.core.display.Math object>

To calculate the probability of observing more than 6 heads from 10 fair coin flips we subtract the value of the cumulative distribution function at 6 from 1. Mathematically, this looks like the following:<br>
<br>

$\text{P(more than 6 heads) = 1 - P(6 or fewer heads)}$


In [14]:
x = 6
n = 10
p = 0.5
p = get_cdf_probabilities(x, n, p)
ans = round((1 - p), 5)
msg = '\\text{The CDF probability of getting more than 6 heads is %s}'
display(Math(msg%(ans)))

<IPython.core.display.Math object>

#### Probability Density Function Calculations Of Continuous Values

Similar to how discrete random variables relate to probability mass functions, continuous random variables relate to probability density functions. They define the probability distributions of continuous random variables and span across all possible values that the given random variable can take on.<br>
<br>
<span style = "color:yellowgreen;font-size:101%">
When graphed, a probability density function is a curve across all possible values the random variable can take on, and the total area under this curve adds up to 1<br>
<br>
In a probability density function, we cannot calculate the probability at a single point. This is because the area of the curve underneath a single point is always zero
<br>
<br>
</span><br>
<br>
We can calculate the area under the curve using the cumulative distribution function for the given probability distribution.<br>
<br>
For example heights are <b>Normal Distributions</b>. The parameters for the normal distribution are the mean and the standard deviation, and we use the form Normal(mean, standard deviation)<br>
<br>
We know that women’s heights have a mean of 167.64 cm with a standard deviation of 8 cm, which makes them fall under the Normal(167.64, 8) distribution<br>
<br>
Let’s say we want to know the probability that a randomly chosen woman is less than 158 cm tall. We can use the cumulative distribution function to calculate the area under the probability density function curve from 0 to 158 to find that probability<br>




#### Continuous Distributions 
<br>
Before defining/discussing Continuous Distribution, I need to review what is a Continuous Random Variable. <b>A continuous random variable</b> is one which takes an infinite number of possible values. Continuous random variables are usually measurements.<br>
<br>
Examples include height, weight, the amount of sugar in an orange, the time required to run a mile<br>
<br>
A key point about Continuous Random Variables is that their sample space is infinite.<br>
<br>
Examples of Continuous Random Variables:<br>

* The time it takes to complete an 60 minute exam. Possible values are all real numbers on the interval [0, 60] such as 0.00000001 seconds. which means the possible values are infinite
 
* Age of a fossil. Possible values are all real numbers on the interval [minimal age, maximal age] which are infinite

* Miles per gallon for a Toyota Prius. Possible Values are all real numbers on the interval [minimal MPG, maximum MPG] which are infinite
<br>

<span style = "color:tomato;font-size:101%">
The main difference between continuous and discrete random variables is that continuous random variables are measured over intervals, while discrete random variables are "discrete"
</span><br>
<br>
For example it impossible that an exam took exactly 32 minutes to complete, It may be 32.1230437408746 minutes but never exactly 32<br> 
<br>
So with that understanding of a continuous random variable we can say that <b>continuous distribution</b> describes the probabilities of the the possible values of a continuous random variable, It will have infinite many number outcomes, Thus the continuous random variable sample space is infinite<br>
<br>
<br>
As stated discrete probability is calculated based on the frequency of exact points or events such as a roll of dice as depicted in the histogram below. The histogram shows the frequency of each result of the roll of two dice. The roll of a 7 being the the most frequent. <br>
<br>
<b>Probability Density Function</b><br>
The Continuous Random Variable sample space is infinite. We cannot record each distinct value like we can do with Discrete Random Variables. Thus you cannot represent the sample space of continuous random variables with a table. Instead we represent them with a graph. Specifically the <b>Probability Density Function (PDF)</b> To be clear, The PDF is defined in Discrete Random Variables as well
</span>
<br>

$f(y) \geq 0 \rightarrow \text{where y is an element of the sample space and greater than or equal to zero}$<br>
<br>

You can plot a continuous distribution in a histogram, but there will be more bins. And sense there will be more bins, the bins will be narrower, which would make the bin value harder to read. To compensate for that we can use a line that <i>should</i> pass through the middle of each bar or along the top. This line is called the <b>Probability Density Curve (PDC)</b>.<br> <span style = "color:mediumseagreen;font-size:101%">The PDC is the output or result of the PDF</span>. 
<br>
Image using $P(y) = \dfrac{favorable}{\text{sample space}}$ to calculate the probabilities for continuous random variables.<br>
<br>
Sense the sample space is infinite $P(y) = \dfrac{favorable}{\text{sample space}} = \dfrac{1}{\infty}$ specifically the denominator is so big making the result of the fraction to be so close to zero that the probability is statistically insignificant.<br>
<br>
Thus it is stated that $P(y) = \dfrac{1}{\infty} = 0$.<br>
<br>
A practical example of this is the probability that someone weighing exactly 200lbs it next to impossible. Or at least the probability is so insignificancy that it's not possible.<br>
<Br>
This leads to an important point:<br>
<br>
<span style = "color:mediumseagreen;font-size:101%">
Because the probability of a specific value is statistically insignificant, we can say: <br>
<br>
&emsp;$P(x > X) = P(x \geq X)$<br>
</span>
<br>
Example:<br>
The probability of a marathoner running a mile in less than six minutes $P(x < 6 min)$ is equal to the probability of the same marathoner running a mile in six minutes of less $P(x \leq 6 min)$ because the probability of running a mile in <i>exactly</i> 6 minutes is statistically impossible.<br>
<br>


#### Probability Density Function Calculations Of Continuous Values

Similar to how discrete random variables relate to probability mass functions, continuous random variables relate to probability density functions. They define the probability distributions of continuous random variables and span across all possible values that the given random variable can take on.<br>
<br>
<span style = "color:yellowgreen;font-size:101%">
When graphed, a probability density function is a curve across all possible values the random variable can take on, and the total area under this curve adds up to 1<br>
<br>
In a probability density function, we cannot calculate the probability at a single point. This is because the area of the curve underneath a single point is always zero
<br>
<br>
</span><br>
<br>
We can calculate the area under the curve using the cumulative distribution function for the given probability distribution.<br>
<br>
For example heights are <b>Normal Distributions</b>. The parameters for the normal distribution are the mean and the standard deviation, and we use the form Normal(mean, standard deviation)<br>
<br>
We know that women’s heights have a mean of 167.64 cm with a standard deviation of 8 cm, which makes them fall under the Normal(167.64, 8) distribution<br>
<br>
Let’s say we want to know the probability that a randomly chosen woman is less than 158 cm tall. We can use the cumulative distribution function to calculate the area under the probability density function curve from 0 to 158 to find that probability<br>

In [24]:

from bokeh.plotting import figure, show
from bokeh.io import output_notebook
import numpy as np
from scipy import stats

# Generate sample data
np.random.seed(42)
data = np.random.normal(0, 1, 1000)

# Calculate PDF
kde = stats.gaussian_kde(data)
x_range = np.linspace(min(data), max(data), 200)
pdf = kde(x_range)

# Create Bokeh plot
p = figure(title='Probability Density Function', 
          x_axis_label='Value',
          y_axis_label='Density')

# Plot PDF line
p.line(x_range, pdf, line_color='goldenrod', line_width=2)

# Fill area under curve
p.patch(x_range, pdf, alpha=0.2, color='dodgerblue')

# Show plot
output_notebook(hide_banner=True)
show(p)

#### Cumulative Distribution Function (CDF) With Continuous Data
A cumulative distribution function (CDF) describes the probabilities of a random variable having values less than or equal to x. It is a cumulative function because it sums the total likelihood up to that point. Its output always ranges between 0 and 1<br>
<br>
CDFs have the following definition: <br>
<br>

$$CDF(x) = P(X \leq x) \\~\\ \text{Where X is the random variable, and x is a specific value}$$
<br>
<br>
You also see/use:<br>
<br>

$$F(y) = P(Y \leq y) \Rightarrow \text{y is lower than or equal to a specific value}$$
<br>
<br>
CDF function is non-decreasing. As x increases, the likelihood can either increase or stay constant, but it can't decrease<br>
<br>

Since no value can be lower than $-\infty$ then if we plug $-\infty$ into the CDF we would get $F(-\infty) = 0$. Similarly all values will be less than $\infty$ then if we plug in $\infty$ into the CDF we would get $F(\infty) = 1$<br>
<br>
All this really means is the CDF will always be between 0 and 1 ($0 \leq F(x) \leq 1$)
<br><br>
<b>CDF vs PDF</b><br>
Both probability density functions (PDFs) and cumulative distribution functions (CDFs) provide likelihoods for random variables. However, PDFs calculate probability densities for x, while CDFs give the chances for ≤ x<br>
<Br>
Put simply, the accumulated area of the PDF defines the CDF. $\int PDF = CDF$. The CDF adds up the area of the PDF. <Br>
<br>
<br>
<b>ℹ️NOTE:</b><br>
You can use the CDF on discrete random variables as well. However it's easier to add the Probability Mass Function values to get the cumulative values of the distribution, this technique is rarely used in that fashion.<br>
<Br>
<span style = "color:mediumseagreen;font-size:101%">
CDFs are really useful when we want to estimate the probability of an interval. 
</span><br>
<br>
<b>Using Cumulative Distribution Functions</b><br>
Cumulative distribution functions are fantastic for comparing two distributions. By comparing the CDFs of two random variables, we can see if one is more likely to be less than or equal to a specific value than the other. That helps us make decisions about whether one is more likely to have a particular property<br>
<br>
Additionally, these cumulative probabilities are equivalent to percentiles. A cumulative probability of 0.80 is the same as the 80th percentile. So, CDFs are great for finding percentiles<br>
# Documentation review comments

Adult males in the U.S. have heights that follow a normal distribution with a mean of 69.2 inches and a standard deviation of 2.66 inches. Consequently, we'll need to use a normal CDF with these parameters to answer our question. Because we're working in inches, I'll enter 72 inches for 6 feet<br>
<br>
Imagine we're a clothing manufacturer and want to compare the prevalence of 6' tall men to women:<br>
<br>

<br>
<br>
Graphically the area under the Density Curve would represent the probability within that interval <br>
We find this area by computing the integral of the density curve over the interval from point a to b<br>
<br>

$$\int_{a}^{b} p(x)dx \\~\\ \text{the probability that x will be between a and b}$$
<br>


In [29]:


import numpy as np
from bokeh.models import Div, TeX
from bokeh.plotting import figure, show, output_notebook, curdoc, ColumnDataSource
from bokeh.models import TextInput, CustomJS, Div, TeX 
from bokeh.layouts import column, row
import datum 


output_notebook()

curdoc().theme ='dark_minimal'

fig = figure(width=670, height=400, toolbar_location=None,
           title="Cumulative Distribution Function\nThe Area Under The Density Curve\n\n")

N = 1000

mu = 500

sigma = 50

new_data = datum.Data(N = N, mu = mu,  sigma = sigma)

ex1_min, ex1_max, ex1_xbar, ex1_std, ex1_df = new_data.make_data()

values = [value for value in range(350, 650)]


# dist = norm(ex1_xbar, ex1_std)

dist = new_data.make_normal_pdf()

probabilities = [dist.pdf(value) for value in values]

y1 = np.zeros(len(values)) # think of these aas the floor of the varea 
  
source = ColumnDataSource(
    dict(x = values, y1 = y1, y2 = probabilities)
    )

source_sel = ColumnDataSource(
    dict(x = values, y1 = y1, y2 = probabilities)
    )

div = Div(text = "Cumulative Probabilty")

fig.line(
       x = 'x'
       ,y = 'y2'
       ,color = 'firebrick'
       ,line_width = 2
       ,legend_label = 'Probability Density Curve'
       ,source = source 
       )

fig.varea(
        x = "x"
        ,y1 = "y1"
        ,y2 = "y2"
        ,color = 'dodgerblue'
        ,alpha = 0.25
        ,legend_label = 'integral of the density curve '
        ,source = source_sel
    )

fig.xaxis.axis_label = "CDF(X)"

fig.legend.location = 'top_left'


start = TextInput(value = '350', title = "Start", width = 75)
end = TextInput(value = '650', title = 'End', width = 75)

callback = CustomJS(args=dict(source=source,
                              source_sel = source_sel, 
                              start = start,
                              end = end), code="""
    var data = source.data    
    var d1 = source_sel.data
    d1['x'] = [];
    d1['y1'] = [];
    d1['y2'] = [];
    
    const find_closest_idx = (val) => {
        return data.x.reduce(({delta, idx}, curr, curr_idx) => {
            const curr_delta = Math.abs(curr - val);
            return (delta === undefined || curr_delta < delta) ? {delta: curr_delta, idx: curr_idx} : {delta, idx};
        }, {}).idx;
    }
    
    const start_idx = find_closest_idx(start.value);        
    const end_idx = find_closest_idx(end.value);   
    
    var area = 0;
    var height = 0.1;
    var data1 = [];
    for (var i = start_idx; i <= (end_idx); i++) {    
        area += 0.5 * (data.y2[i] + data.y2[i-1]) * height;
        d1['x'].push(data.x[i])
        d1['y1'].push(data.y1[i])
        d1['y2'].push(data.y2[i])
        
    }
    console.log(area, d1)

    source.change.emit()
    source_sel.change.emit()
""")

start.js_on_change('value', callback)
end.js_on_change('value', callback)



show(row(fig, column(start, end)))


**ℹ️NOTE:**<br>
I set the max range of x for the area under the curve to 450 (referred to as point y) for the sake of this tutorial. <Br>
<br>


Notice how the cumulative probability is simply the probability form $ -\infty$ to 450<br>
This suggest that the CDF for the specify value y is equal to the integral of the density function over the interval from $- \infty$ to y. This gives us a way to obtain the CDF from te PDF. <br>
<br>

$$PDF \rightarrow CDF$$ 
<br>

$$\int_{_-\infty}^y p(y)d y = F(y)$$
<br>
<br>
However keep in mind <span style="color:mediumseagreen;font-size:102%">The opposite of integration is derivation</span><br>
<br>

$$PDF~\underleftarrow{Derivative}~CDF$$
<br>
So to obtain a PDF from a CDF we would have to find the first Derivative of the CDF<br>
<br>

$\text{The PDF of the sample space y equals the first derivative of the CDF with respect to y}$
<br>

$p(y) = F(y)\dfrac{d}{dy}$
<br>
<Br>

<b>Expected Value and Variance In Continuous Variables</b><br>
Often times when dealing with Continuous variables we are only given the Probability Density Function (PDF)<br>
To create a graph with this information we need to compute the Expected Value $E(y) = ?$ and Variance $Var(y) = ?$. <br>
<br>
Expected Value:<br>
&emsp;The probability of each individual element is 0. Therefore we cannot apply the summation formula we used for discrete<br>&emsp;outcomes. <br>

&emsp;$P(y) = 0 \Rightarrow \text{We can't apply the summation formula}$<br>
<br>
&emsp;<span style="color:mediumseagreen;font-size:102%">When dealing with an Continuous Variable the Expected Value is an Integral</span><br>
<br>
&emsp;$\text{The product of it's associated pd value over the integral from } - \infty~to~\infty $
<Br>

&emsp;$E(y) = \int^{\infty}_{\infty}~yp(y)d y$<Br>
<br>
<Br>
Variance<Br>
&emsp;You compute the Variance for a Continuous Distribution the same way you compute the it for Discrete Variables<br>
<br>
&emsp;The Variance is equal to the Expected Value of the squared variable minus the Expected Value of the variable<br>&emsp;squared.<br> 
<br>

&emsp;$Var(y) = E(y^2) - E(y)^2$
<p>&nbsp;</p>

**ℹ️NOTE:**
I set the max range of x for the area under the curve to 450 (referred to as point y) for the sake of this tutorial.

Notice how the cumulative probability is simply the probability from $-\infty$ to 450.
This suggests that the CDF for the specified value y is equal to the integral of the density function over the interval from $-\infty$ to y. This gives us a way to obtain the CDF from the PDF.

$$PDF \rightarrow CDF$$

$$\int_{-\infty}^y p(y)dy = F(y)$$

However keep in mind: <span style="color:mediumseagreen;font-size:102%">The opposite of integration is differentiation</span>

$$PDF~\underleftarrow{Derivative}~CDF$$

So to obtain a PDF from a CDF we would have to find the first derivative of the CDF

$\text{The PDF of the sample space y equals the first derivative of the CDF with respect to y}$

$p(y) = \frac{d}{dy}F(y)$

<b>Expected Value and Variance In Continuous Variables</b>
Often times when dealing with continuous variables we are only given the Probability Density Function (PDF).
To create a graph with this information we need to compute the Expected Value $E(y)$ and Variance $Var(y)$.

Expected Value:
 The probability of each individual element is 0. Therefore we cannot apply the summation formula we used for discrete  outcomes.

 $P(y) = 0 \Rightarrow \text{We can't apply the summation formula}$

 <span style="color:mediumseagreen;font-size:102%">When dealing with a Continuous Variable the Expected Value is an Integral</span>

 $\text{The product of it's associated pd value over the integral from } -\infty~\text{to}~\infty$

 $E(y) = \int_{-\infty}^{\infty}~yp(y)dy$

Variance:
 You compute the Variance for a Continuous Distribution the same way you compute it for Discrete Variables.

 The Variance is equal to the Expected Value of the squared variable minus the Expected Value of the variable  squared.

 $Var(y) = E(y^2) - [E(y)]^2$


In [32]:

# ADD NORMAL DISTRIBUTION HEAR
import numpy as np
from bokeh.models import Div, TeX
from bokeh.plotting import figure, show, output_notebook, curdoc, ColumnDataSource
from bokeh.models import TextInput, CustomJS, Div, TeX 
from bokeh.layouts import column, row
import datum 


output_notebook(hide_banner=True)

curdoc().theme ='dark_minimal'

fig = figure(width=670, height=400, toolbar_location=None,
           title="Cumulative Distribution Function\nThe Area Under The Density Curve\n\n")

N = 1000

mu = 500

sigma = 50

new_data = datum.Data(N = N, mu = mu,  sigma = sigma)

ex1_min, ex1_max, ex1_xbar, ex1_std, ex1_df = new_data.make_data()

values = [value for value in range(350, 650)]


# dist = norm(ex1_xbar, ex1_std)

dist = new_data.make_normal_pdf()

probabilities = [dist.pdf(value) for value in values]

y1 = np.zeros(len(values)) # think of these aas the floor of the varea 
  
source = ColumnDataSource(
    dict(x = values, y1 = y1, y2 = probabilities)
    )

source_sel = ColumnDataSource(
    dict(x = values, y1 = y1, y2 = probabilities)
    )

div = Div(text = "Cumulative Probabilty")

fig.line(
       x = 'x'
       ,y = 'y2'
       ,color = 'firebrick'
       ,line_width = 2
       ,legend_label = 'Probability Density Curve'
       ,source = source 
       )

fig.varea(
        x = "x"
        ,y1 = "y1"
        ,y2 = "y2"
        ,color = 'dodgerblue'
        ,alpha = 0.25
        ,legend_label = 'Area Under Density Curve '
        ,source = source_sel
    )

fig.xaxis.axis_label = "CDF(X)"

fig.legend.location = 'top_left'


start = TextInput(value = '350', title = "Start", width = 75)
end = TextInput(value = '650', title = 'End', width = 75)

callback = CustomJS(args=dict(source=source,
                              source_sel = source_sel, 
                              start = start,
                              end = end), code="""
    var data = source.data    
    var d1 = source_sel.data
    d1['x'] = [];
    d1['y1'] = [];
    d1['y2'] = [];
    
    const find_closest_idx = (val) => {
        return data.x.reduce(({delta, idx}, curr, curr_idx) => {
            const curr_delta = Math.abs(curr - val);
            return (delta === undefined || curr_delta < delta) ? {delta: curr_delta, idx: curr_idx} : {delta, idx};
        }, {}).idx;
    }
    
    const start_idx = find_closest_idx(start.value);        
    const end_idx = find_closest_idx(end.value);   
    
    var area = 0;
    var height = 0.1;
    var data1 = [];
    for (var i = start_idx; i <= (end_idx); i++) {    
        area += 0.5 * (data.y2[i] + data.y2[i-1]) * height;
        d1['x'].push(data.x[i])
        d1['y1'].push(data.y1[i])
        d1['y2'].push(data.y2[i])
        
    }
    console.log(area, d1)

    source.change.emit()
    source_sel.change.emit()
""")

start.js_on_change('value', callback)
end.js_on_change('value', callback)



show(row(fig, column(start, end)))


**ℹ️NOTE:**<br>
I set the max range of x for the area under the curve to 450 (referred to as point y) for the sake of this tutorial. <Br>
<br>


Notice how the cumulative probability is simply the probability form $ -\infty$ to 450<br>
This suggest that the CDF for the specify value y is equal to the integral of the density function over the interval from $- \infty$ to y. This gives us a way to obtain the CDF from te PDF. <br>
<br>

$$\text{PDF} \xrightarrow{\text{Integration}} \text{CDF}$$

<br>

$$\int_{_-\infty}^y p(y)d y = F(y)$$
<br>
<br>
However keep in mind <span style="color:mediumseagreen;font-size:102%">The opposite of integration is derivation</span><br>
<br>

$$PDF~\underleftarrow{Derivative}~CDF$$
<br>
So to obtain a PDF from a CDF we would have to find the first Derivative of the CDF<br>
<br>

$\text{The PDF of the sample space y equals the first derivative of the CDF with respect to y}$
<br>

$p(y) = F(y)\dfrac{d}{dy}$
<br>
<Br>

<b>Expected Value and Variance In Continuous Variables</b><br>
Often times when dealing with Continuous variables we are only given the Probability Density Function (PDF)<br>
To create a graph with this information we need to compute the Expected Value $E(y) = ?$ and Variance $Var(y) = ?$. <br>
<br>
Expected Value:<br>
&emsp;The probability of each individual element is 0. Therefore we cannot apply the summation formula we used for discrete<br>&emsp;outcomes. <br>

&emsp;$P(y) = 0 \Rightarrow \text{We can't apply the summation formula}$<br>
<br>
&emsp;<span style="color:mediumseagreen;font-size:102%">When dealing with an Continuous Variable the Expected Value is an Integral</span><br>
<br>
&emsp;$\text{The product of it's associated pd value over the integral from } - \infty~to~\infty $
<Br>

&emsp;$E(y) = \int^{\infty}_{\infty}~yp(y)d y$<Br>
<br>
<Br>
Variance<Br>
&emsp;You compute the Variance for a Continuous Distribution the same way you compute the it for Discrete Variables<br>
<br>
&emsp;The Variance is equal to the Expected Value of the squared variable minus the Expected Value of the variable<br>&emsp;squared.<br> 
<br>

&emsp;$Var(y) = E(y^2) - E(y)^2$
<p>&nbsp;</p>






#### Normal Distribution

A Normal Distribution is denoted as $N(\mu, \sigma^2)$ <Br>
<br>

&emsp;$X \sim N(\mu, \sigma^2) \Leftarrow \text{ Variable X follows a normal distribution with with mean and variance}$<br>

When dealing with the actual data of a normal distribution we would generally know the $\mu~and~\sigma^2$<br>
<br>
<b>Characteristics Of A Normal Distribution</b><br>
&emsp;- Bell Shape<br>
&emsp;- Symmetrical <br>
&emsp;- The majority of the data is centered around the mean<br>
&emsp;- Values further away from the mean are less likely to occur <br>
&emsp;- Data points in opposite directions from the mean (+ / -) are equally likely to occur since symmetrical<Br>

The outcomes of many events in nature closely resemble a normal distribution. For example based on multiple reports the mean weight of a polar bear is around 500 kg. However there have been recorders of weights below 350 kg and others above 650 kg. These would be considered as <b>outliers</b> and by definition do not feature frequently in a data set. And the larger the dataset the smaller % of outliers. <Br>
<br>
<b>Expected Value Of A Normal Distribution</b><br>

&emsp;The Expected Value of a Normal Distribution is it's mean: $E(X) = \mu$<br>
<br>
<b>
The Variance Of A Normal Distribution</b><br>

&emsp;The Variance of a Normal Distribution is define by the distribution: $Var(X) = \sigma^2$.<br>
However if it isn't defined by the Normal Distribution, we can derive it from the Expected Value<br>
<br>
&emsp;$\quad Var(X) = E(X^2) - E(X)^2$<br>
<br>
<b>Empirical Rule Of Normal Distributions</b><br>
The Empirical Rule states:<br>

1. 68% of all the data in a Normal Distribution falls within 1 $\sigma$ of the mean
2. 95% of all the data in a Normal Distribution falls within 2 $\sigma$ of the mean
3. 99.7% of all the data in a Normal Distribution falls within 3 $\sigma$ of the mean
<br>

[Top](#fundamentals)<br>
<br>


In [33]:


import numpy as np

from bokeh.layouts import column
from bokeh.models import Div, TeX
from bokeh.plotting import figure, show, output_notebook, curdoc, ColumnDataSource
from scipy.stats import norm
import datum 


output_notebook(hide_banner=True)

curdoc().theme ='dark_minimal'

fig = figure(width=670, height=400, toolbar_location=None,
           title="Continuous Normal Distribution\nPolar Bear Weights(kg)\n")

N = 1000

mu = 500

sigma = 50

new_data = datum .Data(N = N, mu = mu,  sigma = sigma)

ex1_min, ex1_max, ex1_xbar, ex1_std, ex1_df = new_data.make_data()

values = [value for value in range(350, 650)]


# dist = norm(ex1_xbar, ex1_std)
dist = new_data.make_normal_pdf()


probabilities = [dist.pdf(value) for value in values]

y1 = np.zeros(len(values))

source = ColumnDataSource(
    dict(x = values, y1 = y1, y2 = probabilities)
    )

source_sel = ColumnDataSource(
    dict(x = values, y1 = y1, y2 = probabilities)
    )


fig.line(
       x = 'x'
       ,y = 'y2'
       ,color = 'firebrick'
       ,line_width = 2
       ,legend_label = 'weights'
       ,source = source 
       )

fig.varea(
        x = "x"
        ,y1 = "y1"
        ,y2 = "y2"
        ,color = 'dodgerblue'
        ,alpha = 0.25
        ,legend_label = 'Probabilities'
        ,source = source_sel
    )

fig.xaxis.axis_label = "Weight in Kg"


show(fig)

### Standardizing 

Before discussing Standardizing we need to define Transformation<br>
<br>
<b>Transformation</b>
Transformation is a way to alter every element of a distribution resulting in a new distribution with similar characteristics <br>
<br>
<span style = "color:mediumseagreen;font-size:101%">
For a Normal Distribution we can apply addition, subtraction, multiplication, and division without changing the type of distribution. Meaning applying a constant such as the four mentioned, will result in a new Normal Distribution. 
</span><br>
<br>

$$X \sim N(\mu_1, \sigma^2_1) \Rightarrow X + 3 \sim N(\mu_2, \sigma^2_2)$$

<b>Adding and Subtracting From X</b><br>
The graph in row 1 below is a Standard Normal Distribution<br>

* The first graph in row 2 has 50 added to each value in X. Therefore it is shifted to the right
 * $\mu = 550$
* The second graph in row 2 has 50 subtracted from each value in X. Therefore it is shifted to the left by 50
 * $\mu = 450$


In [34]:

import numpy as np

from bokeh.layouts import gridplot
from bokeh.models import Div, TeX, Range1d
from bokeh.plotting import figure, show, output_notebook, curdoc, ColumnDataSource
from scipy.stats import norm
import datum 


output_notebook(hide_banner=True)

curdoc().theme ='dark_minimal'

fig = figure(width=400, height=400, toolbar_location=None,
           title="Transformation\n")

fig_plus = figure(width=400, height=400, toolbar_location=None,
           title="y = f(x + 50)\n")

fig_minus = figure(width=400, height=400, toolbar_location=None,
           title="y = f(x - 50)\n")

N = 1000

mu = 500
mu_plus = 550
mu_minus = 450

sigma = 50

new_data = datum.Data(N = N, mu = mu,  sigma = sigma)
new_data_plus = datum.Data(N = N, mu = mu_plus, sigma = sigma)
new_data_minus = datum.Data(N = N, mu = mu_minus, sigma = sigma)


ex1_min, ex1_max, ex1_xbar, ex1_std, ex1_df = new_data.make_data()

# x axis values
values = [value for value in range(350, 650)]
values_plus = [value for value in range(400, 700)]
values_minus = [value for value in range(300, 600)]


# dist = norm(ex1_xbar, ex1_std)
dist = new_data.make_normal_pdf()
dist_plus = new_data_plus.make_normal_pdf()
dist_minus = new_data_minus.make_normal_pdf()


# y axis values 
probabilities = [dist.pdf(value) for value in values]
probabilities_plus = [dist_plus.pdf(value) for value in values_plus]
probabilities_minus = [dist_minus.pdf(value) for value in values_minus]


y1 = np.zeros(len(values))
y1_plus = np.zeros(len(values_plus))
y1_minus = np.zeros(len(values_minus))

source = ColumnDataSource(
    dict(x = values, y1 = y1, y2 = probabilities)
    )

source_sel = ColumnDataSource(
    dict(x = values, y1 = y1, y2 = probabilities)
    )

source_plus = ColumnDataSource(
    dict(x = values_plus, y1 = y1_plus, y2 = probabilities_plus)
    )

source_sel_plus = ColumnDataSource(
    dict(x = values_plus, y1 = y1_plus, y2 = probabilities_plus)
    )


source_minus = ColumnDataSource(
    dict(x = values_minus, y1 = y1_minus, y2 = probabilities_minus)
    )

source_sel_minus = ColumnDataSource(
    dict(x = values_minus, y1 = y1_minus, y2 = probabilities_minus)
    )




fig.line(
       x = 'x'
       ,y = 'y2'
       ,color = 'firebrick'
       ,line_width = 2
       ,source = source 
       )

fig.varea(
        x = "x"
        ,y1 = "y1"
        ,y2 = "y2"
        ,color = 'dodgerblue'
        ,alpha = 0.25
        ,source = source_sel
    )

fig_plus.line(
       x = 'x'
       ,y = 'y2'
       ,color = 'firebrick'
       ,line_width = 2
       ,source = source_plus 
       )

fig_plus.varea(
        x = "x"
        ,y1 = "y1"
        ,y2 = "y2"
        ,color = 'dodgerblue'
        ,alpha = 0.25
        ,source = source_sel_plus
    )

fig_minus.line(
       x = 'x'
       ,y = 'y2'
       ,color = 'firebrick'
       ,line_width = 2
       ,source = source_minus
       )

fig_minus.varea(
        x = "x"
        ,y1 = "y1"
        ,y2 = "y2"
        ,color = 'dodgerblue'
        ,alpha = 0.25
        ,source = source_sel_minus
    )

fig.x_range = Range1d(350, 650)
fig_plus.x_range = Range1d(400, 700)
fig_minus.x_range = Range1d(300, 600)

p = gridplot([[fig, None],[fig_plus, fig_minus]])

show(p)

<b>Multiplying and Dividing X</b><br>
<Br>

* When you multiply a distribution by a constant k:
 * The mean (μ) is multiplied by k
 * The standard deviation (σ) is multiplied by |k|

* When you divide a distribution by a constant k:
 * The mean (μ) is divided by k 
 * The standard deviation (σ) is divided by |k|
See below:

In [35]:


import numpy  as np 
import datum 
from IPython.display import display, Math 

N = 1000
mu = 500
sigma = 50
low = 0
upp = N

new_data = datum.Data(N = N, mu = mu, sigma = sigma)

cnt, min_val, max_val, x_bar, std, df_x = new_data.make_normal_dist(low = low, upp = upp)

# Use numpy arrays for vectorized operations instead of list comprehensions
multi_df = np.array(df_x) * 2
div_df = np.array(df_x) / 2

multi_min = np.min(multi_df)
multi_max = np.max(multi_df)
multi_mu = round(np.mean(multi_df), 3)
multi_sigma = round(np.std(multi_df), 3)
multi_cnt = len(df_x)


div_min = np.min(div_df)
div_max = np.max(div_df)
div_mu = round(np.mean(div_df), 3)
div_sigma = round(np.std(div_df), 3)
div_cnt = len(df_x)

# Use f-strings for better readability
base = '\\text{Base Data:}\\\\' + 'count:~%s\\\\' + '\\mu: %s\\\\' + '\\sigma: %s'
multi = '\\text{Base Data times 2:}\\\\' + 'count:~%s\\\\' + '\\mu: %s\\\\' + '\\sigma: %s'
div = '\\text{Base Data divided by 2:}\\\\' + 'count:~%s\\\\' + '\\mu: %s\\\\' + '\\sigma: %s'

display(Math(base%(cnt, x_bar, std)))
display(Math(multi%(multi_cnt, multi_mu, multi_sigma)))
display(Math(div%(div_cnt, div_mu, div_sigma)))



<IPython.core.display.Math object>

<IPython.core.display.Math object>

<IPython.core.display.Math object>

<b>Standardizing</b><br>
Standardizing is a special kind of Transformation in which we make the Expected Value equal zero and the variance equal to 1<br>
<br>

$$E(X) = 0,~Var(X) = 1 $$
$~$
$$ X \sim N(E(X), \sigma^2) = N(0, 1)$$
$~$
$\text{X follows a Normal Distribution where the Expected Value = 0, and Variance = 1}$
<br>
<br>
A Normal Distribution that has $E(X) = 0~and~Var(X) = 1$ is called a <b>Standard Normal Distribution </b><br>
<br>
<b>Characteristics Of A Standard Normal Distribution</b><br>
<br>

* Expected Value = 0
* Variance = 1
* Empirical Rule (68, 95, 99.7 rule)
* Z-Table (A table that lists the CDF values for a Standard Normal Distribution)

<b>Standard Normal Distribution Notation</b><br>
If we represent a Standard Normal Distribution with <b>Z</b> and any normal distributed variable with <b>Y</b> we can express the transformation of a Normal Distribution to Standard Normal Distribution as:<br>

$$
Z = \dfrac{Y - \mu}{\sigma} \Rightarrow Y \rightarrow Z
$$

<br>
<b>Transform A Normal Distribution Into A Standard Normal Distribution</b><br>

1. Transform the mean to a value of 0
  1. If the mean is positive, subtract the mean from each value in the distribution 
  2. If the mean is negative, add the mean from each value in the distribution 
2. Transform the standard deviation to a value of 1
  1. Subtract the mean from every element, then divide the difference by the standard deviation
<br>


In [21]:
import datum 
from IPython.display import display, Math

''' 
data.standard_normal() is the function I wrote to convert a normal distribution 
to a standard normal distribution
'''

N = 500
mu = 200
sigma = 15

new_data = datum.Data(N=N, mu=mu, sigma=sigma)

cnt_X, min_val_X, max_val_X, mu_X, sigma_X, tst_X = new_data.make_data(N = N, mu = mu, sigma = sigma)

tst_y = new_data.convert_to_std_norm(tst_X)

cnt_Y, mu_Y, sigma_Y, min_val_Y, max_val_Y = new_data.get_central_tendency(list(tst_y), std_out="N")

norm_dist = '\\text{Normal Distribution:}\\\\' + 'count:~%s\\\\' + '\\mu: %s\\\\' + '\\sigma: %s'
stand_norm_dist = '\\text{Standard Normal Distribution:}\\\\' + 'count:~%s\\\\' + '\\mu: %s\\\\' + '\\sigma: %s'

display(Math(norm_dist%(cnt_X, mu_X, sigma_X)))
display(Math(stand_norm_dist%(cnt_Y, mu_Y, sigma_Y)))



<IPython.core.display.Math object>

<IPython.core.display.Math object>

<b>

Every element in the none standardized normal distribution $Y_i$ is represented in the transformed standard normal distribution by the number of standard deviations $(\sigma)$ it is away from the mean.</b> <br>
<br>


Example:<br>
If a value y is 2.3 standard deviations from the mean in a normal distribution it's transformed value in the standard normal distribution would be 2.3.<br>
<br>

$$y = \mu + 2.3\sigma \Rightarrow z = 2.3$$
<br>
<br>
<b>NOTE:</b><BR>
Its no coincidence the standard normal distribution value is represented by the notation <b>z</b> which aligns with the z-table<br>
<br>
<span style = "color:orangered;font-size:100%">
Normal Distributions are pretty straight forward, but there can be pitfalls. One key thing to remember is they require a lot of data. If the sample size is less than 30, you run the risk of outliers drastically effecting you analysis. 
</span><br>
<br>
That is where the Student's T Distribution comes in
<br>
<br>
