# Probabilistic Modeling

<hr>

## Probability distributions

More often than not, people tend to build models out of many possible factors but sometimes simple probability-distribution analysis can serve a quick, easy way to answer the primary question that does not depend much on other factors.

- **Bernoulli / Binomial**
    - Probability of getting $x$ successes out of $n$ independent identically distributed trials
    - Example: Charity asks for donations from 10% of their mailing list each month
    - For each person:
        - $\text{P(sends donation)} = p$
        - $\text{P(does not send donation)} = 1 - p$
    - Assumptions
        - Donations are the same size
        - $p$ does not change month to month, i.e. $\text{iid}$
    - The # of donations each month is then binomially distributed
    - With large $n$ then distribution converges to Gaussian

    
- **Geometric**
    - Example questions:
        - How many interviews until first job offer?
        - How many hits until a baseball bat breaks?
        - How many good manufactured units before a defective one?
    - i.e. *How many Bernoulli trials until...*
    - Probability of having $x$ Bernoulli(p) failures until first success?
    - Assumes each trial is $iid$
    - Can compare data to Geometric to test whether $iid$ assumption is true
        - Example: Hits until a baseball bat breaks
            - If hits are $iid$ then Bernoulli trials
            - Else, not $iid$
            
- **Poisson**
    - Good at modeling random arrivals
    - Probability mass function: $f_x (x) = \frac{\lambda^x e^{-\lambda}}{x!}$
    - $\lambda$ - average number of arrivals within a time period
    - Assumes arrivals are independent and identically distributed
    
    
- **Exponential**
    - If arrivals are Poisson($\lambda$) then the time between successive arrivals is exponential($\lambda$) distributed
    - Probability mass function: $f_x (x) = \lambda e^{-\lambda x}$
    
    
- **Weibull**
    - Useful for modeling the amount of time required before something fails (inter-failure times)
    - Probability mass function: $f_x (x) = \frac{k}{\lambda} (\frac{x}{\lambda})^{k-1} e^{-(\frac{x}{\lambda})^k}$
    - Scale parameter ($\lambda$) and shape parameter ($k$)
    - Example:
        - How many lightswitch flips on/off until bulb fails? (Geometric)
        - Leave the bulb on; How long until bulb fails? (Weibull)
    - Using $k$
        - $k < 1$
            - Modeling when failure rate decreases with time
            - i.e. Worst things fail first and failure rate decreases after
        - $k > 1$
            - Modeling when failure rate increase with time
            - i.e. Things wear out over time, failure rate increases
        - $k = 1$
            - Modeling when failure rate is constant over time
            - When $k = 1$ it reduces to the exponential distribution analogously by replacing $\lambda$ with $\frac{1}{\lambda}$
            

**Which distribution above is best for data available?**

- Use software/code to fit data with varying distribution and parameters

    - [Finding the best distribution that fits your data using Python's Fitter library](https://medium.com/the-researchers-guide/finding-the-best-distribution-that-fits-your-data-using-pythons-fitter-library-319a5a0972e9)
    - [Fitting empirical data to theoretical distributions with SciPy](https://stackoverflow.com/questions/6620471/fitting-empirical-distribution-to-theoretical-ones-with-scipy-python)


- Use a quantile-quantile (Q-Q) plot to check for goodness of fit between empirical data and theoretical distribution
    - Is the current distribution the same as the distribution a year ago?
    - Is the chosen distribution a good fit for the empirical data?

****

## Using probability distributions to model behaviour

A classic call-center example would be as follows:

- Calls arrive to queue based on a probability distributio nof inter-arrival times
- Queue of # of calls to be handled
- Some number $C$ of employees answering calls
- Calls finish based on probability distribution of talking time

<img alt="Queue example" src="assets/queue_example.png" width="400">

The arrivals can be modeled as a Poisson process with rate parameter $\lambda$ while call time can be modeled as an exponential process with rate parameter $\mu$

The transition equations will be as follows:

- $\text{P(next event is an arrival)} = \frac{\lambda}{\lambda + \mu}$
- $\text{P(next event is finished call)} = \frac{\mu}{\lambda + \mu}$
- Expected fraction of time employee is busy = $\frac{\lambda}{\mu}$
- Expected customer waiting time before talking to employee = $\frac{\lambda}{\mu(\mu-\lambda)}$
- Expected number of calls waiting in queue = $\frac{\lambda^2}{\mu(\mu - \lambda)}$

This relates closely to [queueing theory](https://en.wikipedia.org/wiki/Queueing_theory) and the potential model parameters are represented by the [Kendall's notation](https://en.wikipedia.org/wiki/Kendall%27s_notation):

- General arrival distribution, $A$
- General service distribution, $S$
- Number of servers, $c$
- Size of the queue, $K$
- Population size, $N$
- Queue discipline, $D$


****

## Stochastic simulations

- Continuous-time simulations
    - Changes happen continuously
    - Example: Chemical processes, propagation of diseases
    - Often modeled with differential equations


- Discrete-event simulations
    - Changes happen at discrete time points
    - Example: call-center simulations
        - Someone calls or worker finishes talking to someone
    - Valuable when systems have high variability
    
    
We can then use the simulations to answer what-if questions:

- Simulated change in throughput with investment in faster machines
- Value of hiring an extra worker
- Where to station baggage tugs at an airport

We can compare simulated options to determine the best course of action. Simulations can be powerful but can lead to incorrect answers when missing or incorrect information is not considered. For example, a call center simulation assumes that workers answers calls equally quickly and the reality is that there is lots of variability between workers.

<hr>

# Basic code
A `minimal, reproducible example`