# Probability Theory

Probability theory provides the foundation for reasoning about uncertainty. In data science, probability is used to make predictions, model uncertainty, and quantify randomness in datasets or processes.

In this notebook:
1. **Probability Distributions:** Uniform, Normal (Gaussian), Binomial, Poisson, Exponential, etc.
2. **Conditional Probability:** Bayes’ Theorem, Independence, and Conditional Independence
3. **Law of Large Numbers and Central Limit Theorem (CLT)**
4. **Combinatorics:** Permutations and Combinations

*Requirements*

`pip install numpy, plotly, nbformat`

## Probability Theory Example

In [8]:
import numpy as np

# To better illustrate the probability theory we will simulate a fair coin toss 1000 times.
np.random.seed(999)
coin_flips = np.random.choice(['H', 'T'], size=1000)
probability_heads = np.mean(coin_flips == 'H')
probability_tails = np.mean(coin_flips == 'T')

# The result is the probability of the coin to land either on Heads or Tails
print(f'Probability of Heads: {probability_heads}')
print(f'Probability of Tails: {probability_tails}')

Probability of Heads: 0.495
Probability of Tails: 0.505


## Probability Distributions

### 1. Uniform Distributions

Every outcome in a given range is equally likely. It is used in simulations and random sampling.

In [3]:
import plotly.graph_objects as go

# Simulate rolling a fair six-sided die 10,000 times
die_rolls = np.random.randint(1, 7, size=10000)
unique, counts = np.unique(die_rolls, return_counts=True)

fig = go.Figure(data=[go.Bar(x=unique, y=counts, text=counts, textposition='auto')])
fig.update_layout(
    title='Uniform Distribution: Rolling a Die',
    xaxis_title='Die Face',
    yaxis_title='Frequency'
)
fig.show()


### 2. Normal (Gaussian) Distribution

The most common distribution in statistics, where data tends to cluster around a mean.
Used for many natural processes and error modeling.

In [13]:
mu, sigma = 0, 1
normal_data = np.random.normal(mu, sigma, 10000)

fig = go.Figure(go.Histogram(
    x=normal_data,
    nbinsx=80,
    marker=dict(color='blue'),
    opacity=0.7
))
fig.update_layout(
    title='Normal Distribution Example',
    xaxis_title='Value',
    yaxis_title='Frequency'
)
fig.show()

### 3.Binomial Distribution

Models the number of successes in `n` independent trials, where each trial has the same probability `p` of success.

- $ P_{x} = {n \choose x} p^{x} q^{n-x} $

In [74]:
'''
For this example we will simulate:
    - Geting a value from 0 to 10 (n)
    - With a probablity of success of 50% (p) <- Change this value and look at the data moving :)
    - In a total of 100 Experiments
    - The results is how many times
'''

n, p = 10, 0.5
binom_data = np.random.binomial(n, p, 100)

fig = go.Figure(go.Histogram(
    x=binom_data,
    nbinsx=20,
    marker=dict(color='green'),
    opacity=0.7
))
fig.update_layout(
    title='Binomial Distribution Example',
    xaxis_title='Value',
    yaxis_title='Frequency'
)
fig.show()

### 3. Poisson Distribution

Describes the number of events happening in a fixed interval of time or space, given a known average rate.

In [79]:
poisson_data = np.random.poisson(15, 10000) # Lambda = 3

fig = go.Figure(go.Histogram(
    x=poisson_data,
    nbinsx=20,
    marker=dict(color='purple'),
    opacity=0.7
))
fig.update_layout(
    title='Poisson Distribution Example',
    xaxis_title='Value',
    yaxis_title='Frequency'
)
fig.show()

### 4. Exponential Distribution

Used to model the time between events in a Poisson process. 
Often used in survival analysis and queuing theory.

In [94]:
exp_data = np.random.exponential(scale=1, size=10000)

fig = go.Figure(go.Histogram(
    x=exp_data,
    nbinsx=60,
    marker=dict(color='red'),
    opacity=0.7
))
fig.update_layout(
    title='Exponential Distribution',
    xaxis_title='Value',
    yaxis_title='Frequency'
)
fig.show()

## Conditional Probability

Conditional probability is the likeligood of an event occurring, given that another event has occurred.

### Bayes Theorem

Bayes' theorem relates the conditional and marginal probabilities of events. It's fundamental for classification tasks in data science, especially in **Bayesian inference** and **Naive Bayes classifiers**.

$ P(A\mid B)=\frac {P(B\mid A) \cdot P(A)}{P(B)} $
 
- $ A, B $ = events
- $ P(A|B) $ = probability of A given B is true
- $ P(B|A) $ = probability of B given A is true
- $ P(A), P(B) $ = the independent probabilities of A and B




### Example of Bayes' Theorem
Let’s say we have data about emails, and we want to classify whether an email is spam based on the presence of certain words.

$ P(Spam | Word) = \frac {P(Word | Spam) \cdot P(Spam)}{P(Word)} $

In [95]:
# Given
P_spam = 0.2   # Prior probability of spam
P_word_given_spam = 0.8   # Likelihood of the word given spam
P_word = 0.3   # Probability of the word appearing in any email

# Bayes' Theorem to find P(spam | word)
P_spam_given_word = (P_word_given_spam * P_spam) / P_word
P_spam_given_word


0.5333333333333334

#### Independence and Conditional Independence:
- **Independence**: Two events are independent if the occurrence of one does not affect the probability of the other.
- **Conditional Independence**: Two events are conditionally independent given a third event if, when the third event is known, the occurrence of one event does not affect the probability of the other.

## Law of Large Numbers

As the number of trials increases, the sample mean converges to the population mean

In [111]:
import numpy as np
import plotly.graph_objs as go
import plotly.io as pio

# 0 for Tails, 1 for Heads
coin_flips = np.random.choice([0, 1], size=10000)
cumulative_mean = np.cumsum(coin_flips) / np.arange(1, 10001)

fig = go.Figure()
fig.add_trace(go.Scatter(x=np.arange(1, 10001), 
                         y=cumulative_mean, 
                         mode='lines', 
                         name='Cumulative Mean'))

fig.add_shape(type='line',
              x0=0, x1=10000, y0=0.5, y1=0.5,
              line=dict(color='red', dash='dash'),
              name="Expected Probability")

fig.update_layout(title="Law of Large Numbers Example",
                  xaxis_title="Number of Flips",
                  yaxis_title="Cumulative Mean of Heads",
                  showlegend=False)

pio.show(fig)

## Central Limit Theorem (CLT)

No matter the distribution ot the population, the distribution of the sample mean will approximate a normal distribution if the sample size is large enough.

In [115]:
sample_means = [
    np.mean(np.random.exponential(scale=1, size=10000)) 
    for _ in range(10000)
]
fig = go.Figure(go.Histogram(
        x=sample_means, 
        nbinsx=50,
        marker=dict(color='green'),
        opacity=0.7
        ))
fig.update_layout(
    title='Cetral Limit Theorem Example',
    xaxis_title='Value',
    yaxis_title='Frequency'
)
fig.show()

## Combinatorics: Permutations and Combinations

### 1. Permutations:

The number of ways to arrange `n` objects where order matters.

$ _{n} P_{r}=\frac{n !}{(n-r) !}​ $

- $ _{n} P_{r} $ = permutation
- $ n $ =	total number of objects
- $ r $ =	number of objects selected


In [125]:
import math

n = 5 # Total Objects
r = 3 # Number of objects to arrange

permutations = math.perm(n, r)
permutations

60

### 2.Combinations:

The number of ways to select `r` objects from `n` objects where order does **not** matter

$ _n C_r=\frac{n !}{r ! (n-r) !} $

- $ _n C_r $ = number of combinations
- $ n $	= total number of objects in the set
- $ r $	= number of choosing objects from the set


In [127]:
n = 5 # Total Objects
r = 3 # Number of objects to arrange

combinations = math.comb(n, r)
combinations

10

## Summary and Examples

| **Concept**                    | **When to Use**                                                | **Real-World Example**                                |
|---------------------------------|---------------------------------------------------------------|-------------------------------------------------------|
| **Probability Theory**          | Quantifying uncertainty and randomness                        | Predicting if an email is spam                        |
| **Uniform Distribution**        | All outcomes have equal likelihood                            | Lottery draw                                          |
| **Normal Distribution**         | Data clusters around a mean with symmetric tails              | People's height in a population                       |
| **Binomial Distribution**       | Counting successes in a fixed number of independent trials     | Click-through rate (CTR) of an ad                     |
| **Poisson Distribution**        | Counting events in a fixed interval, random timing            | Number of customers arriving at a store per hour      |
| **Exponential Distribution**    | Modeling the time between random events                       | Time between server failures                          |
| **Conditional Probability**     | Updating probability based on new evidence                    | Medical diagnosis                                     |
| **Law of Large Numbers**        | Sample mean converges to population mean with large data      | Insurance companies setting premiums                  |
| **Central Limit Theorem**       | Distribution of sample means approaches normal distribution    | A/B testing for online ads                            |
| **Permutations**                | Order matters in arrangement                                  | Generating passwords                                  |
| **Combinations**                | Order does not matter in selection                            | Choosing a team from a group of employees             |
