### 1. The belief-update story (before code)
* Prior: how common the disease is before testing
* Likelihood: how reliable the test is
* Evidence: what result you observed
* Posterior: updated belief after seeing the result

Bayes answers one question:

> “Given what I just saw, what should I believe now?”

### 2. Simple disease test example (single update)

Assume:
* Disease prevalence = 1%
* Test sensitivity = 95% (positive if disease exists)
* Test specificity = 90% (negative if disease does not exist)

In [3]:
# prior
P_disease = 0.1
P_no_disease = 1 -  P_disease

# test accuracy
P_positive_given_disease = 0.95
P_positive_given_no_disease = 0.05

# total probability of positive test
P_positive =(
    P_positive_given_disease*P_disease + 
    P_positive_given_no_disease*P_no_disease
)

# Bayes update
P_disease_given_positive =(
    P_positive_given_disease*P_disease
    )/P_positive

print("P(Disease|Positive test):",P_disease_given_positive)

P(Disease|Positive test): 0.6785714285714285


##### Key takeaway
* Even with a good test, a rare disease is still unlikely after one positive result.
* This surprises many people and is why Bayes is essential.

3. Visual intuition using counts (same logic, concrete)

> Imagine testing 10,000 people.

In [9]:
population = 10_000
diseased = int(population*P_disease)
healthy = population -diseased

true_positives = int(diseased* P_positive_given_disease)
false_positives = int(healthy*P_positive_given_no_disease)

print("True positives:",true_positives)
print("false positives:",false_positives)

posterior = true_positives/(true_positives+ false_positives)
print("Posterior probability:",posterior)

True positives: 950
false positives: 450
Posterior probability: 0.6785714285714286


#### Why this helps
* You see where false positives dominate
* Bayes is just counting in disguise

### 4. Updating belief twice (sequential Bayes)

What if the test is repeated independently?

In [12]:
# second positive test
prior = P_disease_given_positive

P_positive_again = (
    P_positive_given_disease*prior +
    P_positive_given_no_disease*(1-prior)
)

posterior_after_two = (
    P_positive_given_disease*prior
)/P_positive_again

print("After two positive tests:",posterior_after_two)

After two positive tests: 0.9756756756756756


#### ML intuition
* Bayesian models update beliefs incrementally
* Evidence compounds over time

### 5. Connecting to Naive Bayes (ML view)
In Naive Bayes:
* Prior → class frequency
* Likelihood → feature probabilities
* Posterior → predicted class probability

Toy spam example

In [14]:
# prior
P_spam = 0.3
P_not_spam = 0.7

# likelihoods
P_free_given_spam = 0.8
P_free_given_not_spam = 0.1

# evidence 
P_free = (
    P_free_given_spam*P_spam+
    P_free_given_not_spam* P_not_spam
)

P_spam_given_free = (
    P_free_given_spam*P_spam
)/ P_free

print("P(spam|'free'):",P_spam_given_free)

P(spam|'free'): 0.7741935483870968


This is a Naive Bayes classifier in one feature.

### 6. Advanced example: multiple features (Naive assumption)

Assume features are conditionally independent.

In [15]:
#  more evidence 
P_offer_given_spam = 0.7
P_offer_given_not_spam = 0.05

# unnormalized posteriors
spam_score  = (
    P_spam*
    P_free_given_spam*
    P_offer_given_spam
)

not_spam_score = (
    P_not_spam*
    P_free_given_not_spam*
    P_offer_given_not_spam
)

# normalize
total = spam_score+not_spam_score
P_spam = spam_score/total

print("P(Spam|'free' and 'offer'):",P_spam)

P(Spam|'free' and 'offer'): 0.9795918367346939


#### Why this works
* Independence is often false
* But decision boundaries still work well

### 7. Monte Carlo check (learning probabilities)

In [16]:
import numpy as np
N = 100_000
disease = np.random.rand(N)<0.01

test_positive = np.where(
    disease,
    np.random.rand(N)<0.95,
    np.random.rand(N)>0.10
)

posterior_estimate = disease[test_positive].mean()
print("Estimated P(Disease|Positive):",posterior_estimate)

Estimated P(Disease|Positive): 0.010277549387791383


This mirrors how ML learns probabilities from data.

### 8. Why Bayes matters in ML
* Handles uncertainty explicitly
* Explains model confidence
* Separates prior belief from evidence
* Foundation of probabilistic ML

Many modern models are Bayesian at heart, even when not stated explicitly.

#### Mental model to remember
* Prior → belief before seeing data
* Likelihood → how data behaves under belief
* Evidence → how common the observation is
* Posterior → belief after seeing data

Bayes is not a formula.
It is rational belief updating.

# Bayes’ rule intuition with Iris data samples

### 1. Set the scene in plain terms

In the Iris dataset:
* Hypothesis (A) → the flower species
(setosa, versicolor, virginica)
* Evidence (B) → observed features
(petal length, petal width, etc.)

Bayes answers:
> “Given what I measured, how likely is each species?”

### 2. Load Iris data

In [20]:
import pandas as pd
from sklearn.datasets import load_iris

iris = load_iris()

df = pd.DataFrame(iris.data,columns = iris.feature_names)
df["species"] = iris.target
df["species"] = df["species"].map(dict(enumerate(iris.target_names)))

Each row is one observed flower.

### 3. Prior probabilities (before seeing any evidence)

This is P(species).

In [24]:
priors = df["species"].value_counts(normalize=True)
priors

species
setosa        0.333333
versicolor    0.333333
virginica     0.333333
Name: proportion, dtype: float64

#### Interpretation
* All species are equally common
* Prior belief is uniform
* No bias before observing features

### 4. Turn continuous features into events (important step)

Bayes works with events.
We convert measurements into interpretable evidence.

Example:
Event → petal length is small

In [25]:
df["petal_length_small"] = df["petal length (cm)"]<2.5

### 5. Likelihood: P(evidence | species)

This answers:

> “If the flower is setosa, how likely is a small petal?”

In [27]:
likelihood = (
    df.groupby("species")["petal_length_small"].mean()
)

print(likelihood)

species
setosa        1.0
versicolor    0.0
virginica     0.0
Name: petal_length_small, dtype: float64


##### What this means
* Setosa almost always has small petals
* Other species almost never do

This is P(B | A).

### 6. Evidence probability: P(evidence)

In [28]:
P_evidence = df["petal_length_small"].mean()
print("P(small petal):",P_evidence)

P(small petal): 0.3333333333333333


This is how common the observation is overall.

### 7. Bayes update: posterior belief

We compute P(species | small petal).

In [29]:
posterior = (likelihood*priors)/P_evidence
print(posterior)

species
setosa        1.0
versicolor    0.0
virginica     0.0
dtype: float64


This is how common the observation is overall.

### 7. Bayes update: posterior belief

We compute P(species | small petal).

In [31]:
ct = pd.crosstab(df["species"],df["petal_length_small"])
print(ct)

petal_length_small  False  True 
species                         
setosa                  0     50
versicolor             50      0
virginica              50      0


##### We are literally counting worlds:
* Rows = hypotheses
* Columns = evidence
* Bayes is normalized counting

### 9. Multiple evidence (Naive Bayes intuition)

Add another observation: petal width.

In [33]:
df["petal_width_small"] = df["petal width (cm)"]<0.8
# Compute likelihoods:
likelihood_pl =(
    df.groupby("species")["petal_length_small"].mean()
)

likelihood_pw =(
    df.groupby("species")["petal_width_small"].mean()
)
# Naive Bayes assumes conditional independence:

unnormalized = (
    priors*
    likelihood_pl*
    likelihood_pw
)

posterior_multi = unnormalized/unnormalized.sum()
print(posterior_multi)

species
setosa        1.0
versicolor    0.0
virginica     0.0
dtype: float64


#### Key insight

Even with a crude independence assumption

Posterior becomes extremely confident

### 10. One-sample classification (real ML thinking)

Suppose we observe a flower:

In [34]:
sample = {
    "petal_length_small": True,
    "petal_width_small": True
}
# Posterior belief:

scores = priors.copy()

scores *= likelihood_pl if sample["petal_length_small"] else (1 - likelihood_pl)
scores *= likelihood_pw if sample["petal_width_small"] else (1 - likelihood_pw)

posterior = scores / scores.sum()
print(posterior)


species
setosa        1.0
versicolor    0.0
virginica     0.0
Name: proportion, dtype: float64


This is exactly how Naive Bayes predicts.

### 11. Why this works so well on Iris

Features are highly informative

Conditional probabilities are sharp

Classes separate cleanly

That is why Iris is a textbook dataset.

### 12. How this connects to ML models

Naive Bayes explicitly applies Bayes rule

Logistic regression models the posterior directly

Neural networks approximate complex posteriors

Decision trees split to maximize conditional purity

All are different ways of answering:

>“Given this evidence, what should I believe?”

#### Final intuition to lock in

Prior → what you believe before data

Likelihood → how evidence behaves under belief

Posterior → belief after seeing data

> Bayes is not about numbers.
It is about rational belief revision.