### 1. Core idea in one line

> P(A | B) means: “Once I know B happened, how likely is A?”

In ML, this is how evidence updates belief.

### 2. Joint, marginal, and conditional (small table example)
##### Example: Medical test
|Disease|Test Positive|Count|
|----|----|----|
|Yes|Yes|40|
|Yes|No|10|
|No|Yes|20|
|No|No|130|

In [7]:
import pandas as pd
data = {
    "Disease": ["Yes","Yes","No","No"],
    "Test":["Positive","Negative","Positive","Negative"],
    "Count":[40,10,20,130]
}

df = pd.DataFrame(data)
df

Unnamed: 0,Disease,Test,Count
0,Yes,Positive,40
1,Yes,Negative,10
2,No,Positive,20
3,No,Negative,130


### 3. Joint probability

Joint probability = probability that both events happen together

In [16]:
total = df["Count"].sum()

P_disease_and_positive = df.loc[
    (df["Disease"] == "Yes") & (df["Test"] == "Positive"),
    "Count"].values[0] / total

print("P(Disease AND Positive):", P_disease_and_positive)


P(Disease AND Positive): 0.2


### 4. Marginal probability

Marginal = ignore the other variable.

In [29]:
P_positve  = df[df["Test"]=="Positive"]["Count"].sum()/total
P_disease = df[df["Disease"]=="Yes"]["Count"].sum()/total

print("P(Positive):",P_positve)
print("P(Disease):",P_disease)

P(Positive): 0.3
P(Disease): 0.25


### 5. Conditional probability (this is the key ML step)

In [30]:
P_disease_given_positive = P_disease_and_positive/P_positve
print("P(Disease|Positive):",P_disease_given_positive)

P(Disease|Positive): 0.6666666666666667


#### Interpretation
* Even if the test is positive, disease is not guaranteed
* ML models reason exactly like this

### 6. Independence check
Events A and B are independent if:
> P(A | B) = P(A)

In [33]:
print("P(Disease):", P_disease)
print("P(Disease | Positive):", P_disease_given_positive)
# If these differ, the events are dependent.

P(Disease): 0.25
P(Disease | Positive): 0.6666666666666667


### 7. Using pandas crosstab (data-driven probability)

Now do the same thing like an ML pipeline.

In [36]:
# expand data into row-level records
records = []
for _, row in df.iterrows():
    records.extend([[row["Disease"],row["Test"]]]*row["Count"])
data_expaded = pd.DataFrame(records, columns = ["Disease","Test"])


ct = pd.crosstab(data_expaded["Disease"],data_expaded["Test"])
print(ct)

Test     Negative  Positive
Disease                    
No            130        20
Yes            10        40


### 8. Conditional probability form crosstab
P(Disease|Test)

In [50]:
P_disease_given_test = ct.div(ct.sum(axis=0),axis=1)
print(P_disease_given_test)
# This is exactly what Naive Bayes estimates.

Test     Negative  Positive
Disease                    
No       0.928571  0.333333
Yes      0.071429  0.666667


### 9.Spam detection example (class ML case)

In [51]:
spam_data = pd.DataFrame({
    "Spam": ["Yes", "Yes", "No", "No", "No", "Yes", "No"],
    "Word_free": ["Yes", "Yes", "No", "Yes", "No", "Yes", "No"]
})

In [53]:
ct_spam = pd.crosstab(spam_data["Spam"],spam_data["Word_free"])
print(ct_spam)

Word_free  No  Yes
Spam              
No          3    1
Yes         0    3


In [54]:
P_spam_given_free = ct_spam.div(ct_spam.sum(axis=0),axis=1)
print(P_spam_given_free)

Word_free   No   Yes
Spam                
No         1.0  0.25
Yes        0.0  0.75


####  Interpretation
* Seeing the word “free” shifts probability toward spam
* This is conditional probability driving prediction

### 10. Advanced example: weather and activity prediction

In [56]:
weather = pd.DataFrame({
    "Rain": ["Yes","No","Yes","No","No","Yes","No","Yes"],
    "Umbrella": ["Yes","No","Yes","No","Yes","Yes","No","Yes"]
})
ct_weather = pd.crosstab(weather["Rain"],weather["Umbrella"])
print(ct_weather)

Umbrella  No  Yes
Rain             
No         3    1
Yes        0    4


In [57]:
P_rain_given_umbrella = ct_weather.div(ct_weather.sum(axis=0),axis=1)
P_rain_given_umbrella

Umbrella,No,Yes
Rain,Unnamed: 1_level_1,Unnamed: 2_level_1
No,1.0,0.2
Yes,0.0,0.8


#### ML intuition
* Observations update belief
* Models rarely predict directly
* They update probabilities as evidence accumulates

### 11. Monte Carlo estimation (how ML learns probabilities)

In [61]:
import numpy as np
N = 100000
rain = np.random.choice(["Yes","No"],size=N,p=[0.3,0.7])
umbrella = np.where(rain=="Yes",
                   np.random.choice(["Yes","No"],size=N,p=[0.9,0.1]),
                   np.random.choice(["Yes","No"],size=N,p=[0.2,0.8]))

sim = pd.DataFrame({"rain":rain,"Umbrella":umbrella})


In [62]:
pd.crosstab(sim["rain"],sim["Umbrella"],normalize="index")
# This mirrors how probabilities are learned from data.

Umbrella,No,Yes
rain,Unnamed: 1_level_1,Unnamed: 2_level_1
No,0.799848,0.200152
Yes,0.101071,0.898929


### 12. How this maps to ML models
* Naive Bayes → conditional probabilities everywhere
* Logistic regression → models P(y | x)
* Neural networks → approximate conditional distributions
* Forecasting → update belief as new evidence arrives

##### Mental model to keep
* Joint: what happens together
* Marginal: ignore everything else
* Conditional: belief after seeing evidence
* Learning: estimating these from data

# Conditional-probability walkthrough, using the Iris flower dataset, which is a classic ML dataset.

In [93]:
import pandas as pd
from sklearn.datasets import load_iris

iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df["species"] = iris.target
df["species"] = df["species"].map(dict(enumerate(iris.target_names)))

df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


#### Mental model
* Each row = one flower (an outcome)
* Species = hidden label
* Measurements = observed evidence

### 2. Create events from continuous features

Conditional probability needs events, so we discretize a feature.

Example:
* Event A → flower is setosa
* Event B → petal length is small

In [94]:
# create a categorical feature
df["petal_length_small"]=df["petal length (cm)"]<2.5

#### Now we have:
* Event A: species = setosa
* Event B: petal_length_small = True

### 3. Joint probability

P(Setosa AND small petal length)

In [95]:
total = len(df)

joint_count = len(df[(df["species"] == "setosa") &
                 (df["petal_length_small"] == True)])

P_joint = joint_count / total
print("P(Setosa AND small petal length):", P_joint)


P(Setosa AND small petal length): 0.3333333333333333


### 4. Marginal probabilities

In [97]:
P_setosa = len(df[df["species"]=="setosa"])/total
P_small_petal = len(df[df["petal_length_small"]==True])/total

print("P(Setosa):",P_setosa)
print("P(Small petal length):",P_small_petal)

P(Setosa): 0.3333333333333333
P(Small petal length): 0.3333333333333333


### 5. Conditional probability (core ML step)

> P(Setosa | small petal length)

In [98]:
P_setosa_given_small = P_joint/P_small_petal
print("P(Setosa|small petal length):",P_setosa_given_small)

P(Setosa|small petal length): 1.0


#### Interpretation
* Once we see a small petal, probability of setosa jumps sharply
* This is exactly how ML models use features

### 6. Independence check
> If species and petal length were independent:

In [99]:
print("P(Setosa):",P_setosa)
print("P(Setosa|small petal):",P_setosa_given_small)

P(Setosa): 0.3333333333333333
P(Setosa|small petal): 1.0


They are very different → strong dependence.

This is why petal length is such a powerful feature.

### 7. Using pandas crosstab (clean ML-style approach)

In [100]:
ct = pd.crosstab(df["species"],df["petal_length_small"])
ct

petal_length_small,False,True
species,Unnamed: 1_level_1,Unnamed: 2_level_1
setosa,0,50
versicolor,50,0
virginica,50,0


### 8. Conditional probability table
> P(species | petal_length_small)

In [101]:
P_species_given_petal = ct.div(ct.sum(axis=0),axis=1)
P_species_given_petal

petal_length_small,False,True
species,Unnamed: 1_level_1,Unnamed: 2_level_1
setosa,0.0,1.0
versicolor,0.5,0.0
virginica,0.5,0.0


#### How to read this
* Columns = observed evidence
* Rows = belief about species
* Each column sums to 1

This is Naive Bayes logic.

### 9. More realistic example: binning petal width

In [102]:
df["petal_width_bin"] = pd.cut(
    df["petal width (cm)"],
    bins = [0,0.5,1.5,2.5],
    labels = ["small","medium","large"]
    )

In [103]:
ct_width = pd.crosstab(df["species"],df["petal_width_bin"])
print(ct_width)

petal_width_bin  small  medium  large
species                              
setosa              49       1      0
versicolor           0      45      5
virginica            0       3     47


In [104]:
P_species_given_width = ct_width.div(ct_width.sum(axis=0),axis=1)
P_species_given_width

petal_width_bin,small,medium,large
species,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
setosa,1.0,0.020408,0.0
versicolor,0.0,0.918367,0.096154
virginica,0.0,0.061224,0.903846


#### Interpretation
* Large petal width → almost certainly virginica
* Small → almost certainly setosa
* This is conditional probability driving classification.

### 10. Advanced ML intuition: multiple conditions
What ML models implicitly do:

> P(species | petal length, petal width, sepal length, sepal width)

Naive Bayes approximates this by combining:
> P(species | feature₁, feature₂, ...)

using multiple conditional probabilities like the ones you just computed.

### 11. Why this matters for ML
* Decision trees split on features to maximize conditional purity
* Naive Bayes explicitly multiplies conditional probabilities
* Logistic regression models conditional likelihood
* Neural networks approximate complex conditional distributions

#### Final mental model
* Joint → what happens together in the data
* Marginal → overall frequency
* Conditional → belief after seeing evidence
* Learning → estimating these from samples

The Iris dataset works so well because features strongly condition the label.