# Notebook 6 — Exercises: Probability Models & ML Foundations

## Learning Objectives
By the end of this notebook, you should be able to:
- Apply probability rules (joint, marginal, conditional) in ML contexts.
- Derive and compute Maximum Likelihood Estimates (MLE).
- Understand and implement Naive Bayes classification.
- Compare MLE with Maximum A Posteriori (MAP) estimation.
- Use Python to implement probability-based models.

## 1. Probability Refresher — Joint, Marginal, Conditional

**Task 1.1:** Given the joint probability table below for two binary variables `A` and `B`, compute:
- Marginal probabilities `P(A=1)` and `P(B=1)`
- Conditional probability `P(A=1 | B=1)`

| A | B | P(A,B) |
|---|---|--------|
| 0 | 0 | 0.1    |
| 0 | 1 | 0.3    |
| 1 | 0 | 0.2    |
| 1 | 1 | 0.4    |

In [None]:
# Task 1.1 — Your code here
import numpy as np

joint_probs = np.array([
    [0.1, 0.3],  # A=0,B=0 and A=0,B=1
    [0.2, 0.4]   # A=1,B=0 and A=1,B=1
])

# TODO: Compute P(A=1), P(B=1), P(A=1|B=1)


## 2. Bayes' Theorem in ML Context

A spam filter uses the probability of words appearing in spam vs. non-spam emails.

**Task 2.1:** Using Bayes' theorem, compute:
\[ P(\text{spam} | \text{word='offer'}) \]
Given:
- P(spam) = 0.4
- P(word='offer' | spam) = 0.6
- P(word='offer' | not spam) = 0.05

In [None]:
# Task 2.1 — Your code here
P_spam = 0.4
P_offer_given_spam = 0.6
P_offer_given_not_spam = 0.05

# TODO: Use Bayes' theorem to compute P(spam | offer)


## 3. Maximum Likelihood Estimation (MLE)

**Task 3.1:** You toss a coin 10 times and get 7 heads. Estimate the probability of heads using MLE for the Bernoulli distribution.

**Task 3.2:** Generate 100 samples from a normal distribution with mean 5 and std 2. Compute the MLE estimates for the mean and variance.

In [None]:
# Task 3.1 — Bernoulli MLE
heads = 7
n = 10

# TODO: MLE for Bernoulli parameter p

# Task 3.2 — Gaussian MLE
np.random.seed(42)
data = np.random.normal(5, 2, 100)

# TODO: MLE for mean and variance


## 4. Maximum A Posteriori (MAP) Estimation

**Task 4.1:** Suppose you have the same coin-toss experiment (7 heads out of 10), but you also have a prior belief that the coin is fair (p=0.5) with a Beta(α=2, β=2) prior.
Compute the MAP estimate for p.

In [None]:
# Task 4.1 — MAP estimation with Beta prior
alpha_prior = 2
beta_prior = 2

# TODO: Compute MAP for p


## 5. Naive Bayes Classifier from Scratch

**Task 5.1:** Implement a simple Naive Bayes classifier for binary classification using discrete features.
- Create a small dataset with binary features and binary labels.
- Compute class priors and likelihoods.
- Predict labels for test samples.

**Task 5.2:** Compare your results with `sklearn.naive_bayes.BernoulliNB`.

In [None]:
# Task 5.1 — Implement Naive Bayes from scratch
import numpy as np
from sklearn.naive_bayes import BernoulliNB

# Example dataset
X_train = np.array([
    [1, 0, 1],
    [1, 1, 0],
    [0, 0, 1],
    [0, 1, 0]
])
y_train = np.array([1, 1, 0, 0])

# TODO: Implement Naive Bayes manually

# Task 5.2 — Compare to sklearn
clf = BernoulliNB()
clf.fit(X_train, y_train)
print("sklearn predictions:", clf.predict(X_train))
