# Part 2: Bayesian Probability 

##Formula:
-P(A‚à£B) = Posterior probability (probability of A given B)
-P(B‚à£A) = Likelihood (probability of B given A)

-ùëÉ(ùê¥)P(A) = Prior probability of A

-P(B) = Marginal probability of B (normalizing factor)

-A be "the email is spam"
-B be "the words in the email"

In [8]:
from collections import defaultdict
import math

# Training dataset
emails = [
    ("Buy cheap meds now", "spam"),
    ("Limited offer just for you", "spam"),
    ("Meeting tomorrow at 10am", "not_spam"),
    ("Project deadline extended", "not_spam"),
    (" No meeting today", "not_spam"),
]

# Changes a word to lower case and splits
def word_splitter(text):
    return text.lower().split()

# Count words
word_counts = {
    "spam": defaultdict(int),
    "not_spam": defaultdict(int)
}
class_counts = defaultdict(int)

for message, label in emails:
    class_counts[label] += 1
    for word in word_splitter(message):
        word_counts[label][word] += 1

# Total number of messages
total_messages = sum(class_counts.values())

# Calculate prior probability of spam or not_spam
priors = {
    label: count / total_messages
    for label, count in class_counts.items()
}

# Calculate conditional probabilities with Laplace smoothing
def word_prob(word, label):
    return (word_counts[label][word] + 1) / (sum(word_counts[label].values()) + len(word_counts[label]))

# Modified Naive Bayes classifier with detailed logs
def predict_verbose(message):
    words = word_splitter(message)
    scores = {}

    print(f"\nüì© Predicting message: \"{message}\"\nWords: {words}\n")

    for label in class_counts:
        print(f"üîç Label: '{label}'")

        prior = priors[label]
        log_prob = math.log(prior)
        print(f"  ‚û§ Prior P({label}) = {prior:.4f}")
        print(f"  ‚û§ log(Prior) = {log_prob:.4f}")

        for word in words:
            cond_prob = word_prob(word, label)
            log_word_prob = math.log(cond_prob)
            log_prob += log_word_prob
            print(f"    üîπ P('{word}'|{label}) = {cond_prob:.4f} ‚Üí log = {log_word_prob:.4f}")

        scores[label] = log_prob
        print(f"  ‚úÖ Total log score for '{label}': {log_prob:.4f}\n")

    # Convert log scores to real probabilities
    max_log = max(scores.values())
    exp_scores = {label: math.exp(scores[label] - max_log) for label in scores}
    total = sum(exp_scores.values())
    probs = {label: exp_scores[label] / total for label in exp_scores}

    for label in probs:
        print(f"üî¢ P({label} | message) = {probs[label]:.4f}")

    prediction = max(probs, key=probs.get)
    print(f"\nüèÅ Final prediction: {prediction.upper()}")
    return prediction

# Test email
test_email = "We have a cheap offer now"
predict_verbose(test_email)




üì© Predicting message: "We have a cheap offer now"
Words: ['we', 'have', 'a', 'cheap', 'offer', 'now']

üîç Label: 'spam'
  ‚û§ Prior P(spam) = 0.4000
  ‚û§ log(Prior) = -0.9163
    üîπ P('we'|spam) = 0.0526 ‚Üí log = -2.9444
    üîπ P('have'|spam) = 0.0500 ‚Üí log = -2.9957
    üîπ P('a'|spam) = 0.0476 ‚Üí log = -3.0445
    üîπ P('cheap'|spam) = 0.0952 ‚Üí log = -2.3514
    üîπ P('offer'|spam) = 0.0952 ‚Üí log = -2.3514
    üîπ P('now'|spam) = 0.0952 ‚Üí log = -2.3514
  ‚úÖ Total log score for 'spam': -16.9551

üîç Label: 'not_spam'
  ‚û§ Prior P(not_spam) = 0.6000
  ‚û§ log(Prior) = -0.5108
    üîπ P('we'|not_spam) = 0.0500 ‚Üí log = -2.9957
    üîπ P('have'|not_spam) = 0.0476 ‚Üí log = -3.0445
    üîπ P('a'|not_spam) = 0.0455 ‚Üí log = -3.0910
    üîπ P('cheap'|not_spam) = 0.0435 ‚Üí log = -3.1355
    üîπ P('offer'|not_spam) = 0.0417 ‚Üí log = -3.1781
    üîπ P('now'|not_spam) = 0.0400 ‚Üí log = -3.2189
  ‚úÖ Total log score for 'not_spam': -19.1745

üî¢ P(spam | m

'spam'


![alt text](<CamScanner ‚Å®6-22-25 18.57‚Å©_1.jpg>)

![alt text](<CamScanner ‚Å®6-22-25 18.57‚Å©_1-1.jpg>)

![alt text](<CamScanner ‚Å®6-22-25 18.57‚Å©_3.jpg>)

![alt text](<CamScanner ‚Å®6-22-25 18.57‚Å©_4.jpg>)

![alt text](<CamScanner ‚Å®6-22-25 18.57‚Å©_5.jpg>)

#  Part 4: Gradient Descent in Code 

*In this section we are going to convert the manual calculation computed in part 3 into Pyhon code using SciPy.*

### Import libraries:
- numpy: Used for  for numerical computations.
- matplotlib.pyplot:Used for creating visualizations and plotting data.
-  scipy.optimize.approx_fprime:Computes numerically gradients


In [None]:

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import approx_fprime

### Initialization of Dataset and Parameters

1. Initial m=-1
2. Initial b=1
3. learning rate=0.1
4. Given points: (1,3) and (3,6)

In [None]:
# Parameters
m=-1
b=1
alpha = 000.1  # Learning rate

# Dataset
X = np.array([1, 3]) 
Y = np.array([3, 6])  

### Mean Squared Error Cost Function
It helps us measure how large the prediction error is.

### Components:
1. len(x): count how many data point we have in x which in our case is 2 as x ha 2 elements ([1,3])
2. prediction :
3. error : difference between prediction and true y value.
4. cost: Mean Squared Error


In [None]:
def cost_function(m, b, X, Y):
    N = len(X)
    predictions = m * X + b
    error = predictions - Y
    cost = (1 / N) * np.sum(error ** 2)
    return cost

### Gradient Calculation:

We define a function to compute the gradients of the cost function with respect to parameters `m`  and `b` .  
These gradients tell us how to adjust `m` and `b` in order to reduce the error.

### Components:

- GoM: tells us how to update m
- GoB: tells us how to update b



In [None]:
# Gradient Calculation
def gradients(m, b, X, Y):
    N = len(X)
    predictions = m * X + b
    error = predictions - Y
    GoM = (2 / N) * np.sum(error * X)
    GoB = (2 / N) * np.sum(error)
    return GoM,GoB

In [None]:
# Perform Gradient Descent
max_iterations =4


In [None]:
for i in range(max_iterations):
    current_cost = cost_function(m, b, X, Y)
    GoM, GoB = gradients(m, b, X, Y)
    
    m = m - alpha * GoM
    b = b - alpha * GoB
    
    print(f"Iteration {i+1}: Cost={current_cost:.4f}, m={m:.4f}, b={b:.4f}")


Iteration 1: Cost=36.5000, m=1.7000, b=2.1000
Iteration 2: Cost=1.0400, m=1.2600, b=1.9000
Iteration 3: Cost=0.0640, m=1.3400, b=1.9160
Iteration 4: Cost=0.0348, m=1.3336, b=1.8968
