# **Naive Bayes – Enhanced Jupyter Notebook (2025 Edition)**
Complete, exam-ready, deeply explained Naive Bayes notebook with:
- Theory + formulas
- Gaussian NB from scratch
- Multinomial NB from scratch
- Text classification (spam filter)
- Laplace smoothing
- Sklearn comparisons


## **1. Bayes' Theorem Basics**
Bayes Theorem:
\[ P(y|x) = \frac{P(x|y)P(y)}{P(x)} \]
Naive assumption → features independent:
\[ P(x|y)=\prod_i P(x_i|y) \]

## **2. Gaussian Naive Bayes – From Scratch**

In [2]:
import numpy as np

class GaussianNB:
    def fit(self,X,y):
        self.classes=np.unique(y)
        self.mean={}; self.var={}; self.priors={}
        for c in self.classes:
            Xc=X[y==c]
            self.mean[c]=Xc.mean(axis=0)
            self.var[c]=Xc.var(axis=0)+1e-9
            self.priors[c]=Xc.shape[0]/X.shape[0]

    def pdf(self,cls,x):
        mean=self.mean[cls]; var=self.var[cls]
        num=np.exp(-(x-mean)**2/(2*var))
        den=np.sqrt(2*np.pi*var)
        return num/den

    def predict(self,X):
        preds=[]
        for x in X:
            post=[]
            for c in self.classes:
                prior=np.log(self.priors[c])
                cond=np.sum(np.log(self.pdf(c,x)))
                post.append(prior+cond)
            preds.append(self.classes[np.argmax(post)])
        return np.array(preds)

# Example
np.random.seed(42)
X=np.vstack([np.random.randn(50,2)+0, np.random.randn(50,2)+3])
y=np.array([0]*50+[1]*50)
model=GaussianNB(); model.fit(X,y)
print("Predictions:",model.predict(X[:5]))

Predictions: [0 0 0 0 0]


## **3. Multinomial Naive Bayes – For Text Classification**

In [3]:
from collections import defaultdict
import numpy as np

class MultinomialNB:
    def fit(self, X, y):
        self.classes=np.unique(y)
        self.total_words={c:0 for c in self.classes}
        self.word_counts={c:defaultdict(int) for c in self.classes}
        self.priors={}

        for c in self.classes:
            Xc=X[y==c]
            self.priors[c]=len(Xc)/len(X)
            for row in Xc:
                for i,count in enumerate(row):
                    self.word_counts[c][i]+=count
                    self.total_words[c]+=count

        self.vocab_size=X.shape[1]

    def predict(self, X):
        preds=[]
        for row in X:
            scores={}
            for c in self.classes:
                log_prob=np.log(self.priors[c])
                for i,count in enumerate(row):
                    if count>0:
                        num=self.word_counts[c][i]+1
                        den=self.total_words[c]+self.vocab_size
                        log_prob+=count*np.log(num/den)
                scores[c]=log_prob
            preds.append(max(scores,key=scores.get))
        return np.array(preds)

# Example tiny BOW
X=np.array([[2,1,0],[0,1,3],[3,0,0]])
y=np.array([0,1,0])
mnb=MultinomialNB(); mnb.fit(X,y)
print("Prediction:", mnb.predict(X))

Prediction: [0 1 0]


## **4. Simple Spam Filter (Multinomial NB)**

In [4]:
from sklearn.feature_extraction.text import CountVectorizer

docs=["win money now","free prize win","meeting at office","project discussion"]
labels=np.array([1,1,0,0])

cv=CountVectorizer()
X=cv.fit_transform(docs).toarray()

mnb=MultinomialNB()
mnb.fit(X,labels)

test=["win free money","office meeting now"]
Xt=cv.transform(test).toarray()
print("Predictions:", mnb.predict(Xt))

Predictions: [1 0]


## **5. Compare GaussianNB With sklearn**

In [5]:
from sklearn.naive_bayes import GaussianNB as SkGNB

sk=SkGNB()
sk.fit(X,y)
print("Sklearn Pred:",sk.predict(X[:5]))


ValueError: Found input variables with inconsistent numbers of samples: [4, 3]

## **6. Summary**
- Gaussian NB: numeric features
- Multinomial NB: text BOW
- Laplace smoothing
- Log-probability to avoid underflow
- Fast + effective for many tasks
