# Naive Bayes Classifier — Notes

---

## 1. Basic Idea
- Naive Bayes is a **probabilistic classifier** based on **Bayes’ Theorem**.  
- It assumes that all features are **conditionally independent** given the class label (the "naive" assumption).  

**Bayes’ Theorem:**  
\[
P(C \mid X) = \frac{P(X \mid C) \cdot P(C)}{P(X)}
\]  

Where:  
- \( C \) = class (e.g., spam / not spam)  
- \( X \) = features (e.g., word frequencies)  
- \( P(C|X) \) = posterior probability (what we want)  
- \( P(X|C) \) = likelihood (probability of features given class)  
- \( P(C) \) = prior probability of the class  
- \( P(X) \) = marginal probability (same for all classes → often ignored)  

---

## 2. Intuition
- Imagine classifying emails as spam or not spam.  
- If certain words (like *free*, *money*, *win*) appear often in spam, Naive Bayes will assign higher probability to the spam class when those words are present.  
- Even though words are not truly independent, this assumption makes the math simple and works surprisingly well in practice.  

---

## 3. Types of Naive Bayes
1. **GaussianNB** – for continuous features (assumes data follows a normal distribution).  
2. **MultinomialNB** – for count data (e.g., word frequencies).  
3. **BernoulliNB** – for binary features (e.g., word present or not).  

---

## 4. Advantages
- Fast to train and predict.  
- Works well on small datasets.  
- Handles high-dimensional data (e.g., text classification).  
- Simple and interpretable.  
- Often performs surprisingly well despite the independence assumption.  

---

## 5. Disadvantages
- Assumes independence of features, which is rarely true.  
- Continuous features must follow assumed distribution (e.g., Gaussian).  
- Struggles when features are highly correlated.  
- Estimates of probability can be poor with rare events (unless smoothing like Laplace smoothing is applied).  

---

## 6. Common Use Cases
- Spam detection (spam vs not spam).  
- Text classification (news categorization, sentiment analysis).  
- Medical diagnosis (probability of disease given symptoms).  
- Recommendation systems (e.g., predicting user preferences).  

---

## 7. Summary
Naive Bayes is a simple yet powerful baseline model for classification, especially in text and document classification problems. While it may not always be the most accurate model, it is computationally efficient, interpretable, and often surprisingly competitive.  

---


In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

In [8]:
headers = [
    "word_freq_make", "word_freq_address", "word_freq_all", "word_freq_3d",
    "word_freq_our", "word_freq_over", "word_freq_remove", "word_freq_internet",
    "word_freq_order", "word_freq_mail", "word_freq_receive", "word_freq_will",
    "word_freq_people", "word_freq_report", "word_freq_addresses", "word_freq_free",
    "word_freq_business", "word_freq_email", "word_freq_you", "word_freq_credit",
    "word_freq_your", "word_freq_font", "word_freq_000", "word_freq_money",
    "word_freq_hp", "word_freq_hpl", "word_freq_george", "word_freq_650",
    "word_freq_lab", "word_freq_labs", "word_freq_telnet", "word_freq_857",
    "word_freq_data", "word_freq_415", "word_freq_85", "word_freq_technology",
    "word_freq_1999", "word_freq_parts", "word_freq_pm", "word_freq_direct",
    "word_freq_cs", "word_freq_meeting", "word_freq_original", "word_freq_project",
    "word_freq_re", "word_freq_edu", "word_freq_table", "word_freq_conference",
    "char_freq_;", "char_freq_(", "char_freq_[", "char_freq_!", "char_freq_$",
    "char_freq_#", "capital_run_length_average", "capital_run_length_longest",
    "capital_run_length_total", "class"
]



In [9]:
df = pd.read_csv('spambase.data', encoding='latin1', names=headers)

In [10]:
df.head()

Unnamed: 0,word_freq_make,word_freq_address,word_freq_all,word_freq_3d,word_freq_our,word_freq_over,word_freq_remove,word_freq_internet,word_freq_order,word_freq_mail,word_freq_receive,word_freq_will,word_freq_people,word_freq_report,word_freq_addresses,word_freq_free,word_freq_business,word_freq_email,word_freq_you,word_freq_credit,word_freq_your,word_freq_font,word_freq_000,word_freq_money,word_freq_hp,word_freq_hpl,word_freq_george,word_freq_650,word_freq_lab,word_freq_labs,word_freq_telnet,word_freq_857,word_freq_data,word_freq_415,word_freq_85,word_freq_technology,word_freq_1999,word_freq_parts,word_freq_pm,word_freq_direct,word_freq_cs,word_freq_meeting,word_freq_original,word_freq_project,word_freq_re,word_freq_edu,word_freq_table,word_freq_conference,char_freq_;,char_freq_(,char_freq_[,char_freq_!,char_freq_$,char_freq_#,capital_run_length_average,capital_run_length_longest,capital_run_length_total,class
0,0.0,0.64,0.64,0.0,0.32,0.0,0.0,0.0,0.0,0.0,0.0,0.64,0.0,0.0,0.0,0.32,0.0,1.29,1.93,0.0,0.96,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.778,0.0,0.0,3.756,61,278,1
1,0.21,0.28,0.5,0.0,0.14,0.28,0.21,0.07,0.0,0.94,0.21,0.79,0.65,0.21,0.14,0.14,0.07,0.28,3.47,0.0,1.59,0.0,0.43,0.43,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.132,0.0,0.372,0.18,0.048,5.114,101,1028,1
2,0.06,0.0,0.71,0.0,1.23,0.19,0.19,0.12,0.64,0.25,0.38,0.45,0.12,0.0,1.75,0.06,0.06,1.03,1.36,0.32,0.51,0.0,1.16,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.12,0.0,0.06,0.06,0.0,0.0,0.01,0.143,0.0,0.276,0.184,0.01,9.821,485,2259,1
3,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,0.31,0.31,0.31,0.0,0.0,0.31,0.0,0.0,3.18,0.0,0.31,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.137,0.0,0.137,0.0,0.0,3.537,40,191,1
4,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,0.31,0.31,0.31,0.0,0.0,0.31,0.0,0.0,3.18,0.0,0.31,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.135,0.0,0.135,0.0,0.0,3.537,40,191,1


In [4]:
df.shape

(4600, 58)

In [12]:
df.isnull().sum()

word_freq_make                0
word_freq_address             0
word_freq_all                 0
word_freq_3d                  0
word_freq_our                 0
word_freq_over                0
word_freq_remove              0
word_freq_internet            0
word_freq_order               0
word_freq_mail                0
word_freq_receive             0
word_freq_will                0
word_freq_people              0
word_freq_report              0
word_freq_addresses           0
word_freq_free                0
word_freq_business            0
word_freq_email               0
word_freq_you                 0
word_freq_credit              0
word_freq_your                0
word_freq_font                0
word_freq_000                 0
word_freq_money               0
word_freq_hp                  0
word_freq_hpl                 0
word_freq_george              0
word_freq_650                 0
word_freq_lab                 0
word_freq_labs                0
word_freq_telnet              0
word_fre

In [13]:
X = df.iloc[:, 0:-1] 
y = df.iloc[:, -1]

In [14]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

In [15]:
from sklearn.naive_bayes import GaussianNB

gnb = GaussianNB() 

gnb.fit(X_train, y_train) 

0,1,2
,priors,
,var_smoothing,1e-09


In [16]:
y_pred = gnb.predict(X_test)

In [17]:
from sklearn.metrics import accuracy_score, classification_report

print(accuracy_score(y_pred, y_test))
print(classification_report(y_pred, y_test))

0.8124547429398986
              precision    recall  f1-score   support

           0       0.73      0.95      0.82       631
           1       0.94      0.70      0.80       750

    accuracy                           0.81      1381
   macro avg       0.83      0.82      0.81      1381
weighted avg       0.84      0.81      0.81      1381



In [18]:
from sklearn.naive_bayes import MultinomialNB

mnb = MultinomialNB() 

mnb.fit(X_train, y_train)

0,1,2
,alpha,1.0
,force_alpha,True
,fit_prior,True
,class_prior,


In [19]:
y_pred1 = mnb.predict(X_test)
print(accuracy_score(y_pred1, y_test))
print(classification_report(y_pred1, y_test))

0.8095582910934106
              precision    recall  f1-score   support

           0       0.85      0.83      0.84       839
           1       0.75      0.77      0.76       542

    accuracy                           0.81      1381
   macro avg       0.80      0.80      0.80      1381
weighted avg       0.81      0.81      0.81      1381

