Q1. A company conducted a survey of its employees and found that 70% of the employees use the
company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the
probability that an employee is a smoker given that he/she uses the health insurance plan?

### Solution Using Conditional Probability  


Given Data:  
- P(Uses Insurance) = 0.70  
- P(Smoker | Uses Insurance) = 0.40  

#### Using Conditional Probability Formula:  
P(Smoker | Uses Insurance) = frac{P(Smoker \cap Uses Insurance)}{P(Uses Insurance)}

Since P(Smoker | Uses Insurance) is already given as **0.40**, we can directly state:  
P(Smoker | Uses Insurance) = 0.40

#### Final Answer:  
**The probability that an employee is a smoker given that they use the health insurance plan is 0.40 (or 40%). ✅**  


Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?

### Difference Between Bernoulli Naïve Bayes and Multinomial Naïve Bayes  

| Feature                | Bernoulli Naïve Bayes | Multinomial Naïve Bayes |
|------------------------|---------------------- |-------------------------|
| Data Type              | Binary (0 or 1)       | Count-based (integer frequencies) |
| Use Case               | Text classification with word presence/absence | Text classification with word frequency |
| Feature Representation | Boolean (Word exists or not) | Counts of words in a document |
| Example Application    | Spam detection, fraud detection | Sentiment analysis, topic classification |
| Assumption             | Features are independent binary variables | Features follow a multinomial distribution |

✅ **Key Takeaway:**  
- **Use BernoulliNB** when dealing with binary (yes/no) features.  
- **Use MultinomialNB** when working with word frequencies or count-based features.  


Q3. How does Bernoulli Naive Bayes handle missing values?

### How Bernoulli Naïve Bayes Handles Missing Values  

1. **Assumes Missing as Absent (0)**  
   - Since Bernoulli Naïve Bayes works with binary features (0 or 1), missing values are often treated as **0** (absence of a feature).  

2. **Imputation Strategies**  
   - If missing values are significant, common imputation techniques include:  
     - **Mode Imputation**: Replacing missing values with the most frequent value (0 or 1).  
     - **Mean/Median Imputation**: Using the average presence rate of a feature across samples.  

3. **Ignoring Missing Values (During Training)**  
   - Some implementations of BernoulliNB may ignore missing values when calculating probabilities to avoid bias.  

✅ **Key Takeaway:**  
- BernoulliNB assumes missing values as absent (0) by default but can be handled using imputation techniques if needed.  


Q4. Can Gaussian Naive Bayes be used for multi-class classification?

### Can Gaussian Naïve Bayes Be Used for Multi-Class Classification?  

✅ **Yes, Gaussian Naïve Bayes (GNB) can be used for multi-class classification.**  

### How It Works:  
- GaussianNB assumes that features follow a **normal (Gaussian) distribution**.  
- It calculates the probability for each class using Bayes' theorem and assigns the instance to the class with the highest probability.  
- The classifier supports **multiple classes (not just binary classification).**  

### Example Use Cases:  
- **Iris Classification** (classifying flowers into different species).  
- **Handwritten Digit Recognition** (classifying digits 0-9).  

### Key Takeaway:  
- **GaussianNB supports multi-class classification** by computing probabilities for all classes and selecting the most likely one.  


Q5. Assignment:

In [1]:
# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report
from sklearn.preprocessing import StandardScaler

# Load the dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data"
columns = [f'feature_{i}' for i in range(57)] + ['label']
data = pd.read_csv(url, header=None, names=columns)

# Split features and target
X = data.iloc[:, :-1]  # Features
y = data.iloc[:, -1]    # Labels (Spam or Not Spam)

# Standardizing the data for GaussianNB
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Splitting data into training and test sets (80-20 split)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# Define models
models = {
    "Bernoulli Naïve Bayes": BernoulliNB(),
    "Multinomial Naïve Bayes": MultinomialNB(),
    "Gaussian Naïve Bayes": GaussianNB()
}

# Evaluate models using 10-fold cross-validation
results = {}

for name, model in models.items():
    if name == "Gaussian Naïve Bayes":
        X_train_fit = scaler.fit_transform(X_train)
        X_test_fit = scaler.transform(X_test)
    else:
        X_train_fit, X_test_fit = X_train, X_test

    # Train and test the model
    model.fit(X_train_fit, y_train)
    y_pred = model.predict(X_test_fit)

    # Compute evaluation metrics
    accuracy = accuracy_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred)
    recall = recall_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)
    
    # Cross-validation scores
    cross_val = cross_val_score(model, X, y, cv=10, scoring='accuracy').mean()

    # Store results
    results[name] = {
        "Accuracy": accuracy,
        "Precision": precision,
        "Recall": recall,
        "F1 Score": f1,
        "Cross-Validation Accuracy": cross_val
    }

# Display results
print("Performance of Naïve Bayes Classifiers on Spambase Dataset:\n")
for model, metrics in results.items():
    print(f"--- {model} ---")
    for metric, value in metrics.items():
        print(f"{metric}: {value:.4f}")
    print("\n")


Performance of Naïve Bayes Classifiers on Spambase Dataset:

--- Bernoulli Naïve Bayes ---
Accuracy: 0.8762
Precision: 0.8716
Recall: 0.8044
F1 Score: 0.8367
Cross-Validation Accuracy: 0.8839


--- Multinomial Naïve Bayes ---
Accuracy: 0.7763
Precision: 0.7199
Recall: 0.7080
F1 Score: 0.7139
Cross-Validation Accuracy: 0.7863


--- Gaussian Naïve Bayes ---
Accuracy: 0.8328
Precision: 0.7146
Recall: 0.9587
F1 Score: 0.8188
Cross-Validation Accuracy: 0.8218


