<a href="https://colab.research.google.com/github/UrvashiiThakur/practiceGit/blob/main/10April.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Q1. Probability Calculation

Given:
- \( P(H) = 0.7 \) (Probability of using the health insurance plan)
- \( P(S | H) = 0.4 \) (Probability of being a smoker given using the health insurance plan)

We need to find \( P(S | H) \), which is already given as 0.4.

So, the probability that an employee is a smoker given that he/she uses the health insurance plan is 0.4 or 40%.

### Q2. Difference Between Bernoulli Naive Bayes and Multinomial Naive Bayes

**Bernoulli Naive Bayes:**
- Used for binary/boolean features (e.g., whether a word occurs in a document or not).
- Each feature is assumed to be independent and binary (0 or 1).
- Suitable for binary data.

**Multinomial Naive Bayes:**
- Used for discrete data (e.g., word counts or frequencies).
- Suitable for text classification problems where the input data is represented as word counts or term frequencies.
- Each feature represents the number of occurrences of a word or token in the document.

### Q3. How Bernoulli Naive Bayes Handles Missing Values

Bernoulli Naive Bayes does not handle missing values inherently. Missing values need to be imputed before applying the classifier. Common techniques include filling missing values with the mode or mean of the feature, or using more advanced imputation techniques like k-nearest neighbors or regression imputation.

### Q4. Can Gaussian Naive Bayes Be Used for Multi-Class Classification?

Yes, Gaussian Naive Bayes can be used for multi-class classification. It models each class's feature distribution as a Gaussian (normal) distribution and applies Bayes' theorem to predict the class of new instances. Scikit-learn's `GaussianNB` supports multi-class classification out-of-the-box.

### Q5. Assignment: Naive Bayes Classifiers on Spambase Data Set

#### Data Preparation:
Download the "Spambase Data Set" from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/Spambase).

#### Implementation:

```python
import pandas as pd
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Load the dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data"
data = pd.read_csv(url, header=None)

# Split data into features and labels
X = data.iloc[:, :-1]
y = data.iloc[:, -1]

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Preprocess the data (e.g., scaling for GaussianNB)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Initialize classifiers
bernoulli_nb = BernoulliNB()
multinomial_nb = MultinomialNB()
gaussian_nb = GaussianNB()

# Train classifiers
bernoulli_nb.fit(X_train, y_train)
multinomial_nb.fit(X_train, y_train)
gaussian_nb.fit(X_train_scaled, y_train)

# Predictions
y_pred_bernoulli = bernoulli_nb.predict(X_test)
y_pred_multinomial = multinomial_nb.predict(X_test)
y_pred_gaussian = gaussian_nb.predict(X_test_scaled)

# Performance metrics
def print_metrics(y_test, y_pred):
    print(f"Accuracy: {accuracy_score(y_test, y_pred)}")
    print(f"Precision: {precision_score(y_test, y_pred)}")
    print(f"Recall: {recall_score(y_test, y_pred)}")
    print(f"F1 Score: {f1_score(y_test, y_pred)}")

print("Bernoulli Naive Bayes:")
print_metrics(y_test, y_pred_bernoulli)

print("\nMultinomial Naive Bayes:")
print_metrics(y_test, y_pred_multinomial)

print("\nGaussian Naive Bayes:")
print_metrics(y_test, y_pred_gaussian)

# Hyperparameter tuning using GridSearchCV
from sklearn.model_selection import GridSearchCV

param_grid = {
    'alpha': [0.1, 0.5, 1.0, 5.0, 10.0]
}

grid_search_bernoulli = GridSearchCV(BernoulliNB(), param_grid, cv=10, scoring='f1')
grid_search_multinomial = GridSearchCV(MultinomialNB(), param_grid, cv=10, scoring='f1')

grid_search_bernoulli.fit(X_train, y_train)
grid_search_multinomial.fit(X_train, y_train)

print("Best parameters for BernoulliNB:", grid_search_bernoulli.best_params_)
print("Best parameters for MultinomialNB:", grid_search_multinomial.best_params_)

# Re-train with the best parameters
bernoulli_nb_best = BernoulliNB(alpha=grid_search_bernoulli.best_params_['alpha'])
multinomial_nb_best = MultinomialNB(alpha=grid_search_multinomial.best_params_['alpha'])

bernoulli_nb_best.fit(X_train, y_train)
multinomial_nb_best.fit(X_train, y_train)

y_pred_bernoulli_best = bernoulli_nb_best.predict(X_test)
y_pred_multinomial_best = multinomial_nb_best.predict(X_test)

print("Tuned Bernoulli Naive Bayes:")
print_metrics(y_test, y_pred_bernoulli_best)

print("Tuned Multinomial Naive Bayes:")
print_metrics(y_test, y_pred_multinomial_best)
```

### Results:

Report the performance metrics (Accuracy, Precision, Recall, F1 score) for each classifier.

### Discussion:

Discuss the results obtained, comparing the performance of Bernoulli, Multinomial, and Gaussian Naive Bayes classifiers. Explain why one variant might perform better than the others, considering the nature of the dataset and the assumptions each model makes. Discuss any limitations observed in Naive Bayes classifiers, such as handling correlated features or assuming feature independence.

### Conclusion:

Summarize your findings and provide suggestions for future work. Consider experimenting with different preprocessing techniques, feature engineering, or trying other classifiers to further improve performance.

