# Naive Bayes
Naive Bayes is a family of probabilistic classifiers based on Bayes' Theorem, which assumes that the features are independent given the class. It's particularly popular for text classification tasks like spam detection, sentiment analysis, etc.

### Naive Bayes Overview:
1. **Bayes' Theorem**:  
   \[
   P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}
   \]
   In the context of classification, we use:
   \[
   P(C|X) = \frac{P(X|C) \cdot P(C)}{P(X)}
   \]
   Where:
   - \( P(C|X) \) is the posterior probability of class \( C \) given the feature set \( X \).
   - \( P(X|C) \) is the likelihood of feature set \( X \) given the class \( C \).
   - \( P(C) \) is the prior probability of class \( C \).
   - \( P(X) \) is the prior probability of feature set \( X \) (can be ignored for classification purposes since it remains constant).

2. **Types of Naive Bayes**:
   - **Gaussian Naive Bayes**: Assumes that the features follow a normal distribution.
   - **Multinomial Naive Bayes**: Used for discrete features like word counts.
   - **Bernoulli Naive Bayes**: Used for binary/boolean features.

### Explanation:
- **Train-test split**: Divides the dataset into training (70%) and testing (30%) sets.
- **GaussianNB**: A Naive Bayes classifier assuming Gaussian distribution for the features.
- **Accuracy**: The ratio of correct predictions to the total predictions.
- **Confusion Matrix**: A table showing the performance of the classifier by displaying true positives, true negatives, false positives, and false negatives.
- **Classification Report**: Includes precision, recall, F1-score, and support for each class.

### Parameters

#### **1. `alpha`**

- **What It Does**: Adds a small number to each feature's count to avoid problems with features that don’t appear in the training data.
- **Why It's Needed**: Sometimes, a feature might be missing in the training data for a particular class. `alpha` helps ensure that the model doesn’t assume zero probability for these features.
- **When to Adjust**: Use `alpha` if you notice that some features are missing in your training data, or if your model is giving zero probabilities for certain features.

**Example**: If you’re classifying emails and a word doesn’t appear in your training set, `alpha` helps the model handle that missing word by giving it a small, non-zero count.

#### **2. `binarize`**

- **What It Does**: Sets a cutoff value to decide if a feature should be considered as "present" (1) or "absent" (0).
- **Why It's Needed**: For binary features (like whether a word is present in an email), this parameter helps convert the feature values into binary format.
- **When to Adjust**: Use this if you have features with continuous values and want to convert them to a binary format.

**Example**: If you have a feature that measures the frequency of a word and you set `binarize` to 0.5, any frequency above 0.5 will be considered as "present" (1), and below it will be considered "absent" (0).

#### **3. `var_smoothing`**

- **What It Does**: Adds a small value to the variance of features to avoid problems with very small or zero variances.
- **Why It's Needed**: If some features have very small variance, the model might run into numerical issues. This parameter helps keep calculations stable.
- **When to Adjust**: Use this if your model is having trouble with very small or zero variance in the features.

**Example**: If you're using a Gaussian Naive Bayes model and some features have almost no variability, `var_smoothing` helps the model handle those features more reliably.

#### **4. `fit_prior`**

- **What It Does**: Decides whether the model should learn the prior probabilities of different classes from the training data or use equal probabilities for all classes.
- **Why It's Needed**: If you want the model to use the actual distribution of classes in your data, set this to `True`. If you prefer all classes to be treated equally, set it to `False`.
- **When to Adjust**: Use this if you have a reason to assume that classes should have equal probabilities or if you want the model to learn from the data.

**Example**: If you have 90% of class A and 10% of class B in your training data, `fit_prior=True` will make the model consider this imbalance. If set to `False`, the model will assume that both classes are equally likely, regardless of the actual distribution.

#### **5. `class_prior`**

- **What It Does**: Allows you to specify the probabilities of the classes manually instead of learning them from the data.
- **Why It's Needed**: Use this if you have prior knowledge about the class distribution and want the model to use this information instead of learning it from the training data.
- **When to Adjust**: Set this if you have strong reasons to believe in specific class probabilities that are different from what your training data shows.

**Example**: If you know that in the real world, class A is twice as likely as class B, you can set `class_prior` to reflect this before training your model.

In summary, these parameters help adjust how the Naive Bayes model handles feature data, class distributions, and numerical stability to improve accuracy and performance.
### Running the Code:
- The `load_iris()` function loads a dataset for demonstration purposes.
- The output will show the performance of the Naive Bayes classifier on the test set.

## Gaussian Naive Bayes
 Assumes that the features follow a normal distribution.

In [1]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [2]:
# Sample dataset (e.g., iris dataset)
from sklearn.datasets import load_iris
data = load_iris()
X = data.data  # Features
y = data.target  # Labels

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [3]:
# Initialize the classifier
nb = GaussianNB()


# Train the model
nb.fit(X_train, y_train)

# Make predictions
y_pred = nb.predict(X_test)

In [4]:
# Evaluation metrics
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

# Output the metrics
print(f"Accuracy: {accuracy}")
print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(class_report)

Accuracy: 0.9777777777777777
Confusion Matrix:
[[19  0  0]
 [ 0 12  1]
 [ 0  0 13]]
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      0.92      0.96        13
           2       0.93      1.00      0.96        13

    accuracy                           0.98        45
   macro avg       0.98      0.97      0.97        45
weighted avg       0.98      0.98      0.98        45



## Multinomial Naive Bayes
- Use Case: Best for discrete features, especially when dealing with word counts or frequencies. It’s commonly used in text classification tasks where features represent the number of times a word appears in a document.
- Feature Assumption: Assumes that features are counts of occurrences, such as the number of times a word appears in a document.
- Probability Calculation: The probability of a feature given a class is modeled using a multinomial distribution.

In [5]:
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Sample dataset
texts = ["I love programming", "Python is great", "I hate bugs", "Coding is fun", "Debugging is hard"]
labels = [1, 1, 0, 1, 0]  # Example labels: 1 for positive sentiment, 0 for negative sentiment

# Convert text data into feature vectors
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.3, random_state=42)

# Initialize the classifier
model = MultinomialNB()

# Train the model
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluation metrics
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

# Output the metrics
print(f"Accuracy: {accuracy}")
print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(class_report)

Accuracy: 0.5
Confusion Matrix:
[[0 1]
 [0 1]]
Classification Report:
              precision    recall  f1-score   support

           0       0.00      0.00      0.00         1
           1       0.50      1.00      0.67         1

    accuracy                           0.50         2
   macro avg       0.25      0.50      0.33         2
weighted avg       0.25      0.50      0.33         2



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


## Bernoulli Naive Bayes
- Use Case: Suitable for binary/boolean features, where each feature represents the presence or absence of a characteristic (e.g., whether a word is present or not in a document).
- Feature Assumption: Assumes binary features (0 or 1), like whether a word appears or not in a document.
- Probability Calculation: The probability of a feature given a class is modeled using a Bernoulli distribution.

In [6]:
from sklearn.naive_bayes import BernoulliNB
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Sample dataset
texts = ["I love programming", "Python is great", "I hate bugs", "Coding is fun", "Debugging is hard"]
labels = [1, 1, 0, 1, 0]  # Example labels: 1 for positive sentiment, 0 for negative sentiment

# Convert text data into feature vectors
vectorizer = CountVectorizer(binary=True)  # Binary feature vectors
X = vectorizer.fit_transform(texts)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.3, random_state=42)

# Initialize the classifier
model = BernoulliNB()

# Train the model
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluation metrics
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

# Output the metrics
print(f"Accuracy: {accuracy}")
print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(class_report)


Accuracy: 0.5
Confusion Matrix:
[[0 1]
 [0 1]]
Classification Report:
              precision    recall  f1-score   support

           0       0.00      0.00      0.00         1
           1       0.50      1.00      0.67         1

    accuracy                           0.50         2
   macro avg       0.25      0.50      0.33         2
weighted avg       0.25      0.50      0.33         2



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


#### Prepared By,
Ahamed Basith