## **NAIVE BAYES ML MODELS**

. Naive Bayes is a simple supervised machine learning algorithm.

. Naive Bayes is a family of probabilistic algorithms primarily used for classification problems.

. The algorithm is based on applying Bayes' theorem and assuming conditional independence given the class label.

. It is a simple yet effective method for many classification tasks, especially with text data(e.g, spam filtering or sentiment analysis)

### There are three main types of Naive Bayes

1. **Guassian Naive Bayes**
    . This algorithm is used for continuous numerical features that are assumed to follow a normal distribution (also knows as Guassian distribution).
     . In Guassian Naive Bayes, the likelihood probability P(features | class) is modeled using the normal distribution with a mean and variance estimated from the training data.

 **2. Multinomial Naive Bayes**    
       . This algorithm is used for discrete count data such as word counts in text classification.
       . In Multinomial Naive Bayes, the likelihood probability P(features | class) is modeled using the Multinomial distribution, which models the probability of observing a feature count given the class label.

  **3. Bernoulli Naive Bayes**
       . This algorithm is also used for discrete count data such as word counts in text classification , but the features are binary (0 or 1) instead of counts.
        . In Bernoulli Naive Bayes, the likelihood probability P(features |  class) is modeled using the Bernoulli distribution, which models the probability of observing a feature given the class label as a binary variable.


In [1]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split 
from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB, ComplementNB
from sklearn.metrics import accuracy_score 

# load the iris dataset 
iris = load_iris()
X = iris.data
y = iris.target 

# split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize the Naive Bayes models 
gnb = GaussianNB()
mnb = MultinomialNB()
bnb = BernoulliNB()
cnb = ComplementNB()

# Train the models on the training set 
gnb.fit(X_train, y_train)
mnb.fit(X_train, y_train)
bnb.fit(X_train, y_train)
cnb.fit(X_train, y_train)

# Make predictions on the testing set using each model
gnb_pred = gnb.predict(X_test)
mnb_pred = mnb.predict(X_test)
bnb_pred = bnb.predict(X_test)
cnb_pred = cnb.predict(X_test)

# Calculate the accuracy scores for each model
gnb_score = accuracy_score(y_test, gnb_pred)
mnb_score = accuracy_score(y_test, mnb_pred)
bnb_score = accuracy_score(y_test, bnb_pred)
cnb_score = accuracy_score(y_test, cnb_pred)

# Print the accuracy scores
print("Guassian Naive Bayes accuracy:", gnb_score)
print("Multinomial Naive Bayes accuracy:", mnb_score)
print("Bernoulli Naive Bayes accuracy:", bnb_score)
print("Complement Naive Bayes accuracy:", cnb_score)

# Select the best model based on the accuracy score 
best_model = max([(gnb_score, 'Guassian'), (mnb_score, 'Multinomial'), (bnb_score, 'Bernoulli'), (cnb_score, 'Complement')])
# print a separating line in output 
print("---------------------------")

print("Best model:", best_model[1])
print("Best accuracy:", best_model[0])






Guassian Naive Bayes accuracy: 0.9777777777777777
Multinomial Naive Bayes accuracy: 0.9555555555555556
Bernoulli Naive Bayes accuracy: 0.28888888888888886
Complement Naive Bayes accuracy: 0.7111111111111111
---------------------------
Best model: 0.9777777777777777
Best accuracy: Guassian


## **same code using for loop**

In [4]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB, ComplementNB
from sklearn.metrics import accuracy_score

# load the iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize the Naive Bayes models
models = {
    'Gaussian': GaussianNB(),
    'Multinomial': MultinomialNB(),
    'Bernoulli': BernoulliNB(),
    'Complement': ComplementNB()
}

# Train and evaluate each model
for name, model in models.items():
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"{name} Naive Bayes accuracy: {accuracy}")

# Find the best model based on accuracy
best_model_name, best_model = max(models.items(), key=lambda x: accuracy_score(y_test, x[1].predict(X_test)))
print("---------------------------")
print("Best model:", best_model_name)
print("Best accuracy:", accuracy_score(y_test, best_model.predict(X_test)))


Gaussian Naive Bayes accuracy: 0.9777777777777777
Multinomial Naive Bayes accuracy: 0.9555555555555556
Bernoulli Naive Bayes accuracy: 0.28888888888888886
Complement Naive Bayes accuracy: 0.7111111111111111
---------------------------
Best model: Gaussian
Best accuracy: 0.9777777777777777
