#### About

> Multinomial Naive Bayes

Multinomial Naive Bayes is a supervised machine learning algorithm that is commonly used for text classification tasks, where the features represent word occurrences or term frequencies in a document. It is based on the Naive Bayes principle and assumes that the features are conditionally independent given the class label.

In [1]:
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report


In [3]:
# Load dataset
newsgroups = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))
X = newsgroups.data
y = newsgroups.target

In [4]:
# Split dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [5]:
# Convert text data to numerical features using CountVectorizer
vectorizer = CountVectorizer()
X_train_vec = vectorizer.fit_transform(X_train)
X_test_vec = vectorizer.transform(X_test)


In [6]:
# Train Multinomial Naive Bayes model
clf = MultinomialNB()
clf.fit(X_train_vec, y_train)


In [7]:
# Predict on test set
y_pred = clf.predict(X_test_vec)


In [8]:
# Evaluate model performance
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print("Classification Report:\n", classification_report(y_test, y_pred, target_names=newsgroups.target_names))

Accuracy: 0.6175066312997347
Classification Report:
                           precision    recall  f1-score   support

             alt.atheism       0.61      0.25      0.36       151
           comp.graphics       0.48      0.75      0.58       202
 comp.os.ms-windows.misc       0.73      0.04      0.08       195
comp.sys.ibm.pc.hardware       0.53      0.73      0.62       183
   comp.sys.mac.hardware       0.86      0.58      0.69       205
          comp.windows.x       0.68      0.80      0.74       215
            misc.forsale       0.88      0.53      0.66       193
               rec.autos       0.87      0.63      0.73       196
         rec.motorcycles       0.49      0.58      0.53       168
      rec.sport.baseball       0.99      0.67      0.80       211
        rec.sport.hockey       0.92      0.80      0.86       198
               sci.crypt       0.59      0.77      0.67       201
         sci.electronics       0.84      0.49      0.62       202
                 sci.m