CMU 10-601 Machine Learning 2015: HW 3
CMU 10-701 Machine Learning 2011: HW 3

Problem 1: Implementing Naive Bayes
The suggested dataset from the graduate level course 10-601 is changed to the suggested dataset from the PhD level course 10-701 (fetch_20newsgroups) because the dataset from 10-601 is not available. 

Furthermore, the preprocessing is done by sklearn to focus on the core problem of Naive Bayes.

First the Naive Bayes Classifier is implemented from scratch with Beta Priors like described in 10-601.
For reference purposes the accuracy of the solution will be compared to the accuracy of sklearn's BernoulliNB.

In [1]:
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
from sklearn import metrics
import numpy as np
# categories_multinomial = ['alt.atheism', 'soc.religion.christian', 'comp.graphics', 'sci.med']
categories_binomial = ['soc.religion.christian', 'comp.graphics']

twenty_train = fetch_20newsgroups(subset='train',
                                 categories=categories_binomial,
                                 shuffle=True,
                                 random_state=42)
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(twenty_train.data)

twenty_test = fetch_20newsgroups(subset = 'test', 
                                categories=categories_binomial,
                                shuffle=True,
                                random_state=42)
docs_test = twenty_test.data
docs_test_counts = count_vect.transform(docs_test)

In [69]:
X_train_counts.shape

(1183, 22953)

In [70]:
X_train_counts.data.shape

(183710,)

In [3]:
set(twenty_train.target)

{0, 1}

Implementing Naive Bayes with BETA Priors from scratch

In [85]:
# P(Y = 1 | X1, ... , Xn) = P(X1, ... , Xn | Y = 1) * P(Y=1) / sum_y[P(X1, ... , Xn | Y = y) * P(Y=y)]
# P(Y = 1 | X1, ... , Xn) = product_i(P(Xi | Y = 1)) * P(Y=1) / sum_y(product_i(P(Xi | Y = y)) * P(Y=y))
# P(Xi | Y = 1) = theta_i**n_i * (1-theta_i)**(N - n_i) # -> one vs. all
# P(Y = 1) = theta_y1**(n_y1 - 1) * (1-theta_y1)**(n_y0 - 1)
# 

def NB_XGivenY(X_train, y_train):
    beta1 = 2
    beta0 = 1
    theta_MAP = np.zeros((2,X_train.shape[1]))
    X_train_given_y1 = X_train[y_train==1]
    X_train_given_y0 = X_train[y_train==0]
    
    X_train_given_y1_ysum = np.sum(X_train_given_y1, axis=0)
    X_train_given_y0_ysum = np.sum(X_train_given_y0, axis=0)
    X_train_given_y1_total = np.sum(X_train_given_y1_ysum, axis=1)
    X_train_given_y0_total = np.sum(X_train_given_y0_ysum, axis=1)
    
    features_count = X_train_given_y1_ysum.shape[1]
    
    theta_MAP[0,:] = (X_train_given_y0_ysum + beta1)/(X_train_given_y0_total + features_count*beta0)
    theta_MAP[1,:] = (X_train_given_y1_ysum + beta1)/(X_train_given_y1_total + features_count*beta0)  

    return theta_MAP

def NB_YPrior(y_train):
    y1_count = np.sum((y_train==1)*1)
    y0_count = np.sum((y_train==0)*1)
    return y1_count/(y1_count + y0_count)

def NB_Classify(theta_MAP, prior, X_test):
    # detect where X is existent
    is_feature = (X_test.todense()!=0)*1
    # fill with according probability "1 - theta" or "1 - (1 - theta)""
    logP_of_X_given_y1 = np.log(np.abs(is_feature - (1-theta_MAP[1]) ) ) 
    logP_of_X_given_y0 = np.log(np.abs(is_feature - (1-theta_MAP[0]) ) )
    
    logP_y1_given_X = np.sum(logP_of_X_given_y1, axis=1) + np.log(prior)
    logP_y0_given_X = np.sum(logP_of_X_given_y0, axis=1) + np.log(1-prior)
    class_probabilities = logP_y1_given_X - logP_y0_given_X

    return ( (class_probabilities) > 0 )*1
    
     

theta_MAP = NB_XGivenY(X_train_counts, twenty_train.target)
prior = NB_YPrior(twenty_train.target)
y_predicted = NB_Classify(theta_MAP, prior, docs_test_counts)
y_test = twenty_test.target.reshape(twenty_test.target.shape[0], 1)

y_predicted_test = np.concatenate((y_predicted, y_test), axis=1)
false_classified_boolean = (y_predicted!=y_test)
correct_classified_boolean = (y_predicted==y_test)
correct_classified = (correct_classified_boolean)*1
correct_classified_sum = np.sum(correct_classified)

print("Accuracy self-implemented:", correct_classified_sum/y_predicted.shape[0])
print(metrics.classification_report(y_test, 
                                    y_predicted,
                                   target_names=twenty_test.target_names))

Accuracy self-implemented: 0.9809402795425667
                        precision    recall  f1-score   support

         comp.graphics       0.97      0.99      0.98       389
soc.religion.christian       0.99      0.97      0.98       398

              accuracy                           0.98       787
             macro avg       0.98      0.98      0.98       787
          weighted avg       0.98      0.98      0.98       787



For reference purposes, comparing to Sklearn BernoulliNB Classifier

In [52]:
from sklearn.naive_bayes import BernoulliNB

clf = BernoulliNB().fit(X_train_counts, twenty_train.target)
predicted = clf.predict(docs_test_counts)
print(metrics.classification_report(twenty_test.target, 
                                    predicted,
                                   target_names=twenty_test.target_names))

                        precision    recall  f1-score   support

         comp.graphics       0.86      0.98      0.91       389
soc.religion.christian       0.98      0.84      0.90       398

              accuracy                           0.91       787
             macro avg       0.92      0.91      0.91       787
          weighted avg       0.92      0.91      0.91       787



Implementing Naive Bayes with DIRICHLET Priors from scratch