# Naive Bayes Classifier

In basic probability; Bayes Theorem states:

$$P(A|B) = \frac{P(B|A)P(A)}{P(B)} $$

For us, we can translate this to:

$$P(y|x_i) = \frac{P(x_1, ..., x_j|y)P(y)}{P(x_1, ..., x_j)}$$

### Gaussian

Continuous Features!

In [1]:
from sklearn import datasets
from sklearn.naive_bayes import GaussianNB

In [2]:
iris = datasets.load_iris()

In [3]:
features = iris.data
target = iris.target

In [4]:
clf = GaussianNB()

In [5]:
model = clf.fit(features, target)

In [6]:
new_obs = [[4, 4, 4, 0.4]]

In [7]:
model.predict(new_obs)

array([1])

In [8]:
#assign prior belief on target
clf = GaussianNB(priors = [0.25, 0.25, 0.5])

In [9]:
model = clf.fit(features, target)

### Multinomial

Discrete or count data.

HYPERPARAMETER ALPHA 

In [12]:
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
import numpy as np

In [13]:
text_data = np.array(['I like Cardi B. ', 'Tribeca is a strange place.', ' Germany is where they make volkswagen cars.'])
count = CountVectorizer()
bag_of_words = count.fit_transform(text_data)

In [14]:
features = bag_of_words.toarray()
target = np.array([0, 0, 1])
clf = MultinomialNB(class_prior = [0.25, 0.5])

In [15]:
model = clf.fit(features, target)

In [16]:
new_observation = [[0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1]]

In [17]:
model.predict(new_observation)

array([1])

### Bernoullli

Binary feature data.

In [18]:
from sklearn.naive_bayes import BernoulliNB

In [19]:
features = np.random.randint(2, size = (100, 3))

In [20]:
target = np.random.randint(2, size = (100, 1)).ravel()

In [23]:
clf = BernoulliNB(class_prior = [0.25, .5])

In [25]:
model = clf.fit(features, target)

In [26]:
import seaborn as sns

In [27]:
titanic = sns.load_dataset('titanic')

In [28]:
titanic.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


In [29]:
from sklearn.model_selection import train_test_split

In [30]:
X_train, X_test, y_train, y_test = train_test_split(titanic.fare, titanic.survived)

In [31]:
gnb = GaussianNB()

In [32]:
gnb.fit(X_train.values.reshape(-1,1), y_train)

  """Entry point for launching an IPython kernel.


GaussianNB(priors=None)

In [34]:
pred = gnb.predict(X_test.values.reshape(-1,1))

  """Entry point for launching an IPython kernel.


In [37]:
gnb.score(X_test.values.reshape(-1,1), y_test)

  """Entry point for launching an IPython kernel.


0.6322869955156951

In [38]:
from sklearn.metrics import classification_report

In [43]:
print(classification_report(pred, y_test))

             precision    recall  f1-score   support

          0       0.98      0.62      0.76       207
          1       0.14      0.81      0.24        16

avg / total       0.92      0.63      0.72       223



In [44]:
titanic.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


In [48]:
titanic.nlargest(10, 'fare')

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
258,1,1,female,35.0,0,0,512.3292,C,First,woman,False,,Cherbourg,yes,True
679,1,1,male,36.0,0,1,512.3292,C,First,man,True,B,Cherbourg,yes,False
737,1,1,male,35.0,0,0,512.3292,C,First,man,True,B,Cherbourg,yes,True
27,0,1,male,19.0,3,2,263.0,S,First,man,True,C,Southampton,no,False
88,1,1,female,23.0,3,2,263.0,S,First,woman,False,C,Southampton,yes,False
341,1,1,female,24.0,3,2,263.0,S,First,woman,False,C,Southampton,yes,False
438,0,1,male,64.0,1,4,263.0,S,First,man,True,C,Southampton,no,False
311,1,1,female,18.0,2,2,262.375,C,First,woman,False,B,Cherbourg,yes,False
742,1,1,female,21.0,2,2,262.375,C,First,woman,False,B,Cherbourg,yes,False
118,0,1,male,24.0,0,1,247.5208,C,First,man,True,B,Cherbourg,no,False
