# Classifying Texts with a Naive Bayes Classifier

In [1]:
import numpy as np
import pandas as pd

df = pd.read_csv("data_set_amazon.csv")
df.head()

Unnamed: 0,review,class
0,watch for the song no me queda mas by far the ...,Acompanhado
1,a final precious gift from roy to his fans i ...,Acompanhado
2,like a cat dragged in from the rain depeche mo...,Acompanhado
3,folk/rock country/rock what a great artist. d...,Acompanhado
4,"one of my very favorite ""sing along"" albums if...",Acompanhado


In [2]:
df.loc[0, 'review']

'watch for the song no me queda mas by far the highlight of the album is the song  no me queda mas. i used to play it for my students while teaching in albania. we had a lesson where each student had to bring a song  we would listen to it  and we would explain the personal significance it had for us. the song brings gentleness to my heart  the sweetness that was selena  the simple and adorned style she sometimes could do so well. perhaps what impresses me the most about selena is the amazing variety of styles she could sing in. my only wish she could have sung more songs like the beautiful lo me queda mas.'

# Vector representation of text
Let's represent the text as a vector of indicators of occurrences of words from some dictionary in the text. This is the simplest BOF model.

Let's form a dictionary based on the training dataset. To do this, we use the CountVectorizer module:

In [3]:
from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer(binary=True)
vectorizer.fit(df.review)

CountVectorizer(binary=True)

In [4]:
len(vectorizer.vocabulary_)

11164

# Splitting a dataset

In [5]:
from sklearn.model_selection import train_test_split

df_train, df_test = train_test_split(df, test_size=0.2, shuffle=True)



Let's translate the entire set of texts of the training and test dataset into sets of vectors:



In [6]:
X_train = vectorizer.transform(df_train.review)
X_test = vectorizer.transform(df_test.review)

X_train.shape, X_test.shape

((410, 11164), (103, 11164))

In [7]:
df_train

Unnamed: 0,review,class
95,This book is the last word on the F-14. I want...,Sozinho
279,one of the few hip-hop cd's i can love i first...,Família
425,top 10 movie of all time... i've shown this mo...,Amigos
109,I shall just get right to the point: if you ar...,Sozinho
91,This book seems to be missing a lot of words. ...,Sozinho
...,...,...
474,closely patterned after the book most of the f...,Casal
397,the best cd in the universe not many people ha...,Amigos
443,beautiful portrayal of washington my girlfrien...,Amigos
35,indescribable my friend is a cheap-70's-horror...,Sozinho


# BernoulliNB
BernoulliNB - Bayesian classifier for binarized features:

In [8]:
from sklearn.naive_bayes import BernoulliNB

clf = BernoulliNB().fit(X_train, df_train['class'])

Prior probabilities for categories (how did these numbers come about?)

In [9]:
np.exp(clf.class_log_prior_)

array([0.04390244, 0.13170732, 0.14146341, 0.22195122, 0.46097561])

Let's display the 10 most significant words in each category:

In [10]:
def show_top10(classifier, vectorizer, categories=('neg', 'pos')):
    feature_names = np.asarray(vectorizer.get_feature_names())
    for i, category in enumerate(categories):
        top10 = np.argsort(classifier.feature_log_prob_[i])[-10:]
        print("%s: %s" % (category, " ".join(feature_names[top10])))

show_top10(clf, vectorizer)

neg: to was for of is this in and it the
pos: was in is of to it my the this and




We see that the top includes commonly used words that are not specific to any category. This remains to be dealt with, for now, let's run the predict method on the training part of the dataset:

In [11]:
from sklearn.metrics import classification_report

predicts = clf.predict(X_train)
print(classification_report(df_train['class'], predicts))

              precision    recall  f1-score   support

 Acompanhado       0.00      0.00      0.00        18
      Amigos       1.00      0.07      0.14        54
       Casal       1.00      0.03      0.07        58
     Família       0.46      1.00      0.63        91
     Sozinho       0.86      0.95      0.90       189

    accuracy                           0.67       410
   macro avg       0.67      0.41      0.35       410
weighted avg       0.77      0.67      0.59       410



Compare with the metrics on the test part:

In [12]:
predicts = clf.predict(X_test)
print(classification_report(df_test['class'], predicts))

              precision    recall  f1-score   support

 Acompanhado       0.00      0.00      0.00         3
      Amigos       0.00      0.00      0.00        15
       Casal       0.00      0.00      0.00        10
     Família       0.32      0.94      0.48        17
     Sozinho       0.94      0.86      0.90        58

    accuracy                           0.64       103
   macro avg       0.25      0.36      0.28       103
weighted avg       0.58      0.64      0.59       103



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


The model clearly needs to be improved. First of all, it makes sense to pay attention to the feature vector.

# The Bag-of-Words Model

When constructing a feature vector, we will take into account not just the fact that a word occurs in the text, but also count the number of occurrences:

In [13]:
count_vect = CountVectorizer(binary=False).fit(df.review)

X_train_counts = count_vect.transform(df_train.review)
X_test_counts = count_vect.transform(df_test.review)

For example, let's display the words together and the number of occurrences for the first text from the training dataset:

In [14]:
dict(zip(count_vect.inverse_transform(X_train_counts[0])[0], X_train_counts[0].data))

{'14': 2,
 '1980': 1,
 '1988': 1,
 '79': 1,
 'about': 1,
 'against': 1,
 'aircraft': 1,
 'and': 1,
 'before': 1,
 'book': 1,
 'but': 1,
 'during': 1,
 'fall': 1,
 'how': 1,
 'iran': 2,
 'iraq': 1,
 'iraqi': 1,
 'is': 1,
 'last': 1,
 'mentioned': 1,
 'more': 1,
 'nothing': 1,
 'of': 1,
 'on': 1,
 'performed': 1,
 'read': 1,
 'shah': 1,
 'sold': 1,
 'soviet': 1,
 'supplied': 1,
 'that': 1,
 'the': 6,
 'they': 1,
 'this': 1,
 'to': 2,
 'wanted': 1,
 'war': 1,
 'was': 1,
 'were': 1,
 'word': 1}

This text representation is called Bag-of-Words (BOF).

# MultinomialNB

MultinomialNB is a Bayesian classifier for features expressing the number of events that have occurred:

In [15]:
from sklearn.naive_bayes import MultinomialNB

clf = MultinomialNB().fit(X_train_counts, df_train['class'])

In [16]:
# Metrics on learning:
predicts = clf.predict(X_train_counts)
print(classification_report(df_train['class'], predicts))

              precision    recall  f1-score   support

 Acompanhado       1.00      0.17      0.29        18
      Amigos       0.98      0.85      0.91        54
       Casal       1.00      0.55      0.71        58
     Família       0.78      0.95      0.86        91
     Sozinho       0.86      0.99      0.92       189

    accuracy                           0.86       410
   macro avg       0.92      0.70      0.74       410
weighted avg       0.88      0.86      0.85       410



Let's evaluate the new model on the test set:

In [17]:
X_test_counts = count_vect.transform(df_test.review)
predicts = clf.predict(X_test_counts)
print(classification_report(df_test['class'], predicts))

              precision    recall  f1-score   support

 Acompanhado       0.00      0.00      0.00         3
      Amigos       0.00      0.00      0.00        15
       Casal       0.00      0.00      0.00        10
     Família       0.25      0.41      0.31        17
     Sozinho       0.73      0.93      0.82        58

    accuracy                           0.59       103
   macro avg       0.20      0.27      0.23       103
weighted avg       0.45      0.59      0.51       103



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Non-specific words are still leading:

In [18]:
show_top10(clf, count_vect)

neg: that was this is in it to of and the
pos: in was my is of to this and it the




It's time to get rid of them by removing the stop words:

In [19]:
from sklearn.feature_extraction.text import ENGLISH_STOP_WORDS

count_vect = CountVectorizer(stop_words=ENGLISH_STOP_WORDS, binary=False).fit(df.review)

X_train_counts = count_vect.transform(df_train.review)
X_test_counts = count_vect.transform(df_test.review)

In [20]:
clf = MultinomialNB().fit(X_train_counts, df_train['class'])

Now let see if we improved the model after removing the stop words

In [21]:
predicts = clf.predict(X_test_counts)
print(classification_report(df_test['class'], predicts))

              precision    recall  f1-score   support

 Acompanhado       0.00      0.00      0.00         3
      Amigos       0.64      0.47      0.54        15
       Casal       0.50      0.10      0.17        10
     Família       0.39      0.76      0.52        17
     Sozinho       0.89      0.88      0.89        58

    accuracy                           0.70       103
   macro avg       0.49      0.44      0.42       103
weighted avg       0.71      0.70      0.68       103



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


The model added a few more percent of accuracy! Now in the top more meaningful words:

In [22]:
show_top10(clf, count_vect)

neg: best orbison roy good time just great song movie album
pos: songs don friends just like album cd song great movie




Let's go further along the path of improving the indicative description.

# TF-IDF

TF-IDF - a measure of the significance of a word for a document, calculated as the product of the Term Frequency value (the frequency of occurrence of the word in the document) by the Inverse Document Frequency value (the reciprocal of the proportion of documents in the dataset in which the given word occurs). You can often find logarithms in the TF-IDF calculation formula.

Let's make a dictionary and calculate the IDF measures for the words of the training dataset:

In [23]:
from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer(stop_words=ENGLISH_STOP_WORDS)
vectorizer = vectorizer.fit(df.review)

Let's convert the texts of the training and test dataset into vectors:

In [24]:
X_train_vectors = vectorizer.transform(df_train.review)
X_test_vectors = vectorizer.transform(df_test.review)

The non-zero elements of the vectors contain the values ​​of the TF-IDF measure for words from the document:

In [25]:
X_train_vectors[0].data

array([0.13814654, 0.14942613, 0.13814654, 0.22487051, 0.22487051,
       0.18185537, 0.22487051, 0.07973702, 0.21094837, 0.16633591,
       0.22487051, 0.22487051, 0.44974102, 0.17322618, 0.06410383,
       0.21094837, 0.22487051, 0.19340855, 0.21094837, 0.40214091])

In [26]:
# Let's derive the words from the first document in ascending order from the TF-IDF measure:
vectorizer.inverse_transform(X_train_vectors[0])[0][np.argsort(X_train_vectors[0].data)]

array(['book', 'read', 'word', 'wanted', 'war', 'mentioned', 'fall',
       'sold', '1988', 'performed', '1980', 'aircraft', 'shah', 'soviet',
       'iraqi', 'iraq', 'supplied', '79', '14', 'iran'], dtype='<U22')

As it was planned, at the beginning of the list were commonly used words, and towards the end of the list - words that are specific to this particular document.

Let's apply the MultinomialNB model (how to adapt the MultinomialNB model for non-integer features?):

In [27]:
clf = MultinomialNB().fit(X_train_vectors, df_train['class'])

In [28]:
# Top list of words by category:
show_top10(clf, vectorizer)

neg: just orbison songs music best latin song album great movie
pos: don friend watch songs album great film friends cd movie




In [29]:
# Metrics on learning:
predicts = clf.predict(X_train_vectors)
print(classification_report(df_train['class'], predicts))

              precision    recall  f1-score   support

 Acompanhado       0.00      0.00      0.00        18
      Amigos       1.00      0.02      0.04        54
       Casal       1.00      0.03      0.07        58
     Família       0.84      0.81      0.83        91
     Sozinho       0.59      1.00      0.74       189

    accuracy                           0.65       410
   macro avg       0.69      0.37      0.33       410
weighted avg       0.73      0.65      0.54       410



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [30]:
# Model metrics on the test dataset:
X_test_vectors = vectorizer.transform(df_test.review)
predicts = clf.predict(X_test_vectors)
print(classification_report(df_test['class'], predicts))

              precision    recall  f1-score   support

 Acompanhado       0.00      0.00      0.00         3
      Amigos       0.00      0.00      0.00        15
       Casal       0.00      0.00      0.00        10
     Família       0.33      0.06      0.10        17
     Sozinho       0.58      1.00      0.73        58

    accuracy                           0.57       103
   macro avg       0.18      0.21      0.17       103
weighted avg       0.38      0.57      0.43       103



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


we got the accuracy even more less than the previous one which was 59. Can be better!

Let's try to count not only single words, but also pairs of words:

In [31]:
vectorizer = TfidfVectorizer(stop_words=ENGLISH_STOP_WORDS, ngram_range=(1, 2)).fit(df.review)

X_train_vectors = vectorizer.transform(df_train.review)
X_test_vectors = vectorizer.transform(df_test.review)


Let's derive the words from the first document in ascending order from the TF-IDF measure:

In [32]:
vectorizer.inverse_transform(X_train_vectors[0])[0][np.argsort(X_train_vectors[0].data)]

array(['book', 'read', 'word', 'wanted', 'war', 'mentioned', 'fall',
       'sold', '1988', '1980', 'performed', 'aircraft', 'fall shah',
       'book word', 'word 14', 'iran fall', '79 14', '79', '1988 iran',
       '1980 1988', '14 wanted', 'aircraft 1980', 'iran iraq', 'iraqi',
       'iraq war', 'war mentioned', 'wanted read', 'supplied iraqi',
       'supplied', 'soviet supplied', 'soviet', 'iraq', 'sold iran',
       'shah', 'read 79', 'performed soviet', 'iraqi aircraft', '14 sold',
       'shah performed', '14', 'iran'], dtype='<U34')

Apply the MultinomialNB model again:

In [33]:
clf = MultinomialNB().fit(X_train_vectors, df_train['class'])

In [34]:
# lets go again
predicts = clf.predict(X_train_vectors)
print(classification_report(df_train['class'], predicts))

              precision    recall  f1-score   support

 Acompanhado       0.00      0.00      0.00        18
      Amigos       1.00      0.02      0.04        54
       Casal       0.00      0.00      0.00        58
     Família       0.99      0.92      0.95        91
     Sozinho       0.58      1.00      0.74       189

    accuracy                           0.67       410
   macro avg       0.51      0.39      0.35       410
weighted avg       0.62      0.67      0.56       410



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [35]:
# on test data
X_test_vectors = vectorizer.transform(df_test.review)
predicts = clf.predict(X_test_vectors)
print(classification_report(df_test['class'], predicts))

              precision    recall  f1-score   support

 Acompanhado       0.00      0.00      0.00         3
      Amigos       0.00      0.00      0.00        15
       Casal       0.00      0.00      0.00        10
     Família       0.00      0.00      0.00        17
     Sozinho       0.56      1.00      0.72        58

    accuracy                           0.56       103
   macro avg       0.11      0.20      0.14       103
weighted avg       0.32      0.56      0.41       103



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


# SVM
As we have done with the Naive Bayes, now we will try SVM

In [37]:
from sklearn.svm import SVC

param_grid = {'C': [0.1,1, 10, 100], 'gamma': [1,0.1,0.01,0.001],'kernel': ['rbf', 'poly', 'sigmoid']}

In [40]:
from sklearn.model_selection import GridSearchCV
grid = GridSearchCV(SVC(),param_grid,refit=True,verbose=2)
grid.fit(X_train,df_train['class'])

Fitting 5 folds for each of 48 candidates, totalling 240 fits
[CV] END .........................C=0.1, gamma=1, kernel=rbf; total time=   0.5s
[CV] END .........................C=0.1, gamma=1, kernel=rbf; total time=   0.3s
[CV] END .........................C=0.1, gamma=1, kernel=rbf; total time=   0.3s
[CV] END .........................C=0.1, gamma=1, kernel=rbf; total time=   0.3s
[CV] END .........................C=0.1, gamma=1, kernel=rbf; total time=   0.3s
[CV] END ........................C=0.1, gamma=1, kernel=poly; total time=   0.4s
[CV] END ........................C=0.1, gamma=1, kernel=poly; total time=   0.3s
[CV] END ........................C=0.1, gamma=1, kernel=poly; total time=   0.5s
[CV] END ........................C=0.1, gamma=1, kernel=poly; total time=   0.5s
[CV] END ........................C=0.1, gamma=1, kernel=poly; total time=   0.4s
[CV] END .....................C=0.1, gamma=1, kernel=sigmoid; total time=   0.2s
[CV] END .....................C=0.1, gamma=1, k

[CV] END ....................C=1, gamma=0.01, kernel=sigmoid; total time=   0.1s
[CV] END ....................C=1, gamma=0.01, kernel=sigmoid; total time=   0.2s
[CV] END ....................C=1, gamma=0.01, kernel=sigmoid; total time=   0.2s
[CV] END ....................C=1, gamma=0.01, kernel=sigmoid; total time=   0.2s
[CV] END .......................C=1, gamma=0.001, kernel=rbf; total time=   0.2s
[CV] END .......................C=1, gamma=0.001, kernel=rbf; total time=   0.2s
[CV] END .......................C=1, gamma=0.001, kernel=rbf; total time=   0.2s
[CV] END .......................C=1, gamma=0.001, kernel=rbf; total time=   0.2s
[CV] END .......................C=1, gamma=0.001, kernel=rbf; total time=   0.2s
[CV] END ......................C=1, gamma=0.001, kernel=poly; total time=   0.2s
[CV] END ......................C=1, gamma=0.001, kernel=poly; total time=   0.2s
[CV] END ......................C=1, gamma=0.001, kernel=poly; total time=   0.2s
[CV] END ...................

[CV] END ......................C=100, gamma=0.1, kernel=poly; total time=   0.3s
[CV] END ......................C=100, gamma=0.1, kernel=poly; total time=   0.3s
[CV] END ...................C=100, gamma=0.1, kernel=sigmoid; total time=   0.1s
[CV] END ...................C=100, gamma=0.1, kernel=sigmoid; total time=   0.1s
[CV] END ...................C=100, gamma=0.1, kernel=sigmoid; total time=   0.1s
[CV] END ...................C=100, gamma=0.1, kernel=sigmoid; total time=   0.1s
[CV] END ...................C=100, gamma=0.1, kernel=sigmoid; total time=   0.1s
[CV] END ......................C=100, gamma=0.01, kernel=rbf; total time=   0.3s
[CV] END ......................C=100, gamma=0.01, kernel=rbf; total time=   0.3s
[CV] END ......................C=100, gamma=0.01, kernel=rbf; total time=   0.2s
[CV] END ......................C=100, gamma=0.01, kernel=rbf; total time=   0.3s
[CV] END ......................C=100, gamma=0.01, kernel=rbf; total time=   0.3s
[CV] END ...................

GridSearchCV(estimator=SVC(),
             param_grid={'C': [0.1, 1, 10, 100], 'gamma': [1, 0.1, 0.01, 0.001],
                         'kernel': ['rbf', 'poly', 'sigmoid']},
             verbose=2)

so we got the following accuracy with SVM 

In [41]:
from sklearn.metrics import classification_report, confusion_matrix 
grid_predictions = grid.predict(X_test)
print(confusion_matrix(df_test['class'],grid_predictions))
print(classification_report(df_test['class'],grid_predictions))

[[ 0  0  1  1  1]
 [ 0 10  0  4  1]
 [ 0  0  5  4  1]
 [ 0  5  4  6  2]
 [ 0  1  2  5 50]]
              precision    recall  f1-score   support

 Acompanhado       0.00      0.00      0.00         3
      Amigos       0.62      0.67      0.65        15
       Casal       0.42      0.50      0.45        10
     Família       0.30      0.35      0.32        17
     Sozinho       0.91      0.86      0.88        58

    accuracy                           0.69       103
   macro avg       0.45      0.48      0.46       103
weighted avg       0.69      0.69      0.69       103



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


# KFold cross validation

In [77]:
X = df_train['review']
Y = df_train['class']
tfidf_vectorizer = TfidfVectorizer(max_features=5000,ngram_range=(2,2))
# TF-IDF feature matrix
X= tfidf_vectorizer.fit_transform(X)

In [84]:
from sklearn import preprocessing 
label_encoder = preprocessing.LabelEncoder() 
# Encode labels in column 'sentiment'. 
Y= label_encoder.fit_transform(Y) 
Y

array([4, 3, 1, 4, 4, 0, 0, 4, 4, 3, 3, 2, 3, 4, 2, 3, 4, 2, 3, 4, 0, 3,
       3, 2, 3, 3, 4, 1, 4, 4, 2, 1, 4, 3, 4, 3, 4, 4, 3, 2, 1, 4, 4, 2,
       1, 0, 4, 4, 0, 4, 4, 4, 1, 2, 4, 3, 4, 4, 4, 1, 2, 1, 1, 2, 3, 4,
       0, 2, 3, 1, 3, 2, 3, 4, 4, 3, 4, 3, 3, 3, 4, 1, 4, 0, 4, 3, 4, 4,
       1, 2, 3, 4, 3, 1, 4, 4, 3, 4, 4, 4, 4, 4, 4, 1, 3, 4, 4, 4, 1, 4,
       3, 1, 4, 4, 4, 1, 0, 1, 4, 1, 3, 4, 3, 2, 2, 4, 1, 2, 2, 1, 2, 4,
       1, 4, 2, 3, 4, 2, 4, 4, 3, 0, 4, 4, 3, 4, 2, 4, 4, 3, 4, 4, 4, 4,
       1, 1, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 4, 3, 2, 4, 3, 4, 1, 4, 3, 4,
       3, 4, 4, 3, 4, 1, 0, 1, 3, 4, 3, 4, 4, 2, 3, 4, 4, 3, 3, 4, 4, 4,
       4, 4, 4, 4, 4, 4, 3, 4, 4, 4, 2, 3, 2, 2, 3, 3, 4, 3, 4, 4, 3, 1,
       4, 4, 2, 4, 3, 4, 4, 4, 0, 4, 3, 3, 3, 2, 1, 4, 3, 3, 2, 4, 2, 2,
       3, 4, 4, 2, 4, 3, 3, 4, 0, 2, 4, 4, 3, 4, 4, 4, 4, 1, 4, 4, 4, 3,
       1, 4, 3, 4, 2, 4, 4, 0, 3, 4, 4, 2, 4, 2, 1, 3, 4, 1, 4, 2, 3, 4,
       4, 3, 1, 4, 1, 2, 1, 3, 3, 4, 0, 4, 4, 4, 1,

In [86]:
from scipy import interp
from itertools import cycle
from collections import defaultdict
from collections import Counter
from imblearn.over_sampling import SMOTE
print(f'Original dataset shape : {Counter(Y)}')

smote = SMOTE(random_state=42)
X_res, y_res = smote.fit_resample(X, Y)

print(f'Resampled dataset shape {Counter(y_res)}')

Original dataset shape : Counter({4: 189, 3: 91, 2: 58, 1: 54, 0: 18})
Resampled dataset shape Counter({4: 189, 3: 189, 1: 189, 0: 189, 2: 189})


In [87]:
## Divide the dataset into Train and Test
X_train, X_test, y_train, y_test = train_test_split(X_res, y_res, test_size=0.25, random_state=0)

In [88]:
#creating the objects

svc_cv=SVC()
nb_cv=BernoulliNB()
cv_dict = {0: 'SVM', 1: 'Naive Bayes'}
cv_models=[svc_cv,nb_cv]


for i,model in enumerate(cv_models):
    print("{} Test Accuracy: {}".format(cv_dict[i],cross_val_score(model, X,Y, cv=10, scoring ='accuracy').mean()))

SVM Test Accuracy: 0.4853658536585367
Naive Bayes Test Accuracy: 0.573170731707317
