<a href="https://colab.research.google.com/github/VighneshS/sentiment_prediction/blob/master/sentiment_prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Sentiment Prediction using Naive Bayes Classifier (NBC)
This is a notebook to understand how Naive Bayes Classifier (NBC) works and also how it is useful to classify text based on sentiment.

We will also see how it will be effective against missing data.

## Settings
Training Percentage

In [657]:
training_ratio = 60 / 100

## Importing the Data
We used the [kaggle dataset](https://storage.googleapis.com/kagglesdsdata/datasets/22169/30047/sentiment%20labelled%20sentences/imdb_labelled.txt?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=gcp-kaggle-com%40kaggle-161607.iam.gserviceaccount.com%2F20210425%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20210425T202010Z&X-Goog-Expires=259199&X-Goog-SignedHeaders=host&X-Goog-Signature=6133706ef10bc2dcd0b58f8398b4d73ab9e9d788de1718b07334df91f6007e1e4ca0b78e3176f95b8250e0c4535ce1633528f4fabffeb7e4124af3ee3f895ac34c03044fca9b23b23c4ddb8fa90d84dfc14869ff4806f03783cafad53b19445b3c3052983fdf1ca4384257eac1bc0a4270d238a1ea89d1289866c7a0ea7ad7c97a76f2e142c148019e39cc5a1295f92650747ac5ea5946b026f7ad6d5d262d4c4a370aee6bc1f5d5b445bb6d93692debe678a79e5e1c1fe3d3e68ea4f2fad3115795d3361e0626e98156fbc7f5967beb7cf0f00e07351d23a00d8677ebb75e3e13b1bfa07762266efabf6f6f9d53206be31b7623cf3614f60f8cf5011cf23def) to get the ground truth of sample IMDB reviews.

In [658]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np  # linear algebra
import pandas as pd  # data processing, CSV file I/O (e.g. pd.read_csv)
import math

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

# import os
# for dirname, _, filenames in os.walk('/kaggle/input'):
#     for filename in filenames:
#         print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

data = pd.read_csv(
    r"http://storage.googleapis.com/kagglesdsdata/datasets/22169/30047/sentiment%20labelled%20sentences/imdb_labelled.txt?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=gcp-kaggle-com%40kaggle-161607.iam.gserviceaccount.com%2F20210425%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20210425T202010Z&X-Goog-Expires=259199&X-Goog-SignedHeaders=host&X-Goog-Signature=6133706ef10bc2dcd0b58f8398b4d73ab9e9d788de1718b07334df91f6007e1e4ca0b78e3176f95b8250e0c4535ce1633528f4fabffeb7e4124af3ee3f895ac34c03044fca9b23b23c4ddb8fa90d84dfc14869ff4806f03783cafad53b19445b3c3052983fdf1ca4384257eac1bc0a4270d238a1ea89d1289866c7a0ea7ad7c97a76f2e142c148019e39cc5a1295f92650747ac5ea5946b026f7ad6d5d262d4c4a370aee6bc1f5d5b445bb6d93692debe678a79e5e1c1fe3d3e68ea4f2fad3115795d3361e0626e98156fbc7f5967beb7cf0f00e07351d23a00d8677ebb75e3e13b1bfa07762266efabf6f6f9d53206be31b7623cf3614f60f8cf5011cf23def",
    delimiter="\t", header=None, names=["IMDB Review", "Sentiment"])
data

Unnamed: 0,IMDB Review,Sentiment
0,"A very, very, very slow-moving, aimless movie ...",0
1,Not sure who was more lost - the flat characte...,0
2,Attempting artiness with black & white and cle...,0
3,Very little music or anything to speak of.,0
4,The best scene in the movie was when Gerardo i...,1
...,...,...
743,I just got bored watching Jessice Lange take h...,0
744,"Unfortunately, any virtue in this film's produ...",0
745,"In a word, it is embarrassing.",0
746,Exceptionally bad!,0


### Split Data
We split the data into train, development and test

In [659]:
train = data[:math.floor(data.shape[0] * training_ratio)]

In [660]:
validation = data[math.floor(data.shape[0] * training_ratio):]
dev, test = np.array_split(validation, 2)

In [661]:
train, dev, test

(                                           IMDB Review  Sentiment
 0    A very, very, very slow-moving, aimless movie ...          0
 1    Not sure who was more lost - the flat characte...          0
 2    Attempting artiness with black & white and cle...          0
 3         Very little music or anything to speak of.            0
 4    The best scene in the movie was when Gerardo i...          1
 ..                                                 ...        ...
 443  It's a campy sort of film that's a joy to watc...          1
 444  There's barely a boring moment in the film and...          1
 445        The cast is always entertaining as usual.            1
 446                              Overall, a delight!            1
 447  This movie is so mind-bendingly awful, it coul...          0
 
 [448 rows x 2 columns],
                                            IMDB Review  Sentiment
 448  The film lacks any real scares or tension & so...          0
 449      The least said about the 

## Generation of Vocabulary list

In [662]:
def split_words(review):
    return review.lower().replace(',', '').replace('"', '').replace('(', '').replace(')', '').replace('\'s',
                                                                                                      '').replace(
        '.',
        '').replace(
        '!', '').replace('-', ' ').replace('/', ' ').split()


def get_word_count(review_data_frame: pd.DataFrame, column_name: str):
    vocab = review_data_frame["IMDB Review"].apply(lambda review: pd.value_counts(
        split_words(review))).sum(axis=0).to_frame()
    vocab.columns = [column_name]
    vocab.reset_index(inplace=True)
    vocab = vocab.rename(columns={'index': 'Word'})
    return vocab

In [663]:
vocabulary = get_word_count(train, "Frequency")
vocabulary

Unnamed: 0,Word,Frequency
0,very,45.0
1,a,270.0
2,man,9.0
3,moving,2.0
4,slow,3.0
...,...,...
2298,scot,1.0
2299,nonetheless,1.0
2300,campy,1.0
2301,delight,1.0


### Probability of the word
Frequency of the word in all documents / Total number of words

### Total Number of words

In [664]:
total_words = vocabulary["Frequency"].sum(axis=0)
total_words

9440.0

In [665]:
total_sentiments = train.count(axis=0)['Sentiment']
total_sentiments

448

In [666]:
vocabulary['Word Probability'] = vocabulary["Frequency"].div(total_words)
vocabulary

Unnamed: 0,Word,Frequency,Word Probability
0,very,45.0,0.004767
1,a,270.0,0.028602
2,man,9.0,0.000953
3,moving,2.0,0.000212
4,slow,3.0,0.000318
...,...,...,...
2298,scot,1.0,0.000106
2299,nonetheless,1.0,0.000106
2300,campy,1.0,0.000106
2301,delight,1.0,0.000106


### Conditional Probability based on sentiment
i.e. P(word | sentiment = "Positive"(1)/ "Negative"(0))

###

In [667]:
positive_sentiments = train[train['Sentiment'] == 1]
positive_vocabulary = get_word_count(positive_sentiments, "Positive Sentiment Count")
vocabulary = vocabulary.merge(positive_vocabulary, how='left', on='Word')
vocabulary

Unnamed: 0,Word,Frequency,Word Probability,Positive Sentiment Count
0,very,45.0,0.004767,12.0
1,a,270.0,0.028602,105.0
2,man,9.0,0.000953,4.0
3,moving,2.0,0.000212,
4,slow,3.0,0.000318,
...,...,...,...,...
2298,scot,1.0,0.000106,1.0
2299,nonetheless,1.0,0.000106,1.0
2300,campy,1.0,0.000106,1.0
2301,delight,1.0,0.000106,1.0


In [668]:
total_positive_words = positive_sentiments.count(axis=0)['Sentiment']
total_positive_words

188

In [669]:
probability_of_positive_sentiments = total_positive_words / total_sentiments
probability_of_positive_sentiments

0.41964285714285715

In [670]:
vocabulary['Positive Sentiments Probability'] = vocabulary['Positive Sentiment Count'].div(total_positive_words)
vocabulary

Unnamed: 0,Word,Frequency,Word Probability,Positive Sentiment Count,Positive Sentiments Probability
0,very,45.0,0.004767,12.0,0.063830
1,a,270.0,0.028602,105.0,0.558511
2,man,9.0,0.000953,4.0,0.021277
3,moving,2.0,0.000212,,
4,slow,3.0,0.000318,,
...,...,...,...,...,...
2298,scot,1.0,0.000106,1.0,0.005319
2299,nonetheless,1.0,0.000106,1.0,0.005319
2300,campy,1.0,0.000106,1.0,0.005319
2301,delight,1.0,0.000106,1.0,0.005319


In [671]:
negative_sentiments = train[train['Sentiment'] == 0]
negative_vocabulary = get_word_count(negative_sentiments, "Negative Sentiment Count")
vocabulary = vocabulary.merge(negative_vocabulary, how='left', on='Word')
vocabulary

Unnamed: 0,Word,Frequency,Word Probability,Positive Sentiment Count,Positive Sentiments Probability,Negative Sentiment Count
0,very,45.0,0.004767,12.0,0.063830,33.0
1,a,270.0,0.028602,105.0,0.558511,165.0
2,man,9.0,0.000953,4.0,0.021277,5.0
3,moving,2.0,0.000212,,,2.0
4,slow,3.0,0.000318,,,3.0
...,...,...,...,...,...,...
2298,scot,1.0,0.000106,1.0,0.005319,
2299,nonetheless,1.0,0.000106,1.0,0.005319,
2300,campy,1.0,0.000106,1.0,0.005319,
2301,delight,1.0,0.000106,1.0,0.005319,


In [672]:
total_negative_words = negative_sentiments.count(axis=0)['Sentiment']
total_negative_words

260

In [673]:
probability_of_negative_sentiments = total_negative_words / total_sentiments
probability_of_negative_sentiments



0.5803571428571429

In [674]:
vocabulary['Negative Sentiments Probability'] = vocabulary['Negative Sentiment Count'].div(total_negative_words)
vocabulary

Unnamed: 0,Word,Frequency,Word Probability,Positive Sentiment Count,Positive Sentiments Probability,Negative Sentiment Count,Negative Sentiments Probability
0,very,45.0,0.004767,12.0,0.063830,33.0,0.126923
1,a,270.0,0.028602,105.0,0.558511,165.0,0.634615
2,man,9.0,0.000953,4.0,0.021277,5.0,0.019231
3,moving,2.0,0.000212,,,2.0,0.007692
4,slow,3.0,0.000318,,,3.0,0.011538
...,...,...,...,...,...,...,...
2298,scot,1.0,0.000106,1.0,0.005319,,
2299,nonetheless,1.0,0.000106,1.0,0.005319,,
2300,campy,1.0,0.000106,1.0,0.005319,,
2301,delight,1.0,0.000106,1.0,0.005319,,


In [675]:
def get_probabilities(review: str, sentiment: bool, smoothening: bool):
    prob = 1
    column_name = 'Positive Sentiments Probability' if sentiment else 'Negative Sentiments Probability'
    individual_prob = 0 if not smoothening else 1 / (probability_of_positive_sentiments if sentiment else probability_of_negative_sentiments)
    for word in split_words(review):
        if word in vocabulary.values:
            individual_prob = vocabulary[vocabulary['Word'] == word].iloc[0][column_name]
        prob *= 0 if math.isnan(individual_prob) else individual_prob
    return prob * (probability_of_positive_sentiments if sentiment else probability_of_negative_sentiments)

In [676]:
train["Conditional Positive Probability"] = train["IMDB Review"].apply(
    lambda review: get_probabilities(review, True, False))
train["Conditional Negative Probability"] = train["IMDB Review"].apply(
    lambda review: get_probabilities(review, False, False))
train["Predicted sentiment"] = train["Conditional Positive Probability"] > train["Conditional Negative Probability"]
print("Train Accuracy: ",
      train.loc[train["Predicted sentiment"] == train["Sentiment"]].count(axis=0)['Sentiment'] * 100 /
      train.count(axis=0)['Sentiment'])
train.loc[train["Predicted sentiment"] != train["Sentiment"]]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  train["Conditional Positive Probability"] = train["IMDB Review"].apply(


Train Accuracy:  96.65178571428571


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  train["Conditional Negative Probability"] = train["IMDB Review"].apply(
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  train["Predicted sentiment"] = train["Conditional Positive Probability"] > train["Conditional Negative Probability"]


Unnamed: 0,IMDB Review,Sentiment,Conditional Positive Probability,Conditional Negative Probability,Predicted sentiment
20,This if the first movie I've given a 10 to in ...,1,7.714623e-14,4.937223e-13,False
25,I gave it a 10,1,2.152155e-05,2.480739e-05,False
28,It actually turned out to be pretty decent as ...,1,1.3652140000000002e-28,5.62493e-28,False
39,I don't think you will be disappointed.,1,1.968216e-10,5.99309e-10,False
59,A great film by a great director.,1,5.811941e-08,2.379532e-07,False
61,The music in the film is really nice too.,1,2.061381e-09,4.882636e-09,False
65,I liked this movie way too much.,1,1.509136e-09,1.253581e-08,False
132,There were too many close ups.,0,7.907807e-11,5.36554e-11,True
164,Everything from acting to cinematography was s...,1,4.501529e-11,1.005614e-09,False
251,"I won't say any more - I don't like spoilers, ...",1,1.227248e-30,2.111939e-30,False


In [677]:
dev["Conditional Positive Probability"] = dev["IMDB Review"].apply(
    lambda review: get_probabilities(review, True, False))
dev["Conditional Negative Probability"] = dev["IMDB Review"].apply(
    lambda review: get_probabilities(review, False, False))
dev["Predicted sentiment"] = dev["Conditional Positive Probability"] > dev["Conditional Negative Probability"]
print("Dev Accuracy: ",
      dev.loc[dev["Predicted sentiment"] == dev["Sentiment"]].count(axis=0)['Sentiment'] * 100 / dev.count(axis=0)[
          'Sentiment'])
dev.loc[dev["Predicted sentiment"] != dev["Sentiment"]]

Dev Accuracy:  43.333333333333336


Unnamed: 0,IMDB Review,Sentiment,Conditional Positive Probability,Conditional Negative Probability,Predicted sentiment
450,This movie does an excellent job of revealing ...,1,8.851617e-22,6.733142e-19,False
451,I believe every one should see this movie as I...,1,0.000000e+00,2.514524e-31,False
452,Nothing short of magnificent photography/cinem...,1,0.000000e+00,0.000000e+00,False
453,"The acting is fantastic, the stories are seaml...",1,0.000000e+00,0.000000e+00,False
459,Macbeth (Jason Connery) moved me to tears with...,1,0.000000e+00,0.000000e+00,False
...,...,...,...,...,...
592,"As a courtroom drama, it's compelling, as an i...",1,0.000000e+00,2.194450e-18,False
593,This film highlights the fundamental flaws of ...,1,0.000000e+00,2.276611e-22,False
595,This mostly routine fact-based TV drama gets a...,1,0.000000e+00,0.000000e+00,False
596,"Predictable, but not a bad watch.",1,1.916122e-09,9.227881e-07,False


In [678]:
test["Conditional Positive Probability"] = test["IMDB Review"].apply(
    lambda review: get_probabilities(review, True, False))
test["Conditional Negative Probability"] = test["IMDB Review"].apply(
    lambda review: get_probabilities(review, False, False))
test["Predicted sentiment"] = test["Conditional Positive Probability"] > test["Conditional Negative Probability"]
print("Test Accuracy: ",
      test.loc[test["Predicted sentiment"] == test["Sentiment"]].count(axis=0)['Sentiment'] * 100 / test.count(axis=0)[
          'Sentiment'])
test.loc[test["Predicted sentiment"] != test["Sentiment"]]

Test Accuracy:  40.666666666666664


Unnamed: 0,IMDB Review,Sentiment,Conditional Positive Probability,Conditional Negative Probability,Predicted sentiment
598,She carries the movie well.,1,0.0,1.338965e-07,False
599,Constantine gives everything the right intensi...,1,0.0,0.000000e+00,False
600,"It is wonderful and inspiring to watch, and I ...",1,0.0,3.372625e-21,False
609,Editing: The editing of this film was phenomen...,1,0.0,0.000000e+00,False
610,When a song could explain the emotions of the ...,1,0.0,0.000000e+00,False
...,...,...,...,...,...
736,It was a riot to see Hugo Weaving play a sex-o...,1,0.0,2.024268e-28,False
737,":) Anyway, the plot flowed smoothly and the ma...",1,0.0,0.000000e+00,False
738,"The opening sequence of this gem is a classic,...",1,0.0,0.000000e+00,False
740,Lange had become a great actress.,1,0.0,0.000000e+00,False


## Smoothening

In [679]:
vocabulary["Frequency"] += 1
total_words += 2
total_sentiments += 2
vocabulary['Word Probability'] = vocabulary["Frequency"].div(total_words)

vocabulary["Positive Sentiment Count"] += 1
vocabulary["Positive Sentiment Count"] = vocabulary["Positive Sentiment Count"].fillna(value=1)

total_positive_words += 2

probability_of_positive_sentiments = total_positive_words / total_sentiments

vocabulary['Positive Sentiments Probability'] = vocabulary['Positive Sentiment Count'].div(total_positive_words)
vocabulary["Negative Sentiment Count"] += 1
vocabulary["Negative Sentiment Count"] = vocabulary["Negative Sentiment Count"].fillna(value=1)

total_negative_words += 2

probability_of_negative_sentiments = total_negative_words / total_sentiments

vocabulary['Negative Sentiments Probability'] = vocabulary['Negative Sentiment Count'].div(total_negative_words)
vocabulary

Unnamed: 0,Word,Frequency,Word Probability,Positive Sentiment Count,Positive Sentiments Probability,Negative Sentiment Count,Negative Sentiments Probability
0,very,46.0,0.004872,13.0,0.068421,34.0,0.129771
1,a,271.0,0.028702,106.0,0.557895,166.0,0.633588
2,man,10.0,0.001059,5.0,0.026316,6.0,0.022901
3,moving,3.0,0.000318,1.0,0.005263,3.0,0.011450
4,slow,4.0,0.000424,1.0,0.005263,4.0,0.015267
...,...,...,...,...,...,...,...
2298,scot,2.0,0.000212,2.0,0.010526,1.0,0.003817
2299,nonetheless,2.0,0.000212,2.0,0.010526,1.0,0.003817
2300,campy,2.0,0.000212,2.0,0.010526,1.0,0.003817
2301,delight,2.0,0.000212,2.0,0.010526,1.0,0.003817


In [680]:
train["Conditional Positive Probability"] = train["IMDB Review"].apply(
    lambda review: get_probabilities(review, True, True))
train["Conditional Negative Probability"] = train["IMDB Review"].apply(
    lambda review: get_probabilities(review, False, True))
train["Predicted sentiment"] = train["Conditional Positive Probability"] > train["Conditional Negative Probability"]
print("Train Accuracy: ",
      train.loc[train["Predicted sentiment"] == train["Sentiment"]].count(axis=0)['Sentiment'] * 100 /
      train.count(axis=0)['Sentiment'])
train.loc[train["Predicted sentiment"] != train["Sentiment"]]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  train["Conditional Positive Probability"] = train["IMDB Review"].apply(


Train Accuracy:  90.17857142857143


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  train["Conditional Negative Probability"] = train["IMDB Review"].apply(
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  train["Predicted sentiment"] = train["Conditional Positive Probability"] > train["Conditional Negative Probability"]


Unnamed: 0,IMDB Review,Sentiment,Conditional Positive Probability,Conditional Negative Probability,Predicted sentiment
11,The movie showed a lot of Florida at it's best...,1,1.206873e-17,2.333813e-17,False
13,It Was So Cool.,1,8.400641e-05,9.844006e-05,False
14,"This is a very ""right on case"" movie that deli...",1,1.483574e-20,4.161515e-20,False
20,This if the first movie I've given a 10 to in ...,1,3.745276e-13,2.00999e-12,False
21,If there was ever a movie that needed word-of-...,1,1.324184e-17,2.724938e-16,False
24,Give this one a look.,1,5.921379e-06,6.239798e-06,False
25,I gave it a 10,1,3.487757e-05,4.040556e-05,False
28,It actually turned out to be pretty decent as ...,1,3.251217e-26,6.19972e-26,False
39,I don't think you will be disappointed.,1,6.464498e-10,1.237284e-09,False
45,The only thing really worth watching was the s...,1,2.323397e-17,1.309205e-16,False


In [681]:
dev["Conditional Positive Probability"] = dev["IMDB Review"].apply(lambda review: get_probabilities(review, True, True))
dev["Conditional Negative Probability"] = dev["IMDB Review"].apply(
    lambda review: get_probabilities(review, False, True))
dev["Predicted sentiment"] = dev["Conditional Positive Probability"] > dev["Conditional Negative Probability"]
print("Dev Accuracy: ",
      dev.loc[dev["Predicted sentiment"] == dev["Sentiment"]].count(axis=0)['Sentiment'] * 100 / dev.count(axis=0)[
          'Sentiment'])
dev.loc[dev["Predicted sentiment"] != dev["Sentiment"]]

Dev Accuracy:  48.0


Unnamed: 0,IMDB Review,Sentiment,Conditional Positive Probability,Conditional Negative Probability,Predicted sentiment
450,This movie does an excellent job of revealing ...,1,6.965734e-20,8.701317e-18,False
451,I believe every one should see this movie as I...,1,9.585102e-32,5.926157e-30,False
452,Nothing short of magnificent photography/cinem...,1,2.877094e-11,2.049215e-10,False
453,"The acting is fantastic, the stories are seaml...",1,2.325788e-25,2.752622e-24,False
454,Don't be afraid of subtitles........ its worth...,1,3.365471e-17,4.682728e-17,False
...,...,...,...,...,...
592,"As a courtroom drama, it's compelling, as an i...",1,1.416773e-19,4.839757e-17,False
593,This film highlights the fundamental flaws of ...,1,6.282122e-27,2.237575e-21,False
595,This mostly routine fact-based TV drama gets a...,1,3.393606e-24,7.094748e-23,False
596,"Predictable, but not a bad watch.",1,6.992181e-09,1.305069e-06,False


In [682]:
test["Conditional Positive Probability"] = test["IMDB Review"].apply(
    lambda review: get_probabilities(review, True, True))
test["Conditional Negative Probability"] = test["IMDB Review"].apply(
    lambda review: get_probabilities(review, False, True))
test["Predicted sentiment"] = test["Conditional Positive Probability"] > test["Conditional Negative Probability"]
print("Test Accuracy: ",
      test.loc[test["Predicted sentiment"] == test["Sentiment"]].count(axis=0)['Sentiment'] * 100 / test.count(axis=0)[
          'Sentiment'])
test.loc[test["Predicted sentiment"] != test["Sentiment"]]

Test Accuracy:  44.666666666666664


Unnamed: 0,IMDB Review,Sentiment,Conditional Positive Probability,Conditional Negative Probability,Predicted sentiment
598,She carries the movie well.,1,1.554827e-07,5.778270e-07,False
599,Constantine gives everything the right intensi...,1,2.739577e-21,5.262533e-21,False
600,"It is wonderful and inspiring to watch, and I ...",1,1.162408e-21,8.255424e-20,False
609,Editing: The editing of this film was phenomen...,1,2.647006e-08,1.270103e-07,False
610,When a song could explain the emotions of the ...,1,2.292733e-42,6.386539e-41,False
...,...,...,...,...,...
736,It was a riot to see Hugo Weaving play a sex-o...,1,1.875002e-29,3.059007e-27,False
737,":) Anyway, the plot flowed smoothly and the ma...",1,1.251378e-09,1.172515e-06,False
738,"The opening sequence of this gem is a classic,...",1,1.436837e-23,5.729227e-23,False
739,Fans of the genre will be in heaven.,1,9.387690e-09,1.133963e-08,False
