# Projet AI - NLP Sentiment Classifier

In this project, we will implement an AI model. We'll provide a sentence as input to the model, and it will tell us if the message is negative or positive. We will use NLP techniques to build this AI model.

We have an Amazon dataset that contains various user feedback on different products. The dataset is located in the **data** folder.

We'll import all the necessary packages for this project. You can find the details in the **requirements.txt** file.

In [64]:
import numpy as np
import json
import pandas
import random

We initialize a Review class and Sentiments class. In Sentiments class we have 3 attributs, thoses attributs significate negative, neutral and positive. On the other side, we have Review class. This class take text, score and the sentiments of this text in function of score.

In [66]:
class Sentiments:
    NEGATIVE = "NEGATIVE"
    NEUTRAL = "NEUTRAL"
    POSITIVE = "POSITIVE"

class Review:
    
    def __init__(self, text, score):
        self.text = text
        self.score = score
        self.sentiment = self.get_sentiments()
        
    def display_review(self):
        print(self.text + " | " + str(self.score) + " | " + self.sentiment)
        
    def get_text(self):
        return self.text
    
    def get_score(self):
        return self.score
            
    def get_sentiments(self):
        if self.score <= 2:
            return Sentiments.NEGATIVE
        elif self.score == 3:
            return Sentiments.NEUTRAL
        elif self.score > 3:
            return Sentiments.POSITIVE

class ReviewContainer:
    def __init__(self, reviews):
        self.reviews = reviews
        
    def get_text(self):
        return [x.text for x in self.reviews]
    
    def get_sentiment(self):
        return [x.sentiment for x in self.reviews]
        
    def evenly_distribute(self):
        negative = list(filter(lambda x: x.sentiment == Sentiments.NEGATIVE, self.reviews))
        positive = list(filter(lambda x: x.sentiment == Sentiments.POSITIVE, self.reviews))
        positive_shrunk = positive[:len(negative)]
        self.reviews = negative + positive_shrunk
        random.shuffle(self.reviews)

We'll display all the data in that file : **Books_small.json**. Specifically reviewText and overall because that's what we want. We stock thoses data in Review class (reviewText and overall) into list . We'll keep that list for later.

In [67]:
path_data = "../data/Books_small_10000.json"

reviews = []

with open(path_data) as f:
    for line in f:
        review_ = json.loads(line)
        
        # Display reviewText en overall   
        #print("- " + review_['reviewText'])
        #print(review_['overall'])
        
        review_text = review_['reviewText']
        review_overall = review_['overall']
        
        reviews.append(Review(review_text, review_overall))

In [68]:
print(reviews[0].display_review())
print(len(reviews))

I bought both boxed sets, books 1-5.  Really a great series!  Start book 1 three weeks ago and just finished book 5.  Sloane Monroe is a great character and being able to follow her through both private life and her PI life gets a reader very involved!  Although clues may be right in front of the reader, there are twists and turns that keep one guessing until the last page!  These are books you won't be disappointed with. | 5.0 | POSITIVE
None
10000


Now, we have our main data into that list **reviews**, but in machine learning (ML), we want some vector, arrays, etc... So we'll convert that list into numerical arrays or numerical vector used numpy package from python. The method is called BOW (Bag of Words). This technique will give us a method to prepare a good structure for our model to analyze and process data.

Befere we need to prepare our data. We'll do 2 blocks of data. First one is our training data, we'll give those data to our model and our model we'll train with. Second one is our testing data, after our model got training we'll give him testing data and in function of output result, we'll known if our model have learning good or not.

In [70]:
# Prep data (training and testing) split
from sklearn.model_selection import train_test_split

training_data, test_data = train_test_split(reviews, test_size=0.3, random_state=42)

train_container = ReviewContainer(training_data)

test_container = ReviewContainer(test_data)

In [71]:
# Display training_data size and test_data size
print("training data : ", len(training_data))
print("test data : ", len(test_data))

training data :  7000
test data :  3000


In [73]:
train_container.evenly_distribute()
train_x = train_container.get_text()
train_y = train_container.get_sentiment()

test_container.evenly_distribute()
test_x = test_container.get_text()
test_y = test_container.get_sentiment()

print(train_y.count(Sentiments.POSITIVE))
print(train_y.count(Sentiments.NEGATIVE))

461
461


## BAGS OF WORDS (BOW)

In [74]:
from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer()
train_x_vec = vectorizer.fit_transform(train_x)

test_x_vec = vectorizer.transform(test_x)

print(train_x[0])
print(train_x_vec[0].toarray())

#print(train_x_vec.toarray())

The book is in a comic book format. I have never liked comic books and the writing in this format is impossible to read and follow on a kindle. Had I realized it was a comic book, I would not have purchased it.
[[0 0 0 ... 0 0 0]]


## Classification

In [75]:
## Linear SVM option

from sklearn import svm

clf_svm = svm.SVC(kernel='linear')
clf_svm.fit(train_x_vec, train_y)

print(test_x[0])

clf_svm.predict(test_x_vec[0])


All my annoyance melted. "You dumb-a@#," I crooned, kissing her on the forehead. "You don't share me. You own me."This book owned me. I couldn't even think about putting it down and if I had to then all I could think about was getting back to it. Honestly I was quite content and happy with how If I Stay ended. It was an epic ending if you ask me. But of course if Gayle Forman is going to offer more of Adam and Mia's story, then I'm going to take it. And I did and loved it. It crushed me and then made me whole again.It was very hard for me to not flip to the end to see how this story was going to conclude but I kept control and just moved forward with the story. It was just that I really didn't know how this was going to end and it was ripping me apart. This year I've read some books with unhappy endings and have actually enjoyed them, kind of like a breath of fresh-air, something different, but there's no way I could have handled an ending like that with this book.I fell in love with G

array(['NEGATIVE'], dtype='<U8')

## Évalutation

In [76]:
clf_svm.score(test_x_vec, test_y)

0.8114754098360656

## F1 Score

In [77]:
from sklearn.metrics import f1_score

f1_score(test_y, clf_svm.predict(test_x_vec), average=None)

array([0.8109589, 0.8119891])

In [78]:
print(train_y.count(Sentiments.POSITIVE))
print(train_y.count(Sentiments.NEUTRAL))
print(train_y.count(Sentiments.NEGATIVE))

461
0
461


## Let's see our model in action

In [89]:
test_set = ['that is sooo great book']
new_test = vectorizer.transform(test_set)

clf_svm.predict(new_test)

array(['POSITIVE'], dtype='<U8')