### Load Data

In [1]:
import pandas as pd
df = pd.read_json('data/sentiment/Books_small_10000.json')

In [2]:
df.head()

Unnamed: 0,reviewerID,asin,reviewerName,helpful,reviewText,overall,summary,unixReviewTime,reviewTime
0,A1F2H80A1ZNN1N,B00GDM3NQC,Connie Correll,"[0, 0]","I bought both boxed sets, books 1-5. Really a...",5,Can't stop reading!,1390435200,"01 23, 2014"
1,AI3DRTKCSK4KX,B00A5MREAM,Grandma,"[0, 0]",I enjoyed this short book. But it was way way ...,3,A leaf on the wind of all hallows,1399593600,"05 9, 2014"
2,A3KAKFHY9DAC8A,0446547573,"toobusyreading ""Inspired Kathy""","[1, 1]",I love Nicholas Sparks. I&#8217;ve read everyt...,4,Great writing from Nicholas Sparks.,1404518400,"07 5, 2014"
3,ATYBCYD6BIXVL,0955809215,Chrissie,"[0, 0]",I really enjoyed this adventure and look forwa...,4,great,1389225600,"01 9, 2014"
4,A17K95SEU3J68U,0991500776,"Sirde ""artist761""","[0, 0]",It was a decent read.. typical story line. Not...,3,It was a decent read.. typical story line ...,1404864000,"07 9, 2014"


In [3]:
df["sentiment"] = df.apply(lambda x: 'POSITIVE' if x["overall"]>=4 else 'NEUTRAL' if x["overall"]>2 else 'NEGETIVE',axis=1)

In [4]:
reviews = df[["reviewText","sentiment"]]

### Prep Data

In [5]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(reviews["reviewText"].values,reviews["sentiment"].values, test_size=0.33, random_state=42)

#### Bag of words vectorization

In [6]:
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

# This book is great !
# This book was so bad

vectorizer = TfidfVectorizer()
train_x_vectors = vectorizer.fit_transform(X_train)

test_x_vectors = vectorizer.transform(X_test)

print(X_train[0])
print(train_x_vectors[0].toarray())
print(y_train[0])

Olivia Hampton arrives at the Dunraven family home as cataloger of their extensive library. What she doesn't expect is a broken carriage wheel on the way. Nor a young girl whose mind is clearly gone, an old man in need of care himself (and doesn&#8217;t quite seem all there in Olivia&#8217;s opinion). Furthermore, Marion Dunraven, the only sane one of the bunch and the one Olivia is inexplicable drawn to, seems captive to everyone in the dusty old house. More importantly, she doesn't expect to fall in love with Dunraven's daughter Marion.Can Olivia truly believe the stories of sadness and death that surround the house, or are they all just local neighborhood rumor?Was that carriage trouble just a coincidence or a supernatural sign to stay away? If she remains, will the Castle&#8217;s dark shadows take Olivia down with them or will she and Marion long enough to declare their love?Patty G. Henderson has created an atmospheric and intriguing story in her Gothic tale. I found this to be an

## Classification

#### Linear SVM



















In [7]:
from sklearn import svm

clf_svm = svm.SVC(kernel='linear')

clf_svm.fit(train_x_vectors, y_train)

X_train[0]

clf_svm.predict(test_x_vectors[0])

array(['POSITIVE'], dtype=object)

#### Decision Tree

In [8]:
from sklearn.tree import DecisionTreeClassifier

clf_dec = DecisionTreeClassifier()
clf_dec.fit(train_x_vectors, y_train)

clf_dec.predict(test_x_vectors[0])


array(['POSITIVE'], dtype=object)

#### Naive Bayes

In [9]:
from sklearn.naive_bayes import GaussianNB

clf_gnb = DecisionTreeClassifier()
clf_gnb.fit(train_x_vectors, y_train)

clf_gnb.predict(test_x_vectors[0])


array(['POSITIVE'], dtype=object)

#### Logistic Regression

In [10]:
from sklearn.linear_model import LogisticRegression

clf_log = LogisticRegression()
clf_log.fit(train_x_vectors, y_train)

clf_log.predict(test_x_vectors[0])



array(['POSITIVE'], dtype=object)

## Evaluation

In [11]:
# Mean Accuracy
print(clf_svm.score(test_x_vectors, y_test))
print(clf_dec.score(test_x_vectors, y_test))
print(clf_gnb.score(test_x_vectors, y_test))
print(clf_log.score(test_x_vectors, y_test))

0.8566666666666667
0.7693939393939394
0.7678787878787878
0.8536363636363636


In [12]:
# F1 Scores
from sklearn.metrics import f1_score

f1_score(y_test, clf_log.predict(test_x_vectors), average=None, labels=["POSITIVE","NEUTRAL","NEGATIVE"])

  average, "true nor predicted", 'F-score is', len(true_sum)


array([0.92359249, 0.15143603, 0.        ])

In [13]:
test_set = ['very fun', "bad book do not buy", 'horrible waste of time']
new_test = vectorizer.transform(test_set)

clf_svm.predict(new_test)


array(['POSITIVE', 'NEGETIVE', 'NEGETIVE'], dtype=object)

### Tuning our model (with Grid Search)

In [14]:
from sklearn.model_selection import GridSearchCV

parameters = {'kernel': ('rbf', 'poly'),"degree":(2,3)}

svc = svm.SVC()
clf = GridSearchCV(svc, parameters, cv=5)
clf.fit(train_x_vectors, y_train)

GridSearchCV(cv=5, error_score=nan,
             estimator=SVC(C=1.0, break_ties=False, cache_size=200,
                           class_weight=None, coef0=0.0,
                           decision_function_shape='ovr', degree=3,
                           gamma='scale', kernel='rbf', max_iter=-1,
                           probability=False, random_state=None, shrinking=True,
                           tol=0.001, verbose=False),
             iid='deprecated', n_jobs=None,
             param_grid={'degree': (2, 3), 'kernel': ('rbf', 'poly')},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring=None, verbose=0)

In [15]:
print(clf.score(test_x_vectors, y_test))

0.8406060606060606


## Saving Model

#### Save model

In [16]:
import pickle

with open('./models/sentiment_classifier.pkl', 'wb') as f:
    pickle.dump(clf, f)

#### Load model

In [20]:
with open('./models/sentiment_classifier.pkl', 'rb') as f:
    loaded_clf = pickle.load(f)

In [21]:
print(X_test[0])

loaded_clf.predict(test_x_vectors[0])

was sent an Arc of this book for an honest review and here it is = This is the kind of book that you want to read while sitting in front of the fire with a cup of hot apple cider and a blanket over your legs.I have read many of Jaci Burton's books and have never been disappointed. This first book in her new Hope series does not disappoint either.This is the story of Emma, a new vet who has come back home to open her own practice and Luke McCormack, a police officer in the same town.Both have been previously burned by love so both have issues but, that doesn't stop them from acting on that attraction.This book pulls you in from the first page, wraps you up and doesn't let you go until the end.I loved it!


array(['POSITIVE'], dtype=object)

In [25]:
from sklearn.neural_network import MLPClassifier
clf = MLPClassifier()
clf.fit(train_x_vectors, y_train)

MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=False, epsilon=1e-08,
              hidden_layer_sizes=(100,), learning_rate='constant',
              learning_rate_init=0.001, max_fun=15000, max_iter=200,
              momentum=0.9, n_iter_no_change=10, nesterovs_momentum=True,
              power_t=0.5, random_state=None, shuffle=True, solver='adam',
              tol=0.0001, validation_fraction=0.1, verbose=False,
              warm_start=False)

In [26]:
print(clf.score(test_x_vectors, y_test))

0.8442424242424242
