### Predicting sentiment from product reviews


The goal of this first notebook is to explore logistic regression and feature engineering with existing GraphLab functions.

In this notebook you will use product review data from Amazon.com to predict whether the sentiments about a product (from its reviews) are positive or negative.

* Use DataFrame to do some feature engineering
* Train a logistic regression model to predict the sentiment of product reviews.
* Inspect the weights (coefficients) of a trained logistic regression model.
* Make a prediction (both class and probability) of sentiment for a new product review.
* Given the logistic regression weights, predictors and ground truth labels, write a function to compute the **accuracy** of the model.
* Inspect the coefficients of the logistic regression model and interpret their meanings.
* Compare multiple logistic regression models.

In [1]:
import numpy as np
import pandas as pd

In [2]:
products = pd.read_csv("amazon_baby.csv")

In [3]:
products.head()

Unnamed: 0,name,review,rating
0,Planetwise Flannel Wipes,"These flannel wipes are OK, but in my opinion ...",3
1,Planetwise Wipe Pouch,it came early and was not disappointed. i love...,5
2,Annas Dream Full Quilt with 2 Shams,Very soft and comfortable and warmer than it l...,5
3,Stop Pacifier Sucking without tears with Thumb...,This is a product well worth the purchase. I ...,5
4,Stop Pacifier Sucking without tears with Thumb...,All of my kids have cried non-stop when I trie...,5


In [4]:
products.dtypes

name      object
review    object
rating     int64
dtype: object

Now, we will perform 2 simple data transformations:

1. Remove punctuation using [Python's built-in](https://docs.python.org/2/library/string.html) string functionality.
2. Transform the reviews into word-counts.

**Aside**. In this notebook, we remove all punctuations for the sake of simplicity. A smarter approach to punctuations would preserve phrases such as "I'd", "would've", "hadn't" and so forth. See [this page](https://www.cis.upenn.edu/~treebank/tokenization.html) for an example of smart handling of punctuations.

In [5]:
def remove_punctuation(text):
    import string
    translator = str.maketrans('', '', string.punctuation)
    return text.translate(translator)

In [6]:
products["review"] = products["review"].fillna(" ")

In [7]:
products["clean_review"] = products["review"].apply(remove_punctuation)

In [8]:
products.head()

Unnamed: 0,name,review,rating,clean_review
0,Planetwise Flannel Wipes,"These flannel wipes are OK, but in my opinion ...",3,These flannel wipes are OK but in my opinion n...
1,Planetwise Wipe Pouch,it came early and was not disappointed. i love...,5,it came early and was not disappointed i love ...
2,Annas Dream Full Quilt with 2 Shams,Very soft and comfortable and warmer than it l...,5,Very soft and comfortable and warmer than it l...
3,Stop Pacifier Sucking without tears with Thumb...,This is a product well worth the purchase. I ...,5,This is a product well worth the purchase I h...
4,Stop Pacifier Sucking without tears with Thumb...,All of my kids have cried non-stop when I trie...,5,All of my kids have cried nonstop when I tried...


### Extract sentiments

We will **ignore** all reviews with *rating = 3*, since they tend to have a neutral sentiment.

In [9]:
products = products[products['rating'] != 3]
len(products)

166752

Now, we will assign reviews with a rating of 4 or higher to be *positive* reviews, while the ones with rating of 2 or lower are *negative*. For the sentiment column, we use +1 for the positive class label and -1 for the negative class label.

In [10]:
products["sentiment"] = products["rating"].apply(lambda rating: +1 if rating > 3 else -1)

In [11]:
products

Unnamed: 0,name,review,rating,clean_review,sentiment
1,Planetwise Wipe Pouch,it came early and was not disappointed. i love...,5,it came early and was not disappointed i love ...,1
2,Annas Dream Full Quilt with 2 Shams,Very soft and comfortable and warmer than it l...,5,Very soft and comfortable and warmer than it l...,1
3,Stop Pacifier Sucking without tears with Thumb...,This is a product well worth the purchase. I ...,5,This is a product well worth the purchase I h...,1
4,Stop Pacifier Sucking without tears with Thumb...,All of my kids have cried non-stop when I trie...,5,All of my kids have cried nonstop when I tried...,1
5,Stop Pacifier Sucking without tears with Thumb...,"When the Binky Fairy came to our house, we did...",5,When the Binky Fairy came to our house we didn...,1
6,A Tale of Baby's Days with Peter Rabbit,"Lovely book, it's bound tightly so you may not...",4,Lovely book its bound tightly so you may not b...,1
7,"Baby Tracker&reg; - Daily Childcare Journal, S...",Perfect for new parents. We were able to keep ...,5,Perfect for new parents We were able to keep t...,1
8,"Baby Tracker&reg; - Daily Childcare Journal, S...",A friend of mine pinned this product on Pinter...,5,A friend of mine pinned this product on Pinter...,1
9,"Baby Tracker&reg; - Daily Childcare Journal, S...",This has been an easy way for my nanny to reco...,4,This has been an easy way for my nanny to reco...,1
10,"Baby Tracker&reg; - Daily Childcare Journal, S...",I love this journal and our nanny uses it ever...,4,I love this journal and our nanny uses it ever...,1


### Split data into training and testing dataset

In [12]:
from sklearn.model_selection import train_test_split

In [13]:
train_data, test_data = train_test_split(products, train_size = 0.8, random_state = 1)

In [14]:
train_data.shape

(133401, 5)

In [15]:
test_data.shape

(33351, 5)

In [16]:
from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer(token_pattern=r'\b\w+\b')
     # Use this token pattern to keep single-letter words
# First, learn vocabulary from the training data and assign columns to words
# Then convert the training data into a sparse matrix
train_matrix = vectorizer.fit_transform(train_data['clean_review'])
# Second, convert the test data into a sparse matrix, using the same word-column mapping
test_matrix = vectorizer.transform(test_data['clean_review'])

In [17]:
train_matrix

<133401x121505 sparse matrix of type '<class 'numpy.int64'>'
	with 7325453 stored elements in Compressed Sparse Row format>

### Train a sentiment classifier with logistic regression¶ 

In [18]:
from sklearn import datasets, linear_model
from sklearn.linear_model import LogisticRegression

In [19]:
sentiment_model = LogisticRegression()

In [20]:
sentiment_model.fit(train_matrix, train_data['sentiment'])

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

### Making predictions with logistic regression

Now that a model is trained, we can make predictions on the test data. In this section, we will explore this in the context of 3 data points in the test data. Take the 11th, 12th, and 13th data points in the test data and save them to sample_test_data. The following cell extracts the three data points from the SFrame test_data and print their content:

In [21]:
sample_test_data = test_data[10:13]

In [22]:
print (sample_test_data)

                                                     name  \
117165       Lassig Glam Small Messenger Diaper Bag ,navy   
30667   BOB Weather Shield for Single Revolution/Strol...   
60268                Tiny Love Sweet Island Dreams Mobile   

                                                   review  rating  \
117165  While I'm sure this bag is a wonderful diaper ...       5   
30667   This weather shield has been a great accessory...       5   
60268   And we managed to get it to attach to the Grac...       5   

                                             clean_review  sentiment  
117165  While Im sure this bag is a wonderful diaper b...          1  
30667   This weather shield has been a great accessory...          1  
60268   And we managed to get it to attach to the Grac...          1  


In [23]:
type(sample_test_data)

pandas.core.frame.DataFrame

In [24]:
sample_test_data.iloc[0]["review"]

"While I'm sure this bag is a wonderful diaper bag, I purchased it for traveling purposes.  It's just the right size and doesn't look like a traditional diaper bag at all.  Great pockets - inside and out; easy access to everything I need when I need it.  Love this bag!  I originally bought the navy and have since purchased it in brown and black."

We will now make a class prediction for the sample_test_data. The sentiment_model should predict +1 if the sentiment is positive and -1 if the sentiment is negative. Recall from the lecture that the score (sometimes called margin) for the logistic regression model is defined as:
where h(x_i) represents the features for data point i. We will write some code to obtain the scores. For each row, the score (or margin) is a number in the range (-inf, inf). Use a pre-built function in your tool to calculate the score of each data point in sample_test_data. In scikit-learn, you can call the decision_function() function.

In [25]:
sample_test_matrix = vectorizer.transform(sample_test_data['clean_review'])
scores = sentiment_model.decision_function(sample_test_matrix)

### Prediction Sentiment

These scores can be used to make class predictions as follows:
Using scores, write code to calculate predicted labels for sample_test_data.
Checkpoint: Make sure your class predictions match with the ones obtained from sentiment_model. The logistic regression classifier in scikit-learn comes with the predict function for this purpose.

In [26]:
print(scores)

[  7.97980322  11.15825826   1.09352084]


In [27]:
sentiment_model.predict(sample_test_matrix)

array([1, 1, 1], dtype=int64)

### Probability Predictions

In [28]:
sample_test_data.iloc[1]["review"]

'This weather shield has been a great accessory.  It is a little cumbersome at first to put it on the stroller but after the first time it has been very easy.The shield definitely keeps my little girl warm during the windy days with out needing to tuck a blanket around her. Also, dry during the rainy ones.The full-body window provides me with the ability to see that she is buckled up safely at any time during the walk.  It enables her to see things just about as well as when the shield is off.The yellow color is definitely a color contrast to our navy colored stroller but it sure makes us visible to oncoming vehicles.Construction appears to be very good. The material reminds me of a very duty plastic tarp (not like the cheap ones you find at the big discount retailers).  I would imagine this would be repairable with a inner tube patch kit or clear adhesive caulk for the window if it ever needed it.I would definitely purchase this again.'

In [29]:
sample_test_data.iloc[2]["review"]

"And we managed to get it to attach to the Graco Pack 'n Play as well!  We typically turn the mobile on, but without the music going.  We find a nature, classical CD or other 'new agey' music is much better suited than the music on this mobile, but at least there is a variety on it!Our son didn't even get into the mobile until roughly the 6 week mark.  By week 8 he was REALLY into the mobile and loves to watch it go around.  Usually he'll wave his arms, kick his feet and snort snort snort! while watching it go around.  Eventually he falls asleep after 20 minutes of excitement from this mobile.We are very please with the mobile.  We lost the remote to the mobile, so we have to turn it on manually each time, which isn't a big deal if your crib or co-sleeper is next to the bed. The light on the mobile is dim enough so that he can see the  animals going around when we turn off the lights."

In [30]:
print (1/(1+np.exp(-scores)))

[ 0.99965781  0.99998574  0.74904414]


### Find the most positive (and negative) review

We now turn to examining the full test dataset, test_data, and use sklearn.linear_model.LogisticRegression to form predictions on all of the test data points.
Using the sentiment_model, find the 20 reviews in the entire test_data with the highest probability of being classified as a positive review. We refer to these as the "most positive reviews."
To calculate these top-20 reviews, use the following steps:
Make probability predictions on test_data using the sentiment_model. Sort the data according to those predictions and pick the top 20.

In [31]:
scores_test = sentiment_model.decision_function(test_matrix)

In [46]:
test_data["predictions"] = 1/(1+np.exp(-scores_test))

In [34]:
#test_data = test_data.drop("predictions",1)

In [47]:
test_data.head()

Unnamed: 0,name,review,rating,clean_review,sentiment,predictions
178665,15 Plastic Alligator Grip Suspender Pacifier B...,These clips are just what I was looking for. ...,5,These clips are just what I was looking for M...,1,0.921255
158713,"green sprouts 2 Count Cool Hand Teether, Green...","This was a great buy, the baby really loves ch...",5,This was a great buy the baby really loves che...,1,0.996867
11916,Kidkusion Kid Safe Banister Guard,It's a little amusing that this is marketed as...,5,Its a little amusing that this is marketed as ...,1,0.997898
55010,Mommy's Helper Car Seat Sun Shade,I live an area of the US where we get summers ...,5,I live an area of the US where we get summers ...,1,0.994363
44239,Gerber Graduates BPA Free 4 Pack Bunch-A-Bowls...,I ordered these to give to my daughter - she l...,5,I ordered these to give to my daughter she lo...,1,0.999938


In [48]:
test_data.sort_values("predictions", ascending = False).iloc[0:20]

Unnamed: 0,name,review,rating,clean_review,sentiment,predictions
176040,Twist Breastfeeding Gift Set,I went back to work full time just six weeks a...,5,I went back to work full time just six weeks a...,1,1.0
60298,"Ju-Ju-Be Be Right Back Backpack Diaper Bag, Bl...",This review is going to compare 3 Ju-Ju-Be bag...,5,This review is going to compare 3 JuJuBe bags ...,1,1.0
69455,Dr. Brown's BPA Free Polypropylene Natural Flo...,We have been using these Dr. Brown's bottles f...,5,We have been using these Dr Browns bottles for...,1,1.0
93690,The First Years Ignite Stroller,The last thing we wanted was to purchase more ...,5,The last thing we wanted was to purchase more ...,1,1.0
76549,Britax Advocate 65 CS Click &amp; Safe Convert...,The Britax Advocate CS appears similar to the ...,4,The Britax Advocate CS appears similar to the ...,1,1.0
162687,"Joovy Caboose Too Rear Seat, Greenie",We are thrilled with this rear seat. This litt...,5,We are thrilled with this rear seat This littl...,1,1.0
179871,"Thirsties Diaper Cover with Hook and Loop, Aqu...",The Thirsties are really an awesome concept th...,5,The Thirsties are really an awesome concept th...,1,1.0
73725,"Chicco Cortina Keyfit 30 Travel System, Miro",UPDATE 11/20/13 - I went ahead and used a tiny...,4,UPDATE 112013 I went ahead and used a tiny bi...,1,1.0
172946,Spectra Baby USA S2 Hospital Grade Double/sing...,Long Review but worth the read for those who a...,5,Long Review but worth the read for those who a...,1,1.0
103297,"Chicco Cortina Together Double Stroller, Fuego",I was very excited when I heard Chicco was fin...,5,I was very excited when I heard Chicco was fin...,1,1.0


In [49]:
test_data.sort_values("predictions", ascending=True).iloc[0:20]

Unnamed: 0,name,review,rating,clean_review,sentiment,predictions
120707,The European NANNY Baby Movement Monitor - EU ...,"The previous reviewers laud the ""piece of mind...",1,The previous reviewers laud the piece of mind ...,-1,1.063577e-18
120219,Levana Safe N'See Digital Video Baby Monitor w...,I have NEVER written a review before for anyth...,1,I have NEVER written a review before for anyth...,-1,4.960318e-17
89902,"Peg-Perego Aria Twin Stroller, Java",I am so incredibly disappointed with the strol...,1,I am so incredibly disappointed with the strol...,-1,9.394769e-16
121755,The First Years Home and Away Portable Video M...,"With modern technology, I really can't believe...",1,With modern technology I really cant believe w...,-1,5.950522e-15
66359,Levana BABYVIEW20 Interference Free Digital Wi...,where do i even begin? this baby monitor is no...,1,where do i even begin this baby monitor is not...,-1,2.176357e-12
134999,Infant Optics DXR-5 2.4 GHz Digital Video Baby...,Let me begin with the fact that the monitor wo...,1,Let me begin with the fact that the monitor wo...,-1,8.03002e-12
89904,"Peg-Perego Aria Twin Stroller, Java",ahhhh where do I begin. I had such high hopes...,1,ahhhh where do I begin I had such high hopes ...,-1,1.831694e-11
31741,"Regalo My Cot Portable Bed, Royal Blue",If I could give this product zero stars I woul...,1,If I could give this product zero stars I woul...,-1,4.659149e-11
3746,Playtex Diaper Genie - First Refill Included,"Prior to parenthood, I had heard several paren...",1,Prior to parenthood I had heard several parent...,-1,4.955974e-11
172090,Belkin WeMo Wi-Fi Baby Monitor for Apple iPhon...,I read so many reviews saying the Belkin WiFi ...,2,I read so many reviews saying the Belkin WiFi ...,-1,1.473626e-10


### Compute accuracy of the classifier

We will now evaluate the accuracy of the trained classifier. Recall that the accuracy is given by
This can be computed as follows:
Step 1: Use the sentiment_model to compute class predictions. Step 2: Count the number of data points when the predicted class labels match the ground truth labels. Step 3: Divide the total number of correct predictions by the total number of data points in the dataset.

In [50]:
predicted_sentiment = sentiment_model.predict(test_matrix)

In [51]:
test_data.head()

Unnamed: 0,name,review,rating,clean_review,sentiment,predictions
178665,15 Plastic Alligator Grip Suspender Pacifier B...,These clips are just what I was looking for. ...,5,These clips are just what I was looking for M...,1,0.921255
158713,"green sprouts 2 Count Cool Hand Teether, Green...","This was a great buy, the baby really loves ch...",5,This was a great buy the baby really loves che...,1,0.996867
11916,Kidkusion Kid Safe Banister Guard,It's a little amusing that this is marketed as...,5,Its a little amusing that this is marketed as ...,1,0.997898
55010,Mommy's Helper Car Seat Sun Shade,I live an area of the US where we get summers ...,5,I live an area of the US where we get summers ...,1,0.994363
44239,Gerber Graduates BPA Free 4 Pack Bunch-A-Bowls...,I ordered these to give to my daughter - she l...,5,I ordered these to give to my daughter she lo...,1,0.999938


In [52]:
diff = predicted_sentiment - test_data["sentiment"]

In [53]:
print (round(np.sum(diff==0) / len(diff),2))

0.93


### Implement Logistic from Scratch

Apple test cleaning on the review data

In [54]:
import json
with open ("important_words.json", "r") as f:
    important_words = json.load(f)

iportant_words = [str(r) for r in important_words]

In [55]:
important_words[:10]

['baby',
 'one',
 'great',
 'love',
 'use',
 'would',
 'like',
 'easy',
 'little',
 'seat']

In [56]:
products

Unnamed: 0,name,review,rating,clean_review,sentiment,baby,one,great,love,use,...,loves,stroller,put,months,car,still,back,used,recommend,first
1,Planetwise Wipe Pouch,it came early and was not disappointed. i love...,5,it came early and was not disappointed i love ...,1,0,0,0,1,0,...,0,0,0,0,0,0,0,0,1,0
2,Annas Dream Full Quilt with 2 Shams,Very soft and comfortable and warmer than it l...,5,Very soft and comfortable and warmer than it l...,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0
3,Stop Pacifier Sucking without tears with Thumb...,This is a product well worth the purchase. I ...,5,This is a product well worth the purchase I h...,1,0,0,0,2,0,...,1,0,0,0,0,0,1,0,0,0
4,Stop Pacifier Sucking without tears with Thumb...,All of my kids have cried non-stop when I trie...,5,All of my kids have cried nonstop when I tried...,1,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
5,Stop Pacifier Sucking without tears with Thumb...,"When the Binky Fairy came to our house, we did...",5,When the Binky Fairy came to our house we didn...,1,0,0,1,0,0,...,0,0,0,0,0,0,0,0,1,0
6,A Tale of Baby's Days with Peter Rabbit,"Lovely book, it's bound tightly so you may not...",4,Lovely book its bound tightly so you may not b...,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,"Baby Tracker&reg; - Daily Childcare Journal, S...",Perfect for new parents. We were able to keep ...,5,Perfect for new parents We were able to keep t...,1,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,1
8,"Baby Tracker&reg; - Daily Childcare Journal, S...",A friend of mine pinned this product on Pinter...,5,A friend of mine pinned this product on Pinter...,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,"Baby Tracker&reg; - Daily Childcare Journal, S...",This has been an easy way for my nanny to reco...,4,This has been an easy way for my nanny to reco...,1,2,1,0,0,0,...,0,0,0,0,0,0,0,0,1,0
10,"Baby Tracker&reg; - Daily Childcare Journal, S...",I love this journal and our nanny uses it ever...,4,I love this journal and our nanny uses it ever...,1,3,0,0,2,2,...,0,0,0,0,0,0,1,0,1,0


For each word in important_words, we compute a count for the number of times the word occurs in the review. We will store this count in a separate column (one for each word). The result of this feature processing is a single column for each word in important_words which keeps a count of the number of times the respective word occurs in the review text.

Note: There are several ways of doing this. One way is to create an anonymous function that counts the occurrence of a particular word and apply it to every element in the review_clean column. Repeat this step for every word in important_words. Your code should be analogous to the following:

In [57]:
for word in important_words:
    products[word] = products["clean_review"].apply(lambda x: x.split().count(word))

In [58]:
products.head()

Unnamed: 0,name,review,rating,clean_review,sentiment,baby,one,great,love,use,...,seems,picture,completely,wish,buying,babies,won,tub,almost,either
1,Planetwise Wipe Pouch,it came early and was not disappointed. i love...,5,it came early and was not disappointed i love ...,1,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
2,Annas Dream Full Quilt with 2 Shams,Very soft and comfortable and warmer than it l...,5,Very soft and comfortable and warmer than it l...,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Stop Pacifier Sucking without tears with Thumb...,This is a product well worth the purchase. I ...,5,This is a product well worth the purchase I h...,1,0,0,0,2,0,...,0,0,0,0,0,0,0,0,0,0
4,Stop Pacifier Sucking without tears with Thumb...,All of my kids have cried non-stop when I trie...,5,All of my kids have cried nonstop when I tried...,1,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
5,Stop Pacifier Sucking without tears with Thumb...,"When the Binky Fairy came to our house, we did...",5,When the Binky Fairy came to our house we didn...,1,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0


In [59]:
products["contain_perfect"] = products["perfect"].apply(lambda x : 1 if x>=1 else 0)

In [60]:
sum(products["contain_perfect"] ==1)

13177

In [61]:
products.shape

(166752, 199)

Convert data frame to multi-dimensional array

In [62]:
def get_numpy_data(dataframe, features, label):
    dataframe["constant"] = 1
    features = ["constant"] + features
    features_frame = dataframe[features]
    feature_matrix = features_frame.as_matrix()
    label_sarray = dataframe[label]
    label_array = label_sarray.as_matrix()
    return (feature_matrix, label_array)

In [None]:
feature_matrix, label_array = get_numpy_data(products, important_words, 'sentiment')

In [None]:
feature_matrix.shape

In [66]:
def predict_probability(feature_matrix, coefficients):
    score = np.dot(feature_matrix, coefficients)
    prediction = 1/ (1 + np.exp(-score))
    return prediction

In [67]:
def feature_derivative(errors, feature):
    deriviate = np.dot(errors, feature)
    return deriviate

In [69]:
def compute_log_likelihood(feature_matrix, serntiment, coefficients):
    indicator = (sentiment ==+ 1)
    scores = np.dot(feature_matrix, coefficients)
    lp = np.sum(indicator - 1)* scores - np.log(1. +np.exp(-scores))
    return lp

### Taking gradient steps

In [70]:
from math import sqrt
def logistic_regression(feature_matrix, sentiment, initial_coefficients, step_size, max_iter):
    coefficients = np.array(initial_coefficients) # make sure it's a numpy array
    for itr in xrange(max_iter):
        # Predict P(y_i = +1|x_1,w) using your predict_probability() function
        # YOUR CODE HERE
        predictions = predict_probability(feature_matrix, coefficients)

        # Compute indicator value for (y_i = +1)
        indicator = (sentiment==+1)

        # Compute the errors as indicator - predictions
        errors = indicator - predictions

        for j in xrange(len(coefficients)): # loop over each coefficient
            # Recall that feature_matrix[:,j] is the feature column associated with coefficients[j]
            # compute the derivative for coefficients[j]. Save it in a variable called derivative
            # YOUR CODE HERE
            derivative = np.dot(errors, feature_matrix[:,j])

            # add the step size times the derivative to the current coefficient
            # YOUR CODE HERE
            coefficients[j] = step_size * derivative

        # Checking whether log likelihood is increasing
        if itr <= 15 or (itr <= 100 and itr % 10 == 0) or (itr <= 1000 and itr % 100 == 0) \
        or (itr <= 10000 and itr % 1000 == 0) or itr % 10000 == 0:
            lp = compute_log_likelihood(feature_matrix, sentiment, coefficients)
            print ('iteration %*d: log likelihood of observed labels = %.8f' % \
                (int(np.ceil(np.log10(max_iter))), itr, lp))
    return coefficients

### Generate prediction

In [None]:
feature_matrix = feature_matrix 
sentiment = label_array
initial_coefficients = np.zeros(194)
step_size = 1e-7
max_iter = 301

In [None]:
variable_coefficients = logistic_regression(feature_matrix, sentiment, initial_coefficients, step_size, max_iter)

In [None]:
scores_new = np.dot(feature_matrix, variable_coefficients)

In [None]:
predicted_sentiment = np.array([+1 if s > 0 else -1 for s in scores_new])

In [None]:
float(sum(predicted_sentiment == sentiment))/len(sentiment)