# Implementing logistic regression from scratch

The goal of this notebook is to implement your own logistic regression classifier. You will:

 * Extract features from Amazon product reviews.
 * Convert an DataFrame into a NumPy array.
 * Implement the link function for logistic regression.
 * Write a function to compute the derivative of the log likelihood function with respect to a single coefficient.
 * Implement gradient ascent.
 * Given a set of coefficients, predict sentiments.
 * Compute classification accuracy for the logistic regression model.
 
Let's get started!
    

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

## Load review dataset

For this assignment, we will use a subset of the Amazon product review dataset. The subset was chosen to contain similar numbers of positive and negative reviews, as the original dataset consisted primarily of positive reviews.

In [2]:
products = pd.read_csv('amazon_baby.csv')

In [3]:
products = products[:30000]

In [4]:
products[:10]

Unnamed: 0,name,review,rating
0,Planetwise Flannel Wipes,"These flannel wipes are OK, but in my opinion ...",3
1,Planetwise Wipe Pouch,it came early and was not disappointed. i love...,5
2,Annas Dream Full Quilt with 2 Shams,Very soft and comfortable and warmer than it l...,5
3,Stop Pacifier Sucking without tears with Thumb...,This is a product well worth the purchase. I ...,5
4,Stop Pacifier Sucking without tears with Thumb...,All of my kids have cried non-stop when I trie...,5
5,Stop Pacifier Sucking without tears with Thumb...,"When the Binky Fairy came to our house, we did...",5
6,A Tale of Baby\'s Days with Peter Rabbit,"Lovely book, it\'s bound tightly so you may no...",4
7,"Baby Tracker&reg; - Daily Childcare Journal, S...",Perfect for new parents. We were able to keep ...,5
8,"Baby Tracker&reg; - Daily Childcare Journal, S...",A friend of mine pinned this product on Pinter...,5
9,"Baby Tracker&reg; - Daily Childcare Journal, S...",This has been an easy way for my nanny to reco...,4


One column of this dataset is 'sentiment', corresponding to the class label with +1 indicating a review with positive sentiment and -1 indicating one with negative sentiment.

In [5]:
products['sentiment'] = products['rating'].apply(lambda x:+1 if x>3 else -1)
print(products[:10])

                                                name  \
0                           Planetwise Flannel Wipes   
1                              Planetwise Wipe Pouch   
2                Annas Dream Full Quilt with 2 Shams   
3  Stop Pacifier Sucking without tears with Thumb...   
4  Stop Pacifier Sucking without tears with Thumb...   
5  Stop Pacifier Sucking without tears with Thumb...   
6           A Tale of Baby\'s Days with Peter Rabbit   
7  Baby Tracker&reg; - Daily Childcare Journal, S...   
8  Baby Tracker&reg; - Daily Childcare Journal, S...   
9  Baby Tracker&reg; - Daily Childcare Journal, S...   

                                              review  rating  sentiment  
0  These flannel wipes are OK, but in my opinion ...       3         -1  
1  it came early and was not disappointed. i love...       5          1  
2  Very soft and comfortable and warmer than it l...       5          1  
3  This is a product well worth the purchase.  I ...       5          1  
4  All of my 

Let us quickly explore more of this dataset.  The 'name' column indicates the name of the product.  Here we list the first 10 products in the dataset.  We then count the number of positive and negative reviews.

In [6]:
products.head(10)['name']

0                             Planetwise Flannel Wipes
1                                Planetwise Wipe Pouch
2                  Annas Dream Full Quilt with 2 Shams
3    Stop Pacifier Sucking without tears with Thumb...
4    Stop Pacifier Sucking without tears with Thumb...
5    Stop Pacifier Sucking without tears with Thumb...
6             A Tale of Baby\'s Days with Peter Rabbit
7    Baby Tracker&reg; - Daily Childcare Journal, S...
8    Baby Tracker&reg; - Daily Childcare Journal, S...
9    Baby Tracker&reg; - Daily Childcare Journal, S...
Name: name, dtype: object

In [7]:
print ('# of positive reviews =', len(products[products['sentiment']==1]) )
print ('# of negative reviews =', len(products[products['sentiment']==-1]) )

# of positive reviews = 22107
# of negative reviews = 7893


In [8]:
s=products[products['sentiment']==1]
y=products[products['sentiment']==-1]
print (len(y), len(s) )

7893 22107


**Note:** For this assignment, we eliminated class imbalance by choosing 
a subset of the data with a similar number of positive and negative reviews. 

## Apply text cleaning on the review data

In this section, we will perform some simple feature cleaning using **DataFrames**. The last assignment used all words in building bag-of-words features, but here we limit ourselves to 193 words (for simplicity). We compiled a list of 193 most frequent words into a JSON file. 

Now, we will load these words from this JSON file:

In [9]:
import json
with open('important_words.json', 'r') as f: # Reads the list of most frequent words
    important_words = json.load(f)
important_words = [str(s) for s in important_words]

In [10]:
print (important_words)

['baby', 'one', 'great', 'love', 'use', 'would', 'like', 'easy', 'little', 'seat', 'old', 'well', 'get', 'also', 'really', 'son', 'time', 'bought', 'product', 'good', 'daughter', 'much', 'loves', 'stroller', 'put', 'months', 'car', 'still', 'back', 'used', 'recommend', 'first', 'even', 'perfect', 'nice', 'bag', 'two', 'using', 'got', 'fit', 'around', 'diaper', 'enough', 'month', 'price', 'go', 'could', 'soft', 'since', 'buy', 'room', 'works', 'made', 'child', 'keep', 'size', 'small', 'need', 'year', 'big', 'make', 'take', 'easily', 'think', 'crib', 'clean', 'way', 'quality', 'thing', 'better', 'without', 'set', 'new', 'every', 'cute', 'best', 'bottles', 'work', 'purchased', 'right', 'lot', 'side', 'happy', 'comfortable', 'toy', 'able', 'kids', 'bit', 'night', 'long', 'fits', 'see', 'us', 'another', 'play', 'day', 'money', 'monitor', 'tried', 'thought', 'never', 'item', 'hard', 'plastic', 'however', 'disappointed', 'reviews', 'something', 'going', 'pump', 'bottle', 'cup', 'waste', 'retu

Now, we will perform 2 simple data transformations:

1. Remove punctuation using [Python's built-in](https://docs.python.org/2/library/string.html) string functionality.
2. Compute word counts (only for **important_words**)

We start with *Step 1* which can be done as follows:

In [11]:
def remove_punctuation(text):
    import string
    return str(text).translate( string.punctuation) 

products['review_clean'] = products['review'].apply(remove_punctuation)

In [12]:
print(products[:10])


                                                name  \
0                           Planetwise Flannel Wipes   
1                              Planetwise Wipe Pouch   
2                Annas Dream Full Quilt with 2 Shams   
3  Stop Pacifier Sucking without tears with Thumb...   
4  Stop Pacifier Sucking without tears with Thumb...   
5  Stop Pacifier Sucking without tears with Thumb...   
6           A Tale of Baby\'s Days with Peter Rabbit   
7  Baby Tracker&reg; - Daily Childcare Journal, S...   
8  Baby Tracker&reg; - Daily Childcare Journal, S...   
9  Baby Tracker&reg; - Daily Childcare Journal, S...   

                                              review  rating  sentiment  \
0  These flannel wipes are OK, but in my opinion ...       3         -1   
1  it came early and was not disappointed. i love...       5          1   
2  Very soft and comfortable and warmer than it l...       5          1   
3  This is a product well worth the purchase.  I ...       5          1   
4  All o

In [13]:
# here with help of  .iloc[:, j] get the required column from dataset  its important in derivative (feature_matirx)
products.iloc[:,4]

0        These flannel wipes are OK, but in my opinion ...
1        it came early and was not disappointed. i love...
2        Very soft and comfortable and warmer than it l...
3        This is a product well worth the purchase.  I ...
4        All of my kids have cried non-stop when I trie...
5        When the Binky Fairy came to our house, we did...
6        Lovely book, it\'s bound tightly so you may no...
7        Perfect for new parents. We were able to keep ...
8        A friend of mine pinned this product on Pinter...
9        This has been an easy way for my nanny to reco...
10       I love this journal and our nanny uses it ever...
11       This book is perfect!  I\'m a first time new m...
12       I originally just gave the nanny a pad of pape...
13       I thought keeping a simple handwritten journal...
14       Space for monthly photos, info and a lot of us...
15       I bought this calender for myself for my secon...
16       I love this little calender, you can keep trac.

In [14]:
products['review_clean']

0        These flannel wipes are OK, but in my opinion ...
1        it came early and was not disappointed. i love...
2        Very soft and comfortable and warmer than it l...
3        This is a product well worth the purchase.  I ...
4        All of my kids have cried non-stop when I trie...
5        When the Binky Fairy came to our house, we did...
6        Lovely book, it\'s bound tightly so you may no...
7        Perfect for new parents. We were able to keep ...
8        A friend of mine pinned this product on Pinter...
9        This has been an easy way for my nanny to reco...
10       I love this journal and our nanny uses it ever...
11       This book is perfect!  I\'m a first time new m...
12       I originally just gave the nanny a pad of pape...
13       I thought keeping a simple handwritten journal...
14       Space for monthly photos, info and a lot of us...
15       I bought this calender for myself for my secon...
16       I love this little calender, you can keep trac.

Now we proceed with *Step 2*. For each word in **important_words**, we compute a count for the number of times the word occurs in the review. We will store this count in a separate column (one for each word). The result of this feature processing is a single column for each word in **important_words** which keeps a count of the number of times the respective word occurs in the review text.


**Note:** There are several ways of doing this. In this assignment, we use the built-in *count* function for Python lists. Each review string is first split into individual words and the number of occurances of a given word is counted.

In [15]:
#it checks every word from important_words then from review is split and check the word is there or 
#not then its next review like that it continues for all words
for word in important_words:
    products[word] = products['review_clean'].apply(lambda s : s.split().count(word))
    #to check how the loop runs print below line
print (products[:10])

                                                name  \
0                           Planetwise Flannel Wipes   
1                              Planetwise Wipe Pouch   
2                Annas Dream Full Quilt with 2 Shams   
3  Stop Pacifier Sucking without tears with Thumb...   
4  Stop Pacifier Sucking without tears with Thumb...   
5  Stop Pacifier Sucking without tears with Thumb...   
6           A Tale of Baby\'s Days with Peter Rabbit   
7  Baby Tracker&reg; - Daily Childcare Journal, S...   
8  Baby Tracker&reg; - Daily Childcare Journal, S...   
9  Baby Tracker&reg; - Daily Childcare Journal, S...   

                                              review  rating  sentiment  \
0  These flannel wipes are OK, but in my opinion ...       3         -1   
1  it came early and was not disappointed. i love...       5          1   
2  Very soft and comfortable and warmer than it l...       5          1   
3  This is a product well worth the purchase.  I ...       5          1   
4  All o

The DataFrame **products** now contains one column for each of the 193 **important_words**. As an example, the column **perfect** contains a count of the number of times the word **perfect** occurs in each of the reviews.

In [16]:
products['perfect']

0        0
1        0
2        0
3        0
4        0
5        0
6        0
7        0
8        0
9        0
10       0
11       0
12       2
13       1
14       0
15       0
16       0
17       0
18       0
19       0
20       0
21       0
22       1
23       0
24       0
25       0
26       0
27       0
28       1
29       0
        ..
29970    0
29971    0
29972    0
29973    0
29974    0
29975    0
29976    0
29977    0
29978    1
29979    0
29980    0
29981    1
29982    0
29983    0
29984    0
29985    0
29986    0
29987    0
29988    0
29989    0
29990    0
29991    0
29992    0
29993    1
29994    0
29995    0
29996    0
29997    0
29998    0
29999    0
Name: perfect, Length: 30000, dtype: int64

In [17]:
#just checking what happened
products[:10]

Unnamed: 0,name,review,rating,sentiment,review_clean,baby,one,great,love,use,...,seems,picture,completely,wish,buying,babies,won,tub,almost,either
0,Planetwise Flannel Wipes,"These flannel wipes are OK, but in my opinion ...",3,-1,"These flannel wipes are OK, but in my opinion ...",0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
1,Planetwise Wipe Pouch,it came early and was not disappointed. i love...,5,1,it came early and was not disappointed. i love...,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
2,Annas Dream Full Quilt with 2 Shams,Very soft and comfortable and warmer than it l...,5,1,Very soft and comfortable and warmer than it l...,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Stop Pacifier Sucking without tears with Thumb...,This is a product well worth the purchase. I ...,5,1,This is a product well worth the purchase. I ...,0,0,0,2,0,...,0,0,0,0,0,0,0,0,0,0
4,Stop Pacifier Sucking without tears with Thumb...,All of my kids have cried non-stop when I trie...,5,1,All of my kids have cried non-stop when I trie...,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
5,Stop Pacifier Sucking without tears with Thumb...,"When the Binky Fairy came to our house, we did...",5,1,"When the Binky Fairy came to our house, we did...",0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
6,A Tale of Baby\'s Days with Peter Rabbit,"Lovely book, it\'s bound tightly so you may no...",4,1,"Lovely book, it\'s bound tightly so you may no...",0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,"Baby Tracker&reg; - Daily Childcare Journal, S...",Perfect for new parents. We were able to keep ...,5,1,Perfect for new parents. We were able to keep ...,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,"Baby Tracker&reg; - Daily Childcare Journal, S...",A friend of mine pinned this product on Pinter...,5,1,A friend of mine pinned this product on Pinter...,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,"Baby Tracker&reg; - Daily Childcare Journal, S...",This has been an easy way for my nanny to reco...,4,1,This has been an easy way for my nanny to reco...,2,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [18]:
col = products.columns
print(len(col),col)

198 Index(['name', 'review', 'rating', 'sentiment', 'review_clean', 'baby', 'one',
       'great', 'love', 'use',
       ...
       'seems', 'picture', 'completely', 'wish', 'buying', 'babies', 'won',
       'tub', 'almost', 'either'],
      dtype='object', length=198)


In [19]:
col = col[5:]
y = products['sentiment']
x = products[col]
print(x.shape,y.shape)

(30000, 193) (30000,)


Now, write some code to compute the number of product reviews that contain the word **perfect**.

**Hint**: 
* First create a column called `contains_perfect` which is set to 1 if the count of the word **perfect** (stored in column **perfect**) is >= 1.
* Sum the number of 1s in the column `contains_perfect`.

In [20]:
products['contains_perfect']=products['perfect'].apply(lambda s:+1 if s>=1 else 0)
products['contains_perfect'].sum()


1476

**Quiz Question**. How many reviews contain the word **perfect**?

## Convert DataFrame to NumPy array

As you have seen previously, NumPy is a powerful library for doing matrix manipulation. Let us convert our data to matrices and then implement our algorithms with matrices.

First, make sure you can perform the following import.

In [21]:
import numpy as np

We now provide you with a function that extracts columns from an DataFrame and converts them into a NumPy array. Two arrays are returned: one representing features and another representing class labels. Note that the feature matrix includes an additional column 'intercept' to take account of the intercept term.

In [22]:
def get_numpy_data(data_frame, features, label):
    
    data_frame['intercept'] = 1
    feature = ['intercept'] + features
    feature_matrix = data_frame[feature]
    label_array = label
    
    return(feature_matrix, label_array)

Let us convert the data into NumPy arrays.

In [23]:
# Warning: This may take a few minutes...
#print( x.columns , len(important_words) ) both are same 
feature_matrix, sentiment = get_numpy_data(x, important_words, y) 

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until


In [24]:
feature_matrix[:10]


Unnamed: 0,intercept,baby,one,great,love,use,would,like,easy,little,...,seems,picture,completely,wish,buying,babies,won,tub,almost,either
0,1,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,1,0,0,0,2,0,0,1,0,1,...,0,0,0,0,0,0,0,0,0,0
4,1,0,0,1,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
5,1,0,0,1,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,1,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,1,2,1,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0


In [25]:
print( feature_matrix.shape , sentiment.shape )

(30000, 194) (30000,)


** Quiz Question:** How many features are there in the **feature_matrix**?

** Quiz Question:** Assuming that the intercept is present, how does the number of features in **feature_matrix** relate to the number of features in the logistic regression model?

Now, let us see what the **sentiment** column looks like:

In [26]:
sentiment[:10]

0   -1
1    1
2    1
3    1
4    1
5    1
6    1
7    1
8    1
9    1
Name: sentiment, dtype: int64

## Estimating conditional probability with link function

Recall from lecture that the link function is given by:
$$
P(y_i = +1 | \mathbf{x}_i,\mathbf{w}) = \frac{1}{1 + \exp(-\mathbf{w}^T h(\mathbf{x}_i))},
$$

where the feature vector $h(\mathbf{x}_i)$ represents the word counts of **important_words** in the review  $\mathbf{x}_i$. Complete the following function that implements the link function:

In [27]:
'''
produces probablistic estimate for P(y_i = +1 | x_i, w).
estimate ranges between 0 and 1.
'''
def predict_probability(feature_matrix, coefficients):
    # Take dot product of feature_matrix and coefficients  
    # YOUR CODE HERE
    score = np.dot(feature_matrix , coefficients)
    
    # Compute P(y_i = +1 | x_i, w) using the link function
    # YOUR CODE HERE
    predictions = 1/(1+np.exp(-score))
    
    # return predictions
    return predictions

**Aside**. How the link function works with matrix algebra

Since the word counts are stored as columns in **feature_matrix**, each $i$-th row of the matrix corresponds to the feature vector $h(\mathbf{x}_i)$:
$$
[\text{feature_matrix}] =
\left[
\begin{array}{c}
h(\mathbf{x}_1)^T \\
h(\mathbf{x}_2)^T \\
\vdots \\
h(\mathbf{x}_N)^T
\end{array}
\right] =
\left[
\begin{array}{cccc}
h_0(\mathbf{x}_1) & h_1(\mathbf{x}_1) & \cdots & h_D(\mathbf{x}_1) \\
h_0(\mathbf{x}_2) & h_1(\mathbf{x}_2) & \cdots & h_D(\mathbf{x}_2) \\
\vdots & \vdots & \ddots & \vdots \\
h_0(\mathbf{x}_N) & h_1(\mathbf{x}_N) & \cdots & h_D(\mathbf{x}_N)
\end{array}
\right]
$$

By the rules of matrix multiplication, the score vector containing elements $\mathbf{w}^T h(\mathbf{x}_i)$ is obtained by multiplying **feature_matrix** and the coefficient vector $\mathbf{w}$.
$$
[\text{score}] =
[\text{feature_matrix}]\mathbf{w} =
\left[
\begin{array}{c}
h(\mathbf{x}_1)^T \\
h(\mathbf{x}_2)^T \\
\vdots \\
h(\mathbf{x}_N)^T
\end{array}
\right]
\mathbf{w}
= \left[
\begin{array}{c}
h(\mathbf{x}_1)^T\mathbf{w} \\
h(\mathbf{x}_2)^T\mathbf{w} \\
\vdots \\
h(\mathbf{x}_N)^T\mathbf{w}
\end{array}
\right]
= \left[
\begin{array}{c}
\mathbf{w}^T h(\mathbf{x}_1) \\
\mathbf{w}^T h(\mathbf{x}_2) \\
\vdots \\
\mathbf{w}^T h(\mathbf{x}_N)
\end{array}
\right]
$$

**Checkpoint**

Just to make sure you are on the right track, we have provided a few examples. If your `predict_probability` function is implemented correctly, then the outputs will match:

In [28]:
dummy_feature_matrix = np.array([[1.,2.,3.], [1.,-1.,-1]])
dummy_coefficients = np.array([1., 3., -1.])

correct_scores      = np.array( [ 1.*1. + 2.*3. + 3.*(-1.),          1.*1. + (-1.)*3. + (-1.)*(-1.) ] )
correct_predictions = np.array( [ 1./(1+np.exp(-correct_scores[0])), 1./(1+np.exp(-correct_scores[1])) ] )

print ('The following outputs must match ' )
print ('------------------------------------------------' )
print ('correct_predictions           =', correct_predictions )
print ('output of predict_probability =', predict_probability(dummy_feature_matrix, dummy_coefficients) )

The following outputs must match 
------------------------------------------------
correct_predictions           = [0.98201379 0.26894142]
output of predict_probability = [0.98201379 0.26894142]


## Compute derivative of log likelihood with respect to a single coefficient

Recall from lecture:
$$
\frac{\partial\ell}{\partial w_j} = \sum_{i=1}^N h_j(\mathbf{x}_i)\left(\mathbf{1}[y_i = +1] - P(y_i = +1 | \mathbf{x}_i, \mathbf{w})\right)
$$

We will now write a function that computes the derivative of log likelihood with respect to a single coefficient $w_j$. The function accepts two arguments:
* `errors` vector containing $\mathbf{1}[y_i = +1] - P(y_i = +1 | \mathbf{x}_i, \mathbf{w})$ for all $i$.
* `feature` vector containing $h_j(\mathbf{x}_i)$  for all $i$. 

Complete the following code block:

In [29]:
def feature_derivative(errors, feature):     
    # Compute the dot product of errors and feature
    derivative = np.dot(errors , feature)
    #derivative = np.dot(feature ,errors)
    
    # Return the derivative
    return derivative

In the main lecture, our focus was on the likelihood.  In the advanced optional video, however, we introduced a transformation of this likelihood---called the log likelihood---that simplifies the derivation of the gradient and is more numerically stable.  Due to its numerical stability, we will use the log likelihood instead of the likelihood to assess the algorithm.

The log likelihood is computed using the following formula (see the advanced optional video if you are curious about the derivation of this equation):

$$\ell\ell(\mathbf{w}) = \sum_{i=1}^N \Big( (\mathbf{1}[y_i = +1] - 1)\mathbf{w}^T h(\mathbf{x}_i) - \ln\left(1 + \exp(-\mathbf{w}^T h(\mathbf{x}_i))\right) \Big) $$

We provide a function to compute the log likelihood for the entire dataset. 

In [30]:
def compute_log_likelihood(feature_matrix, sentiment, coefficients):
    indicator = ( sentiment ==+1 )
    scores = np.dot(feature_matrix, coefficients)
    logexp = np.log(1. + np.exp(-scores))
    
    # Simple check to prevent overflow
    mask = np.isinf(logexp)
    logexp[mask] = -scores[mask]
    
    lp = np.sum((indicator-1)*scores - logexp)
    return lp

**Checkpoint**

Just to make sure we are on the same page, run the following code block and check that the outputs match.

In [31]:
dummy_feature_matrix = np.array([[1.,2.,3.], [1.,-1.,-1]])
dummy_coefficients = np.array([1., 3., -1.])
dummy_sentiment = np.array([-1, 1])

correct_indicators  = np.array( [ -1==+1,                                       1==+1 ] )
correct_scores      = np.array( [ 1.*1. + 2.*3. + 3.*(-1.),                     1.*1. + (-1.)*3. + (-1.)*(-1.) ] )
correct_first_term  = np.array( [ (correct_indicators[0]-1)*correct_scores[0],  (correct_indicators[1]-1)*correct_scores[1] ] )
correct_second_term = np.array( [ np.log(1. + np.exp(-correct_scores[0])),      np.log(1. + np.exp(-correct_scores[1])) ] )

correct_ll          =      sum( [ correct_first_term[0]-correct_second_term[0], correct_first_term[1]-correct_second_term[1] ] ) 

print ('The following outputs must match ' )
print ('------------------------------------------------' )
print ('correct_log_likelihood           =', correct_ll )
print ('output of compute_log_likelihood =', compute_log_likelihood(dummy_feature_matrix, dummy_sentiment, dummy_coefficients) )

The following outputs must match 
------------------------------------------------
correct_log_likelihood           = -5.331411615436032
output of compute_log_likelihood = -5.331411615436032


## Taking gradient steps

Now we are ready to implement our own logistic regression. All we have to do is to write a gradient ascent function that takes gradient steps towards the optimum. 

Complete the following function to solve the logistic regression model using gradient ascent:

In [32]:
from math import sqrt

def logistic_regression(feature_matrix, sentiment, initial_coefficients, step_size, max_iter):
    coefficients = np.array(initial_coefficients) # make sure it's a numpy array
    
    for itr in range(max_iter):

        # Predict P(y_i = +1|x_i,w) using your predict_probability() function
        # YOUR CODE HERE
        predictions = predict_probability(coefficients, np.transpose(feature_matrix))
        
        # Compute indicator value for (y_i = +1)
        indicator = (sentiment==+1)
        
        # Compute the errors as indicator - predictions
        errors = indicator - predictions
        for j in range(len(coefficients)): # loop over each coefficient
            
            # Recall that feature_matrix[:,j] is the feature column associated with coefficients[j].
            # Compute the derivative for coefficients[j]. Save it in a variable called derivative
            # YOUR CODE HERE
            #derivative = np.sum(feature_derivative(errors, feature_matrix))
            derivative = (feature_derivative(errors, feature_matrix.iloc[:,j]))
            #print(derivative[j])
            # add the step size times the derivative to the current coefficient
            ## YOUR CODE HERE
            coefficients[j] = coefficients[j] + step_size * derivative

        
        # Checking whether log likelihood is increasing
        if itr <= 15 or (itr <= 100 and itr % 10 == 0) or (itr <= 1000 and itr % 100 == 0) \
        or (itr <= 10000 and itr % 1000 == 0) or itr % 10000 == 0:
            lp = compute_log_likelihood(feature_matrix, sentiment, coefficients)
            #print (int(lp) )
            print ('iteration %*d: log likelihood of observed labels = %.8f' % \
               (int(np.ceil(np.log10(max_iter))), itr, lp) )
    return coefficients

Now, let us run the logistic regression solver.

In [33]:
coefficients = logistic_regression(feature_matrix, sentiment, initial_coefficients=np.zeros(194),
                                   step_size=1e-5, max_iter=500)

iteration   0: log likelihood of observed labels = -19545.36719511
iteration   1: log likelihood of observed labels = -18826.91634511
iteration   2: log likelihood of observed labels = -18378.65651523
iteration   3: log likelihood of observed labels = -18075.20665786
iteration   4: log likelihood of observed labels = -17854.07202249
iteration   5: log likelihood of observed labels = -17682.22673236
iteration   6: log likelihood of observed labels = -17541.28712270
iteration   7: log likelihood of observed labels = -17420.57898980
iteration   8: log likelihood of observed labels = -17313.69029264
iteration   9: log likelihood of observed labels = -17216.66015866
iteration  10: log likelihood of observed labels = -17126.98252047
iteration  11: log likelihood of observed labels = -17043.03666830
iteration  12: log likelihood of observed labels = -16963.75151889
iteration  13: log likelihood of observed labels = -16888.40262802
iteration  14: log likelihood of observed labels = -16816.4869

In [35]:
coef_high = logistic_regression(feature_matrix, sentiment, initial_coefficients=np.zeros(194),
                                   step_size=1e-8, max_iter=500)

iteration   0: log likelihood of observed labels = -20792.98541123
iteration   1: log likelihood of observed labels = -20791.55613479
iteration   2: log likelihood of observed labels = -20790.12758709
iteration   3: log likelihood of observed labels = -20788.69976770
iteration   4: log likelihood of observed labels = -20787.27267624
iteration   5: log likelihood of observed labels = -20785.84631227
iteration   6: log likelihood of observed labels = -20784.42067541
iteration   7: log likelihood of observed labels = -20782.99576525
iteration   8: log likelihood of observed labels = -20781.57158137
iteration   9: log likelihood of observed labels = -20780.14812338
iteration  10: log likelihood of observed labels = -20778.72539087
iteration  11: log likelihood of observed labels = -20777.30338342
iteration  12: log likelihood of observed labels = -20775.88210065
iteration  13: log likelihood of observed labels = -20774.46154213
iteration  14: log likelihood of observed labels = -20773.0417

In [36]:
# not that imporatnt because it is using all features at a time for derivative 

predictions = predict_probability(coefficients, np.transpose(feature_matrix))
print(predictions)
indicator = (sentiment==+1)
print(indicator)
error = indicator - predictions
s = feature_derivative(error , feature_matrix  )
print(s.shape ,s)

[0.78144707 0.92104537 0.88484785 ... 0.56098678 0.70072783 0.9363234 ]
0        False
1         True
2         True
3         True
4         True
5         True
6         True
7         True
8         True
9         True
10        True
11        True
12        True
13       False
14        True
15        True
16        True
17        True
18        True
19        True
20        True
21       False
22        True
23       False
24        True
25        True
26        True
27       False
28        True
29        True
         ...  
29970     True
29971     True
29972     True
29973     True
29974     True
29975     True
29976    False
29977    False
29978     True
29979     True
29980     True
29981     True
29982     True
29983     True
29984     True
29985    False
29986    False
29987     True
29988    False
29989     True
29990     True
29991     True
29992     True
29993     True
29994    False
29995     True
29996     True
29997     True
29998     True
29999     True
Name: sentime

In [37]:
print(coefficients.shape ,coefficients)

(194,) [ 8.44264604e-01  3.90906577e-02  7.89270828e-02  5.59669292e-01
  1.02522247e+00 -1.44217923e-02 -2.30696410e-01 -4.13455279e-02
  8.61789722e-01  3.63730298e-01 -1.28736088e-01  1.30370284e-01
  2.76202765e-01 -1.35780267e-01  1.25872302e-01 -5.31230419e-03
  3.75767490e-02 -1.29385680e-01 -9.19790317e-02 -3.05009965e-01
 -7.75940367e-03  1.70652304e-01 -4.34321379e-02  9.62557232e-01
 -8.86482662e-02  2.24590813e-02  1.21663925e-02  1.73191308e-01
  1.51294270e-01 -1.64795273e-01  2.13394764e-01  5.79015710e-01
  1.04096399e-01 -2.09375905e-01  5.81533447e-01  2.38014735e-01
 -7.20528720e-02 -3.39971495e-02  5.51062802e-02 -1.30624863e-01
 -6.45479842e-02  9.57053024e-02 -6.94901038e-03  6.94793011e-04
 -1.52794571e-01  1.54469256e-01  4.20406466e-02 -1.45190461e-01
  2.31372353e-01  1.00054471e-01 -2.39166322e-01  1.80228560e-01
  3.29498339e-01 -1.36585272e-01 -4.26061288e-02  1.47974675e-01
  1.45596218e-01 -9.53597739e-02  1.89948975e-01 -1.56255075e-02
  2.67251180e-03 -

**Quiz Question:** As each iteration of gradient ascent passes, does the log likelihood increase or decrease?

## Predicting sentiments

Recall from lecture that class predictions for a data point $\mathbf{x}$ can be computed from the coefficients $\mathbf{w}$ using the following formula:
$$
\hat{y}_i = 
\left\{
\begin{array}{ll}
      +1 & \mathbf{x}_i^T\mathbf{w} > 0 \\
      -1 & \mathbf{x}_i^T\mathbf{w} \leq 0 \\
\end{array} 
\right.
$$

Now, we will write some code to compute class predictions. We will do this in two steps:
* **Step 1**: First compute the **scores** using **feature_matrix** and **coefficients** using a dot product.
* **Step 2**: Using the formula above, compute the class predictions from the scores.

Step 1 can be implemented as follows:

In [38]:
# Compute the scores as a dot product between feature_matrix and coefficients.
scores = np.dot(feature_matrix, coefficients)
#scores1 = pd.DataFrame(scores)
print(scores)
pred = predict_probability(coefficients, np.transpose(feature_matrix))
#pred = pd.DataFrame(predictions)
print(pred)


[1.2741192  2.45663586 2.03916138 ... 0.24516783 0.85076613 2.68814384]
[0.78144707 0.92104537 0.88484785 ... 0.56098678 0.70072783 0.9363234 ]


Now, complete the following code block for **Step 2** to compute the class predictions using the **scores** obtained above:

In [44]:
#class_predictions=[1 for i in scores if i>1 ]
#class_predictions=pred[0].apply(lambda x:1 if x>0.5 else -1 )
class_pred = list()
for i in scores:
    if i>1:
        class_pred.append(1)
    else:
        class_pred.append(-1)
#class_predictions
print(class_pred)

[1, 1, 1, 1, 1, 1, 1, -1, -1, 1, 1, 1, -1, -1, -1, 1, 1, -1, -1, -1, 1, -1, 1, -1, 1, 1, -1, -1, 1, -1, 1, -1, 1, 1, 1, 1, 1, 1, -1, 1, 1, -1, 1, 1, -1, 1, 1, 1, 1, 1, -1, -1, 1, -1, 1, 1, 1, -1, -1, 1, -1, -1, 1, 1, 1, 1, 1, 1, 1, -1, -1, -1, 1, -1, 1, 1, 1, 1, -1, 1, 1, -1, 1, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -1, -1, 1, 1, 1, 1, 1, 1, 1, -1, 1, 1, -1, -1, 1, 1, -1, -1, 1, 1, -1, 1, -1, -1, -1, -1, -1, -1, -1, -1, 1, 1, 1, -1, 1, -1, 1, 1, -1, 1, 1, 1, -1, 1, 1, -1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, 1, 1, 1, -1, 1, 1, 1, 1, 1, -1, 1, 1, -1, 1, -1, 1, 1, 1, -1, -1, 1, -1, 1, -1, 1, 1, 1, -1, -1, 1, 1, 1, -1, -1, 1, -1, 1, -1, 1, 1, -1, 1, 1, -1, -1, -1, 1, 1, -1, 1, 1, 1, -1, 1, 1, -1, 1, -1, 1, -1, 1, 1, 1, -1, 1, -1, -1, -1, 1, 1, 1, 1, -1, 1, 1, 1, 1, 1, -1, -1, -1, 1, 1, 1, 1, -1, -1, 1, 1, 1, 1, 1, 1, 1, 1, -1, 1, 1, 1, -1, 1, -1, -1, 1, 1, 1, 1, 1

** Quiz Question: ** How many reviews were predicted to have positive sentiment?

In [51]:
#class_pred[class_pred==1]
p = 0
for i in class_pred:
    if i == 1:
        p +=1 
print(p)


17239


In [56]:
# measuring accuracy with the data above

l = len(class_pred)
true = 0
for i in range(l):
    if class_pred[i] == y[i]:
        true += 1
no_mistakes = l-true      
acc = 1*true/l
print(no_mistakes , acc)

8558 0.7147333333333333


## Measuring accuracy

We will now measure the classification accuracy of the model. Recall from the lecture that the classification accuracy can be computed as follows:

$$
\mbox{accuracy} = \frac{\mbox{# correctly classified data points}}{\mbox{# total data points}}
$$

Complete the following code block to compute the accuracy of the model.

In [59]:
num_correct = (class_pred == y).sum()
num_mistakes = len(products) - num_correct
accuracy = 1.0 * num_correct/len(products)
print ("-----------------------------------------------------" )
print ('# Reviews   correctly classified =', len(products) - num_mistakes )
print ('# Reviews incorrectly classified =', num_mistakes )
print ('# Reviews total                  =', len(products) )
print ("-----------------------------------------------------" )
print ('Accuracy = %.2f' % accuracy )

-----------------------------------------------------
# Reviews   correctly classified = 21442
# Reviews incorrectly classified = 8558
# Reviews total                  = 30000
-----------------------------------------------------
Accuracy = 0.71


**Quiz Question**: What is the accuracy of the model on predictions made above? (round to 2 digits of accuracy)

## Which words contribute most to positive & negative sentiments?

Recall that in Module 2 assignment, we were able to compute the "**most positive words**". These are words that correspond most strongly with positive reviews. In order to do this, we will first do the following:
* Treat each coefficient as a tuple, i.e. (**word**, **coefficient_value**).
* Sort all the (**word**, **coefficient_value**) tuples by **coefficient_value** in descending order.

In [60]:
coefficients = list(coefficients[1:]) # exclude intercept
word_coefficient_tuples = [(word, coefficient) for word, coefficient in zip(important_words, coefficients)]
word_coefficient_tuples = sorted(word_coefficient_tuples, key=lambda x:x[1], reverse=True)

Now, **word_coefficient_tuples** contains a sorted list of (**word**, **coefficient_value**) tuples. The first 10 elements in this list correspond to the words that are most positive.

### Ten "most positive" words

Now, we compute the 10 words that have the most positive coefficient values. These words are associated with positive sentiment.

In [61]:
print (word_coefficient_tuples[0:10] )


[('love', 1.0252224678198214), ('loves', 0.9625572316787286), ('easy', 0.8617897215325037), ('best', 0.6269144016944348), ('perfect', 0.5815334470696762), ('recommend', 0.5790157099359071), ('great', 0.5596692924481533), ('fits', 0.49855775664093194), ('happy', 0.46557022006809945), ('little', 0.3637302979328063)]


** Quiz Question:** Which word is **not** present in the top 10 "most positive" words?

### Ten "most negative" words

Next, we repeat this exercise on the 10 most negative words.  That is, we compute the 10 words that have the most negative coefficient values. These words are associated with negative sentiment.

In [62]:
print (word_coefficient_tuples[len(word_coefficient_tuples)-10:len(word_coefficient_tuples)] )


[('work', -0.36737616170082), ('idea', -0.3733741791744676), ('difficult', -0.4169904979077553), ('broke', -0.4193726550395553), ('thought', -0.49097705223766563), ('money', -0.5117640044104614), ('returned', -0.5482206262952579), ('return', -0.5804211777483826), ('disappointed', -0.7124445870181867), ('waste', -0.7856176723185628)]


** Quiz Question:** Which word is **not** present in the top 10 "most negative" words?