# Implementing Logistic Regression From Scratch

The goal of this notebook is to implement your own logistic regression classifier. You will:

 * Extract features from Amazon product reviews.
 * Convert an SFrame into a NumPy array.
 * Implement the link function for logistic regression.
 * Write a function to compute the derivative of the log likelihood function with respect to a single coefficient.
 * Implement gradient ascent.
 * Given a set of coefficients, predict sentiments.
 * Compute classification accuracy for the logistic regression model.
 
Let's get started!

# Import Neccessary Libraries

In [1]:
import pandas as pd
import numpy as np
import math
import string
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction import DictVectorizer
from sklearn.feature_extraction.text import CountVectorizer

## Load review dataset

For this assignment, we will use a subset of the Amazon product review dataset. The subset was chosen to contain similar numbers of positive and negative reviews, as the original dataset consisted primarily of positive reviews.

Load the dataset into a data frame named products. One column of this dataset is sentiment, corresponding to the class label with +1 indicating a review with positive sentiment and -1 for negative sentiment.

In [2]:
products = pd.read_csv('amazon_baby_subset.csv')
products

Unnamed: 0,name,review,rating,sentiment
0,Stop Pacifier Sucking without tears with Thumb...,All of my kids have cried non-stop when I trie...,5,1
1,Nature's Lullabies Second Year Sticker Calendar,We wanted to get something to keep track of ou...,5,1
2,Nature's Lullabies Second Year Sticker Calendar,My daughter had her 1st baby over a year ago. ...,5,1
3,"Lamaze Peekaboo, I Love You","One of baby's first and favorite books, and it...",4,1
4,SoftPlay Peek-A-Boo Where's Elmo A Children's ...,Very cute interactive book! My son loves this ...,5,1
...,...,...,...,...
53067,"Samsung Baby Care Washer, Stainless Platinum, ...","My infant goes to a really crappy daycare, and...",1,-1
53068,"Mud Pie Milestone Stickers, Boy",Pretty please open and inspect these stickers ...,1,-1
53069,Best BIB for Baby - Soft Bib (Pink-Elephant),Great 5-Star Product but An Obvious knock-off ...,1,-1
53070,Bouncy&reg; Inflatable Real Feel Hopping Cow,When I received the item my initial thought wa...,2,-1


Let us quickly explore more of this dataset. The name column indicates the name of the product. Try listing the name of the first 10 products in the dataset.

After that, try counting the number of positive and negative reviews.

**Note:** For this assignment, we eliminated class imbalance by choosing a subset of the data with a similar number of positive and negative reviews.

In [3]:
products['name'].iloc[0:10]

0    Stop Pacifier Sucking without tears with Thumb...
1      Nature's Lullabies Second Year Sticker Calendar
2      Nature's Lullabies Second Year Sticker Calendar
3                          Lamaze Peekaboo, I Love You
4    SoftPlay Peek-A-Boo Where's Elmo A Children's ...
5                            Our Baby Girl Memory Book
6    Hunnt&reg; Falling Flowers and Birds Kids Nurs...
7    Blessed By Pope Benedict XVI Divine Mercy Full...
8    Cloth Diaper Pins Stainless Steel Traditional ...
9    Cloth Diaper Pins Stainless Steel Traditional ...
Name: name, dtype: object

## Apply text cleaning on the review data

In this section, we will perform some simple feature cleaning using **data frames**. The last assignment used all words in building bag-of-words features, but here we limit ourselves to 193 words (for simplicity). We compiled a list of 193 most frequent words into the JSON file named **important_words.json**. Load the words into a list **important_words**.

In [4]:
important_words = pd.read_json('important_words.json')
important_words = important_words.rename({0: 'words'}, axis=1)
important_words = list(important_words['words'])

Now, we will perform 2 simple data transformations:

1. Remove punctuation using [Python's built-in](https://docs.python.org/2/library/string.html) string functionality.
2. Compute word counts (only for **important_words**)

We start with *Step 1* which can be done as follows:

* if your tool supports it, fill n/a values in the review column with empty strings. The n/a values indicate empty reviews. For instance, Pandas's the fillna() method lets you replace all N/A's in the review columns as follows:

In [5]:
products = products.fillna({'review':''})  # fill in N/A's in the review column

Write a function **remove_punctuation** that takes a line of text and removes all punctuation from that text.

In [6]:
def remove_punctuation(text):
    text = text.translate(str.maketrans('','',string.punctuation)) 
    
    return text

Apply the **remove_punctuation** function on every element of the **review** column and assign the result to the new column **review_clean**. Note. Many data frame packages support **apply** operation for this type of task. Consult appropriate manuals.

In [7]:
products['review_clean'] = products['review'].apply(remove_punctuation)

Now we proceed with the *Step 2*. For each word in **important_words**, we compute a count for the number of times the word occurs in the review. We will store this count in a separate column (one for each word). The result of this feature processing is a single column for each word in **important_words** which keeps a count of the number of times the respective word occurs in the review text.

**Note:** There are several ways of doing this. One way is to create an anonymous function that counts the occurrence of a particular word and apply it to every element in the **review_clean column**. Repeat this step for every word in important_words.

In [8]:
for word in important_words:
    products[word] = products['review_clean'].apply(lambda s : s.split().count(word))

The data frame **products** should contain one column for each of the 193 **important_words**. As an example, the column **perfect** contains a count of the number of times the word **perfect** occurs in each of the reviews.

In [9]:
products['perfect']

0        0
1        0
2        0
3        1
4        0
        ..
53067    0
53068    0
53069    0
53070    0
53071    0
Name: perfect, Length: 53072, dtype: int64

Now, write some code to compute the number of product reviews that contain the word **perfect**.

**Hint:**

First create a column called `contains_perfect` which is set to 1 if the count of the word **perfect** (stored in column perfect is >= 1.
Sum the number of 1s in the column `contains_perfect`.

In [10]:
products['contains_perfect'] = products['perfect'].apply(lambda pf: 1 if pf >=1 else 0)
contains_perfect = sum(products['contains_perfect'] == 1)
contains_perfect

2955

<font color='steelblue'><b> Quiz : How many reviews contain the word <i>perfect</i>? </b></font>

<font color='mediumvioletred'><b> Answer : {{contains_perfect}} reviews contain the word <i>perfect</i> </b></font>

## Convert Data Frame to Multi-Dimensional Array

It is now time to convert our data frame to a multi-dimensional array. Look for a package that provides a highly optimized matrix operations. In the case of Python, NumPy is a good choice.

Write a function that extracts columns from a data frame and converts them into a multi-dimensional array. We plan to use them throughout the course, so make sure to get this function right.

The function should accept three parameters:

* **dataframe**: a data frame to be converted
* **features**: a list of string, containing the names of the columns that are used as features.
* **label**: a string, containing the name of the single column that is used as class labels.


The function should return two values:

* one 2D array for features
* one 1D array for class labels


The function should do the following:

* Prepend a new **column** constant to **dataframe** and fill it with 1's. This column takes account of the intercept term. Make sure that the constant **column appears** first in the data frame.
* Prepend a string `constant` to the list **features**. Make sure the string `constant` appears first in the list.
* Extract columns in **dataframe** whose names appear in the list **features**.
* Convert the extracted columns into a 2D array using a function in the data frame library. For Pandas, you would use .values function.
* Extract the single column in **dataframe** whose name corresponds to the string **label**.
* Convert the column into a 1D array.
* Return the 2D array and the 1D array.

In [11]:
def get_numpy_data(dataframe, features, label):
    dataframe['constant'] = 1
    features = ['constant'] + features
    features_frame = dataframe[features]
    feature_matrix = features_frame.values
    label_sarray = dataframe[label]
    label_array = label_sarray.values
    
    return(feature_matrix, label_array)

Using the function written above, extract two arrays **feature_matrix** and **sentiment**. The 2D array feature_matrix would contain the content of the columns given by the list **important_words**. The 1D array **sentiment** would contain the content of the column **sentiment**.

In [12]:
feature_matrix, sentiment = get_numpy_data(products, important_words, 'sentiment')

# The code below is to answer the following quiz questions

num_feats_feature_matrix = len(feature_matrix)
num_feats_log_reg = 121712

print('There are', num_feats_feature_matrix, 'features in the feature_matrix\n')

relation = 'None'

if num_feats_log_reg == num_feats_feature_matrix + 1 :
    relation = 'y = x + 1'
    print('Reation Between number of features in features_matrix(x) and number of features in logistic regression(y) is :' 
          '\n', relation)
    
elif num_feats_log_reg == num_feats_feature_matrix :
    relation = 'y = x'
    print('Reation Between number of features in features_matrix(x) and number of features in logistic regression(y) is :' 
          '\n', relation)

elif num_feats_log_reg == num_feats_feature_matrix - 1 :
    relation = 'y = x - 1'
    print('Reation Between number of features in features_matrix(x) and number of features in logistic regression(y) is :' 
          '\n', relation)
    
else :
    relation = str(str('y - ') + str(num_feats_log_reg) + str(' = m(x - ') + str(num_feats_feature_matrix) + str(')'))
    print('Reation Between number of features in features_matrix(x) and number of features in logistic regression(y) is :' 
          '\n', relation)

There are 53072 features in the feature_matrix

Reation Between number of features in features_matrix(x) and number of features in logistic regression(y) is :
 y - 121712 = m(x - 53072)


<font color='steelblue'><b> Quiz 1 : How many features are there in `feature_matrix`? </b></font>

<font color='mediumvioletred'><b> Answer 1 : There are {{num_feats_feature_matrix}} features in the feature_matrix </b></font>

<br/>

<font color='steelblue'><b> Quiz 2 : Assuming that the intercept is present, how does the number of features in `feature_matrix` relate to the number of features in the `logistic regression` model? Let x = number of features in feature_matrix and y = number of features in logistic regression model. </b></font>

<font color='mediumvioletred'><b> Answer 2 : Relation Between number of features in features_matrix(x) and number of features in logistic regression(y) is : </b></font>
<font color='slategray'><b> <nbps><nbps><nbps><nbps> {{relation}} </b></font>

## Estimating conditional probability with link function

Recall from lecture that the link function is given by:
$$
P(y_i = +1 | \mathbf{x}_i,\mathbf{w}) = \frac{1}{1 + \exp(-\mathbf{w}^T h(\mathbf{x}_i))},
$$

where the feature vector $h(\mathbf{x}_i)$ represents the word counts of **important_words** in the review  $\mathbf{x}_i$. Write a function named predict_probability that implements the link function.

* Take two parameters: **feature_matrix** and **coefficients**.
* First compute the dot product of **feature_matrix** and **coefficients**.
* Then compute the link function **$P(y=+1|x,w)$**
* Return the predictions given by the link function.

In [13]:
def predict_probability(feature_matrix, coefficients):
    # Take dot product of feature_matrix and coefficients  
    score = np.dot(feature_matrix, coefficients)
    
    # Compute P(y_i = +1 | x_i, w) using the link function
    predictions = 1 / (1 + np.exp(-score))
    
    return predictions

**Aside**. How the link function works with matrix algebra

Since the word counts are stored as columns in **feature_matrix**, each $i$-th row of the matrix corresponds to the feature vector $h(\mathbf{x}_i)$:
$$
[\text{feature_matrix}] =
\left[
\begin{array}{c}
h(\mathbf{x}_1)^T \\
h(\mathbf{x}_2)^T \\
\vdots \\
h(\mathbf{x}_N)^T
\end{array}
\right] =
\left[
\begin{array}{cccc}
h_0(\mathbf{x}_1) & h_1(\mathbf{x}_1) & \cdots & h_D(\mathbf{x}_1) \\
h_0(\mathbf{x}_2) & h_1(\mathbf{x}_2) & \cdots & h_D(\mathbf{x}_2) \\
\vdots & \vdots & \ddots & \vdots \\
h_0(\mathbf{x}_N) & h_1(\mathbf{x}_N) & \cdots & h_D(\mathbf{x}_N)
\end{array}
\right]
$$

By the rules of matrix multiplication, the score vector containing elements $\mathbf{w}^T h(\mathbf{x}_i)$ is obtained by multiplying **feature_matrix** and the coefficient vector $\mathbf{w}$.
$$
[\text{score}] =
[\text{feature_matrix}]\mathbf{w} =
\left[
\begin{array}{c}
h(\mathbf{x}_1)^T \\
h(\mathbf{x}_2)^T \\
\vdots \\
h(\mathbf{x}_N)^T
\end{array}
\right]
\mathbf{w}
= \left[
\begin{array}{c}
h(\mathbf{x}_1)^T\mathbf{w} \\
h(\mathbf{x}_2)^T\mathbf{w} \\
\vdots \\
h(\mathbf{x}_N)^T\mathbf{w}
\end{array}
\right]
= \left[
\begin{array}{c}
\mathbf{w}^T h(\mathbf{x}_1) \\
\mathbf{w}^T h(\mathbf{x}_2) \\
\vdots \\
\mathbf{w}^T h(\mathbf{x}_N)
\end{array}
\right]
$$

**Checkpoint**

The code below should ensure that the `predict_probability` function is correctly implemented. If so, the outputs match.

In [14]:
dummy_feature_matrix = np.array([[1.,2.,3.], [1.,-1.,-1]])
dummy_coefficients = np.array([1., 3., -1.])

correct_scores      = np.array( [ 1.*1. + 2.*3. + 3.*(-1.),          1.*1. + (-1.)*3. + (-1.)*(-1.) ] )
correct_predictions = np.array( [ 1./(1+np.exp(-correct_scores[0])), 1./(1+np.exp(-correct_scores[1])) ] )

print('The following outputs must match ')
print('------------------------------------------------')
print('correct_predictions           =', correct_predictions)
print('output of predict_probability =', predict_probability(dummy_feature_matrix, dummy_coefficients))

The following outputs must match 
------------------------------------------------
correct_predictions           = [0.98201379 0.26894142]
output of predict_probability = [0.98201379 0.26894142]


## Compute derivative of log likelihood with respect to a single coefficient

Recall from lecture:
$$
\frac{\partial\ell}{\partial w_j} = \sum_{i=1}^N h_j(\mathbf{x}_i)\left(\mathbf{1}[y_i = +1] - P(y_i = +1 | \mathbf{x}_i, \mathbf{w})\right)
$$

We will now write a function that computes the derivative of log likelihood with respect to a single coefficient $w_j$. The function accepts two arguments:
* **errors** vector containing $\mathbf{1}[y_i = +1] - P(y_i = +1 | \mathbf{x}_i, \mathbf{w})$ for all $i$.
* **feature** vector containing $h_j(\mathbf{x}_i)$  for all $i$. 

This corresponds to the j-th column of **feature_matrix**.

The function should do the following:

* Take two parameters errors and feature.
* Compute the dot product of errors and feature.
* Return the dot product. This is the derivative with respect to a single coefficient w_j.

In [15]:
def feature_derivative(errors, feature):     
    # Compute the dot product of errors and feature
    derivative = np.dot(errors, feature)
    
    # Return the derivative
    return derivative

In the main lecture, our focus was on the likelihood.  In the advanced optional video, however, we introduced a transformation of this likelihood---called the log likelihood---that simplifies the derivation of the gradient and is more numerically stable.  Due to its numerical stability, we will use the log likelihood instead of the likelihood to assess the algorithm.

The log likelihood is computed using the following formula (see the advanced optional video if you are curious about the derivation of this equation):

$$\ell\ell(\mathbf{w}) = \sum_{i=1}^N \Big( (\mathbf{1}[y_i = +1] - 1)\mathbf{w}^T h(\mathbf{x}_i) - \ln\left(1 + \exp(-\mathbf{w}^T h(\mathbf{x}_i))\right) \Big) $$

Write a function compute_log_likelihood that implements the equation

In [16]:
def compute_log_likelihood(feature_matrix, sentiment, coefficients):
    indicator = (sentiment==+1)
    scores = np.dot(feature_matrix, coefficients)
    lp = np.sum((indicator-1)*scores - np.log(1. + np.exp(-scores)))
    return lp

**Checkpoint**

The code below should ensure that the `compute_log_likelihood` function is correctly implemented. If so, the outputs match.

In [17]:
dummy_feature_matrix = np.array([[1.,2.,3.], [1.,-1.,-1]])
dummy_coefficients = np.array([1., 3., -1.])
dummy_sentiment = np.array([-1, 1])

correct_indicators  = np.array( [ -1==+1,                                       1==+1 ] )
correct_scores      = np.array( [ 1.*1. + 2.*3. + 3.*(-1.),                     1.*1. + (-1.)*3. + (-1.)*(-1.) ] )
correct_first_term  = np.array( [ (correct_indicators[0]-1)*correct_scores[0],  (correct_indicators[1]-1)*correct_scores[1] ] )
correct_second_term = np.array( [ np.log(1. + np.exp(-correct_scores[0])),      np.log(1. + np.exp(-correct_scores[1])) ] )

correct_ll          =      sum( [ correct_first_term[0]-correct_second_term[0], correct_first_term[1]-correct_second_term[1] ] ) 

print('The following outputs must match ')
print('------------------------------------------------')
print('correct_log_likelihood           =', correct_ll)
print('output of compute_log_likelihood =', compute_log_likelihood(dummy_feature_matrix, dummy_sentiment, dummy_coefficients))

The following outputs must match 
------------------------------------------------
correct_log_likelihood           = -5.331411615436032
output of compute_log_likelihood = -5.331411615436032


## Taking gradient steps

Now we are ready to implement our own logistic regression. All we have to do is to write a gradient ascent function that takes gradient steps towards the optimum. 

Complete the following function to solve the logistic regression model using gradient ascent:

Write a function logistic_regression to fit a logistic regression model using gradient ascent.

The function accepts the following parameters:

* **feature_matrix**: 2D array of features
* **sentiment**: 1D array of class labels
* **initial_coefficients**: 1D array containing initial values of coefficients
* **step_size**: a parameter controlling the size of the gradient steps
* **max_iter**: number of iterations to run gradient ascent

The function returns the last set of coefficients after performing gradient ascent.

The function carries out the following steps:

1. Initialize vector coefficients to **initial_coefficients**.
2. Predict the class probability **$P(y=+1|x,w)$** using your predict_probability function and save it to variable **predictions**.
3. Compute indicator value for **$(yi=+1)$** by comparing **sentiment** against +1. Save it to variable **indicator**.
4. Compute the errors as difference between **indicator** and **predictions**. Save the errors to variable **errors**.
5. For each j-th coefficient, compute the per-coefficient derivative by calling **feature_derivative** with the j-th column of **feature_matrix**. Then increment the j-th coefficient by (step_size*derivative).
6. Once in a while, insert code to print out the log likelihood.
7. Repeat steps 2-6 for **max_iter times**.

Now, let us run the logistic regression solver with the parameters below:

* **feature_matrix** = feature_matrix extracted using get_numpy_data (stored by the same name)
* **sentiment** = sentiment extracted using get_numpy_data (stored by the same name)
* **initial_coefficients** = a 194-dimensional vector filled with zeros
* **step_size** = 1e-7
* **max_iter** = 301

Save the returned coefficients to variable **coefficients**.

In [18]:
def logistic_regression(feature_matrix, sentiment, initial_coefficients, step_size, max_iter):
    coefficients = np.array(initial_coefficients) # make sure it's a numpy array
    for itr in range(max_iter):
        # Predict P(y_i = +1|x_1,w) using your predict_probability() function
        predictions = predict_probability(feature_matrix, coefficients)

        # Compute indicator value for (y_i = +1)
        indicator = (sentiment==+1)

        # Compute the errors as indicator - predictions
        errors = indicator - predictions

        for j in range(len(coefficients)): # loop over each coefficient
            # Recall that feature_matrix[:,j] is the feature column associated with coefficients[j]
            # compute the derivative for coefficients[j]. Save it in a variable called derivative
            # YOUR CODE HERE
            derivative = feature_derivative(errors, feature_matrix[:,j])

            # add the step size times the derivative to the current coefficient
            coefficients[j] += step_size * derivative

        # Checking whether log likelihood is increasing
        if itr <= 15 or (itr <= 100 and itr % 10 == 0) or (itr <= 1000 and itr % 100 == 0) \
        or (itr <= 10000 and itr % 1000 == 0) or itr % 10000 == 0:
            lp = compute_log_likelihood(feature_matrix, sentiment, coefficients)
            print('iteration %*d: log likelihood of observed labels = %.8f' % \
                (int(np.ceil(np.log10(max_iter))), itr, lp))
            
    return coefficients

In [19]:
coefficients = logistic_regression(feature_matrix, sentiment, initial_coefficients=np.zeros(194),
                                  step_size=1e-7, max_iter=301)

iteration   0: log likelihood of observed labels = -36780.91768478
iteration   1: log likelihood of observed labels = -36775.13434712
iteration   2: log likelihood of observed labels = -36769.35713564
iteration   3: log likelihood of observed labels = -36763.58603240
iteration   4: log likelihood of observed labels = -36757.82101962
iteration   5: log likelihood of observed labels = -36752.06207964
iteration   6: log likelihood of observed labels = -36746.30919497
iteration   7: log likelihood of observed labels = -36740.56234821
iteration   8: log likelihood of observed labels = -36734.82152213
iteration   9: log likelihood of observed labels = -36729.08669961
iteration  10: log likelihood of observed labels = -36723.35786366
iteration  11: log likelihood of observed labels = -36717.63499744
iteration  12: log likelihood of observed labels = -36711.91808422
iteration  13: log likelihood of observed labels = -36706.20710739
iteration  14: log likelihood of observed labels = -36700.5020

<font color='steelblue'><b> Quiz : As each iteration of gradient ascent passes, does the log likelihood increase or decrease? </b></font>

<font color='mediumvioletred'><b> Answer : As each iteration of gradient ascent passes, the log likelihood increases </b></font>

## Predicting sentiments

Recall from lecture that class predictions for a data point $\mathbf{x}$ can be computed from the coefficients $\mathbf{w}$ using the following formula:
$$
\hat{y}_i = 
\left\{
\begin{array}{ll}
      +1 & \mathbf{x}_i^T\mathbf{w} > 0 \\
      -1 & \mathbf{x}_i^T\mathbf{w} \leq 0 \\
\end{array} 
\right.
$$

Now, we will write some code to compute class predictions. We will do this in two steps:

* First compute the **scores** using **feature_matrix** and **coefficients** using a dot product.
* Using the formula above, compute the class predictions from the scores.

In [20]:
scores = np.dot(feature_matrix, coefficients)
predicted_sentiment = np.where(scores >=0, 1, -1)
print('Scores : ', scores, '\n\nPredicted Sentiment : ', predicted_sentiment, '\n')

# The code below is to answer the following quiz question
positive_reviews = sum(predicted_sentiment == 1)
print(positive_reviews, 'reviews were predicted to have positive sentiment')

Scores :  [ 0.05104571 -0.02936473  0.02411584 ... -0.40986295  0.01411436
 -0.06755923] 

Predicted Sentiment :  [ 1 -1  1 ... -1  1 -1] 

25126 reviews were predicted to have positive sentiment


<font color='steelblue'><b> Quiz : How many reviews were predicted to have positive sentiment? </b></font>

<font color='mediumvioletred'><b> Answer : {{positive_reviews}} reviews were predicted to have positive sentiment </b></font>

## Measuring accuracy

We will now measure the classification accuracy of the model. Recall from the lecture that the classification accuracy can be computed as follows:

$$
\mbox{accuracy} = \frac{\mbox{# correctly classified data points}}{\mbox{# total data points}}
$$

In [21]:
accuracy_log_reg = round(sum(predicted_sentiment == sentiment) / len(sentiment),2)

num_mistakes = sum(predicted_sentiment != sentiment)

print('# Reviews correctly classified =', len(products) - num_mistakes)
print('# Reviews incorrectly classified =', num_mistakes)
print('# Reviews total                  =', len(products))

print('\nAccuracy of the model : ', accuracy_log_reg)

# Reviews correctly classified = 39903
# Reviews incorrectly classified = 13169
# Reviews total                  = 53072

Accuracy of the model :  0.75


<font color='steelblue'><b> Quiz : What is the accuracy of the model on predictions made above? (round to 2 digits of accuracy) </b></font>

<font color='mediumvioletred'><b> Answer : What is the accuracy of the model is : {{accuracy_log_reg}} </b></font>

## Which words contribute most to positive & negative sentiments?

Recall that in earlier assignment, we were able to compute the "**most positive words**". These are words that correspond most strongly with positive reviews. In order to do this, we will first do the following:

* Treat each coefficient as a tuple, i.e. (**word**, **coefficient_value**).
* Sort all the (**word**, **coefficient_value**) tuples by **coefficient_value** in descending order.

In [22]:
coefficients = list(coefficients[1:]) # exclude intercept
word_coefficient_tuples = [(word, coefficient) for word, coefficient in zip(important_words, coefficients)]
word_coefficient_tuples = sorted(word_coefficient_tuples, key=lambda x:x[1], reverse=True)

Now, **word_coefficient_tuples** contains a sorted list of (**word**, **coefficient_value**) tuples. The first 10 elements in this list correspond to the words that are most positive.

### Ten "most positive" words

Compute the 10 words that have the most positive coefficient values. These words are associated with positive sentiment.

In [23]:
ten_most_pos_words = [word_coefficient_tuples[i][0] for i in range(10)]
print('The 10 words with most positive coefficients are : ', ten_most_pos_words)

# The code below is to answer the followin quiz question
pos_answers = ['love', 'easy', 'great', 'perfect', 'cheap']
not_pos_ans = np.setdiff1d(pos_answers,ten_most_pos_words)
not_pos_ans = ", ".join(str(e) for e in not_pos_ans)

print('\nThe word', not_pos_ans, 'is not present in the top "most positive words"')

The 10 words with most positive coefficients are :  ['great', 'love', 'easy', 'little', 'loves', 'well', 'perfect', 'old', 'nice', 'daughter']

The word cheap is not present in the top "most positive words"


<font color='steelblue'><b> Quiz : Which word is <u><i>not</i></u> present in the top 10 "most positive" words? </b></font>
<font color='slategray'><b>
- love
- easy
- great
- perfect
- cheap  </b></font>

<font color='mediumvioletred'><b> Answer : The word <i><u>{{not_pos_ans}}</u></i> is not present in the top "most positive words </b></font>

In [24]:
neg_words = word_coefficient_tuples[::-1]
ten_most_neg_words = [neg_words[i][0] for i in range(10)]
print('The 10 words with most negative coefficients are : ', ten_most_neg_words)

# The code below is to answer the followin quiz question
neg_answers = ['need', 'work', 'disappointed', 'even', 'return']
not_neg_ans = np.setdiff1d(neg_answers,ten_most_neg_words)
not_neg_ans = ", ".join(str(e) for e in not_neg_ans)

print('\nThe word', not_neg_ans, 'is not present in the top "most negative words"')

The 10 words with most negative coefficients are :  ['would', 'product', 'money', 'work', 'even', 'disappointed', 'get', 'back', 'return', 'monitor']

The word need is not present in the top "most negative words"


<font color='steelblue'><b> Quiz : Which word is <u><i>not</i></u> present in the top 10 "most positive" words? </b></font>
<font color='slategray'><b>
- need
- work
- disappointed
- even
- return  </b></font>

<font color='mediumvioletred'><b> Answer : The word <i><u>{{not_neg_ans}}</u></i> is not present in the top "most negative words </b></font>