Logistic Regression with L2 regularization
The goal of this second notebook is to implement your own logistic regression classifier with L2 regularization. You will do the following:

Extract features from Amazon product reviews.
Convert an SFrame into a NumPy array.
Write a function to compute the derivative of log likelihood function with an L2 penalty with respect to a single coefficient.
Implement gradient ascent with an L2 penalty.
Empirically explore how the L2 penalty can ameliorate overfitting.

In [1]:
import pandas as pd
import numpy as np


Load and process review dataset
For this assignment, we will use the same subset of the Amazon product review dataset that we used in Module 3 assignment. The subset was chosen to contain similar numbers of positive and negative reviews, as the original dataset consisted of mostly positive reviews.

In [2]:
products=pd.read_csv('amazon_baby_subset.csv')
products.head()

Unnamed: 0,name,review,rating,sentiment
0,Stop Pacifier Sucking without tears with Thumb...,All of my kids have cried non-stop when I trie...,5,1
1,Nature's Lullabies Second Year Sticker Calendar,We wanted to get something to keep track of ou...,5,1
2,Nature's Lullabies Second Year Sticker Calendar,My daughter had her 1st baby over a year ago. ...,5,1
3,"Lamaze Peekaboo, I Love You","One of baby's first and favorite books, and it...",4,1
4,SoftPlay Peek-A-Boo Where's Elmo A Children's ...,Very cute interactive book! My son loves this ...,5,1


Just like we did previously, we will work with a hand-curated list of important words extracted from the review data. We will also perform 2 simple data transformations:

Remove punctuation using Python's built-in string functionality.
Compute word counts (only for the important_words)
Refer to Module 3 assignment for more details.

In [3]:
import json
with open('important_words.json', 'r') as f: # Reads the list of most frequent words
    important_words = json.load(f)
important_words = [str(s) for s in important_words]

In [4]:
products = products.fillna({'review':' '})

In [5]:
def remove_punctuation(text):
    import string
    return text.translate( string.punctuation) 

products['review_clean'] = products['review'].apply(remove_punctuation)

In [6]:
for word in important_words:
    products[word] = products['review_clean'].apply(lambda s : s.split().count(word))

Train-Validation split
We split the data into a train-validation split with 80% of the data in the training set and 20% of the data in the validation set. We use seed=2 so that everyone gets the same result.

Note: In previous assignments, we have called this a train-test split. However, the portion of data that we don't train on will be used to help select model parameters. Thus, this portion of data should be called a validation set. Recall that examining performance of various potential models (i.e. models with different parameters) should be on a validation set, while evaluation of selected model should always be on a test set.

In [7]:
with open('module-4-assignment-train-idx.json', 'r') as f: 
    train_data_index = json.load(f)

In [8]:
len(train_data_index)

42361

In [9]:
np.transpose(train_data_index)

array([    0,     1,     3, ..., 53067, 53069, 53070])

In [10]:
type(train_data_index)

list

In [12]:
#train_data_index[-1]
#arr=[0,1,3,2]
#for i in  arr:
#    x=products.iloc[arr]
    

#x=products.iloc[]
#x

In [13]:

train_data=products.iloc[train_data_index]
    


In [14]:

train_data

Unnamed: 0,name,review,rating,sentiment,review_clean,baby,one,great,love,use,...,seems,picture,completely,wish,buying,babies,won,tub,almost,either
0,Stop Pacifier Sucking without tears with Thumb...,All of my kids have cried non-stop when I trie...,5,1,All of my kids have cried non-stop when I trie...,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Nature's Lullabies Second Year Sticker Calendar,We wanted to get something to keep track of ou...,5,1,We wanted to get something to keep track of ou...,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Lamaze Peekaboo, I Love You","One of baby's first and favorite books, and it...",4,1,"One of baby's first and favorite books, and it...",0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,SoftPlay Peek-A-Boo Where's Elmo A Children's ...,Very cute interactive book! My son loves this ...,5,1,Very cute interactive book! My son loves this ...,0,0,1,0,0,...,0,0,0,0,0,1,0,0,0,0
5,Our Baby Girl Memory Book,"Beautiful book, I love it to record cherished ...",5,1,"Beautiful book, I love it to record cherished ...",0,0,1,1,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
53065,Summer Infant Pop 'n Play Portable Playard,Good idea but too dangerous. I really wanted t...,2,-1,Good idea but too dangerous. I really wanted t...,1,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
53066,Freeens Cool Seat Liner Breathing with 3d Mesh...,It doesn't stay input. My daughter was sliding...,1,-1,It doesn't stay input. My daughter was sliding...,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
53067,"Samsung Baby Care Washer, Stainless Platinum, ...","My infant goes to a really crappy daycare, and...",1,-1,"My infant goes to a really crappy daycare, and...",1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
53069,Best BIB for Baby - Soft Bib (Pink-Elephant),Great 5-Star Product but An Obvious knock-off ...,1,-1,Great 5-Star Product but An Obvious knock-off ...,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [15]:
with open('module-4-assignment-validation-idx.json', 'r') as f: 
    validation_data_index = json.load(f)

In [16]:
len(validation_data_index)

10711

In [18]:
np.transpose(validation_data_index)

array([    2,     9,    23, ..., 53063, 53068, 53071])

In [19]:
validation_data=products.iloc[validation_data_index]

In [20]:
validation_data

Unnamed: 0,name,review,rating,sentiment,review_clean,baby,one,great,love,use,...,seems,picture,completely,wish,buying,babies,won,tub,almost,either
2,Nature's Lullabies Second Year Sticker Calendar,My daughter had her 1st baby over a year ago. ...,5,1,My daughter had her 1st baby over a year ago. ...,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,Cloth Diaper Pins Stainless Steel Traditional ...,It has been many years since we needed diaper ...,5,1,It has been many years since we needed diaper ...,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
23,Fisher Price Nesting Action Vehicles,For well over a year my son has enjoyed stacki...,5,1,For well over a year my son has enjoyed stacki...,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
26,Sassy Who Loves Baby? Photo Album Book with te...,I bought this for a new granddaughter. I will...,5,1,I bought this for a new granddaughter. I will...,0,0,2,0,0,...,0,0,0,0,0,0,0,0,0,0
27,Earlyears: Earl E. Bird with Teething Rings,We received an Earl E. Bird as a gift when we ...,5,1,We received an Earl E. Bird as a gift when we ...,3,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
53053,Merry Muscles Ergonomic Jumper Exerciser Baby ...,"once in this thing, my 2mo. son loves this... ...",2,-1,"once in this thing, my 2mo. son loves this... ...",0,2,0,0,0,...,0,0,0,0,0,0,0,0,0,0
53059,K&amp;C Baby Bath Seat Support Sling Shower Me...,Absolute rip off!!! Not impressed at all this ...,1,-1,Absolute rip off!!! Not impressed at all this ...,0,0,0,0,0,...,0,0,0,0,0,0,0,3,0,0
53063,Umai Authentic Hazelwood and CHERRY RAW (Unpol...,Made no difference :/,1,-1,Made no difference :/,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
53068,"Mud Pie Milestone Stickers, Boy",Pretty please open and inspect these stickers ...,1,-1,Pretty please open and inspect these stickers ...,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0



Convert DataFrame to NumPy array
Just like in the second assignment of the previous module, we provide you with a function that extracts columns from an SFrame and converts them into a NumPy array. Two arrays are returned: one representing features and another representing class labels.

Note: The feature matrix includes an additional column 'intercept' filled with 1's to take account of the intercept term.

In [25]:
def get_numpy_data(dataframe, features, label):
    dataframe['intercept'] = 1
    features = ['intercept'] + features
    featuresframe = dataframe[features]
    feature_matrix = featuresframe.to_numpy()
    labelarray = dataframe[label]
    label_array = labelarray.to_numpy()
    return(feature_matrix, label_array)

In [26]:
feature_matrix_train, sentiment_train = get_numpy_data(train_data, important_words, 'sentiment')
feature_matrix_valid, sentiment_valid = get_numpy_data(validation_data, important_words, 'sentiment')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  



Building on logistic regression with no L2 penalty assignment
Let us now build on Module 3 assignment. Recall from lecture that the link function for logistic regression can be defined as:

$$
P(y_i = +1 | \mathbf{x}_i,\mathbf{w}) = \frac{1}{1 + \exp(-\mathbf{w}^T h(\mathbf{x}_i))},
$$
where the feature vector $h(\mathbf{x}_i)$ is given by the word counts of important_words in the review $\mathbf{x}_i$.

We will use the same code as in this past assignment to make probability predictions since this part is not affected by the L2 penalty. (Only the way in which the coefficients are learned is affected by the addition of a regularization term.)

In [27]:
def predict_probability(feature_matrix, coefficients):
    # Take dot product of feature_matrix and coefficients 
    
    # YOUR CODE HERE
    ...
    scores = np.dot(feature_matrix, coefficients)
    
    # Compute P(y_i = +1 | x_i, w) using the link function
    # YOUR CODE HERE
    predictions = 1. / (1 + np.exp(-scores))
    
    
    # return predictions
    return predictions


Adding L2 penalty
Let us now work on extending logistic regression with L2 regularization. As discussed in the lectures, the L2 regularization is particularly useful in preventing overfitting. In this assignment, we will explore L2 regularization in detail.

Recall from lecture and the previous assignment that for logistic regression without an L2 penalty, the derivative of the log likelihood function is:$$
\frac{\partial\ell}{\partial w_j} = \sum_{i=1}^N h_j(\mathbf{x}_i)\left(\mathbf{1}[y_i = +1] - P(y_i = +1 | \mathbf{x}_i, \mathbf{w})\right)
$$

Adding L2 penalty to the derivative

It takes only a small modification to add a L2 penalty. All terms indicated in red refer to terms that were added due to an L2 penalty.

Recall from the lecture that the link function is still the sigmoid:$$
P(y_i = +1 | \mathbf{x}_i,\mathbf{w}) = \frac{1}{1 + \exp(-\mathbf{w}^T h(\mathbf{x}_i))},
$$
We add the L2 penalty term to the per-coefficient derivative of log likelihood:$$
\frac{\partial\ell}{\partial w_j} = \sum_{i=1}^N h_j(\mathbf{x}_i)\left(\mathbf{1}[y_i = +1] - P(y_i = +1 | \mathbf{x}_i, \mathbf{w})\right) \color{red}{-2\lambda w_j }
$$
The per-coefficient derivative for logistic regression with an L2 penalty is as follows:$$
\frac{\partial\ell}{\partial w_j} = \sum_{i=1}^N h_j(\mathbf{x}_i)\left(\mathbf{1}[y_i = +1] - P(y_i = +1 | \mathbf{x}_i, \mathbf{w})\right) \color{red}{-2\lambda w_j }
$$and for the intercept term, we have$$
\frac{\partial\ell}{\partial w_0} = \sum_{i=1}^N h_0(\mathbf{x}_i)\left(\mathbf{1}[y_i = +1] - P(y_i = +1 | \mathbf{x}_i, \mathbf{w})\right)
$$

Note: As we did in the Regression course, we do not apply the L2 penalty on the intercept. A large intercept does not necessarily indicate overfitting because the intercept is not associated with any particular feature.

Write a function that computes the derivative of log likelihood with respect to a single coefficient $w_j$. Unlike its counterpart in the last assignment, the function accepts five arguments:

errors vector containing $(\mathbf{1}[y_i = +1] - P(y_i = +1 | \mathbf{x}_i, \mathbf{w}))$ for all $i$
feature vector containing $h_j(\mathbf{x}_i)$ for all $i$
coefficient containing the current value of coefficient $w_j$.
l2_penalty representing the L2 penalty constant $\lambda$
feature_is_constant telling whether the $j$-th feature is constant or not.

In [28]:
def feature_derivative_without_L2(errors, feature):     
    # Compute the dot product of errors and feature
    derivative = np.dot(errors, feature)
    
    # Return the derivative
    return derivative

In [29]:
def feature_derivative_with_L2(errors, feature, coefficient, l2_penalty, feature_is_constant): 
    
    # Compute the dot product of errors and feature
    ## YOUR CODE HERE
    derivative = np.dot(errors, feature)

    # add L2 penalty term for any feature that isn't the intercept.
    if not feature_is_constant: 
        ## YOUR CODE HERE
        derivative -= (2 * l2_penalty * coefficient)
        
    return derivative

In [30]:
def compute_log_likelihood_without_L2(feature_matrix, sentiment, coefficients):
    indicator = (sentiment==+1)
    scores = np.dot(feature_matrix, coefficients)
    lp = np.sum((indicator-1)*scores - np.log(1. + np.exp(-scores)))
    return lp


To verify the correctness of the gradient ascent algorithm, we provide a function for computing log likelihood (which we recall from the last assignment was a topic detailed in an advanced optional video, and used here for its numerical stability).

$$\ell\ell(\mathbf{w}) = \sum_{i=1}^N \Big( (\mathbf{1}[y_i = +1] - 1)\mathbf{w}^T h(\mathbf{x}_i) - \ln\left(1 + \exp(-\mathbf{w}^T h(\mathbf{x}_i))\right) \Big) \color{red}{-\lambda\|\mathbf{w}\|_2^2} $$

In [31]:
def compute_log_likelihood_with_L2(feature_matrix, sentiment, coefficients, l2_penalty):
    indicator = (sentiment==+1)
    scores = np.dot(feature_matrix, coefficients)
    
    lp = np.sum((indicator-1)*scores - np.log(1. + np.exp(-scores))) - l2_penalty*np.sum(coefficients[1:]**2)
    
    return lp

In [39]:
def logistic_regression_with_L2(feature_matrix, sentiment, initial_coefficients, step_size, l2_penalty, max_iter):
    coefficients = np.array(initial_coefficients) # make sure it's a numpy array
    for itr in range(max_iter):
        # Predict P(y_i = +1|x_i,w) using your predict_probability() function
        ## YOUR CODE HERE
        predictions = predict_probability(feature_matrix, coefficients)
        
        # Compute indicator value for (y_i = +1)
        indicator = (sentiment==+1)
        
        # Compute the errors as indicator - predictions
        errors = indicator - predictions
        for j in range(len(coefficients)): # loop over each coefficient
            is_intercept = (j == 0)
            # Recall that feature_matrix[:,j] is the feature column associated with coefficients[j].
            # Compute the derivative for coefficients[j]. Save it in a variable called derivative
            ## YOUR CODE HERE
            derivative = feature_derivative_with_L2(errors, feature_matrix[:,j], coefficients[j], l2_penalty, is_intercept)
            
            # add the step size times the derivative to the current coefficient
            ## YOUR CODE HERE
            coefficients[j] += step_size * derivative
        
        # Checking whether log likelihood is increasing
        if itr <= 15 or (itr <= 100 and itr % 10 == 0) or (itr <= 1000 and itr % 100 == 0) \
        or (itr <= 10000 and itr % 1000 == 0) or itr % 10000 == 0:
            lp = compute_log_likelihood_with_L2(feature_matrix, sentiment, coefficients, l2_penalty)
            print ('iteration %*d: log likelihood of observed labels = %.8f' % \
                (int(np.ceil(np.log10(max_iter))), itr, lp))
    return coefficients


Explore effects of L2 regularization
Now that we have written up all the pieces needed for regularized logistic regression, let's explore the benefits of using L2 regularization in analyzing sentiment for product reviews. As iterations pass, the log likelihood should increase.

Below, we train models with increasing amounts of regularization, starting with no L2 penalty, which is equivalent to our previous logistic regression implementation.

In [40]:
#run with L2=0
coefficients_0_penalty = logistic_regression_with_L2(feature_matrix_train, sentiment_train,
                                                     initial_coefficients=np.zeros(194),
                                                     step_size=5e-6, l2_penalty=0, max_iter=501)

iteration   0: log likelihood of observed labels = -29221.21883537
iteration   1: log likelihood of observed labels = -29084.95004081
iteration   2: log likelihood of observed labels = -28953.16751027
iteration   3: log likelihood of observed labels = -28825.53328562
iteration   4: log likelihood of observed labels = -28701.77555327
iteration   5: log likelihood of observed labels = -28581.66893780
iteration   6: log likelihood of observed labels = -28465.02128437
iteration   7: log likelihood of observed labels = -28351.66467192
iteration   8: log likelihood of observed labels = -28241.44923653
iteration   9: log likelihood of observed labels = -28134.23889767
iteration  10: log likelihood of observed labels = -28029.90838872
iteration  11: log likelihood of observed labels = -27928.34118478
iteration  12: log likelihood of observed labels = -27829.42804628
iteration  13: log likelihood of observed labels = -27733.06598310
iteration  14: log likelihood of observed labels = -27639.1575

In [41]:
# run with L2 = 4
coefficients_4_penalty = logistic_regression_with_L2(feature_matrix_train, sentiment_train,
                                                      initial_coefficients=np.zeros(194),
                                                      step_size=5e-6, l2_penalty=4, max_iter=501)

iteration   0: log likelihood of observed labels = -29221.22168459
iteration   1: log likelihood of observed labels = -29084.96672750
iteration   2: log likelihood of observed labels = -28953.20827762
iteration   3: log likelihood of observed labels = -28825.60772339
iteration   4: log likelihood of observed labels = -28701.89266497
iteration   5: log likelihood of observed labels = -28581.83719214
iteration   6: log likelihood of observed labels = -28465.24865646
iteration   7: log likelihood of observed labels = -28351.95867784
iteration   8: log likelihood of observed labels = -28241.81696333
iteration   9: log likelihood of observed labels = -28134.68703034
iteration  10: log likelihood of observed labels = -28030.44323484
iteration  11: log likelihood of observed labels = -27928.96869736
iteration  12: log likelihood of observed labels = -27830.15384515
iteration  13: log likelihood of observed labels = -27733.89537494
iteration  14: log likelihood of observed labels = -27640.0955

In [42]:
# run with L2 = 10
coefficients_10_penalty = logistic_regression_with_L2(feature_matrix_train, sentiment_train,
                                                      initial_coefficients=np.zeros(194),
                                                      step_size=5e-6, l2_penalty=10, max_iter=501)

iteration   0: log likelihood of observed labels = -29221.22595841
iteration   1: log likelihood of observed labels = -29084.99175584
iteration   2: log likelihood of observed labels = -28953.26942043
iteration   3: log likelihood of observed labels = -28825.71935764
iteration   4: log likelihood of observed labels = -28702.06828566
iteration   5: log likelihood of observed labels = -28582.08948969
iteration   6: log likelihood of observed labels = -28465.58957882
iteration   7: log likelihood of observed labels = -28352.39948240
iteration   8: log likelihood of observed labels = -28242.36826219
iteration   9: log likelihood of observed labels = -28135.35883084
iteration  10: log likelihood of observed labels = -28031.24497671
iteration  11: log likelihood of observed labels = -27929.90928717
iteration  12: log likelihood of observed labels = -27831.24168845
iteration  13: log likelihood of observed labels = -27735.13840639
iteration  14: log likelihood of observed labels = -27641.5012

In [43]:
# run with L2 = 1e2
coefficients_1e2_penalty = logistic_regression_with_L2(feature_matrix_train, sentiment_train,
                                                       initial_coefficients=np.zeros(194),
                                                       step_size=5e-6, l2_penalty=1e2, max_iter=501)

iteration   0: log likelihood of observed labels = -29221.29006569
iteration   1: log likelihood of observed labels = -29085.36693825
iteration   2: log likelihood of observed labels = -28954.18538135
iteration   3: log likelihood of observed labels = -28827.39064810
iteration   4: log likelihood of observed labels = -28704.69585791
iteration   5: log likelihood of observed labels = -28585.86189203
iteration   6: log likelihood of observed labels = -28470.68391666
iteration   7: log likelihood of observed labels = -28358.98222530
iteration   8: log likelihood of observed labels = -28250.59594738
iteration   9: log likelihood of observed labels = -28145.37869457
iteration  10: log likelihood of observed labels = -28043.19553225
iteration  11: log likelihood of observed labels = -27943.92086053
iteration  12: log likelihood of observed labels = -27847.43691827
iteration  13: log likelihood of observed labels = -27753.63271190
iteration  14: log likelihood of observed labels = -27662.4032

In [44]:
# run with L2 = 1e3
coefficients_1e3_penalty = logistic_regression_with_L2(feature_matrix_train, sentiment_train,
                                                       initial_coefficients=np.zeros(194),
                                                       step_size=5e-6, l2_penalty=1e3, max_iter=501)

iteration   0: log likelihood of observed labels = -29221.93113859
iteration   1: log likelihood of observed labels = -29089.09379416
iteration   2: log likelihood of observed labels = -28963.22409126
iteration   3: log likelihood of observed labels = -28843.77529768
iteration   4: log likelihood of observed labels = -28730.28878055
iteration   5: log likelihood of observed labels = -28622.36971946
iteration   6: log likelihood of observed labels = -28519.67084188
iteration   7: log likelihood of observed labels = -28421.88135919
iteration   8: log likelihood of observed labels = -28328.71932563
iteration   9: log likelihood of observed labels = -28239.92628207
iteration  10: log likelihood of observed labels = -28155.26343892
iteration  11: log likelihood of observed labels = -28074.50890139
iteration  12: log likelihood of observed labels = -27997.45560234
iteration  13: log likelihood of observed labels = -27923.90971787
iteration  14: log likelihood of observed labels = -27853.6894

In [45]:
# run with L2 = 1e5
coefficients_1e5_penalty = logistic_regression_with_L2(feature_matrix_train, sentiment_train,
                                                       initial_coefficients=np.zeros(194),
                                                       step_size=5e-6, l2_penalty=1e5, max_iter=501)

iteration   0: log likelihood of observed labels = -29292.44915662
iteration   1: log likelihood of observed labels = -29292.34327181
iteration   2: log likelihood of observed labels = -29292.29816820
iteration   3: log likelihood of observed labels = -29292.25832068
iteration   4: log likelihood of observed labels = -29292.22233186
iteration   5: log likelihood of observed labels = -29292.18980711
iteration   6: log likelihood of observed labels = -29292.16041245
iteration   7: log likelihood of observed labels = -29292.13384662
iteration   8: log likelihood of observed labels = -29292.10983739
iteration   9: log likelihood of observed labels = -29292.08813873
iteration  10: log likelihood of observed labels = -29292.06852827
iteration  11: log likelihood of observed labels = -29292.05080505
iteration  12: log likelihood of observed labels = -29292.03478746
iteration  13: log likelihood of observed labels = -29292.02031133
iteration  14: log likelihood of observed labels = -29292.0072

Compare coefficients
We now compare the coefficients for each of the models that were trained above. We will create a table of features and learned coefficients associated with each of the different L2 penalty values.

Below is a simple helper function that will help us create this table.

In [46]:
table = pd.DataFrame({'word': ['(intercept)'] + important_words})
def add_coefficients_to_table(coefficients, column_name):
    table[column_name] = coefficients
    return table

In [47]:
add_coefficients_to_table(coefficients_0_penalty, 'coefficients [L2=0]')
add_coefficients_to_table(coefficients_4_penalty, 'coefficients [L2=4]')
add_coefficients_to_table(coefficients_10_penalty, 'coefficients [L2=10]')
add_coefficients_to_table(coefficients_1e2_penalty, 'coefficients [L2=1e2]')
add_coefficients_to_table(coefficients_1e3_penalty, 'coefficients [L2=1e3]')
add_coefficients_to_table(coefficients_1e5_penalty, 'coefficients [L2=1e5]')

Unnamed: 0,word,coefficients [L2=0],coefficients [L2=4],coefficients [L2=10],coefficients [L2=1e2],coefficients [L2=1e3],coefficients [L2=1e5]
0,(intercept),-0.070184,-0.069563,-0.068642,-0.056228,-0.002102,0.010482
1,baby,0.091208,0.091004,0.090704,0.086810,0.064944,0.001669
2,one,0.016803,0.016578,0.016247,0.012001,-0.002518,-0.001229
3,great,0.757890,0.753036,0.745885,0.654686,0.324473,0.006737
4,love,1.093872,1.086042,1.074515,0.928547,0.430865,0.008887
...,...,...,...,...,...,...,...
189,babies,0.007430,0.007450,0.007479,0.007913,0.007674,0.000150
190,won,0.006615,0.006549,0.006450,0.005202,0.001490,0.000017
191,tub,-0.172717,-0.171192,-0.168945,-0.140352,-0.048443,-0.000689
192,almost,-0.026058,-0.025806,-0.025435,-0.020669,-0.005412,-0.000115


Using the coefficients trained with L2 penalty 0, find the 5 most positive words (with largest positive coefficients). Save them to positive_words. Similarly, find the 5 most negative words (with largest negative coefficients) and save them to negative_words.

In [50]:
table[['word','coefficients [L2=0]']].sort_values('coefficients [L2=0]', ascending = False)[0:5]

Unnamed: 0,word,coefficients [L2=0]
4,love,1.093872
23,loves,1.082033
8,easy,1.008062
3,great,0.75789
34,perfect,0.728262


In [52]:
positive_words = table.sort_values('coefficients [L2=0]', ascending = False)[0:5]['word']
print (positive_words)

4        love
23      loves
8        easy
3       great
34    perfect
Name: word, dtype: object


In [53]:

negative_words = table.sort_values('coefficients [L2=0]', ascending = True)[0:5]['word']
print( negative_words)

113           waste
106    disappointed
114          return
97            money
169        returned
Name: word, dtype: object



Measuring accuracy
Now, let us compute the accuracy of the classifier model. Recall that the accuracy is given by

$$
\mbox{accuracy} = \frac{\mbox{# correctly classified data points}}{\mbox{# total data points}}
$$
Recall from lecture that that the class prediction is calculated using$$
\hat{y}_i = 
\left\{
\begin{array}{ll}
      +1 &amp; h(\mathbf{x}_i)^T\mathbf{w} &gt; 0 \\
      -1 &amp; h(\mathbf{x}_i)^T\mathbf{w} \leq 0 \\
\end{array} 
\right.
$$

Note: It is important to know that the model prediction code doesn't change even with the addition of an L2 penalty. The only thing that changes is the estimated coefficients used in this prediction.

Based on the above, we will use the same code that was used in Module 3 assignment.

In [77]:
def get_classification_accuracy(feature_matrix, sentiment, coefficients):
    scores = np.dot(feature_matrix, coefficients)
    apply_threshold = np.vectorize(lambda x: 1. if x > 0  else -1.)
    predictions = apply_threshold(scores)
    
    num_correct = (predictions == sentiment).sum()
    accuracy = num_correct / len(feature_matrix)    
    return accuracy

In [78]:
train_accuracy = {}
train_accuracy[0]   = get_classification_accuracy(feature_matrix_train, sentiment_train, coefficients_0_penalty)
train_accuracy[4]   = get_classification_accuracy(feature_matrix_train, sentiment_train, coefficients_4_penalty)
train_accuracy[10]  = get_classification_accuracy(feature_matrix_train, sentiment_train, coefficients_10_penalty)
train_accuracy[1e2] = get_classification_accuracy(feature_matrix_train, sentiment_train, coefficients_1e2_penalty)
train_accuracy[1e3] = get_classification_accuracy(feature_matrix_train, sentiment_train, coefficients_1e3_penalty)
train_accuracy[1e5] = get_classification_accuracy(feature_matrix_train, sentiment_train, coefficients_1e5_penalty)

validation_accuracy = {}
validation_accuracy[0]   = get_classification_accuracy(feature_matrix_valid, sentiment_valid, coefficients_0_penalty)
validation_accuracy[4]   = get_classification_accuracy(feature_matrix_valid, sentiment_valid, coefficients_4_penalty)
validation_accuracy[10]  = get_classification_accuracy(feature_matrix_valid, sentiment_valid, coefficients_10_penalty)
validation_accuracy[1e2] = get_classification_accuracy(feature_matrix_valid, sentiment_valid, coefficients_1e2_penalty)
validation_accuracy[1e3] = get_classification_accuracy(feature_matrix_valid, sentiment_valid, coefficients_1e3_penalty)
validation_accuracy[1e5] = get_classification_accuracy(feature_matrix_valid, sentiment_valid, coefficients_1e5_penalty)

In [79]:
# Build a simple report
for key in sorted(validation_accuracy.keys()):
    print ("L2 penalty = %g" % key)
    print ("train accuracy = %s, validation_accuracy = %s" % (train_accuracy[key], validation_accuracy[key]))
    print ("--------------------------------------------------------------------------------")

L2 penalty = 0
train accuracy = 0.7700715280564671, validation_accuracy = 0.7664083652320045
--------------------------------------------------------------------------------
L2 penalty = 4
train accuracy = 0.7699771015792828, validation_accuracy = 0.7663150032676688
--------------------------------------------------------------------------------
L2 penalty = 10
train accuracy = 0.7698354618635065, validation_accuracy = 0.7664083652320045
--------------------------------------------------------------------------------
L2 penalty = 100
train accuracy = 0.7683246382285593, validation_accuracy = 0.7661282793389973
--------------------------------------------------------------------------------
L2 penalty = 1000
train accuracy = 0.7572295271594155, validation_accuracy = 0.7584725982634675
--------------------------------------------------------------------------------
L2 penalty = 100000
train accuracy = 0.6534784353532731, validation_accuracy = 0.644664363738213
---------------------------