# Regression Phase 8: Ridge Regression Feature Selection and LASSO 

We will use LASSO to select features, building on a pre-implemented solver for LASSO. 

* Run LASSO with different L1 penalties.
* Choose best L1 penalty using a validation set.
* Choose best L1 penalty using a validation set, with additional constraint on the size of subset.



# import Graphlab

In [64]:
import graphlab
import numpy as np
import pprint

# Load in hotels sales data

In [65]:
hotels = graphlab.SFrame('NY.csv') # Chicago.csv
hotels['price'] = hotels['price'].astype(float)
hotels['rates'] = hotels['rates'].astype(float)
hotels['zipcode'] = hotels['zipcode'].astype(float)
hotels['guests'] = hotels['guests'].astype(float)

#hotels = hotels[hotels['size'] < 1500] 
hotels = hotels[hotels['price'] > 10]
hotels

------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[str,str,int,float,float,str,str,str,str,int,int,str,int,str,str]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------


name,zone,zipcode,star,rating,rates,checkin,checkout
Courtyard New York Downtown Manhattan/World ...,Wall Street - Financial District ...,10006.0,3.0,4.3,46.0,04/21/2017,04/22/2017
Courtyard New York Downtown Manhattan/World ...,Wall Street - Financial District ...,10006.0,3.0,4.3,46.0,04/21/2017,04/22/2017
Courtyard New York Downtown Manhattan/World ...,Wall Street - Financial District ...,10006.0,3.0,4.3,46.0,04/21/2017,04/22/2017
Courtyard New York Downtown Manhattan/World ...,Wall Street - Financial District ...,10006.0,3.0,4.3,46.0,04/21/2017,04/22/2017
Courtyard New York Downtown Manhattan/World ...,Wall Street - Financial District ...,10006.0,3.0,4.3,46.0,04/21/2017,04/22/2017
Courtyard New York Downtown Manhattan/World ...,Wall Street - Financial District ...,10006.0,3.0,4.3,46.0,04/21/2017,04/22/2017
Park Lane Hotel,Central Park,10019.0,4.0,4.0,6.0,04/21/2017,04/22/2017
Park Lane Hotel,Central Park,10019.0,4.0,4.0,6.0,04/21/2017,04/22/2017
Park Lane Hotel,Central Park,10019.0,4.0,4.0,6.0,04/21/2017,04/22/2017
The Belvedere Hotel,Broadway - Times Square,10036.0,3.5,4.2,16.0,04/21/2017,04/22/2017

room,size,price,bed,guests,address
"Room, 1 King Bed",245,220.0,1 King Bed,2.0,Greenwich Street
"Room, 1 King Bed",245,289.0,1 King Bed,2.0,Greenwich Street
"Room, 1 King Bed, Harbor View ...",245,224.0,1 King Bed,2.0,Greenwich Street
"Room, 1 King Bed, Harbor View ...",245,294.0,1 King Bed,2.0,Greenwich Street
"Room, 1 King Bed, View",251,246.0,1 King Bed,2.0,Greenwich Street
"Room, 1 King Bed, View",251,319.0,1 King Bed,2.0,Greenwich Street
"Executive, One Queen Bed, City View ...",312,179.0,1 Queen Bed,2.0,Central Park S
Premier City View Queen/King ...,312,189.0,1 King Bed or 1 Queen Bed,2.0,Central Park S
"Executive, One King Bed, City View ...",331,194.0,1 King Bed,2.0,Central Park S
"Deluxe Room, 1 King Bed",300,159.0,1 King Bed,2.0,W 48th St

link
https://www.expedia.com /Wall-Street-Financial- ...
https://www.expedia.com /Wall-Street-Financial- ...
https://www.expedia.com /Wall-Street-Financial- ...
https://www.expedia.com /Wall-Street-Financial- ...
https://www.expedia.com /Wall-Street-Financial- ...
https://www.expedia.com /Wall-Street-Financial- ...
https://www.expedia.com /New-York-Hotels-Park- ...
https://www.expedia.com /New-York-Hotels-Park- ...
https://www.expedia.com /New-York-Hotels-Park- ...
https://www.expedia.com /New-York-Hotels-The- ...


# Create new features


As in phase 3, we consider features that are some transformations of inputs.

* star_squared = star * star
* rates_rating = rates * rating
* sqrt_size = sqrt(size)


In [66]:
from math import log, sqrt
hotels['size_sqrt'] = hotels['size'].apply(sqrt)
hotels['star_squared'] = hotels['star']*hotels['star']
hotels['rates_rating'] = hotels['rates']*hotels['rating']
# drop na values
# sf_filter = sf[(sf['carrier'] == 'US')]
#hotels = hotels[hotels['name'] != 'FieldHouse Jones']
hotels

name,zone,zipcode,star,rating,rates,checkin,checkout
Courtyard New York Downtown Manhattan/World ...,Wall Street - Financial District ...,10006.0,3.0,4.3,46.0,04/21/2017,04/22/2017
Courtyard New York Downtown Manhattan/World ...,Wall Street - Financial District ...,10006.0,3.0,4.3,46.0,04/21/2017,04/22/2017
Courtyard New York Downtown Manhattan/World ...,Wall Street - Financial District ...,10006.0,3.0,4.3,46.0,04/21/2017,04/22/2017
Courtyard New York Downtown Manhattan/World ...,Wall Street - Financial District ...,10006.0,3.0,4.3,46.0,04/21/2017,04/22/2017
Courtyard New York Downtown Manhattan/World ...,Wall Street - Financial District ...,10006.0,3.0,4.3,46.0,04/21/2017,04/22/2017
Courtyard New York Downtown Manhattan/World ...,Wall Street - Financial District ...,10006.0,3.0,4.3,46.0,04/21/2017,04/22/2017
Park Lane Hotel,Central Park,10019.0,4.0,4.0,6.0,04/21/2017,04/22/2017
Park Lane Hotel,Central Park,10019.0,4.0,4.0,6.0,04/21/2017,04/22/2017
Park Lane Hotel,Central Park,10019.0,4.0,4.0,6.0,04/21/2017,04/22/2017
The Belvedere Hotel,Broadway - Times Square,10036.0,3.5,4.2,16.0,04/21/2017,04/22/2017

room,size,price,bed,guests,address
"Room, 1 King Bed",245,220.0,1 King Bed,2.0,Greenwich Street
"Room, 1 King Bed",245,289.0,1 King Bed,2.0,Greenwich Street
"Room, 1 King Bed, Harbor View ...",245,224.0,1 King Bed,2.0,Greenwich Street
"Room, 1 King Bed, Harbor View ...",245,294.0,1 King Bed,2.0,Greenwich Street
"Room, 1 King Bed, View",251,246.0,1 King Bed,2.0,Greenwich Street
"Room, 1 King Bed, View",251,319.0,1 King Bed,2.0,Greenwich Street
"Executive, One Queen Bed, City View ...",312,179.0,1 Queen Bed,2.0,Central Park S
Premier City View Queen/King ...,312,189.0,1 King Bed or 1 Queen Bed,2.0,Central Park S
"Executive, One King Bed, City View ...",331,194.0,1 King Bed,2.0,Central Park S
"Deluxe Room, 1 King Bed",300,159.0,1 King Bed,2.0,W 48th St

link,size_sqrt,star_squared,rates_rating
https://www.expedia.com /Wall-Street-Financial- ...,15.6524758425,9.0,197.8
https://www.expedia.com /Wall-Street-Financial- ...,15.6524758425,9.0,197.8
https://www.expedia.com /Wall-Street-Financial- ...,15.6524758425,9.0,197.8
https://www.expedia.com /Wall-Street-Financial- ...,15.6524758425,9.0,197.8
https://www.expedia.com /Wall-Street-Financial- ...,15.8429795178,9.0,197.8
https://www.expedia.com /Wall-Street-Financial- ...,15.8429795178,9.0,197.8
https://www.expedia.com /New-York-Hotels-Park- ...,17.6635217327,16.0,24.0
https://www.expedia.com /New-York-Hotels-Park- ...,17.6635217327,16.0,24.0
https://www.expedia.com /New-York-Hotels-Park- ...,18.1934053987,16.0,24.0
https://www.expedia.com /New-York-Hotels-The- ...,17.3205080757,12.25,67.2


* Squaring star will increase the separation between low-star hotels (e.g. 1 star) and fancy hotels (e.g. 5 star) since 1^2 = 1 but 5^2 = 25. Consequently this variable will mostly affect luxury hotels.

* On the other hand, taking square root of size will decrease the separation between big and small hotels coz some expensive hotels located in the city center are not very big.

# Learn regression weights with L1 penalty


Let us fit a model with all the features available, plus the features we just created above.


In [67]:
all_features = ['star', 
                'star_squared',
                'size_sqrt',
                'rates_rating', 
                'zipcode',
                'rating',
                'rates',
                'size',
                'guests'
               ]

Applying L1 penalty requires adding an extra parameter (l1_penalty) to the linear regression call in GraphLab Create. (Other tools may have separate implementations of LASSO.) Note that it's important to set l2_penalty=0 to ensure we don't introduce an additional L2 penalty.

In [94]:
model_all = graphlab.linear_regression.create(hotels, target='price', features = all_features,
                                              validation_set=None, 
                                              l2_penalty=0., l1_penalty=5e6)

Find what features had non-zero weight.


In [95]:
# get the non-zero weights
non_zero_weight = model_all["coefficients"][model_all["coefficients"]["value"] > 0]
non_zero_weight.print_rows(num_rows=10)

+--------------+-------+-----------------+--------+
|     name     | index |      value      | stderr |
+--------------+-------+-----------------+--------+
| (intercept)  |  None |  208.368392485  |  None  |
|     star     |  None |   2.4953686027  |  None  |
| star_squared |  None |   1.9352184498  |  None  |
|  size_sqrt   |  None |  0.504811734311 |  None  |
|     size     |  None | 0.0771794925303 |  None  |
+--------------+-------+-----------------+--------+
[5 rows x 4 columns]



Note that a majority of the weights have been set to zero. So by setting an L1 penalty that's large enough, we are performing a subset selection.


# Selecting an L1 penalty


To find a good L1 penalty, we will explore multiple values using a validation set. we will do three way split into train, validation, and test sets:

* Split our sales data into 2 sets: training and test
* Further split our training data into two sets: train, validation


In [98]:
(training_and_validation, testing) = hotels.random_split(.9,seed=1) # initial train/test split
(training, validation) = training_and_validation.random_split(0.5, seed=1) # split training into train and validate

Next, we write a loop that does the following:

* For l1_penalty in [10^1, 10^1.5, 10^2, 10^2.5, ..., 10^7] (np.logspace(1, 7, num=13).)
    * Fit a regression model with a given l1_penalty on TRAIN data. Specify l1_penalty=l1_penalty and l2_penalty=0. in the parameter list.
    * Compute the RSS on VALIDATION data (here we will use .predict()) for that l1_penalty
* Report which l1_penalty produced the lowest RSS on validation data.


In [99]:
#set a result set for the validation rss
validation_rss = {}
for l1_penalty in np.logspace(1,7, num=13):
    #here we taking all the features into consideration
    model = graphlab.linear_regression.create(training, target='price', features=all_features,
                                              validation_set=None, verbose = False,
                                              l2_penalty=0., l1_penalty=l1_penalty)
    predictions = model.predict(validation)
    residual = validation['price'] - predictions
    rss = sum(residual**2)
    #dictionary is key value pair thing
    validation_rss[l1_penalty] = rss

pprint.pprint(validation_rss)
#here we can use the min to print out the smallest value in a dictionary very important
print min(validation_rss.items(), key = lambda x:x[1])

{10.0: 196452013.7869342,
 31.622776601683793: 196452432.6221374,
 100.0: 196453757.2097408,
 316.22776601683796: 196457947.0667083,
 1000.0: 196471207.99363214,
 3162.2776601683795: 196513257.07196343,
 10000.0: 196647371.41477448,
 31622.776601683792: 197082912.9748925,
 100000.0: 198607757.32021484,
 316227.76601683791: 204509555.22652483,
 1000000.0: 234980925.506664,
 3162277.6601683795: 366428308.2363785,
 10000000.0: 324470399.1276843}
(10.0, 196452013.7869342)


In [101]:
model_best = graphlab.linear_regression.create(training, target='price', features=all_features,
                                              validation_set=None, verbose = False,
                                              l2_penalty=0., l1_penalty=10.0)
non_zero_weight_best = model_best["coefficients"][model_best["coefficients"]["value"] > 0]
print model_best["coefficients"]["value"].nnz()
non_zero_weight_best.print_rows(num_rows=20)

10
+--------------+-------+------------------+--------+
|     name     | index |      value       | stderr |
+--------------+-------+------------------+--------+
| (intercept)  |  None |  24.5968377813   |  None  |
|     star     |  None |  13.3207564062   |  None  |
| star_squared |  None |  4.69111802351   |  None  |
|  size_sqrt   |  None |  2.67426931867   |  None  |
| rates_rating |  None | 0.00309284743847 |  None  |
|   zipcode    |  None | 0.00241751139928 |  None  |
|    rating    |  None |  8.48700643455   |  None  |
|     size     |  None |  0.19607286394   |  None  |
|    guests    |  None |   9.4605375092   |  None  |
+--------------+-------+------------------+--------+
[9 rows x 4 columns]



# Limit the number of nonzero weights


What if we absolutely wanted to limit ourselves to, say, 5 features? This may be important if we want to derive "a rule of thumb" --- an interpretable model that has only a few features in them.

In this section, we are going to implement a simple, two phase procedure to achive this goal:

    1.explore a large range of l1_penalty values to find a narrow region of l1_penalty values where models are likely to have the desired number of non-zero weights.
    
    2.Further explore the narrow region we found to find a good value for l1_penalty that achieves the desired sparsity. Here, we will again use a validation set to choose the best value for l1_penalty.


In [124]:
max_nonzeros = 6

# Exploring the larger range of values to find a narrow range with the desired sparsity

Let's define a wide range of possible l1_penalty_values:


In [125]:
l1_penalty_values = np.logspace(5, 7, num=20)

Now, implement a loop that search through this space of possible l1_penalty values:

* For l1_penalty in np.logspace(5, 7, num=20):
    * Fit a regression model with a given l1_penalty on TRAIN data. Specify l1_penalty=l1_penalty and l2_penalty=0. in the parameter list. When we call linear_regression.create() make sure we set validation_set = None
    * Extract the weights of the model and count the number of nonzeros. Save the number of nonzeros to a list.
        * model['coefficients']['value'] gives an SArray with the parameters we have learned.


In [126]:
coef_dict ={}
for l1_penalty in l1_penalty_values:
    model  = graphlab.linear_regression.create(training,target = 'price', features = all_features, validation_set = None,
                                             verbose = None, l2_penalty=0.,l1_penalty=l1_penalty)
    coef_dict[l1_penalty] = model['coefficients']['value'].nnz()
    
pprint.pprint(coef_dict)

{100000.0: 9,
 127427.49857031347: 9,
 162377.67391887208: 8,
 206913.80811147901: 8,
 263665.08987303555: 8,
 335981.82862837811: 8,
 428133.23987193959: 8,
 545559.47811685142: 8,
 695192.79617756058: 8,
 885866.79041008325: 8,
 1128837.8916846884: 8,
 1438449.888287663: 7,
 1832980.7108324375: 6,
 2335721.4690901213: 5,
 2976351.441631319: 3,
 3792690.1907322537: 1,
 4832930.2385717519: 1,
 6158482.110660254: 1,
 7847599.7035146067: 1,
 10000000.0: 1}


In [127]:
l1_penalty_min = 1438449.888287663
l1_penalty_max = 2335721.4690901213

# Exploring the narrow range of values to find the solution with the right number of non-zeros that has lowest RSS on the validation set

We will now explore the narrow region of l1_penalty values we found:


In [128]:
l1_penalty_values = np.linspace(l1_penalty_min,l1_penalty_max,20)

* For l1_penalty in np.linspace(l1_penalty_min,l1_penalty_max,20):
    * Fit a regression model with a given l1_penalty on TRAIN data. Specify l1_penalty=l1_penalty and l2_penalty=0. in the parameter list. When we call linear_regression.create() make sure our set validation_set = None
    * Measure the RSS of the learned model on the VALIDATION set

Find the model that the lowest RSS on the VALIDATION set and has sparsity equal to max_nonzeros.

In [129]:
validation_rss = {}
for l1_penalty in l1_penalty_values:
    model = graphlab.linear_regression.create(training,target='price', features=all_features,
                                             validation_set = None, verbose = False, l2_penalty= 0, l1_penalty=l1_penalty)
    predictions = model.predict(validation)
    residuals = predictions - validation['price']
    rss = sum(residuals**2)
    validation_rss[l1_penalty] = rss, model['coefficients']['value'].nnz()
    
validation_rss

{1438449.888287663: (261230622.88899678, 7),
 1485674.7083298976: (263804976.2219418, 7),
 1532899.5283721322: (265755287.60718963, 6),
 1580124.3484143671: (267305174.79435593, 6),
 1627349.1684566017: (268868705.9951554, 6),
 1674573.9884988363: (271290798.39221954, 6),
 1721798.8085410709: (274142245.4353427, 6),
 1769023.6285833055: (277033085.9048229, 6),
 1816248.4486255401: (279985624.8406279, 6),
 1863473.2686677747: (284277060.55391914, 6),
 1910698.0887100096: (288316477.7230976, 5),
 1957922.9087522442: (291243627.5094769, 5),
 2005147.7287944788: (294205169.7905289, 5),
 2052372.5488367134: (297201088.8883851, 5),
 2099597.368878948: (300278070.0863141, 5),
 2146822.1889211829: (303465152.03691715, 5),
 2194047.0089634173: (306699043.3245677, 5),
 2241271.8290056521: (310091561.12653154, 5),
 2288496.649047887: (314052305.5679448, 5),
 2335721.4690901213: (319262720.2384972, 5)}

In [130]:
bestRSS = 9223372036854775807.0
for k,v in validation_rss.iteritems():    
    if (v[1] == max_nonzeros) and (v[0] < bestRSS):
        bestRSS = v[0]
        bestl1 = k
        
print bestRSS, bestl1

265755287.607 1532899.52837


Retrain the model with the best L1 value

In [131]:
model = graphlab.linear_regression.create(training,target='price',features= all_features,
                                         validation_set = None, verbose = False,
                                         l2_penalty=0., l1_penalty=1532899.52837)

In [132]:
non_zero_weight_test = model['coefficients'][model['coefficients']['value']>0]
non_zero_weight_test.print_rows(num_rows=20)
model['coefficients'].print_rows(num_rows=20)

+--------------+-------+----------------+--------+
|     name     | index |     value      | stderr |
+--------------+-------+----------------+--------+
| (intercept)  |  None | 153.531245227  |  None  |
|     star     |  None |  6.443666566   |  None  |
| star_squared |  None | 2.93345739643  |  None  |
|  size_sqrt   |  None | 1.29300746878  |  None  |
|    rating    |  None | 1.86944571357  |  None  |
|     size     |  None | 0.120075102605 |  None  |
+--------------+-------+----------------+--------+
[6 rows x 4 columns]

+--------------+-------+----------------+--------+
|     name     | index |     value      | stderr |
+--------------+-------+----------------+--------+
| (intercept)  |  None | 153.531245227  |  None  |
|     star     |  None |  6.443666566   |  None  |
| star_squared |  None | 2.93345739643  |  None  |
|  size_sqrt   |  None | 1.29300746878  |  None  |
| rates_rating |  None |      0.0       |  None  |
|   zipcode    |  None |      0.0       |  None  |
|    rati

As we can see the feature that we have chosen are those features not equal to 0 except the intercept