# Exploration of Socioeconomic Influences on Cancer Mortality:
# Machine Learning with Unscaled Data

This notebook uses the cleaned DataFrame built off of the "cancer-reg.csv" dataset (https://data.world/exercises/linear-regression-exercise-1) to build a series of classification models designed to identify the most salient predictors of cancer mortality at the county level for the year 2015 by looking at the coefficients of the best performing regressor algorithm. The cleaned DataFrame contains a series of features native to the "cancer-reg.csv" dataset, but also contains a series of derived features (as detailed in the Data Cleaning notebook).

The target feature of the model is continuous, so regression is the focus in this report. Machine Learning regression models are carried out below using Ordinary Least Squares (OLS) Regression, Ridge Regression, LASSO, ElasticNet, Stochastic Gradient Descent (SGD) Regressor, Kernel Ridge Regression, and Random Forest algorithms to try and predict cancer mortality rates.

These models are created not only to predict cancer mortality, but to also identify the most salient predictors of cancer mortality by looking at the coefficients of the best performing regression algorithm. By identifying the most salient predictors of cancer mortality, policy makers can use this study as a resource in which to guide public health policy as a component of the fight against cancer. Although these salient predictors cannot be identified as a cause of cancer mortality, identifying predictive features can help in the understanding of factors that contribute to cancer mortality. Random Forest is also used as a way to nonlinearly predict cancer mortality, but because the Random Forest method does not produce coefficients, it is not used to identify the most salient predictors of cancer mortality. 

The best performing regression algorithm for the model is identified by evaluating the accuracy score and root mean squared error (RMSE) of a set of regression algorithms. Generally speaking, these regression algorithms are run on unscaled and scaled data, and utilizee different values for the regularization hyperparameter ‘alpha’ (for all algorithms except for simple OLS linear regression), the L1 ratio (for ElasticNet and SGD Regressor), the penalty (L1, L2, or ElasticNet for SGD Regressor), and the number of estimators (for Random Forest). The LASSO and ElasticNet algorithms use their internal normalization setting to scale the data, as they would not converge otherwise. The MinMax scaler is used for scaling data on the other algorithms, but in the next notebook named 'Cancer_ML_unscaled.ipynb'. In this notebook, only unscaled data is used (except for the necessary normalization used in the LASSO and Elastic Net algorithms). 

The best performing regression algorithm in terms of accuracy score and RMSE is then identified. These regression algorithms’ accuracy and RMSE scores are stored in a hyperparameter tuning table, which is displayed below.

In [2]:
import csv
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn
from sklearn.preprocessing import MinMaxScaler
from sklearn import linear_model
from sklearn.linear_model import Ridge
from sklearn.kernel_ridge import KernelRidge
from sklearn.linear_model import Lasso
from sklearn.linear_model import ElasticNet
from sklearn.linear_model import SGDRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn import metrics
from sklearn.metrics import roc_curve
from sklearn.metrics import roc_auc_score
from sklearn.metrics import mean_squared_error
import pickle

In [3]:
df = pd.read_csv('cancer_ml6_ml.csv', index_col=['Geography'])

In [4]:
df.head()

Unnamed: 0_level_0,TARGET_deathRate,avgAnnCount,incidenceRate,medIncome,popEst2015,povertyPercent,studyPerCap,MedianAge,MedianAgeMale,MedianAgeFemale,...,city_min_distsl1_sqrd,sc_min_dists_l1_log,PCT_LACCESS_CHILD10_sqrd,PCT_LACCESS_HHNV10_sqrd,PC_DIRSALES07_sqrd,FMRKT13_sqrd,PCH_FMRKT_09_13_sqrd,PCT_OBESE_ADULTS13_log,PCT_OBESE_ADULTS13_sqrd,CHILDPOVRATE10_log
Geography,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
"Abbeville County, South Carolina",183.7,143.0,430.9,35525,24932,21.4,0.0,43.3,40.7,44.9,...,5.827314,-0.674641,49.425404,36.601854,13.9129,4,0.0,3.456317,1004.89,3.280911
"Acadia Parish, Louisiana",230.5,323.0,492.7,40269,62577,22.0,0.0,35.7,34.7,37.2,...,47.212922,-1.386678,0.243122,3.229274,58.9824,0,0.0,3.499533,1095.61,3.387774
"Accomack County, Virginia",216.2,221.0,479.4,38390,32973,19.4,0.0,45.3,42.7,47.3,...,22.077434,0.153911,0.516719,59.388869,2.9584,4,10000.0,3.303217,739.84,3.356897
"Ada County, Idaho",151.6,1757.0,469.0,57908,434211,11.6,414.545002,35.8,35.0,36.6,...,104.907721,-0.244491,24.459579,0.336674,5.2441,100,123.45679,3.387774,876.16,2.778819
"Adair County, Iowa",178.9,51.0,440.7,48216,7228,10.3,138.350858,45.9,45.0,47.7,...,54.729457,-0.522917,3.281391,3.52072,48.3025,4,0.0,3.443618,979.69,2.646175


In [5]:
df.columns

Index(['TARGET_deathRate', 'avgAnnCount', 'incidenceRate', 'medIncome',
       'popEst2015', 'povertyPercent', 'studyPerCap', 'MedianAge',
       'MedianAgeMale', 'MedianAgeFemale',
       ...
       'city_min_distsl1_sqrd', 'sc_min_dists_l1_log',
       'PCT_LACCESS_CHILD10_sqrd', 'PCT_LACCESS_HHNV10_sqrd',
       'PC_DIRSALES07_sqrd', 'FMRKT13_sqrd', 'PCH_FMRKT_09_13_sqrd',
       'PCT_OBESE_ADULTS13_log', 'PCT_OBESE_ADULTS13_sqrd',
       'CHILDPOVRATE10_log'],
      dtype='object', length=329)

In [6]:
len(df.index.unique())

3047

In [7]:
df.shape

(3047, 329)

In [8]:
df.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
Index: 3047 entries, Abbeville County, South Carolina to Zavala County, Texas
Data columns (total 329 columns):
TARGET_deathRate                  float64
avgAnnCount                       float64
incidenceRate                     float64
medIncome                         int64
popEst2015                        int64
povertyPercent                    float64
studyPerCap                       float64
MedianAge                         float64
MedianAgeMale                     float64
MedianAgeFemale                   float64
AvgHouseholdSize                  float64
PercentMarried                    float64
PctNoHS18_24                      float64
PctHS18_24                        float64
PctSomeCol18_24                   float64
PctBachDeg18_24                   float64
PctHS25_Over                      float64
PctBachDeg25_Over                 float64
PctEmployed16_Over                float64
PctUnemployed16_Over              float64
PctPrivateCove

In order for all features in the DataFrame to be compatible with all types of machine learning model algorithms, the set of all Boolean features in the DataFrame is first converted to binary integer features.

In [9]:
boolean_cols = ['PctSomeCol18_24_isnull', 'PctEmployed16_Over_isnull', 'PctPrivateCoverageAlone_isnull', 
               'age_gt_100', 'household_lt_1', 'PCT_LACCESS_POP10_isnull', 'PCT_LACCESS_LOWI10_isnull', 
               'PCT_LACCESS_CHILD10_isnull', 'PCT_LACCESS_SENIORS10_isnull', 'PCT_LACCESS_HHNV10_isnull', 
               'FOODINSEC_00_02_isnull', 'FOODINSEC_07_09_isnull', 'FOODINSEC_10_12_isnull', 
               'CH_FOODINSEC_02_12_isnull', 'CH_FOODINSEC_09_12_isnull', 'VLFOODSEC_00_02_isnull', 
               'VLFOODSEC_07_09_isnull', 'VLFOODSEC_10_12_isnull', 'CH_VLFOODSEC_02_12_isnull', 
               'CH_VLFOODSEC_09_12_isnull', 'FOODINSEC_CHILD_01_07_isnull', 'FOODINSEC_CHILD_03_11_isnull', 
               'PCT_LOCLFARM07_isnull', 'PCT_LOCLSALE07_isnull', 'PC_DIRSALES07_isnull', 'FMRKT09_isnull', 
               'FMRKT13_isnull', 'PCH_FMRKT_09_13_isnull', 'FMRKTPTH09_isnull', 'FMRKTPTH13_isnull', 
               'PCH_FMRKTPTH_09_13_isnull', 'PCT_FMRKT_SNAP13_isnull', 'PCT_FMRKT_WIC13_isnull', 
               'PCT_FMRKT_WICCASH13_isnull', 'PCT_FMRKT_SFMNP13_isnull', 'PCT_FRMKT_FRVEG13_isnull', 
               'PCT_FRMKT_ANMLPROD13_isnull', 'PCT_FMRKT_OTHER13_isnull', 'VEG_FARMS07_isnull', 
               'VEG_ACRES07_isnull', 'VEG_ACRESPTH07_isnull', 'FRESHVEG_FARMS07_isnull', 'FRESHVEG_ACRES07_isnull', 
               'FRESHVEG_ACRESPTH07_isnull', 'ORCHARD_FARMS07_isnull', 'ORCHARD_ACRES07_isnull', 
               'ORCHARD_ACRESPTH07_isnull', 'BERRY_FARMS07_isnull', 'BERRY_ACRES07_isnull', 
               'BERRY_ACRESPTH07_isnull', 'SLHOUSE07_isnull', 'GHVEG_FARMS07_isnull', 'GHVEG_SQFT07_isnull', 
               'GHVEG_SQFTPTH07_isnull', 'FOODHUB12_isnull', 'CSA07_isnull', 'AGRITRSM_OPS07_isnull', 
               'AGRITRSM_RCT07_isnull', 'FARM_TO_SCHOOL_isnull', 'PCT_OBESE_CHILD08_isnull', 
               'PCT_OBESE_CHILD11_isnull', 'PCH_OBESE_CHILD_08_11_isnull', 'PCT_HSPA09_isnull', 
               'PCH_RECFAC_07_12_isnull', 'PCH_RECFACPTH_07_12_isnull', 'NATAMEN_isnull']

In [10]:
for col in boolean_cols:
    df[col] = df[col].astype(int)

In [11]:
df.to_csv('cancer_ml7.csv')

In [12]:
del df

In [13]:
df = pd.read_csv('cancer_ml7.csv', index_col=['Geography'])

# Machine Learning with Unscaled Data

To find the best performing algorithm, unscaled data is first experimented with and then scaled data will be used in the "Cancer_ML_scaled" notebook. A simple 1-fold cross-validation of a train-test split will be used.

The hyperparameter tuning table that records all of the train and test accuracy and Root Mean Squared Error (RMSE) is called below for reference.

In [19]:
hp_tuning_table_unscaled = pd.read_excel('HP tuning table - 2nd Capstone_Loew.xlsx', 
                                         sheet_name = 'HP_Tuning_Unscaled')
hp_tuning_table_unscaled

Unnamed: 0,Individual Algorithm Summary,LR_#,Train_Accuracy_Score,Train_RMSE,Test_Accuracy_Score,Test_RMSE,Model,Unscaled/Scaled,Non-normalized/Normalized,Alpha,Solver,Penalty,L1 Ratio (for ElasticNet),epsilon (for SGD Regressor),learning_rate (for SGD Regressor),eta0 (for SGD Regressor),power_t (for SGD Regressor),estimators (for Random Forest)
0,"Unscaled, OLS Linear Regression, No Train-Test...",1,0.652889,16.34743,,,OLS Linear Regression,,,,,,,,,,,
1,"Unscaled, OLS Linear Regression",2,0.646626,16.58783,0.640625,16.19956,OLS Linear Regression,Unscaled,Non-normalized,,,,,,,,,
2,"Unscaled, Ridge Regression, Alpha 0.001, auto ...",3,0.646537,16.58992,0.640764,16.19642,Ridge Regression,Unscaled,Non-normalized,0.001,Auto,,,,,,,
3,"Unscaled, Ridge Regression, Alpha 0.01, auto s...",4,0.646118,16.59976,0.639803,16.21808,Ridge Regression,Unscaled,Non-normalized,0.01,Auto,,,,,,,
4,"Unscaled, Ridge Regression, Alpha 0.1, auto so...",5,0.644564,16.63615,0.634241,16.34281,Ridge Regression,Unscaled,Non-normalized,0.1,Auto,,,,,,,
5,"Unscaled, Ridge Regression, Alpha 1, auto solver",6,0.638631,16.77443,0.621876,16.61678,Ridge Regression,Unscaled,Non-normalized,1.0,Auto,,,,,,,
6,"Unscaled, Ridge Regression, Alpha 10, auto solver",7,0.631687,16.93482,0.613007,16.81052,Ridge Regression,Unscaled,Non-normalized,10.0,Auto,,,,,,,
7,"Unscaled, Ridge Regression, Alpha 100, auto so...",8,0.620851,17.18215,0.607718,16.92499,Ridge Regression,Unscaled,Non-normalized,100.0,Auto,,,,,,,
8,"Normalized, LASSO, Alpha 0.001",9,0.620849,17.18219,0.599832,17.09427,LASSO,Unscaled,Normalized,0.001,,,,,,,,
9,"Normalized, LASSO, Alpha 0.01",10,0.580685,18.06936,0.594475,17.20831,LASSO,Unscaled,Normalized,0.01,,,,,,,,


## Linear Regression: Basic OLS with no Hyperparameter Tuning or Train-Test Split

The target variable is set as 'TARGET_deathRate', the per capita cancer mortality rate (per 100,000 people).

In [13]:
y = df['TARGET_deathRate']

The predictive feature set X is defined as the rest of the columns in the DataFrame.

In [14]:
target_name = ['TARGET_deathRate']
X = df[[cn for cn in df.columns if cn not in target_name]]

As a starting point, a simple Linear Regression algorithm is run with no scaling, no hyperparameter tuning and no train/test split.

In [15]:
lr_1 = linear_model.LinearRegression()
lr_1

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
         normalize=False)

In [16]:
lr_1.fit(X, y)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
         normalize=False)

In [17]:
lr_1.score(X, y)

0.6528892461052225

In [18]:
y_pred_1 = lr_1.predict(X)
y_pred_1[0:20]

array([188.49306894, 204.21317534, 208.04168369, 144.35277673,
       172.08713334, 208.03640583, 177.36422249, 211.13721024,
       157.36076519, 164.04873534, 178.13226571, 188.25829047,
       180.1145355 , 209.17514372, 178.23192609, 154.91846179,
       222.78217372, 166.29058527, 148.85989663, 178.30619196])

In [19]:
print("R^2: {}".format(lr_1.score(X, y)))
rmse = np.sqrt(mean_squared_error(y, y_pred_1))
print("Root Mean Squared Error: {}".format(rmse))

R^2: 0.6528892461052225
Root Mean Squared Error: 16.34742660293694


In [20]:
filename = 'cancer_lr_1.sav'
pickle.dump(lr_1, open(filename, 'wb'))

## Linear Regression: Basic OLS with no Normalization and 80/20 Train-Test Split

In [21]:
lr_2 = linear_model.LinearRegression()
lr_2

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
         normalize=False)

Next, a train/test split is created and the OLS linear regression algorithm is re-run.

In [22]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

The algorithm is fitted on the training set, and the accuracy is returned on the training and test sets.

In [23]:
lr_2.fit(X_train, y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
         normalize=False)

In [24]:
y_pred_2_train = lr_2.predict(X_train)
y_pred_2_train[0:20]

array([193.43076022, 190.79549479, 216.14766197, 139.15685695,
       225.40659532, 196.12564798, 173.12573478, 170.47723738,
       196.51814602, 230.39423481, 197.22011038, 175.54912298,
       202.47658953, 187.68025628, 154.69078527, 188.18074488,
       201.16786053, 156.98379718, 174.44087904, 179.39307531])

In [25]:
print("Training Set R^2: {}".format(lr_2.score(X_train, y_train)))
rmse_2_train = np.sqrt(mean_squared_error(y_train, y_pred_2_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_2_train))

Training Set R^2: 0.6466260171457876
Training Set Root Mean Squared Error: 16.587832150231034


A training set accuracy score/R Squared of 0.647 and a training set RMSE of 16.6 is returned.

In [26]:
y_pred_2_test = lr_2.predict(X_test)
y_pred_2_test[0:20]

array([177.32331018, 174.71390586, 162.20675843, 175.78673799,
       178.88472736, 195.88785672, 173.28360857, 164.24285469,
       175.00771172, 171.91171712, 177.00965019, 206.91234961,
       158.15670143, 157.54265212, 219.88087446, 108.69274019,
       188.3489999 , 205.9336845 , 208.31387718, 183.87287741])

In [27]:
print("Test Set R^2: {}".format(lr_2.score(X_test, y_test)))
rmse_2_test = np.sqrt(mean_squared_error(y_test, y_pred_2_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_2_test))

Test Set R^2: 0.6406253865183126
Test Set Root Mean Squared Error: 16.199560036310984


A test set accuracy score/R Squared of 0.641 and a training set RMSE of 16.2 is returned.

In [28]:
filename = 'cancer_lr_2.sav'
pickle.dump(lr_2, open(filename, 'wb'))

## Linear Regression: Basic OLS with Normalization and 80/20 Train-Test Split

In [29]:
lr_2n = linear_model.LinearRegression(normalize=True)
lr_2n

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=True)

The algorithm is fitted on the training set, and the accuracy is returned on the test set.

In [30]:
lr_2n.fit(X_train, y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=True)

In [31]:
y_pred_2n_train = lr_2n.predict(X_train)
y_pred_2n_train[0:20]

array([193.5  , 191.25 , 214.5  , 137.   , 225.125, 195.125, 172.   ,
       170.875, 195.   , 228.625, 197.25 , 172.875, 199.625, 189.25 ,
       154.5  , 188.625, 200.75 , 144.5  , 172.875, 178.375])

In [32]:
print("Training Set R^2: {}".format(lr_2n.score(X_train, y_train)))
rmse_2n_train = np.sqrt(mean_squared_error(y_train, y_pred_2n_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_2n_train))

Training Set R^2: 0.6411848638577181
Training Set Root Mean Squared Error: 16.715051687428854


A training set accuracy score/R Squared of 0.641 and a training set RMSE of 16.7 is returned.

In [33]:
y_pred_2n_test = lr_2n.predict(X_test)
y_pred_2n_test[0:20]

array([176.   , 173.25 , 158.   , 175.75 , 179.75 , 197.625, 173.125,
       162.375, 173.75 , 174.625, 176.75 , 206.75 , 158.75 , 156.625,
       220.   , 113.25 , 189.375, 208.125, 206.625, 183.125])

In [34]:
print("Test Set R^2: {}".format(lr_2n.score(X_test, y_test)))
rmse_2n_test = np.sqrt(mean_squared_error(y_test, y_pred_2n_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_2n_test))

Test Set R^2: -1.7688596633522706e+22
Test Set Root Mean Squared Error: 3593984596963.187


A test set accuracy score/R Squared of -1.77 and a training set RMSE of 3593984596963.2 is returned. The internal normalization parameter in OLS linear regression is abandoned.

In [35]:
filename = 'cancer_lr_2n.sav'
pickle.dump(lr_2n, open(filename, 'wb'))

## Ridge Regression

In [36]:
lr_3 = linear_model.Ridge(alpha=0.001)

The algorithm is fitted on the training set, and the accuracy is returned on the training and test sets.

In [37]:
lr_3.fit(X_train, y_train)

  overwrite_a=True).T


Ridge(alpha=0.001, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

In [38]:
y_pred_3_train = lr_3.predict(X_train)
y_pred_3_train[0:20]

array([193.6814711 , 190.81999526, 216.23272617, 139.13920381,
       225.16635654, 195.66524864, 172.79393977, 170.42103655,
       196.48290334, 230.26460023, 197.25887577, 175.61009965,
       202.61717549, 187.70993108, 155.03822506, 188.09679189,
       201.12734623, 156.97725576, 174.21388088, 179.27723478])

In [39]:
print("Training Set R^2: {}".format(lr_3.score(X_train, y_train)))
rmse_3_train = np.sqrt(mean_squared_error(y_train, y_pred_3_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_3_train))

Training Set R^2: 0.6465369736893312
Training Set Root Mean Squared Error: 16.58992192614389


A training set accuracy score/R Squared of 0.647 and a training set RMSE of 16.6 is returned.

In [40]:
y_pred_3_test = lr_3.predict(X_test)
y_pred_3_test[0:20]

array([177.35025576, 175.37003439, 162.28118045, 175.61565049,
       178.79949234, 195.82574314, 173.08925325, 164.09948603,
       174.86180399, 172.19905272, 177.03470199, 206.9132008 ,
       157.80007659, 157.37575802, 220.00185197, 108.9025308 ,
       188.29362207, 205.9232775 , 208.2225993 , 183.78457323])

In [41]:
print("Test Set R^2: {}".format(lr_3.score(X_test, y_test)))
rmse_3_test = np.sqrt(mean_squared_error(y_test, y_pred_3_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_3_test))

Test Set R^2: 0.6407644965711742
Test Set Root Mean Squared Error: 16.196424394992476


A test set accuracy score/R Squared of 0.641 and a training set RMSE of 16.2 is returned.

In [42]:
filename = 'cancer_lr_3.sav'
pickle.dump(lr_3, open(filename, 'wb'))

In [43]:
lr_4 = linear_model.Ridge(alpha=0.01)

In [44]:
lr_4.fit(X_train, y_train)

  overwrite_a=True).T


Ridge(alpha=0.01, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

In [45]:
y_pred_4_train = lr_4.predict(X_train)
y_pred_4_train[0:20]

array([193.97421274, 190.803042  , 216.47744598, 139.18168145,
       224.86107479, 195.28616606, 172.60630417, 170.38031619,
       196.58773879, 229.98546095, 197.3855935 , 175.83052981,
       202.87079374, 187.91668978, 155.3324726 , 187.87667801,
       200.94054981, 157.19996637, 173.92273127, 179.02946537])

In [46]:
print("Training Set R^2: {}".format(lr_4.score(X_train, y_train)))
rmse_4_train = np.sqrt(mean_squared_error(y_train, y_pred_4_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_4_train))

Training Set R^2: 0.6461176562856638
Training Set Root Mean Squared Error: 16.599759420544274


A training set accuracy score/R Squared of 0.646 and a training set RMSE of 16.6 is returned.

In [47]:
y_pred_4_test = lr_4.predict(X_test)
y_pred_4_test[0:20]

array([177.2773751 , 175.84086138, 162.39717784, 175.15845469,
       178.70525044, 195.52244801, 173.33644092, 164.07576807,
       174.89145587, 172.53565528, 176.99916738, 206.93467414,
       157.3491704 , 157.17733013, 220.12558715, 109.72125026,
       188.33461587, 205.93000181, 208.1122704 , 183.61566339])

In [48]:
print("Test Set R^2: {}".format(lr_4.score(X_test, y_test)))
rmse_4_test = np.sqrt(mean_squared_error(y_test, y_pred_4_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_4_test))

Test Set R^2: 0.639803404401255
Test Set Root Mean Squared Error: 16.21807573374081


A test set accuracy score/R Squared of 0.64 and a training set RMSE of 16.2 is returned.

In [49]:
filename = 'cancer_lr_4.sav'
pickle.dump(lr_4, open(filename, 'wb'))

In [50]:
lr_5 = linear_model.Ridge(alpha=0.1)

In [51]:
lr_5.fit(X_train, y_train)

Ridge(alpha=0.1, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

In [52]:
y_pred_5_train = lr_5.predict(X_train)
y_pred_5_train[0:20]

array([193.94205107, 190.34004958, 216.98964232, 139.65314151,
       224.57663064, 195.59982137, 173.39302848, 170.04832602,
       197.32928555, 229.38218098, 197.26445118, 176.06077122,
       203.27706487, 188.84950273, 155.12403909, 187.5381329 ,
       200.38345278, 158.04278352, 174.15385082, 177.3109295 ])

In [53]:
print("Training Set R^2: {}".format(lr_5.score(X_train, y_train)))
rmse_5_train = np.sqrt(mean_squared_error(y_train, y_pred_5_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_5_train))

Training Set R^2: 0.6445643776248309
Training Set Root Mean Squared Error: 16.636149793347826


A training set accuracy score/R Squared of 0.645 and a training set RMSE of 16.6 is returned.

In [54]:
y_pred_5_test = lr_5.predict(X_test)
y_pred_5_test[0:20]

array([177.1340568 , 175.00340605, 162.34699421, 173.79786528,
       178.48061032, 194.9497752 , 173.09927602, 164.42108045,
       175.15798394, 173.22456346, 176.72338181, 206.77115802,
       156.94780341, 157.19234886, 220.28363902, 113.02093142,
       188.87890876, 206.64952067, 208.33444128, 182.93581046])

In [55]:
print("Test Set R^2: {}".format(lr_5.score(X_test, y_test)))
rmse_5_test = np.sqrt(mean_squared_error(y_test, y_pred_5_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_5_test))

Test Set R^2: 0.6342414817237682
Test Set Root Mean Squared Error: 16.342810573526208


A test set accuracy score/R Squared of 0.634 and a training set RMSE of 16.3 is returned.

In [56]:
filename = 'cancer_lr_5.sav'
pickle.dump(lr_5, open(filename, 'wb'))

In [57]:
lr_6 = linear_model.Ridge(alpha=1)

In [58]:
lr_6.fit(X_train, y_train)

Ridge(alpha=1, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

In [59]:
y_pred_6_train = lr_6.predict(X_train)
y_pred_6_train[0:20]

array([193.54968432, 189.2868701 , 217.23528763, 141.35479957,
       224.13278864, 196.1489831 , 174.80309007, 169.01708056,
       197.753776  , 228.33165561, 197.41040226, 176.30335466,
       203.98032825, 190.02798824, 153.41207884, 187.29487857,
       199.85069636, 158.40415461, 175.24353758, 174.09075559])

In [60]:
print("Training Set R^2: {}".format(lr_6.score(X_train, y_train)))
rmse_6_train = np.sqrt(mean_squared_error(y_train, y_pred_6_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_6_train))

Training Set R^2: 0.6386310911070309
Training Set Root Mean Squared Error: 16.77442872580821


A training set accuracy score/R Squared of 0.639 and a training set RMSE of 16.8 is returned.

In [61]:
y_pred_6_test = lr_6.predict(X_test)
y_pred_6_test[0:20]

array([178.28647312, 174.17563057, 162.27984659, 172.30645947,
       177.01143399, 194.83043133, 172.52843565, 164.25192969,
       174.27564908, 174.77172972, 176.31234174, 205.97603674,
       156.19165276, 157.61862641, 221.01656054, 118.96910462,
       190.12973789, 208.20548821, 209.28291182, 181.50484774])

In [62]:
print("Test Set R^2: {}".format(lr_6.score(X_test, y_test)))
rmse_6_test = np.sqrt(mean_squared_error(y_test, y_pred_6_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_6_test))

Test Set R^2: 0.6218756881258749
Test Set Root Mean Squared Error: 16.61677820375241


A test set accuracy score/R Squared of 0.622 and a training set RMSE of 16.6 is returned.

In [63]:
filename = 'cancer_lr_6.sav'
pickle.dump(lr_6, open(filename, 'wb'))

In [64]:
lr_7 = linear_model.Ridge(alpha=10)

In [65]:
lr_7.fit(X_train, y_train)

Ridge(alpha=10, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

In [66]:
y_pred_7_train = lr_7.predict(X_train)
y_pred_7_train[0:20]

array([194.00905367, 189.12904693, 216.08855183, 142.97653181,
       222.53178716, 196.87411927, 175.60206817, 167.33288297,
       196.07545264, 224.37824004, 200.50080077, 176.89280777,
       204.22857049, 191.9179439 , 150.87817075, 187.30911358,
       200.36238998, 158.04910085, 174.37658874, 173.87917218])

In [67]:
print("Training Set R^2: {}".format(lr_7.score(X_train, y_train)))
rmse_7_train = np.sqrt(mean_squared_error(y_train, y_pred_7_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_7_train))

Training Set R^2: 0.6316873504680414
Training Set Root Mean Squared Error: 16.934823077496894


A training set accuracy score/R Squared of 0.632 and a training set RMSE of 16.9 is returned.

In [68]:
y_pred_7_test = lr_7.predict(X_test)
y_pred_7_test[0:20]

array([181.51286026, 174.1025006 , 162.20529353, 172.25515684,
       174.55663589, 195.29457999, 174.05345022, 164.98582171,
       173.29383253, 175.64665635, 177.42640548, 204.45822473,
       155.87860821, 157.57432716, 221.61193598, 123.83752829,
       192.35713452, 208.18793313, 208.7256466 , 181.32054179])

In [69]:
print("Test Set R^2: {}".format(lr_7.score(X_test, y_test)))
rmse_7_test = np.sqrt(mean_squared_error(y_test, y_pred_7_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_7_test))

Test Set R^2: 0.61300697698984
Test Set Root Mean Squared Error: 16.810517763043546


A test set accuracy score/R Squared of 0.613 and a training set RMSE of 16.8 is returned.

In [70]:
filename = 'cancer_lr_7.sav'
pickle.dump(lr_7, open(filename, 'wb'))

In [71]:
lr_8 = linear_model.Ridge(alpha=100)

In [72]:
lr_8.fit(X_train, y_train)

Ridge(alpha=100, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

In [73]:
y_pred_8_train = lr_8.predict(X_train)
y_pred_8_train[0:20]

array([194.24892437, 191.60300774, 213.57133849, 144.54719904,
       219.0851009 , 196.35544434, 177.82030462, 164.37242971,
       192.63831902, 218.26143246, 206.01853551, 177.11785672,
       204.41301744, 193.61998569, 150.67888029, 187.32176655,
       202.20898951, 156.60314818, 170.73186896, 175.1930032 ])

In [74]:
print("Training Set R^2: {}".format(lr_8.score(X_train, y_train)))
rmse_8_train = np.sqrt(mean_squared_error(y_train, y_pred_8_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_8_train))

Training Set R^2: 0.6208507015781555
Training Set Root Mean Squared Error: 17.182148761147744


A training set accuracy score/R Squared of 0.621 and a training set RMSE of 17.2 is returned.

In [75]:
y_pred_8_test = lr_8.predict(X_test)
y_pred_8_test[0:20]

array([183.68141199, 173.6074642 , 162.40412632, 173.34668137,
       173.94359093, 196.84462872, 174.83216629, 167.23118828,
       172.329785  , 175.49194851, 181.82877993, 199.6096779 ,
       155.46650297, 157.40525247, 220.99979022, 126.20640218,
       193.99067576, 207.39410635, 204.84855803, 182.33624218])

In [76]:
print("Test Set R^2: {}".format(lr_8.score(X_test, y_test)))
rmse_8_test = np.sqrt(mean_squared_error(y_test, y_pred_8_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_8_test))

Test Set R^2: 0.6077184878002422
Test Set Root Mean Squared Error: 16.924990852305804


A test set accuracy score/R Squared of 0.608 and a training set RMSE of 16.9 is returned.

In [77]:
filename = 'cancer_lr_8.sav'
pickle.dump(lr_8, open(filename, 'wb'))

## LASSO

LASSO will never converge without normalization. Although the MinMax Scaler will be experimented later in the notebook in conjunction with LASSO, the LASSO function with internal normalization is first tried below.

In [78]:
lr_9 = linear_model.Lasso(alpha=0.001, normalize=True, max_iter=2000)

In [79]:
lr_9.fit(X_train, y_train)

Lasso(alpha=0.001, copy_X=True, fit_intercept=True, max_iter=2000,
   normalize=True, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

In [80]:
y_pred_9_train = lr_9.predict(X_train)
y_pred_9_train[0:20]

array([195.39363953, 191.71837334, 210.65349601, 142.63213555,
       221.67547544, 195.06202143, 175.5714302 , 172.04906281,
       193.66268546, 220.96647609, 208.49938573, 180.31202623,
       202.90988258, 190.52271457, 151.86201164, 188.27804431,
       200.6001508 , 154.55684617, 171.15715976, 174.86999563])

In [81]:
print("Training Set R^2: {}".format(lr_9.score(X_train, y_train)))
rmse_9_train = np.sqrt(mean_squared_error(y_train, y_pred_9_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_9_train))

Training Set R^2: 0.6208488778202208
Training Set Root Mean Squared Error: 17.18219008529444


A training set accuracy score/R Squared of 0.621 and a training set RMSE of 17.2 is returned.

In [82]:
y_pred_9_test = lr_9.predict(X_test)
y_pred_9_test[0:20]

array([179.08404156, 176.19810179, 159.42083273, 175.09287334,
       177.27253781, 196.87050706, 173.28475525, 165.38649936,
       170.85577602, 171.49963144, 177.28394713, 203.97845259,
       159.40360776, 156.60794932, 221.34930729, 115.3806991 ,
       187.57933552, 205.12641528, 206.61495516, 183.56065352])

In [83]:
print("Test Set R^2: {}".format(lr_9.score(X_test, y_test)))
rmse_9_test = np.sqrt(mean_squared_error(y_test, y_pred_9_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_9_test))

Test Set R^2: 0.59983204015626
Test Set Root Mean Squared Error: 17.094274705428003


A test set accuracy score/R Squared of 0.6 and a training set RMSE of 17.1 is returned.

In [84]:
filename = 'cancer_lr_9.sav'
pickle.dump(lr_9, open(filename, 'wb'))

In [85]:
lr_10 = linear_model.Lasso(alpha=0.01, normalize=True, max_iter=2000)

In [86]:
lr_10.fit(X_train, y_train)

Lasso(alpha=0.01, copy_X=True, fit_intercept=True, max_iter=2000,
   normalize=True, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

In [87]:
y_pred_10_train = lr_10.predict(X_train)
y_pred_10_train[0:20]

array([197.55932908, 192.88046139, 207.09423971, 149.80980275,
       213.76130427, 194.55987119, 178.23352096, 165.04493606,
       191.27094394, 217.68123005, 212.36956635, 180.28711277,
       201.31084792, 191.8949879 , 158.95197794, 183.74193488,
       197.89094535, 159.20241699, 169.39084201, 175.98987864])

In [88]:
print("Training Set R^2: {}".format(lr_10.score(X_train, y_train)))
rmse_10_train = np.sqrt(mean_squared_error(y_train, y_pred_10_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_10_train))

Training Set R^2: 0.5806847206343142
Training Set Root Mean Squared Error: 18.06935666540806


A training set accuracy score/R Squared of 0.581 and a training set RMSE of 18.1 is returned.

In [89]:
y_pred_10_test = lr_10.predict(X_test)
y_pred_10_test[0:20]

array([185.11889985, 176.30771271, 166.3214211 , 176.98537907,
       170.28123545, 200.57791348, 173.72497031, 172.01673999,
       172.08606202, 170.06650012, 183.86795521, 194.1600771 ,
       163.7421692 , 161.67062688, 220.46487863, 122.25124148,
       190.42182788, 199.79208002, 201.43295303, 183.92432696])

In [90]:
print("Test Set R^2: {}".format(lr_10.score(X_test, y_test)))
rmse_10_test = np.sqrt(mean_squared_error(y_test, y_pred_10_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_10_test))

Test Set R^2: 0.5944752553234316
Test Set Root Mean Squared Error: 17.20830924435254


A test set accuracy score/R Squared of 0.595 and a training set RMSE of 17.2 is returned.

In [91]:
filename = 'cancer_lr_10.sav'
pickle.dump(lr_10, open(filename, 'wb'))

In [92]:
lr_11 = linear_model.Lasso(alpha=0.1, normalize=True, max_iter=2000)

In [93]:
lr_11.fit(X_train, y_train)

Lasso(alpha=0.1, copy_X=True, fit_intercept=True, max_iter=2000,
   normalize=True, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

In [94]:
y_pred_11_train = lr_11.predict(X_train)
y_pred_11_train[0:20]

array([194.94556298, 189.44173396, 193.59686353, 156.26174291,
       200.40753668, 190.08201545, 179.06111026, 168.16856974,
       180.27350044, 206.03461295, 195.45675894, 174.41740489,
       186.72209989, 190.72352434, 169.15777534, 179.7713397 ,
       189.4072948 , 162.37881562, 174.46865568, 179.84799674])

In [95]:
print("Training Set R^2: {}".format(lr_11.score(X_train, y_train)))
rmse_11_train = np.sqrt(mean_squared_error(y_train, y_pred_11_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_11_train))

Training Set R^2: 0.432862589079535
Training Set Root Mean Squared Error: 21.014375803801023


A training set accuracy score/R Squared of 0.433 and a training set RMSE of 21 is returned.

In [96]:
y_pred_11_test = lr_11.predict(X_test)
y_pred_11_test[0:20]

array([187.81731941, 176.3460354 , 171.92094489, 176.59754822,
       172.80994977, 191.12387984, 177.23686082, 181.80709606,
       172.2819364 , 177.99533327, 183.23656609, 185.27581794,
       174.67196715, 170.54334155, 201.03917248, 136.80641452,
       188.87892173, 191.15520665, 188.31907096, 183.94689769])

In [97]:
print("Test Set R^2: {}".format(lr_11.score(X_test, y_test)))
rmse_11_test = np.sqrt(mean_squared_error(y_test, y_pred_11_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_11_test))

Test Set R^2: 0.4554854293828936
Test Set Root Mean Squared Error: 19.940418782158286


A test set accuracy score/R Squared of 0.46 and a training set RMSE of 19.9 is returned.

In [98]:
filename = 'cancer_lr_11.sav'
pickle.dump(lr_11, open(filename, 'wb'))

In [99]:
lr_12 = linear_model.Lasso(alpha=1, normalize=True, max_iter=2000)

In [100]:
lr_12.fit(X_train, y_train)

Lasso(alpha=1, copy_X=True, fit_intercept=True, max_iter=2000, normalize=True,
   positive=False, precompute=False, random_state=None, selection='cyclic',
   tol=0.0001, warm_start=False)

In [101]:
y_pred_12_train = lr_12.predict(X_train)
y_pred_12_train[0:20]

array([179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847])

In [102]:
print("Training Set R^2: {}".format(lr_12.score(X_train, y_train)))
rmse_12_train = np.sqrt(mean_squared_error(y_train, y_pred_12_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_12_train))

Training Set R^2: 0.0
Training Set Root Mean Squared Error: 27.90437800503229


A training set accuracy score/R Squared of 0 and a training set RMSE of 27.9 is returned.

In [103]:
y_pred_12_test = lr_12.predict(X_test)
y_pred_12_test[0:20]

array([179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847])

In [104]:
print("Test Set R^2: {}".format(lr_12.score(X_test, y_test)))
rmse_12_test = np.sqrt(mean_squared_error(y_test, y_pred_12_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_12_test))

Test Set R^2: -0.00798750229846834
Test Set Root Mean Squared Error: 27.130456166458586


A test set accuracy score/R Squared of -0.008 and a training set RMSE of 27.1 is returned.

In [105]:
filename = 'cancer_lr_12.sav'
pickle.dump(lr_12, open(filename, 'wb'))

In [106]:
lr_13 = linear_model.Lasso(alpha=10, normalize=True, max_iter=2000)

In [107]:
lr_13.fit(X_train, y_train)

Lasso(alpha=10, copy_X=True, fit_intercept=True, max_iter=2000,
   normalize=True, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

In [108]:
y_pred_13_train = lr_13.predict(X_train)
y_pred_13_train[0:20]

array([179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847])

In [109]:
print("Training Set R^2: {}".format(lr_13.score(X_train, y_train)))
rmse_13_train = np.sqrt(mean_squared_error(y_train, y_pred_13_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_13_train))

Training Set R^2: 0.0
Training Set Root Mean Squared Error: 27.90437800503229


A training set accuracy score/R Squared of 0 and a training set RMSE of 27.9 is returned.

In [110]:
y_pred_13_test = lr_13.predict(X_test)
y_pred_13_test[0:20]

array([179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847])

In [111]:
print("Test Set R^2: {}".format(lr_13.score(X_test, y_test)))
rmse_13_test = np.sqrt(mean_squared_error(y_test, y_pred_13_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_13_test))

Test Set R^2: -0.00798750229846834
Test Set Root Mean Squared Error: 27.130456166458586


A test set accuracy score/R Squared of -0.008 and a training set RMSE of 27.1 is returned.

In [112]:
filename = 'cancer_lr_13.sav'
pickle.dump(lr_13, open(filename, 'wb'))

In [113]:
lr_14 = linear_model.Lasso(alpha=100, normalize=True, max_iter=2000)

In [114]:
lr_14.fit(X_train, y_train)

Lasso(alpha=100, copy_X=True, fit_intercept=True, max_iter=2000,
   normalize=True, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

In [115]:
y_pred_14_train = lr_14.predict(X_train)
y_pred_14_train[0:20]

array([179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847])

In [116]:
print("Training Set R^2: {}".format(lr_14.score(X_train, y_train)))
rmse_14_train = np.sqrt(mean_squared_error(y_train, y_pred_14_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_14_train))

Training Set R^2: 0.0
Training Set Root Mean Squared Error: 27.90437800503229


A training set accuracy score/R Squared of 0 and a training set RMSE of 27.9 is returned.

In [117]:
y_pred_14_test = lr_14.predict(X_test)
y_pred_14_test[0:20]

array([179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847])

In [118]:
print("Test Set R^2: {}".format(lr_14.score(X_test, y_test)))
rmse_14_test = np.sqrt(mean_squared_error(y_test, y_pred_14_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_14_test))

Test Set R^2: -0.00798750229846834
Test Set Root Mean Squared Error: 27.130456166458586


A test set accuracy score/R Squared of -0.008 and a training set RMSE of 27.1 is returned.

In [119]:
filename = 'cancer_lr_14.sav'
pickle.dump(lr_14, open(filename, 'wb'))

## Elastic Net with L1 Ratio of 0.25

In [120]:
lr_15 = linear_model.ElasticNet(alpha=0.001, l1_ratio=0.25, normalize=True)

In [121]:
lr_15.fit(X_train, y_train)

ElasticNet(alpha=0.001, copy_X=True, fit_intercept=True, l1_ratio=0.25,
      max_iter=1000, normalize=True, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [122]:
y_pred_15_train = lr_15.predict(X_train)
y_pred_15_train[0:20]

array([202.30629423, 192.19170755, 205.94711832, 151.18398882,
       203.48712571, 189.92578695, 171.24495903, 162.92371268,
       185.24402148, 212.92837923, 205.52227905, 178.01225955,
       198.53637795, 196.65089689, 158.14568024, 180.3461094 ,
       200.17554069, 158.06131007, 164.16481966, 169.2741957 ])

In [123]:
print("Training Set R^2: {}".format(lr_15.score(X_train, y_train)))
rmse_15_train = np.sqrt(mean_squared_error(y_train, y_pred_15_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_15_train))

Training Set R^2: 0.5319134290706964
Training Set Root Mean Squared Error: 19.091297147447065


A training set accuracy score/R Squared of 0.532 and a training set RMSE of 19.1 is returned.

In [124]:
y_pred_15_test = lr_15.predict(X_test)
y_pred_15_test[0:20]

array([170.95437434, 176.36094391, 164.45840771, 175.63669807,
       178.08345981, 193.98510616, 175.32038492, 167.86477987,
       165.50403982, 174.36013448, 181.79767063, 203.00044749,
       162.33978454, 161.07682502, 213.08867172, 127.14256212,
       189.16329271, 201.59158518, 195.99667923, 187.37673869])

In [125]:
print("Test Set R^2: {}".format(lr_15.score(X_test, y_test)))
rmse_15_test = np.sqrt(mean_squared_error(y_test, y_pred_15_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_15_test))

Test Set R^2: 0.4919015523501885
Test Set Root Mean Squared Error: 19.262092152749762


A test set accuracy score/R Squared of 0.492 and a training set RMSE of 19.3 is returned.

In [126]:
filename = 'cancer_lr_15.sav'
pickle.dump(lr_15, open(filename, 'wb'))

In [127]:
lr_16 = linear_model.ElasticNet(alpha=0.01, l1_ratio=0.25, normalize=True)

In [128]:
lr_16.fit(X_train, y_train)

ElasticNet(alpha=0.01, copy_X=True, fit_intercept=True, l1_ratio=0.25,
      max_iter=1000, normalize=True, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [129]:
y_pred_16_train = lr_16.predict(X_train)
y_pred_16_train[0:20]

array([194.75269535, 189.36140402, 195.66034545, 165.37308346,
       191.59754022, 185.74419728, 167.31694446, 164.29500225,
       182.57311026, 199.15422408, 195.14467332, 175.20695711,
       189.13632826, 193.10254262, 167.21705072, 179.56349922,
       192.6770344 , 168.61551795, 167.80267808, 171.35981213])

In [130]:
print("Training Set R^2: {}".format(lr_16.score(X_train, y_train)))
rmse_16_train = np.sqrt(mean_squared_error(y_train, y_pred_16_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_16_train))

Training Set R^2: 0.3566989500543001
Training Set Root Mean Squared Error: 22.380999449452453


A training set accuracy score/R Squared of 0.357 and a training set RMSE of 22.4 is returned.

In [131]:
y_pred_16_test = lr_16.predict(X_test)
y_pred_16_test[0:20]

array([170.76237575, 177.20255587, 171.01026219, 177.40422787,
       179.15725048, 188.30935096, 178.11517384, 173.8346168 ,
       168.40437921, 176.21044743, 183.77945311, 194.91755311,
       170.63556204, 169.14381897, 195.87889423, 151.55066664,
       187.90977896, 191.60971668, 188.02412752, 187.55979514])

In [132]:
print("Test Set R^2: {}".format(lr_16.score(X_test, y_test)))
rmse_16_test = np.sqrt(mean_squared_error(y_test, y_pred_16_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_16_test))

Test Set R^2: 0.30787577693585433
Test Set Root Mean Squared Error: 22.481306505506556


A test set accuracy score/R Squared of 0.308 and a training set RMSE of 22.5 is returned.

In [133]:
filename = 'cancer_lr_16.sav'
pickle.dump(lr_16, open(filename, 'wb'))

In [134]:
lr_17 = linear_model.ElasticNet(alpha=0.1, l1_ratio=0.25, normalize=True)

In [135]:
lr_17.fit(X_train, y_train)

ElasticNet(alpha=0.1, copy_X=True, fit_intercept=True, l1_ratio=0.25,
      max_iter=1000, normalize=True, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [136]:
y_pred_17_train = lr_17.predict(X_train)
y_pred_17_train[0:20]

array([182.20874739, 181.29585541, 182.44779682, 176.41242996,
       181.48543926, 180.64598001, 175.94898091, 175.88724648,
       179.97822639, 182.92460164, 182.46313674, 178.4095897 ,
       181.06515388, 182.03931785, 176.99659611, 179.05297742,
       181.9172123 , 177.57619387, 176.90146549, 177.56957951])

In [137]:
print("Training Set R^2: {}".format(lr_17.score(X_train, y_train)))
rmse_17_train = np.sqrt(mean_squared_error(y_train, y_pred_17_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_17_train))

Training Set R^2: 0.09029427884975072
Training Set Root Mean Squared Error: 26.614775638471766


A training set accuracy score/R Squared of 0.09 and a training set RMSE of 26.6 is returned.

In [138]:
y_pred_17_test = lr_17.predict(X_test)
y_pred_17_test[0:20]

array([176.81419944, 178.87146455, 177.56952794, 178.71108992,
       178.94841071, 181.06972995, 179.28864234, 178.03382554,
       176.96131869, 178.53663968, 180.39169398, 182.19726188,
       177.57181621, 177.22082932, 182.37613486, 173.81807071,
       181.0517028 , 181.35501076, 180.69809804, 181.11019085])

In [139]:
print("Test Set R^2: {}".format(lr_17.score(X_test, y_test)))
rmse_17_test = np.sqrt(mean_squared_error(y_test, y_pred_17_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_17_test))

Test Set R^2: 0.072535709130781
Test Set Root Mean Squared Error: 26.02424392670071


A test set accuracy score/R Squared of 0.07 and a training set RMSE of 26 is returned.

In [140]:
filename = 'cancer_lr_17.sav'
pickle.dump(lr_17, open(filename, 'wb'))

In [141]:
lr_18 = linear_model.ElasticNet(alpha=1, l1_ratio=0.25, normalize=True)

In [142]:
lr_18.fit(X_train, y_train)

ElasticNet(alpha=1, copy_X=True, fit_intercept=True, l1_ratio=0.25,
      max_iter=1000, normalize=True, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [143]:
y_pred_18_train = lr_18.predict(X_train)
y_pred_18_train[0:20]

array([179.16101939, 179.15033658, 179.15525323, 179.13489048,
       179.15423898, 179.15384802, 179.14574887, 179.1404956 ,
       179.14803531, 179.16014358, 179.15350573, 179.14349084,
       179.15022152, 179.15393369, 179.14151603, 179.14927245,
       179.1517171 , 179.13309887, 179.1432941 , 179.14635068])

In [144]:
print("Training Set R^2: {}".format(lr_18.score(X_train, y_train)))
rmse_18_train = np.sqrt(mean_squared_error(y_train, y_pred_18_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_18_train))

Training Set R^2: 0.00028779474453977105
Training Set Root Mean Squared Error: 27.900362349420604


A training set accuracy score/R Squared of 0.0003 and a training set RMSE of 27.9 is returned.

In [145]:
y_pred_18_test = lr_18.predict(X_test)
y_pred_18_test[0:20]

array([179.1480978 , 179.14316307, 179.1467997 , 179.14705374,
       179.1476209 , 179.15174564, 179.14472912, 179.15098806,
       179.1397567 , 179.14746986, 179.14925033, 179.15845061,
       179.14442068, 179.14286662, 179.15587489, 179.1244432 ,
       179.1518528 , 179.15487349, 179.14990126, 179.15109355])

In [146]:
print("Test Set R^2: {}".format(lr_18.score(X_test, y_test)))
rmse_18_test = np.sqrt(mean_squared_error(y_test, y_pred_18_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_18_test))

Test Set R^2: -0.007738882107694689
Test Set Root Mean Squared Error: 27.1271100956251


A test set accuracy score/R Squared of -0.008 and a training set RMSE of 27.1 is returned.

In [147]:
filename = 'cancer_lr_18.sav'
pickle.dump(lr_18, open(filename, 'wb'))

In [148]:
lr_19 = linear_model.ElasticNet(alpha=10, l1_ratio=0.25, normalize=True)

In [149]:
lr_19.fit(X_train, y_train)

ElasticNet(alpha=10, copy_X=True, fit_intercept=True, l1_ratio=0.25,
      max_iter=1000, normalize=True, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [150]:
y_pred_19_train = lr_19.predict(X_train)
y_pred_19_train[0:20]

array([179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847])

In [151]:
print("Training Set R^2: {}".format(lr_19.score(X_train, y_train)))
rmse_19_train = np.sqrt(mean_squared_error(y_train, y_pred_19_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_19_train))

Training Set R^2: 0.0
Training Set Root Mean Squared Error: 27.90437800503229


A training set accuracy score/R Squared of 0 and a training set RMSE of 27.9 is returned.

In [152]:
y_pred_19_test = lr_19.predict(X_test)
y_pred_19_test[0:20]

array([179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847])

In [153]:
print("Test Set R^2: {}".format(lr_19.score(X_test, y_test)))
rmse_19_test = np.sqrt(mean_squared_error(y_test, y_pred_19_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_19_test))

Test Set R^2: -0.00798750229846834
Test Set Root Mean Squared Error: 27.130456166458586


A test set accuracy score/R Squared of -0.008 and a training set RMSE of 27.1 is returned.

In [154]:
filename = 'cancer_lr_19.sav'
pickle.dump(lr_19, open(filename, 'wb'))

In [155]:
lr_20 = linear_model.ElasticNet(alpha=100, l1_ratio=0.25, normalize=True)

In [156]:
lr_20.fit(X_train, y_train)

ElasticNet(alpha=100, copy_X=True, fit_intercept=True, l1_ratio=0.25,
      max_iter=1000, normalize=True, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [157]:
y_pred_20_train = lr_20.predict(X_train)
y_pred_20_train[0:20]

array([179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847])

In [158]:
print("Training Set R^2: {}".format(lr_20.score(X_train, y_train)))
rmse_20_train = np.sqrt(mean_squared_error(y_train, y_pred_20_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_20_train))

Training Set R^2: 0.0
Training Set Root Mean Squared Error: 27.90437800503229


A training set accuracy score/R Squared of 0 and a training set RMSE of 27.9 is returned.

In [159]:
y_pred_20_test = lr_20.predict(X_test)
y_pred_20_test[0:20]

array([179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847])

In [160]:
print("Test Set R^2: {}".format(lr_20.score(X_test, y_test)))
rmse_20_test = np.sqrt(mean_squared_error(y_test, y_pred_20_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_20_test))

Test Set R^2: -0.00798750229846834
Test Set Root Mean Squared Error: 27.130456166458586


A test set accuracy score/R Squared of -0.008 and a training set RMSE of 27.1 is returned.

In [161]:
filename = 'cancer_lr_20.sav'
pickle.dump(lr_20, open(filename, 'wb'))

## Elastic Net with L1 Ratio of 0.5

In [162]:
lr_21 = linear_model.ElasticNet(alpha=0.001, l1_ratio=0.5, normalize=True)

In [163]:
lr_21.fit(X_train, y_train)

ElasticNet(alpha=0.001, copy_X=True, fit_intercept=True, l1_ratio=0.5,
      max_iter=1000, normalize=True, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [164]:
y_pred_21_train = lr_21.predict(X_train)
y_pred_21_train[0:20]

array([201.88619461, 191.96397795, 206.55017843, 149.25946579,
       205.32329944, 190.32697193, 172.38623239, 164.09080309,
       186.17827125, 214.35792764, 206.21241951, 179.09359505,
       199.63516109, 196.08299444, 157.34173426, 181.01506372,
       200.51792446, 157.32653937, 164.56548991, 169.6914623 ])

In [165]:
print("Training Set R^2: {}".format(lr_21.score(X_train, y_train)))
rmse_21_train = np.sqrt(mean_squared_error(y_train, y_pred_21_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_21_train))

Training Set R^2: 0.5514803769651211
Training Set Root Mean Squared Error: 18.68801055287659


A training set accuracy score/R Squared of 0.552 and a training set RMSE of 18.7 is returned.

In [166]:
y_pred_21_test = lr_21.predict(X_test)
y_pred_21_test[0:20]

array([171.66842242, 176.4294506 , 163.30730721, 175.69270898,
       177.84493978, 194.36741686, 175.14652324, 166.88145197,
       165.9978823 , 173.82642174, 181.10044387, 203.15344217,
       161.4461275 , 160.2450128 , 215.53400894, 124.29084818,
       188.89226082, 202.33228132, 197.22356577, 186.65856415])

In [167]:
print("Test Set R^2: {}".format(lr_21.score(X_test, y_test)))
rmse_21_test = np.sqrt(mean_squared_error(y_test, y_pred_21_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_21_test))

Test Set R^2: 0.517138310113884
Test Set Root Mean Squared Error: 18.77763518306763


A test set accuracy score/R Squared of 0.517 and a training set RMSE of 18.8 is returned.

In [168]:
filename = 'cancer_lr_21.sav'
pickle.dump(lr_21, open(filename, 'wb'))

In [169]:
lr_22 = linear_model.ElasticNet(alpha=0.01, l1_ratio=0.5, normalize=True)

In [170]:
lr_22.fit(X_train, y_train)

ElasticNet(alpha=0.01, copy_X=True, fit_intercept=True, l1_ratio=0.5,
      max_iter=1000, normalize=True, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [171]:
y_pred_22_train = lr_22.predict(X_train)
y_pred_22_train[0:20]

array([196.80537588, 190.38572002, 197.86748105, 163.2025598 ,
       193.62527469, 186.69863817, 167.22375318, 162.91915809,
       182.84492958, 201.65177635, 197.43692434, 174.86238893,
       190.77735673, 194.52396802, 165.60933656, 179.48278438,
       194.23343559, 166.63860577, 166.42883941, 170.87972551])

In [172]:
print("Training Set R^2: {}".format(lr_22.score(X_train, y_train)))
rmse_22_train = np.sqrt(mean_squared_error(y_train, y_pred_22_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_22_train))

Training Set R^2: 0.39095030222497174
Training Set Root Mean Squared Error: 21.77703316115587


A training set accuracy score/R Squared of 0.391 and a training set RMSE of 21.8 is returned.

In [173]:
y_pred_22_test = lr_22.predict(X_test)
y_pred_22_test[0:20]

array([170.61636163, 176.86268322, 170.04027523, 176.89932787,
       178.93232248, 189.40310438, 177.51211655, 173.52147116,
       167.22292419, 175.887877  , 184.04899179, 196.69059728,
       169.22150973, 167.71674436, 198.73571331, 147.25453886,
       188.59718556, 193.30814215, 189.15295386, 188.28880949])

In [174]:
print("Test Set R^2: {}".format(lr_22.score(X_test, y_test)))
rmse_22_test = np.sqrt(mean_squared_error(y_test, y_pred_22_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_22_test))

Test Set R^2: 0.34045079746713997
Test Set Root Mean Squared Error: 21.94588615828974


A test set accuracy score/R Squared of 0.341 and a training set RMSE of 22 is returned.

In [175]:
filename = 'cancer_lr_22.sav'
pickle.dump(lr_22, open(filename, 'wb'))

In [176]:
lr_23 = linear_model.ElasticNet(alpha=0.1, l1_ratio=0.5, normalize=True)

In [177]:
lr_23.fit(X_train, y_train)

ElasticNet(alpha=0.1, copy_X=True, fit_intercept=True, l1_ratio=0.5,
      max_iter=1000, normalize=True, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [178]:
y_pred_23_train = lr_23.predict(X_train)
y_pred_23_train[0:20]

array([182.74462316, 181.55174335, 182.97181246, 176.00122447,
       181.90814915, 180.94724017, 175.82088747, 175.43803469,
       180.08224356, 183.43585023, 183.04865361, 178.11288386,
       181.30750304, 182.43558846, 176.69243837, 178.95380425,
       182.27496207, 177.05349347, 176.51970494, 177.62224424])

In [179]:
print("Training Set R^2: {}".format(lr_23.score(X_train, y_train)))
rmse_23_train = np.sqrt(mean_squared_error(y_train, y_pred_23_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_23_train))

Training Set R^2: 0.10275859245447139
Training Set Root Mean Squared Error: 26.43181588074592


A training set accuracy score/R Squared of 0.103 and a training set RMSE of 26.4 is returned.

In [180]:
y_pred_23_test = lr_23.predict(X_test)
y_pred_23_test[0:20]

array([176.74512687, 178.7977992 , 177.30246787, 178.54119787,
       178.89101848, 181.28248327, 179.16331977, 178.25666609,
       176.53706307, 178.44269821, 180.51507686, 182.5840665 ,
       177.34045126, 176.94545028, 182.91069337, 172.89178476,
       181.27084026, 181.56904081, 180.8195803 , 181.3809751 ])

In [181]:
print("Test Set R^2: {}".format(lr_23.score(X_test, y_test)))
rmse_23_test = np.sqrt(mean_squared_error(y_test, y_pred_23_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_23_test))

Test Set R^2: 0.08364511785943718
Test Set Root Mean Squared Error: 25.867911785167912


A test set accuracy score/R Squared of 0.084 and a training set RMSE of 25.9 is returned.

In [182]:
filename = 'cancer_lr_23.sav'
pickle.dump(lr_23, open(filename, 'wb'))

In [183]:
lr_24 = linear_model.ElasticNet(alpha=1, l1_ratio=0.5, normalize=True)

In [184]:
lr_24.fit(X_train, y_train)

ElasticNet(alpha=1, copy_X=True, fit_intercept=True, l1_ratio=0.5,
      max_iter=1000, normalize=True, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [185]:
y_pred_24_train = lr_24.predict(X_train)
y_pred_24_train[0:20]

array([179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847])

In [186]:
print("Training Set R^2: {}".format(lr_24.score(X_train, y_train)))
rmse_24_train = np.sqrt(mean_squared_error(y_train, y_pred_24_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_24_train))

Training Set R^2: 0.0
Training Set Root Mean Squared Error: 27.90437800503229


A training set accuracy score/R Squared of 0 and a training set RMSE of 27.9 is returned.

In [187]:
y_pred_24_test = lr_24.predict(X_test)
y_pred_24_test[0:20]

array([179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847])

In [188]:
print("Test Set R^2: {}".format(lr_24.score(X_test, y_test)))
rmse_24_test = np.sqrt(mean_squared_error(y_test, y_pred_24_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_24_test))

Test Set R^2: -0.00798750229846834
Test Set Root Mean Squared Error: 27.130456166458586


A test set accuracy score/R Squared of -0.008 and a training set RMSE of 27.1 is returned.

In [189]:
filename = 'cancer_lr_24.sav'
pickle.dump(lr_24, open(filename, 'wb'))

In [190]:
lr_25 = linear_model.ElasticNet(alpha=10, l1_ratio=0.5, normalize=True)

In [191]:
lr_25.fit(X_train, y_train)

ElasticNet(alpha=10, copy_X=True, fit_intercept=True, l1_ratio=0.5,
      max_iter=1000, normalize=True, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [192]:
y_pred_25_train = lr_25.predict(X_train)
y_pred_25_train[0:20]

array([179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847])

In [193]:
print("Training Set R^2: {}".format(lr_25.score(X_train, y_train)))
rmse_25_train = np.sqrt(mean_squared_error(y_train, y_pred_25_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_25_train))

Training Set R^2: 0.0
Training Set Root Mean Squared Error: 27.90437800503229


A training set accuracy score/R Squared of 0 and a training set RMSE of 27.9 is returned.

In [194]:
y_pred_25_test = lr_25.predict(X_test)
y_pred_25_test[0:20]

array([179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847])

In [195]:
print("Test Set R^2: {}".format(lr_25.score(X_test, y_test)))
rmse_25_test = np.sqrt(mean_squared_error(y_test, y_pred_25_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_25_test))

Test Set R^2: -0.00798750229846834
Test Set Root Mean Squared Error: 27.130456166458586


A test set accuracy score/R Squared of -0.008 and a training set RMSE of 27.1 is returned.

In [196]:
filename = 'cancer_lr_25.sav'
pickle.dump(lr_25, open(filename, 'wb'))

In [197]:
lr_26 = linear_model.ElasticNet(alpha=100, l1_ratio=0.5, normalize=True)

In [198]:
lr_26.fit(X_train, y_train)

ElasticNet(alpha=100, copy_X=True, fit_intercept=True, l1_ratio=0.5,
      max_iter=1000, normalize=True, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [199]:
y_pred_26_train = lr_26.predict(X_train)
y_pred_26_train[0:20]

array([179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847])

In [200]:
print("Training Set R^2: {}".format(lr_26.score(X_train, y_train)))
rmse_26_train = np.sqrt(mean_squared_error(y_train, y_pred_26_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_26_train))

Training Set R^2: 0.0
Training Set Root Mean Squared Error: 27.90437800503229


A training set accuracy score/R Squared of 0 and a training set RMSE of 27.9 is returned.

In [201]:
y_pred_26_test = lr_26.predict(X_test)
y_pred_26_test[0:20]

array([179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847])

In [202]:
print("Test Set R^2: {}".format(lr_26.score(X_test, y_test)))
rmse_26_test = np.sqrt(mean_squared_error(y_test, y_pred_26_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_26_test))

Test Set R^2: -0.00798750229846834
Test Set Root Mean Squared Error: 27.130456166458586


A test set accuracy score/R Squared of -0.008 and a training set RMSE of 27.1 is returned.

In [203]:
filename = 'cancer_lr_26.sav'
pickle.dump(lr_26, open(filename, 'wb'))

## Elastic Net with L1 Ratio of 0.75

In [204]:
lr_27 = linear_model.ElasticNet(alpha=0.001, l1_ratio=0.75, normalize=True)

In [205]:
lr_27.fit(X_train, y_train)

ElasticNet(alpha=0.001, copy_X=True, fit_intercept=True, l1_ratio=0.75,
      max_iter=1000, normalize=True, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [206]:
y_pred_27_train = lr_27.predict(X_train)
y_pred_27_train[0:20]

array([200.7292074 , 191.41646294, 207.30567058, 146.75513924,
       208.60586548, 191.13413699, 174.19274636, 166.25550164,
       188.06079322, 216.32597704, 206.93716458, 180.50732382,
       201.10039229, 194.92146322, 156.33133869, 182.48634946,
       200.77910046, 156.4510579 , 165.70428572, 170.69116318])

In [207]:
print("Training Set R^2: {}".format(lr_27.score(X_train, y_train)))
rmse_27_train = np.sqrt(mean_squared_error(y_train, y_pred_27_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_27_train))

Training Set R^2: 0.5772959569150442
Training Set Root Mean Squared Error: 18.142224940275913


A training set accuracy score/R Squared of 0.58 and a training set RMSE of 18.1 is returned.

In [208]:
y_pred_27_test = lr_27.predict(X_test)
y_pred_27_test[0:20]

array([173.05814565, 176.3736161 , 161.55932881, 175.86873185,
       177.33470984, 195.0620429 , 174.89435439, 165.50164657,
       167.08055001, 172.92321233, 180.1114842 , 202.85071936,
       160.35898262, 159.03549412, 218.80713232, 120.58610835,
       188.49627093, 202.86697181, 199.41163971, 185.61934068])

In [209]:
print("Test Set R^2: {}".format(lr_27.score(X_test, y_test)))
rmse_27_test = np.sqrt(mean_squared_error(y_test, y_pred_27_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_27_test))

Test Set R^2: 0.5517773798689217
Test Set Root Mean Squared Error: 18.091576262350873


A test set accuracy score/R Squared of 0.55 and a training set RMSE of 18.1 is returned.

In [210]:
filename = 'cancer_lr_27.sav'
pickle.dump(lr_27, open(filename, 'wb'))

In [211]:
lr_28 = linear_model.ElasticNet(alpha=0.01, l1_ratio=0.75, normalize=True)

In [212]:
lr_28.fit(X_train, y_train)

ElasticNet(alpha=0.01, copy_X=True, fit_intercept=True, l1_ratio=0.75,
      max_iter=1000, normalize=True, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [213]:
y_pred_28_train = lr_28.predict(X_train)
y_pred_28_train[0:20]

array([199.43811801, 191.72235026, 200.99538421, 159.22771479,
       196.94010923, 188.06243488, 167.85911956, 161.11869943,
       183.39035021, 205.30032529, 200.89202442, 174.92487702,
       193.51096007, 196.12037042, 163.27520126, 179.44041556,
       196.46561131, 163.5956201 , 164.93947816, 170.612035  ])

In [214]:
print("Training Set R^2: {}".format(lr_28.score(X_train, y_train)))
rmse_28_train = np.sqrt(mean_squared_error(y_train, y_pred_28_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_28_train))

Training Set R^2: 0.4414720681545616
Training Set Root Mean Squared Error: 20.854260533973523


A training set accuracy score/R Squared of 0.44 and a training set RMSE of 20.9 is returned.

In [215]:
y_pred_28_test = lr_28.predict(X_test)
y_pred_28_test[0:20]

array([171.0469776 , 176.35002549, 168.83794284, 176.35551687,
       178.63255147, 191.31705295, 176.30740184, 173.01952326,
       165.95534085, 175.53181274, 183.9738743 , 198.8669463 ,
       167.19036601, 165.86671323, 203.5200834 , 140.13127039,
       189.48343678, 195.96556675, 191.17258393, 188.73709384])

In [216]:
print("Test Set R^2: {}".format(lr_28.score(X_test, y_test)))
rmse_28_test = np.sqrt(mean_squared_error(y_test, y_pred_28_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_28_test))

Test Set R^2: 0.39299117789921756
Test Set Root Mean Squared Error: 21.053631815570398


A test set accuracy score/R Squared of 0.39 and a training set RMSE of 21.1 is returned.

In [217]:
filename = 'cancer_lr_28.sav'
pickle.dump(lr_28, open(filename, 'wb'))

In [218]:
lr_29 = linear_model.ElasticNet(alpha=0.1, l1_ratio=0.75, normalize=True)

In [219]:
lr_29.fit(X_train, y_train)

ElasticNet(alpha=0.1, copy_X=True, fit_intercept=True, l1_ratio=0.75,
      max_iter=1000, normalize=True, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [220]:
y_pred_29_train = lr_29.predict(X_train)
y_pred_29_train[0:20]

array([184.11780986, 182.35179484, 184.36406387, 175.03755628,
       182.90508175, 181.57118983, 175.37760009, 174.26031939,
       180.29955807, 184.84993521, 184.49921574, 177.4177025 ,
       181.9524251 , 183.65042337, 175.78107085, 179.11077999,
       183.3105398 , 175.6370248 , 175.5023602 , 177.14383342])

In [221]:
print("Training Set R^2: {}".format(lr_29.score(X_train, y_train)))
rmse_29_train = np.sqrt(mean_squared_error(y_train, y_pred_29_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_29_train))

Training Set R^2: 0.1366903804401024
Training Set Root Mean Squared Error: 25.927201116393906


A training set accuracy score/R Squared of 0.14 and a training set RMSE of 25.9 is returned.

In [222]:
y_pred_29_test = lr_29.predict(X_test)
y_pred_29_test[0:20]

array([176.56514742, 178.54369489, 176.56874677, 178.37242339,
       179.01528181, 181.94004916, 178.75331567, 178.51974704,
       175.470963  , 178.22901419, 180.73520631, 183.76332844,
       176.70625696, 176.17858516, 184.31529576, 170.37338747,
       181.91524665, 182.46060596, 181.35170544, 182.06965917])

In [223]:
print("Test Set R^2: {}".format(lr_29.score(X_test, y_test)))
rmse_29_test = np.sqrt(mean_squared_error(y_test, y_pred_29_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_29_test))

Test Set R^2: 0.11405651855818588
Test Set Root Mean Squared Error: 25.43504619681397


A test set accuracy score/R Squared of 0.114 and a training set RMSE of 25.4 is returned.

In [224]:
filename = 'cancer_lr_29.sav'
pickle.dump(lr_29, open(filename, 'wb'))

In [225]:
lr_30 = linear_model.ElasticNet(alpha=1, l1_ratio=0.75, normalize=True)

In [226]:
lr_30.fit(X_train, y_train)

ElasticNet(alpha=1, copy_X=True, fit_intercept=True, l1_ratio=0.75,
      max_iter=1000, normalize=True, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [227]:
y_pred_30_train = lr_30.predict(X_train)
y_pred_30_train[0:20]

array([179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847])

In [228]:
print("Training Set R^2: {}".format(lr_30.score(X_train, y_train)))
rmse_30_train = np.sqrt(mean_squared_error(y_train, y_pred_30_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_30_train))

Training Set R^2: 0.0
Training Set Root Mean Squared Error: 27.90437800503229


A training set accuracy score/R Squared of 0 and a training set RMSE of 27.9 is returned.

In [229]:
y_pred_30_test = lr_30.predict(X_test)
y_pred_30_test[0:20]

array([179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847])

In [230]:
print("Test Set R^2: {}".format(lr_30.score(X_test, y_test)))
rmse_30_test = np.sqrt(mean_squared_error(y_test, y_pred_30_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_30_test))

Test Set R^2: -0.00798750229846834
Test Set Root Mean Squared Error: 27.130456166458586


A test set accuracy score/R Squared of -0.008 and a training set RMSE of 27.1 is returned.

In [231]:
filename = 'cancer_lr_30.sav'
pickle.dump(lr_30, open(filename, 'wb'))

In [232]:
lr_31 = linear_model.ElasticNet(alpha=10, l1_ratio=0.75, normalize=True)

In [233]:
lr_31.fit(X_train, y_train)

ElasticNet(alpha=10, copy_X=True, fit_intercept=True, l1_ratio=0.75,
      max_iter=1000, normalize=True, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [234]:
y_pred_31_train = lr_31.predict(X_train)
y_pred_31_train[0:20]

array([179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847])

In [235]:
print("Training Set R^2: {}".format(lr_31.score(X_train, y_train)))
rmse_31_train = np.sqrt(mean_squared_error(y_train, y_pred_31_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_31_train))

Training Set R^2: 0.0
Training Set Root Mean Squared Error: 27.90437800503229


A training set accuracy score/R Squared of 0 and a training set RMSE of 27.9 is returned.

In [236]:
y_pred_31_test = lr_31.predict(X_test)
y_pred_31_test[0:20]

array([179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847])

In [237]:
print("Test Set R^2: {}".format(lr_31.score(X_test, y_test)))
rmse_31_test = np.sqrt(mean_squared_error(y_test, y_pred_31_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_31_test))

Test Set R^2: -0.00798750229846834
Test Set Root Mean Squared Error: 27.130456166458586


A test set accuracy score/R Squared of -0.008 and a training set RMSE of 27.1 is returned.

In [238]:
filename = 'cancer_lr_31.sav'
pickle.dump(lr_31, open(filename, 'wb'))

In [239]:
lr_32 = linear_model.ElasticNet(alpha=100, l1_ratio=0.75, normalize=True)

In [240]:
lr_32.fit(X_train, y_train)

ElasticNet(alpha=100, copy_X=True, fit_intercept=True, l1_ratio=0.75,
      max_iter=1000, normalize=True, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [241]:
y_pred_32_train = lr_32.predict(X_train)
y_pred_32_train[0:20]

array([179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847])

In [242]:
print("Training Set R^2: {}".format(lr_32.score(X_train, y_train)))
rmse_32_train = np.sqrt(mean_squared_error(y_train, y_pred_32_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_32_train))

Training Set R^2: 0.0
Training Set Root Mean Squared Error: 27.90437800503229


A training set accuracy score/R Squared of 0 and a training set RMSE of 27.9 is returned.

In [243]:
y_pred_32_test = lr_32.predict(X_test)
y_pred_32_test[0:20]

array([179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847,
       179.14755847, 179.14755847, 179.14755847, 179.14755847])

In [244]:
print("Test Set R^2: {}".format(lr_32.score(X_test, y_test)))
rmse_32_test = np.sqrt(mean_squared_error(y_test, y_pred_32_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_32_test))

Test Set R^2: -0.00798750229846834
Test Set Root Mean Squared Error: 27.130456166458586


A test set accuracy score/R Squared of -0.008 and a training set RMSE of 27.1 is returned.

In [245]:
filename = 'cancer_lr_32.sav'
pickle.dump(lr_32, open(filename, 'wb'))

## Stochastic Gradient Descent with L2 Penalty (Ridge Regression)

In [246]:
lr_33 = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=0.00001, random_state=42)

In [247]:
lr_33.fit(X_train, y_train)



SGDRegressor(alpha=1e-05, average=False, early_stopping=False, epsilon=0.1,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='invscaling', loss='huber', max_iter=None,
       n_iter=None, n_iter_no_change=5, penalty='l2', power_t=0.25,
       random_state=42, shuffle=True, tol=None, validation_fraction=0.1,
       verbose=0, warm_start=False)

In [248]:
y_pred_33_train = lr_33.predict(X_train)
y_pred_33_train[0:20]

array([  404938.24503574, -1208766.93300601,  6705276.55579317,
        5654048.58377967,  -897491.97211116, 38962505.17960329,
       12416729.36957454, 13562854.00948853, -2429871.71595133,
        -439872.78463801,  2072747.21289994, 22537413.13955067,
        -893314.18900416, -1189419.69109166,  4675759.27001442,
         839213.17127178,  -558173.20808035, 21731071.50820049,
        -407649.00844505, -3989243.80793243])

In [249]:
print("Training Set R^2: {}".format(lr_33.score(X_train, y_train)))
rmse_33_train = np.sqrt(mean_squared_error(y_train, y_pred_33_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_33_train))

Training Set R^2: -7438889288368.192
Training Set Root Mean Squared Error: 76107313.8387275


In [250]:
y_pred_33_test = lr_33.predict(X_test)
y_pred_33_test[0:20]

array([-2986165.35597238, 28260058.79967964, -1196990.43673893,
         580188.11044881, -1829909.25185928,  3051650.43439223,
       29375321.30179185,  4170776.24352542, 36441898.2200378 ,
       -1617693.37115664,  9970311.81083487, -1115925.34240356,
       -1064948.67619251,  8527462.03511787,  9979351.59897834,
        -402756.83178986,   567085.51791467,  1413829.35442049,
        5269276.83014207, 13349077.11162243])

In [251]:
print("Test Set R^2: {}".format(lr_33.score(X_test, y_test)))
rmse_33_test = np.sqrt(mean_squared_error(y_test, y_pred_33_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_33_test))

Test Set R^2: -7782336279022.103
Test Set Root Mean Squared Error: 75384927.01897977


In [252]:
filename = 'cancer_lr_33.sav'
pickle.dump(lr_33, open(filename, 'wb'))

In [253]:
lr_34 = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=0.0001, random_state=42)

In [254]:
lr_34.fit(X_train, y_train)



SGDRegressor(alpha=0.0001, average=False, early_stopping=False, epsilon=0.1,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='invscaling', loss='huber', max_iter=None,
       n_iter=None, n_iter_no_change=5, penalty='l2', power_t=0.25,
       random_state=42, shuffle=True, tol=None, validation_fraction=0.1,
       verbose=0, warm_start=False)

In [255]:
y_pred_34_train = lr_34.predict(X_train)
y_pred_34_train[0:20]

array([  404570.39739385, -1209305.85883767,  6705688.31437749,
        5653182.0074695 ,  -897991.73710752, 38966694.60862876,
       12417473.65438917, 13563293.37421957, -2430552.43128711,
        -440392.88984106,  2072400.37033463, 22539433.0383914 ,
        -893504.11737511, -1189989.17527026,  4676081.10627719,
         837415.19703247,  -558582.34386167, 21733070.41247602,
        -408202.51604297, -3990092.48480406])

In [256]:
print("Training Set R^2: {}".format(lr_34.score(X_train, y_train)))
rmse_34_train = np.sqrt(mean_squared_error(y_train, y_pred_34_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_34_train))

Training Set R^2: -7440463647073.458
Training Set Root Mean Squared Error: 76115367.04858495


In [257]:
y_pred_34_test = lr_34.predict(X_test)
y_pred_34_test[0:20]

array([-2986816.46614697, 28262892.71764994, -1197544.11694695,
         579842.47445581, -1830464.37950328,  3055703.55245992,
       29378211.96995367,  4170831.92338083, 36445597.10627712,
       -1618148.33302385,  9970926.25232351, -1116227.80534262,
       -1065468.13695325,  8527658.12115797,  9980555.22489668,
        -408290.75160693,   566727.65854204,  1413334.15480753,
        5269434.93660161, 13348514.06822818])

In [258]:
print("Test Set R^2: {}".format(lr_34.score(X_test, y_test)))
rmse_34_test = np.sqrt(mean_squared_error(y_test, y_pred_34_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_34_test))

Test Set R^2: -7783959231784.263
Test Set Root Mean Squared Error: 75392787.11350639


In [259]:
filename = 'cancer_lr_34.sav'
pickle.dump(lr_34, open(filename, 'wb'))

In [260]:
lr_35 = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=0.001, random_state=42)

In [261]:
lr_35.fit(X_train, y_train)



SGDRegressor(alpha=0.001, average=False, early_stopping=False, epsilon=0.1,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='invscaling', loss='huber', max_iter=None,
       n_iter=None, n_iter_no_change=5, penalty='l2', power_t=0.25,
       random_state=42, shuffle=True, tol=None, validation_fraction=0.1,
       verbose=0, warm_start=False)

In [262]:
y_pred_35_train = lr_35.predict(X_train)
y_pred_35_train[0:20]

array([  -608383.3058113 ,    875481.63929512,  -6346612.83077702,
        -5434884.46582697,    673826.04954187, -36415786.17668787,
       -12226405.59913513, -12869571.31965439,   2043946.18841568,
          161760.62982556,  -1951884.34032485, -21230348.07144286,
          591473.95142152,    932868.08739853,  -4644518.28499472,
         -942935.96261709,    355236.19221262, -20639070.79268571,
          -89478.21785217,   3456576.50931624])

In [263]:
print("Training Set R^2: {}".format(lr_35.score(X_train, y_train)))
rmse_35_train = np.sqrt(mean_squared_error(y_train, y_pred_35_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_35_train))

Training Set R^2: -6432791181541.995
Training Set Root Mean Squared Error: 70773728.11096293


In [264]:
y_pred_35_test = lr_35.predict(X_test)
y_pred_35_test[0:20]

array([  2413616.54167402, -26480828.17705314,    792324.18238239,
         -868857.53822511,   1401329.0680718 ,  -9457748.73504529,
       -27586974.15387113,  -4202270.87960976, -34012186.77152277,
         1066509.95138837,  -9489207.55243387,    773206.67127287,
          812313.24348802,  -7982768.68395404,  -9403223.61156393,
          178539.83465923,   -687972.04016712,  -1553240.89147694,
        -5158504.54672142, -12484976.88258232])

In [265]:
print("Test Set R^2: {}".format(lr_35.score(X_test, y_test)))
rmse_35_test = np.sqrt(mean_squared_error(y_test, y_pred_35_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_35_test))

Test Set R^2: -6733688972856.393
Test Set Root Mean Squared Error: 70122283.2209382


In [266]:
filename = 'cancer_lr_35.sav'
pickle.dump(lr_35, open(filename, 'wb'))

In [267]:
lr_36 = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=0.01, random_state=42)

In [268]:
lr_36.fit(X_train, y_train)



SGDRegressor(alpha=0.01, average=False, early_stopping=False, epsilon=0.1,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='invscaling', loss='huber', max_iter=None,
       n_iter=None, n_iter_no_change=5, penalty='l2', power_t=0.25,
       random_state=42, shuffle=True, tol=None, validation_fraction=0.1,
       verbose=0, warm_start=False)

In [269]:
y_pred_36_train = lr_36.predict(X_train)
y_pred_36_train[0:20]

array([ -790604.04390036, -2467135.86701966,  5219561.94498232,
        1538691.72585423, -2122298.43338145, 36610282.06556002,
       11056983.57062576, 10120457.74576723, -3694767.77364682,
       -1652637.37385329,   240911.65416618, 20172267.66438334,
       -1508334.70116394, -2410716.86687316,  3582490.73631701,
       -3940752.44145362, -1810949.09492217, 19464033.64177566,
       -1978077.7463449 , -5245448.17169849])

In [270]:
print("Training Set R^2: {}".format(lr_36.score(X_train, y_train)))
rmse_36_train = np.sqrt(mean_squared_error(y_train, y_pred_36_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_36_train))

Training Set R^2: -6839058016155.265
Training Set Root Mean Squared Error: 72974392.85980293


In [271]:
y_pred_36_test = lr_36.predict(X_test)
y_pred_36_test[0:20]

array([ -4027109.99230977,  26048184.32885276,  -2532995.36645663,
         -750998.54443155,  -2950700.7548731 ,   7933149.13161564,
        27026269.17644726,   2755067.55672521,  33828604.65072083,
        -2727491.83220564,   8212226.78938858,  -1737301.65458654,
        -2400945.66979162,   5981461.78887547,   9312492.22132008,
       -13459645.61567714,   -760863.75691727,   -402319.46584395,
         3803315.74107964,   7598428.81657802])

In [272]:
print("Test Set R^2: {}".format(lr_36.score(X_test, y_test)))
rmse_36_test = np.sqrt(mean_squared_error(y_test, y_pred_36_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_36_test))

Test Set R^2: -7071077767497.158
Test Set Root Mean Squared Error: 71857537.30931292


In [273]:
filename = 'cancer_lr_36.sav'
pickle.dump(lr_36, open(filename, 'wb'))

In [274]:
lr_37 = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=0.1, random_state=42)

In [275]:
lr_37.fit(X_train, y_train)



SGDRegressor(alpha=0.1, average=False, early_stopping=False, epsilon=0.1,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='invscaling', loss='huber', max_iter=None,
       n_iter=None, n_iter_no_change=5, penalty='l2', power_t=0.25,
       random_state=42, shuffle=True, tol=None, validation_fraction=0.1,
       verbose=0, warm_start=False)

In [276]:
y_pred_37_train = lr_37.predict(X_train)
y_pred_37_train[0:20]

array([   325811.50818609,   1779628.68859365,  -4934020.07520218,
        -1954843.86170342,   1485148.17075529, -32455617.09631767,
       -10360951.07248366,  -9403741.88574593,   2838607.41599816,
          986442.96727707,   -690327.08263192, -18225986.16653354,
         1018945.58560635,   1663662.07008684,  -3522755.13760985,
         2746986.28728915,   1223409.84866349, -17670126.41631797,
         1197803.04020342,   4173991.38672846])

In [277]:
print("Training Set R^2: {}".format(lr_37.score(X_train, y_train)))
rmse_37_train = np.sqrt(mean_squared_error(y_train, y_pred_37_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_37_train))

Training Set R^2: -5333291813572.22
Training Set Root Mean Squared Error: 64442149.77001455


In [278]:
y_pred_37_test = lr_37.predict(X_test)
y_pred_37_test[0:20]

array([  3243492.96175158, -23233813.35137711,   1830132.53638953,
          280645.92527092,   2281276.89927215,  -7718923.71639744,
       -24200096.9620602 ,  -2704570.93714281, -30067598.81514797,
         1995520.35353378,  -7653289.62872056,   1254606.670776  ,
         1773957.56165858,  -5654225.20485169,  -8388892.70543871,
        10018967.72408115,    199648.2681415 ,    -86774.24672043,
        -3726218.94046637,  -7503776.80206263])

In [279]:
print("Test Set R^2: {}".format(lr_37.score(X_test, y_test)))
rmse_37_test = np.sqrt(mean_squared_error(y_test, y_pred_37_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_37_test))

Test Set R^2: -5480822480914.056
Test Set Root Mean Squared Error: 63263379.81574492


In [280]:
filename = 'cancer_lr_37.sav'
pickle.dump(lr_37, open(filename, 'wb'))

In [281]:
lr_38 = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=1, random_state=42)

In [282]:
lr_38.fit(X_train, y_train)



SGDRegressor(alpha=1, average=False, early_stopping=False, epsilon=0.1,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='invscaling', loss='huber', max_iter=None,
       n_iter=None, n_iter_no_change=5, penalty='l2', power_t=0.25,
       random_state=42, shuffle=True, tol=None, validation_fraction=0.1,
       verbose=0, warm_start=False)

In [283]:
y_pred_38_train = lr_38.predict(X_train)
y_pred_38_train[0:20]

array([ -462122.80288141, -2082046.9749741 ,  5446188.82021954,
        2070972.11831799, -1790504.07799684, 36142367.017613  ,
       11292131.01775451, 10553036.34401591, -3276928.72467295,
       -1245404.68959497,   552023.42626119, 20446482.74469508,
       -1240290.65441515, -2058011.99915636,  3885220.63484084,
       -3206682.33039036, -1478951.11818386, 19687156.56961341,
       -1403012.43562216, -4798502.18067251])

In [284]:
print("Training Set R^2: {}".format(lr_38.score(X_train, y_train)))
rmse_38_train = np.sqrt(mean_squared_error(y_train, y_pred_38_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_38_train))

Training Set R^2: -6628368816166.835
Training Set Root Mean Squared Error: 71841547.58373539


In [285]:
y_pred_38_test = lr_38.predict(X_test)
y_pred_38_test[0:20]

array([-3.77014031e+06,  2.59808668e+07, -2.13042946e+06, -3.92242386e+05,
       -2.65895184e+06,  3.93615405e+06,  2.70262985e+07,  3.01146000e+06,
        3.37231018e+07, -2.47767014e+06,  8.55919179e+06, -1.49449546e+06,
       -2.08958631e+06,  6.24393547e+06,  9.32276500e+06, -1.15507526e+07,
       -4.37321208e+05,  5.40832793e+02,  4.11950979e+06,  8.13794631e+06])

In [286]:
print("Test Set R^2: {}".format(lr_38.score(X_test, y_test)))
rmse_38_test = np.sqrt(mean_squared_error(y_test, y_pred_38_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_38_test))

Test Set R^2: -6836227879532.301
Test Set Root Mean Squared Error: 70654168.18493927


In [287]:
filename = 'cancer_lr_38.sav'
pickle.dump(lr_38, open(filename, 'wb'))

In [288]:
lr_39 = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=10, random_state=42)

In [289]:
lr_39.fit(X_train, y_train)



SGDRegressor(alpha=10, average=False, early_stopping=False, epsilon=0.1,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='invscaling', loss='huber', max_iter=None,
       n_iter=None, n_iter_no_change=5, penalty='l2', power_t=0.25,
       random_state=42, shuffle=True, tol=None, validation_fraction=0.1,
       verbose=0, warm_start=False)

In [290]:
y_pred_39_train = lr_39.predict(X_train)
y_pred_39_train[0:20]

array([-1.09033598e+06, -2.44368713e+06,  3.23403431e+06,  2.38454232e+04,
       -2.09241475e+06,  2.59535117e+07,  7.47737479e+06,  6.31053301e+06,
       -3.34178540e+06, -1.67410395e+06, -4.35044343e+05,  1.37478826e+07,
       -1.67430483e+06, -2.31164538e+06,  1.71278011e+06, -3.91877241e+06,
       -1.90810251e+06,  1.32307181e+07, -2.41503764e+06, -4.53032652e+06])

In [291]:
print("Training Set R^2: {}".format(lr_39.score(X_train, y_train)))
rmse_39_train = np.sqrt(mean_squared_error(y_train, y_pred_39_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_39_train))

Training Set R^2: -3633573883148.8896
Training Set Root Mean Squared Error: 53191145.61214076


In [292]:
y_pred_39_test = lr_39.predict(X_test)
y_pred_39_test[0:20]

array([ -3659947.84445833,  18251269.90359093,  -2661658.78548128,
        -1297010.04116495,  -2800204.60834727,   5734328.74516563,
        18906589.41029056,   1229324.21104274,  23811581.81426162,
        -2688853.46571323,   5278401.09292021,  -1734842.74704664,
        -2635238.14482169,   3345150.3904242 ,   6320579.85095956,
       -11997940.44562588,  -1172248.68395629,   -940315.5663826 ,
         2133779.27752749,   4572511.15769191])

In [293]:
print("Test Set R^2: {}".format(lr_39.score(X_test, y_test)))
rmse_39_test = np.sqrt(mean_squared_error(y_test, y_pred_39_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_39_test))

Test Set R^2: -3736845002167.366
Test Set Root Mean Squared Error: 52237461.54666501


In [294]:
filename = 'cancer_lr_39.sav'
pickle.dump(lr_39, open(filename, 'wb'))

In [295]:
lr_40 = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=100, random_state=42)

In [296]:
lr_40.fit(X_train, y_train)



SGDRegressor(alpha=100, average=False, early_stopping=False, epsilon=0.1,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='invscaling', loss='huber', max_iter=None,
       n_iter=None, n_iter_no_change=5, penalty='l2', power_t=0.25,
       random_state=42, shuffle=True, tol=None, validation_fraction=0.1,
       verbose=0, warm_start=False)

In [297]:
y_pred_40_train = lr_40.predict(X_train)
y_pred_40_train[0:20]

array([ 197451.0798399 ,  289543.41309973,   59764.31109748,
        263668.83598944,  246916.14043229, -709985.62849083,
        -41006.41659055,   60247.47139123,  327676.13249919,
        214220.68056249,  198570.77768967, -202393.64231992,
        251966.64861662,  259545.68864973,  213414.06577922,
        386921.71762138,  254735.11728043, -145762.57389269,
        397005.85272514,  391855.03786495])

In [298]:
print("Training Set R^2: {}".format(lr_40.score(X_train, y_train)))
rmse_40_train = np.sqrt(mean_squared_error(y_train, y_pred_40_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_40_train))

Training Set R^2: -4870662025.42258
Training Set Root Mean Squared Error: 1947450.1247597954


In [299]:
y_pred_40_test = lr_40.predict(X_test)
y_pred_40_test[0:20]

array([ 337586.33222327, -438329.33629538,  343984.33312313,
        272475.8890534 ,  290926.63794862,  734254.5209055 ,
       -420065.05270218,  178315.89649703, -621566.32673866,
        338203.75071579,   28983.27447518,  210788.89128367,
        338506.19778256,  137041.23043524,  -73995.68049381,
        911275.19617899,  237794.63371455,  221553.75434919,
        118232.24727224,   76035.72294609])

In [300]:
print("Test Set R^2: {}".format(lr_40.score(X_test, y_test)))
rmse_40_test = np.sqrt(mean_squared_error(y_test, y_pred_40_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_40_test))

Test Set R^2: -4765091605.647747
Test Set Root Mean Squared Error: 1865370.6933154766


In [301]:
filename = 'cancer_lr_40.sav'
pickle.dump(lr_40, open(filename, 'wb'))

In [302]:
lr_41 = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=0.00001, random_state=42, 
                                  learning_rate='constant')

In [303]:
lr_41.fit(X_train, y_train)



SGDRegressor(alpha=1e-05, average=False, early_stopping=False, epsilon=0.1,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='constant', loss='huber', max_iter=None, n_iter=None,
       n_iter_no_change=5, penalty='l2', power_t=0.25, random_state=42,
       shuffle=True, tol=None, validation_fraction=0.1, verbose=0,
       warm_start=False)

In [304]:
y_pred_41_train = lr_41.predict(X_train)
y_pred_41_train[0:20]

array([-3.31697037e+06, -2.05106055e+07,  5.84398473e+07,  3.63847380e+07,
       -1.68415326e+07,  3.79377576e+08,  1.17217431e+08,  1.18055530e+08,
       -3.30393893e+07, -1.15916071e+07,  1.07800036e+07,  2.12730267e+08,
       -1.49080132e+07, -1.95182708e+07,  3.73098652e+07, -1.33442563e+07,
       -1.37771825e+07,  2.05272891e+08, -1.53510977e+07, -4.90257664e+07])

In [305]:
print("Training Set R^2: {}".format(lr_41.score(X_train, y_train)))
rmse_41_train = np.sqrt(mean_squared_error(y_train, y_pred_41_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_41_train))

Training Set R^2: -731990046485499.1
Training Set Root Mean Squared Error: 754961724.8083209


In [306]:
y_pred_41_test = lr_41.predict(X_test)
y_pred_41_test[0:20]

array([-3.81785996e+07,  2.72197848e+08, -2.13759580e+07, -3.31472724e+06,
       -2.62818972e+07,  6.67619362e+07,  2.82615009e+08,  3.18370110e+07,
        3.52292538e+08, -2.37917358e+07,  8.97133891e+07, -1.63971947e+07,
       -2.13457092e+07,  6.98678789e+07,  9.46542446e+07, -5.81785759e+07,
       -2.58438333e+06,  4.16181007e+06,  4.39824324e+07,  1.09972006e+08])

In [307]:
print("Test Set R^2: {}".format(lr_41.score(X_test, y_test)))
rmse_41_test = np.sqrt(mean_squared_error(y_test, y_pred_41_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_41_test))

Test Set R^2: -758480624721869.8
Test Set Root Mean Squared Error: 744220738.4000801


In [308]:
filename = 'cancer_lr_41.sav'
pickle.dump(lr_41, open(filename, 'wb'))

In [309]:
lr_42 = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=0.00001, random_state=42, 
                                  learning_rate='optimal')

In [310]:
lr_42.fit(X_train, y_train)



SGDRegressor(alpha=1e-05, average=False, early_stopping=False, epsilon=0.1,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='optimal', loss='huber', max_iter=None, n_iter=None,
       n_iter_no_change=5, penalty='l2', power_t=0.25, random_state=42,
       shuffle=True, tol=None, validation_fraction=0.1, verbose=0,
       warm_start=False)

In [311]:
y_pred_42_train = lr_42.predict(X_train)
y_pred_42_train[0:20]

array([-1.86342815e+09, -1.15187023e+10,  3.28175869e+10,  2.04313594e+10,
       -9.45842407e+09,  2.13048351e+11,  6.58257644e+10,  6.62958857e+10,
       -1.85545127e+10, -6.51024165e+09,  6.05295691e+09,  1.19463490e+11,
       -8.37233932e+09, -1.09615858e+10,  2.09522248e+10, -7.49530131e+09,
       -7.73749826e+09,  1.15276098e+11, -8.62101316e+09, -2.75322139e+10])

In [312]:
print("Training Set R^2: {}".format(lr_42.score(X_train, y_train)))
rmse_42_train = np.sqrt(mean_squared_error(y_train, y_pred_42_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_42_train))

Training Set R^2: -2.3084377965004037e+20
Training Set Root Mean Squared Error: 423966395351.9561


In [313]:
y_pred_42_test = lr_42.predict(X_test)
y_pred_42_test[0:20]

array([-2.14411588e+10,  1.52858557e+11, -1.20045194e+10, -1.86202049e+09,
       -1.47599807e+10,  3.75009728e+10,  1.58709030e+11,  1.78780294e+10,
        1.97837486e+11, -1.33610662e+10,  5.03803641e+10, -9.20878857e+09,
       -1.19872971e+10,  3.92355237e+10,  5.31547408e+10, -3.26742541e+10,
       -1.45193272e+09,  2.33641569e+09,  2.46987800e+10,  6.17559014e+10])

In [314]:
print("Test Set R^2: {}".format(lr_42.score(X_test, y_test)))
rmse_42_test = np.sqrt(mean_squared_error(y_test, y_pred_42_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_42_test))

Test Set R^2: -2.3919777835096703e+20
Test Set Root Mean Squared Error: 417934374835.78534


In [315]:
filename = 'cancer_lr_42.sav'
pickle.dump(lr_42, open(filename, 'wb'))

In [316]:
lr_43 = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=0.00001, epsilon=0.01, random_state=42)

In [317]:
lr_43.fit(X_train, y_train)



SGDRegressor(alpha=1e-05, average=False, early_stopping=False, epsilon=0.01,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='invscaling', loss='huber', max_iter=None,
       n_iter=None, n_iter_no_change=5, penalty='l2', power_t=0.25,
       random_state=42, shuffle=True, tol=None, validation_fraction=0.1,
       verbose=0, warm_start=False)

In [318]:
y_pred_43_train = lr_43.predict(X_train)
y_pred_43_train[0:20]

array([ -34382.04655817, -205424.89617664,  585988.88711715,
        277136.80697449, -168202.48405109, 3832480.77825172,
       1172602.97409338, 1141476.6322998 , -329583.46959199,
       -116583.14272361,   90119.6724764 , 2156059.93026516,
       -126457.49447655, -191560.88721159,  422408.59671321,
       -270090.56972139, -140933.30083337, 2086177.55736788,
       -137072.16317213, -486219.66149627])

In [319]:
print("Training Set R^2: {}".format(lr_43.score(X_train, y_train)))
rmse_43_train = np.sqrt(mean_squared_error(y_train, y_pred_43_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_43_train))

Training Set R^2: -73349618718.54965
Training Set Root Mean Squared Error: 7557380.292691661


In [320]:
y_pred_43_test = lr_43.predict(X_test)
y_pred_43_test[0:20]

array([ -381745.23648431,  2741086.20853161,  -212624.52864064,
         -29026.64736905,  -265394.66502298,   605310.16877392,
        2856313.36874192,   330310.15022764,  3550385.05015135,
        -241035.86905355,   912731.8457181 ,  -151807.66819232,
        -185626.1202967 ,   704133.70459787,   978555.24456043,
       -1015159.64743894,   -20863.36618534,    22869.44470161,
         445857.88760869,   941576.50332128])

In [321]:
print("Test Set R^2: {}".format(lr_43.score(X_test, y_test)))
rmse_43_test = np.sqrt(mean_squared_error(y_test, y_pred_43_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_43_test))

Test Set R^2: -75816723696.58592
Test Set Root Mean Squared Error: 7440669.745803612


In [322]:
filename = 'cancer_lr_43.sav'
pickle.dump(lr_43, open(filename, 'wb'))

In [323]:
lr_44 = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=0.0001, epsilon=0.01, random_state=42)

In [324]:
lr_44.fit(X_train, y_train)



SGDRegressor(alpha=0.0001, average=False, early_stopping=False, epsilon=0.01,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='invscaling', loss='huber', max_iter=None,
       n_iter=None, n_iter_no_change=5, penalty='l2', power_t=0.25,
       random_state=42, shuffle=True, tol=None, validation_fraction=0.1,
       verbose=0, warm_start=False)

In [325]:
y_pred_44_train = lr_44.predict(X_train)
y_pred_44_train[0:20]

array([ -61867.13709435, -224429.11322135,  542654.98079786,
        199237.49922451, -189590.687595  , 3668073.43206903,
       1072898.80329179, 1053242.25017695, -343132.15000326,
       -144866.55521071,   55062.99240714, 2053255.74883574,
       -137794.29941788, -213423.92254053,  386866.12322991,
       -341621.39572993, -160098.87698618, 1981822.81739407,
       -163706.86472661, -498593.42341759])

In [326]:
print("Training Set R^2: {}".format(lr_44.score(X_train, y_train)))
rmse_44_train = np.sqrt(mean_squared_error(y_train, y_pred_44_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_44_train))

Training Set R^2: -67729837014.42692
Training Set Root Mean Squared Error: 7262102.287410041


In [327]:
y_pred_44_test = lr_44.predict(X_test)
y_pred_44_test[0:20]

array([ -400653.71115004,  2618086.49967817,  -234297.01067444,
         -56365.96358197,  -284529.1779247 ,   398968.18078221,
        2727144.42583286,   291379.29635886,  3400819.98853161,
        -264606.88681443,   854215.00050637,  -166156.25898326,
        -207710.67750172,   640852.51906529,   935015.16805137,
       -1193803.49517595,   -47242.99441442,   -14817.29125785,
         401853.64765352,   818034.08896064])

In [328]:
print("Test Set R^2: {}".format(lr_44.score(X_test, y_test)))
rmse_44_test = np.sqrt(mean_squared_error(y_test, y_pred_44_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_44_test))

Test Set R^2: -70101578353.15921
Test Set Root Mean Squared Error: 7154732.818449184


In [329]:
filename = 'cancer_lr_44.sav'
pickle.dump(lr_44, open(filename, 'wb'))

In [330]:
lr_45 = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=0.001, epsilon=0.01, random_state=42)

In [331]:
lr_45.fit(X_train, y_train)



SGDRegressor(alpha=0.001, average=False, early_stopping=False, epsilon=0.01,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='invscaling', loss='huber', max_iter=None,
       n_iter=None, n_iter_no_change=5, penalty='l2', power_t=0.25,
       random_state=42, shuffle=True, tol=None, validation_fraction=0.1,
       verbose=0, warm_start=False)

In [332]:
y_pred_45_train = lr_45.predict(X_train)
y_pred_45_train[0:20]

array([  -60838.33058113,    87548.16392951,  -634661.2830777 ,
        -543488.4465827 ,    67382.60495419, -3641578.61766876,
       -1222640.5599135 , -1286957.13196543,   204394.61884157,
          16176.06298256,  -195188.43403248, -2123034.80714427,
          59147.39514215,    93286.80873985,  -464451.82849947,
         -94293.59626172,    35523.61922126, -2063907.07926855,
          -8947.82178522,   345657.65093162])

In [333]:
print("Training Set R^2: {}".format(lr_45.score(X_train, y_train)))
rmse_45_train = np.sqrt(mean_squared_error(y_train, y_pred_45_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_45_train))

Training Set R^2: -64328580429.61135
Training Set Root Mean Squared Error: 7077409.591604758


In [334]:
y_pred_45_test = lr_45.predict(X_test)
y_pred_45_test[0:20]

array([  241361.6541674 , -2648082.81770529,    79232.41823824,
         -86885.75382251,   140132.90680718,  -945774.87350452,
       -2758697.41538709,  -420227.08796097, -3401218.67715225,
         106650.99513884,  -948920.75524338,    77320.66712729,
          81231.3243488 ,  -798276.8683954 ,  -940322.36115638,
          17853.9834659 ,   -68797.20401671,  -155324.08914769,
        -515850.45467214, -1248497.68825823])

In [335]:
print("Test Set R^2: {}".format(lr_45.score(X_test, y_test)))
rmse_45_test = np.sqrt(mean_squared_error(y_test, y_pred_45_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_45_test))

Test Set R^2: -67337724013.75197
Test Set Root Mean Squared Error: 7012271.761782448


In [336]:
filename = 'cancer_lr_45.sav'
pickle.dump(lr_45, open(filename, 'wb'))

In [337]:
lr_46 = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=0.01, epsilon=0.01, random_state=42)

In [338]:
lr_46.fit(X_train, y_train)



SGDRegressor(alpha=0.01, average=False, early_stopping=False, epsilon=0.01,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='invscaling', loss='huber', max_iter=None,
       n_iter=None, n_iter_no_change=5, penalty='l2', power_t=0.25,
       random_state=42, shuffle=True, tol=None, validation_fraction=0.1,
       verbose=0, warm_start=False)

In [339]:
y_pred_46_train = lr_46.predict(X_train)
y_pred_46_train[0:20]

array([ -46578.70811167, -191268.80840801,  504255.55284547,
        192443.04062599, -162497.51697787, 3347138.7155044 ,
       1107602.75390544,  961347.01317271, -299576.86944579,
       -127685.56056634,   59733.40532754, 1865800.80111852,
       -108670.88911294, -184811.78849811,  357875.85774278,
       -301468.98907755, -131310.3122709 , 1809901.75970604,
       -135711.09465626, -439304.76420669])

In [340]:
print("Training Set R^2: {}".format(lr_46.score(X_train, y_train)))
rmse_46_train = np.sqrt(mean_squared_error(y_train, y_pred_46_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_46_train))

Training Set R^2: -57199754726.245346
Training Set Root Mean Squared Error: 6673742.252664704


In [341]:
y_pred_46_test = lr_46.predict(X_test)
y_pred_46_test[0:20]

array([ -332747.43953063,  2389203.52123728,  -196190.39480774,
         -34661.77109009,  -237645.7042936 ,   867954.47280599,
        2484426.27509549,   279310.92537116,  3093056.97531809,
        -211545.10144384,   775725.61278396,  -136551.88670887,
        -185944.18567777,   580870.7188665 ,   866986.76559799,
       -1093701.16530743,   -31307.52417787,    -4986.17795117,
         372620.08569729,   754104.60363214])

In [342]:
print("Test Set R^2: {}".format(lr_46.score(X_test, y_test)))
rmse_46_test = np.sqrt(mean_squared_error(y_test, y_pred_46_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_46_test))

Test Set R^2: -58585604217.45802
Test Set Root Mean Squared Error: 6540711.279621655


In [343]:
filename = 'cancer_lr_46.sav'
pickle.dump(lr_46, open(filename, 'wb'))

In [344]:
lr_47 = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=0.1, epsilon=0.01, random_state=42)

In [345]:
lr_47.fit(X_train, y_train)



SGDRegressor(alpha=0.1, average=False, early_stopping=False, epsilon=0.01,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='invscaling', loss='huber', max_iter=None,
       n_iter=None, n_iter_no_change=5, penalty='l2', power_t=0.25,
       random_state=42, shuffle=True, tol=None, validation_fraction=0.1,
       verbose=0, warm_start=False)

In [346]:
y_pred_47_train = lr_47.predict(X_train)
y_pred_47_train[0:20]

array([-3.60271933e+03,  1.44922197e+05, -6.05422455e+05, -3.03270326e+05,
        1.20982150e+05, -3.66256528e+06, -1.25825584e+06, -1.14753631e+06,
        2.61541524e+05,  7.70498168e+04, -1.26338633e+05, -2.11086487e+06,
        6.19837383e+04,  1.49330093e+05, -4.58482619e+05,  2.23828108e+05,
        8.13824678e+04, -2.03772731e+06,  5.98611734e+04,  4.01932726e+05])

In [347]:
print("Training Set R^2: {}".format(lr_47.score(X_train, y_train)))
rmse_47_train = np.sqrt(mean_squared_error(y_train, y_pred_47_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_47_train))

Training Set R^2: -66905907506.31408
Training Set Root Mean Squared Error: 7217795.603136425


In [348]:
y_pred_47_test = lr_47.predict(X_test)
y_pred_47_test[0:20]

array([  301824.8902023 , -2654175.85018881,   136280.42086161,
         -29030.3305346 ,   200429.9545111 ,  -585427.83120059,
       -2756050.28864333,  -369975.05643515, -3432183.82790071,
         169415.48358226,  -907938.8577488 ,   101976.65543134,
         139854.8586354 ,  -701793.06802378,  -986146.70847804,
         970759.66670324,   -21786.22455249,   -58946.43149535,
        -467797.35573066,  -909912.82096553])

In [349]:
print("Test Set R^2: {}".format(lr_47.score(X_test, y_test)))
rmse_47_test = np.sqrt(mean_squared_error(y_test, y_pred_47_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_47_test))

Test Set R^2: -68656441100.695
Test Set Root Mean Squared Error: 7080601.71525738


In [350]:
filename = 'cancer_lr_47.sav'
pickle.dump(lr_47, open(filename, 'wb'))

In [351]:
lr_48 = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=0.001, epsilon=0.5, random_state=42)

In [352]:
lr_48.fit(X_train, y_train)



SGDRegressor(alpha=0.001, average=False, early_stopping=False, epsilon=0.5,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='invscaling', loss='huber', max_iter=None,
       n_iter=None, n_iter_no_change=5, penalty='l2', power_t=0.25,
       random_state=42, shuffle=True, tol=None, validation_fraction=0.1,
       verbose=0, warm_start=False)

In [353]:
y_pred_48_train = lr_48.predict(X_train)
y_pred_48_train[0:20]

array([-3.04191653e+06,  4.37740820e+06, -3.17330642e+07, -2.71744223e+07,
        3.36913025e+06, -1.82078931e+08, -6.11320280e+07, -6.43478566e+07,
        1.02197309e+07,  8.08803149e+05, -9.75942170e+06, -1.06151740e+08,
        2.95736976e+06,  4.66434044e+06, -2.32225914e+07, -4.71467981e+06,
        1.77618096e+06, -1.03195354e+08, -4.47391089e+05,  1.72828825e+07])

In [354]:
print("Training Set R^2: {}".format(lr_48.score(X_train, y_train)))
rmse_48_train = np.sqrt(mean_squared_error(y_train, y_pred_48_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_48_train))

Training Set R^2: -160819630965698.06
Training Set Root Mean Squared Error: 353868477.0945305


In [355]:
y_pred_48_test = lr_48.predict(X_test)
y_pred_48_test[0:20]

array([ 1.20680827e+07, -1.32404141e+08,  3.96162091e+06, -4.34428769e+06,
        7.00664534e+06, -4.72887437e+07, -1.37934871e+08, -2.10113544e+07,
       -1.70060934e+08,  5.33254976e+06, -4.74460378e+07,  3.86603336e+06,
        4.06156622e+06, -3.99138434e+07, -4.70161181e+07,  8.92699173e+05,
       -3.43986020e+06, -7.76620446e+06, -2.57925227e+07, -6.24248844e+07])

In [356]:
print("Test Set R^2: {}".format(lr_48.score(X_test, y_test)))
rmse_48_test = np.sqrt(mean_squared_error(y_test, y_pred_48_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_48_test))

Test Set R^2: -168342038933085.8
Test Set Root Mean Squared Error: 350611223.0476924


In [357]:
filename = 'cancer_lr_48.sav'
pickle.dump(lr_48, open(filename, 'wb'))

## Kernel Ridge

In [358]:
lr_49 = KernelRidge(alpha=0.001)

In [359]:
lr_49.fit(X_train, y_train)



KernelRidge(alpha=0.001, coef0=1, degree=3, gamma=None, kernel='linear',
      kernel_params=None)

In [360]:
y_pred_49_train = lr_49.predict(X_train)
y_pred_49_train[0:20]

array([194.328125  , 187.11279297, 218.63476562, 140.09375   ,
       223.91455078, 195.19726562, 174.00390625, 166.44140625,
       203.45800781, 230.49853516, 192.21582031, 174.88867188,
       209.08056641, 188.92724609, 158.68847656, 189.15625   ,
       202.39404297, 158.49707031, 173.74365234, 172.62841797])

In [361]:
print("Training Set R^2: {}".format(lr_49.score(X_train, y_train)))
rmse_49_train = np.sqrt(mean_squared_error(y_train, y_pred_49_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_49_train))

Training Set R^2: 0.6291103084866185
Training Set Root Mean Squared Error: 16.993965326455413


In [362]:
y_pred_49_test = lr_49.predict(X_test)
y_pred_49_test[0:20]

array([180.71337891, 167.59277344, 160.66894531, 169.24316406,
       182.73681641, 195.61767578, 174.82421875, 162.75244141,
       175.49511719, 172.58837891, 175.39160156, 214.3972168 ,
       158.22802734, 158.34765625, 221.13232422, 115.6875    ,
       194.80957031, 211.68847656, 212.22851562, 181.5       ])

In [363]:
print("Test Set R^2: {}".format(lr_49.score(X_test, y_test)))
rmse_49_test = np.sqrt(mean_squared_error(y_test, y_pred_49_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_49_test))

Test Set R^2: 0.603181107989073
Test Set Root Mean Squared Error: 17.022592089033253


In [364]:
filename = 'cancer_lr_49.sav'
pickle.dump(lr_49, open(filename, 'wb'))

In [365]:
lr_50 = KernelRidge(alpha=0.01)

In [366]:
lr_50.fit(X_train, y_train)



KernelRidge(alpha=0.01, coef0=1, degree=3, gamma=None, kernel='linear',
      kernel_params=None)

In [367]:
y_pred_50_train = lr_50.predict(X_train)
y_pred_50_train[0:20]

array([191.40185547, 187.95019531, 222.46899414, 136.38671875,
       227.4375    , 194.4140625 , 173.64819336, 173.65722656,
       194.81347656, 225.89208984, 202.29101562, 172.99316406,
       202.61523438, 187.43847656, 157.39916992, 184.81445312,
       199.26904297, 150.95019531, 173.95458984, 179.80566406])

In [368]:
print("Training Set R^2: {}".format(lr_50.score(X_train, y_train)))
rmse_50_train = np.sqrt(mean_squared_error(y_train, y_pred_50_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_50_train))

Training Set R^2: 0.630038350739973
Training Set Root Mean Squared Error: 16.972690812438007


In [369]:
y_pred_50_test = lr_50.predict(X_test)
y_pred_50_test[0:20]

array([176.32470703, 175.87402344, 159.48779297, 174.99707031,
       180.28564453, 197.8203125 , 173.80566406, 158.22167969,
       176.15332031, 168.86181641, 178.43798828, 206.50048828,
       158.88818359, 155.96289062, 224.34423828, 113.25      ,
       188.57177734, 210.54980469, 202.35888672, 184.76953125])

In [370]:
print("Test Set R^2: {}".format(lr_50.score(X_test, y_test)))
rmse_50_test = np.sqrt(mean_squared_error(y_test, y_pred_50_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_50_test))

Test Set R^2: 0.6181338599567342
Test Set Root Mean Squared Error: 16.69879364248533


In [371]:
filename = 'cancer_lr_50.sav'
pickle.dump(lr_50, open(filename, 'wb'))

In [372]:
lr_51 = KernelRidge(alpha=0.1)

In [373]:
lr_51.fit(X_train, y_train)



KernelRidge(alpha=0.1, coef0=1, degree=3, gamma=None, kernel='linear',
      kernel_params=None)

In [374]:
y_pred_51_train = lr_51.predict(X_train)
y_pred_51_train[0:20]

array([201.875    , 198.671875 , 201.421875 , 145.5      , 242.375    ,
       177.09375  , 167.3125   , 191.78125  , 194.578125 , 215.59375  ,
       180.03125  , 172.875    , 195.59375  , 196.734375 , 154.9609375,
       198.6875   , 218.625    , 138.0625   , 170.125    , 190.75     ])

In [375]:
print("Training Set R^2: {}".format(lr_51.score(X_train, y_train)))
rmse_51_train = np.sqrt(mean_squared_error(y_train, y_pred_51_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_51_train))

Training Set R^2: 0.44655258186024793
Training Set Root Mean Squared Error: 20.759195998773926


In [376]:
y_pred_51_test = lr_51.predict(X_test)
y_pred_51_test[0:20]

array([163.0625   , 152.875    , 160.140625 , 175.34375  , 170.34375  ,
       209.5234375, 179.8125   , 159.40625  , 160.5      , 177.15625  ,
       167.609375 , 206.828125 , 163.765625 , 155.9375   , 221.4921875,
       143.25     , 177.15625  , 201.890625 , 231.28125  , 189.25     ])

In [377]:
print("Test Set R^2: {}".format(lr_51.score(X_test, y_test)))
rmse_51_test = np.sqrt(mean_squared_error(y_test, y_pred_51_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_51_test))

Test Set R^2: 0.37831458951180963
Test Set Root Mean Squared Error: 21.306634701441272


In [378]:
filename = 'cancer_lr_51.sav'
pickle.dump(lr_51, open(filename, 'wb'))

In [379]:
lr_52 = KernelRidge(alpha=1)

In [380]:
lr_52.fit(X_train, y_train)

  overwrite_a=False)


KernelRidge(alpha=1, coef0=1, degree=3, gamma=None, kernel='linear',
      kernel_params=None)

In [381]:
y_pred_52_train = lr_52.predict(X_train)
y_pred_52_train[0:20]

array([193.625     , 189.25488281, 217.14257812, 141.390625  ,
       224.11328125, 196.20898438, 175.10546875, 168.9375    ,
       197.72167969, 228.29394531, 197.48828125, 176.27734375,
       203.99853516, 190.0546875 , 153.36083984, 187.3515625 ,
       199.84375   , 158.39453125, 175.2578125 , 173.89453125])

In [382]:
print("Training Set R^2: {}".format(lr_52.score(X_train, y_train)))
rmse_52_train = np.sqrt(mean_squared_error(y_train, y_pred_52_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_52_train))

Training Set R^2: 0.6386811725115132
Training Set Root Mean Squared Error: 16.773266318033475


In [383]:
y_pred_52_test = lr_52.predict(X_test)
y_pred_52_test[0:20]

array([178.38867188, 174.3359375 , 162.31347656, 172.25097656,
       176.93554688, 194.81054688, 172.359375  , 164.22070312,
       174.30273438, 174.80761719, 176.32226562, 205.87939453,
       156.2109375 , 157.62890625, 221.03955078, 118.984375  ,
       190.14453125, 208.2734375 , 209.22460938, 181.390625  ])

In [384]:
print("Test Set R^2: {}".format(lr_52.score(X_test, y_test)))
rmse_52_test = np.sqrt(mean_squared_error(y_test, y_pred_52_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_52_test))

Test Set R^2: 0.6214959313334742
Test Set Root Mean Squared Error: 16.625120368740458


In [385]:
filename = 'cancer_lr_52.sav'
pickle.dump(lr_52, open(filename, 'wb'))

In [386]:
lr_53 = KernelRidge(alpha=10)

In [387]:
lr_53.fit(X_train, y_train)

KernelRidge(alpha=10, coef0=1, degree=3, gamma=None, kernel='linear',
      kernel_params=None)

In [388]:
y_pred_53_train = lr_53.predict(X_train)
y_pred_53_train[0:20]

array([194.03503418, 189.13989258, 216.0703125 , 142.99169922,
       222.52612305, 196.88623047, 175.72167969, 167.33496094,
       196.08496094, 224.34521484, 200.53588867, 176.9309082 ,
       204.23400879, 191.92492676, 150.90991211, 187.34570312,
       200.32922363, 157.98046875, 174.4083252 , 173.82983398])

In [389]:
print("Training Set R^2: {}".format(lr_53.score(X_train, y_train)))
rmse_53_train = np.sqrt(mean_squared_error(y_train, y_pred_53_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_53_train))

Training Set R^2: 0.6316864121353101
Training Set Root Mean Squared Error: 16.934844649507372


In [390]:
y_pred_53_test = lr_53.predict(X_test)
y_pred_53_test[0:20]

array([181.58728027, 174.14404297, 162.22131348, 172.22045898,
       174.53308105, 195.29263306, 174.00512695, 165.0201416 ,
       173.32641602, 175.67053223, 177.41589355, 204.43237305,
       155.89208984, 157.5625    , 221.60031128, 123.77734375,
       192.3671875 , 208.20361328, 208.70166016, 181.30078125])

In [391]:
print("Test Set R^2: {}".format(lr_53.score(X_test, y_test)))
rmse_53_test = np.sqrt(mean_squared_error(y_test, y_pred_53_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_53_test))

Test Set R^2: 0.6130379530239716
Test Set Root Mean Squared Error: 16.80984496849494


In [392]:
filename = 'cancer_lr_53.sav'
pickle.dump(lr_53, open(filename, 'wb'))

In [393]:
lr_54 = KernelRidge(alpha=100)

In [394]:
lr_54.fit(X_train, y_train)

KernelRidge(alpha=100, coef0=1, degree=3, gamma=None, kernel='linear',
      kernel_params=None)

In [395]:
y_pred_54_train = lr_54.predict(X_train)
y_pred_54_train[0:20]

array([194.27999878, 191.63922119, 213.5586853 , 144.55993652,
       219.12432861, 196.34516907, 177.9866333 , 164.39651489,
       192.6421051 , 218.21209717, 206.08609009, 177.22161865,
       204.41228867, 193.63691711, 150.74695969, 187.35882568,
       202.11616516, 156.47476196, 170.81845093, 175.20217896])

In [396]:
print("Training Set R^2: {}".format(lr_54.score(X_train, y_train)))
rmse_54_train = np.sqrt(mean_squared_error(y_train, y_pred_54_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_54_train))

Training Set R^2: 0.6208188127114724
Training Set Root Mean Squared Error: 17.182871309934015


In [397]:
y_pred_54_test = lr_54.predict(X_test)
y_pred_54_test[0:20]

array([183.81860352, 173.63378906, 162.44210815, 173.29011536,
       173.89500427, 196.84836197, 174.80953979, 167.34169006,
       172.41021729, 175.53666687, 181.84016418, 199.61958313,
       155.51068115, 157.33041382, 220.95731735, 126.10766602,
       193.97149658, 207.45852661, 204.79031372, 182.30212402])

In [398]:
print("Test Set R^2: {}".format(lr_54.score(X_test, y_test)))
rmse_54_test = np.sqrt(mean_squared_error(y_test, y_pred_54_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_54_test))

Test Set R^2: 0.6075548661582504
Test Set Root Mean Squared Error: 16.928520213234147


In [399]:
filename = 'cancer_lr_54.sav'
pickle.dump(lr_54, open(filename, 'wb'))

## Random Forest

In [400]:
rfr_1 = RandomForestRegressor(n_estimators=10, random_state=0)

In [401]:
rfr_1.fit(X_train, y_train)

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=None,
           oob_score=False, random_state=0, verbose=0, warm_start=False)

In [402]:
rfr_pred_1_train = rfr_1.predict(X_train)

In [403]:
rfr_1.score(X_train, y_train)

0.9154742926585733

In [404]:
print('Mean Absolute Error Train:', metrics.mean_absolute_error(y_train, rfr_pred_1_train))
print('Mean Squared Error Train:', metrics.mean_squared_error(y_train, rfr_pred_1_train))
print('Root Mean Squared Error Train:', np.sqrt(metrics.mean_squared_error(y_train, rfr_pred_1_train)))

Mean Absolute Error Train: 5.717845711940911
Mean Squared Error Train: 65.81630648338121
Root Mean Squared Error Train: 8.11272497274382


In [405]:
rfr_pred_1_test = rfr_1.predict(X_test)

In [406]:
rfr_1.score(X_test, y_test)

0.5165718288315223

In [407]:
print('Mean Absolute Error Test:', metrics.mean_absolute_error(y_test, rfr_pred_1_test))
print('Mean Squared Error Test:', metrics.mean_squared_error(y_test, rfr_pred_1_test))
print('Root Mean Squared Error Test:', np.sqrt(metrics.mean_squared_error(y_test, rfr_pred_1_test)))

Mean Absolute Error Test: 14.058147540983606
Mean Squared Error Test: 353.0132440983606
Root Mean Squared Error Test: 18.7886466808645


In [408]:
rfr_2 = RandomForestRegressor(n_estimators=100, random_state=0)

In [409]:
rfr_2.fit(X_train, y_train)

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=None,
           oob_score=False, random_state=0, verbose=0, warm_start=False)

In [410]:
rfr_pred_2_train = rfr_2.predict(X_train)

In [411]:
rfr_2.score(X_train, y_train)

0.9389100918635918

In [412]:
print('Mean Absolute Error Train:', metrics.mean_absolute_error(y_train, rfr_pred_2_train))
print('Mean Squared Error Train:', metrics.mean_squared_error(y_train, rfr_pred_2_train))
print('Root Mean Squared Error Train:', np.sqrt(metrics.mean_squared_error(y_train, rfr_pred_2_train)))

Mean Absolute Error Train: 5.02971768567911
Mean Squared Error Train: 47.56792038079601
Root Mean Squared Error Train: 6.896950078171946


In [413]:
rfr_pred_2_test = rfr_2.predict(X_test)

In [414]:
rfr_2.score(X_test, y_test)

0.567818289853375

In [415]:
print('Mean Absolute Error Test:', metrics.mean_absolute_error(y_test, rfr_pred_2_test))
print('Mean Squared Error Test:', metrics.mean_squared_error(y_test, rfr_pred_2_test))
print('Root Mean Squared Error Test:', np.sqrt(metrics.mean_squared_error(y_test, rfr_pred_2_test)))

Mean Absolute Error Test: 13.18401639344262
Mean Squared Error Test: 315.59159485901637
Root Mean Squared Error Test: 17.76489782855551


In [416]:
rfr_3 = RandomForestRegressor(n_estimators=1000, random_state=0)

In [417]:
rfr_3.fit(X_train, y_train)

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=1000, n_jobs=None,
           oob_score=False, random_state=0, verbose=0, warm_start=False)

In [418]:
rfr_pred_3_train = rfr_3.predict(X_train)

In [419]:
rfr_3.score(X_train, y_train)

0.9426262185276223

In [420]:
print('Mean Absolute Error Train:', metrics.mean_absolute_error(y_train, rfr_pred_3_train))
print('Mean Squared Error Train:', metrics.mean_squared_error(y_train, rfr_pred_3_train))
print('Root Mean Squared Error Train:', np.sqrt(metrics.mean_squared_error(y_train, rfr_pred_3_train)))

Mean Absolute Error Train: 4.890899917931911
Mean Squared Error Train: 44.67434233047628
Root Mean Squared Error Train: 6.683886768226724


In [421]:
rfr_pred_3_test = rfr_3.predict(X_test)

In [422]:
rfr_3.score(X_test, y_test)

0.5713508272227407

In [423]:
print('Mean Absolute Error Test:', metrics.mean_absolute_error(y_test, rfr_pred_3_test))
print('Mean Squared Error Test:', metrics.mean_squared_error(y_test, rfr_pred_3_test))
print('Root Mean Squared Error Test:', np.sqrt(metrics.mean_squared_error(y_test, rfr_pred_3_test)))

Mean Absolute Error Test: 13.116432295081951
Mean Squared Error Test: 313.01203381762264
Root Mean Squared Error Test: 17.692146105479196


## Best Performing Algorithm

The best performing algorithm is the unscaled, Ridge Regression algorithm using unscaled data and the automatic solver with an Alpha of 0.001. This algorithm has an accuracy of 0.6465 and a Root Mean Squared Error (RMSE) of 16.6 for the training set. The test set has an accuracy of 0.6408 and a Root Mean Squared Error (RMSE) of 16.2.