# Exploration of Socioeconomic Influences on Cancer Mortality:
# Machine Learning with Scaled Data

This notebook uses the cleaned DataFrame built off of the "cancer-reg.csv" dataset (https://data.world/exercises/linear-regression-exercise-1) to build a series of classification models designed to identify the most salient predictors of cancer mortality at the county level for the year 2015 by looking at the coefficients of the best performing regressor algorithm. The cleaned DataFrame contains a series of features native to the "cancer-reg.csv" dataset, but also contains a series of derived features (as detailed in the Data Cleaning notebook).

The target feature of the model is continuous, so regression is the focus in this report. Machine Learning regression models are carried out below using Ordinary Least Squares (OLS) Regression, Ridge Regression, LASSO, ElasticNet, Stochastic Gradient Descent (SGD) Regressor, Kernel Ridge Regression, and Random Forest algorithms to try and predict cancer mortality rates.

These models are created not only to predict cancer mortality, but to also identify the most salient predictors of cancer mortality by looking at the coefficients of the best performing regression algorithm. By identifying the most salient predictors of cancer mortality, policy makers can use this study as a resource in which to guide public health policy as a component of the fight against cancer. Although these salient predictors cannot be identified as a cause of cancer mortality, identifying predictive features can help in the understanding of factors that contribute to cancer mortality. Random Forest is also used as a way to nonlinearly predict cancer mortality, but because the Random Forest method does not produce coefficients, it is not used to identify the most salient predictors of cancer mortality. 

The best performing regression algorithm for the model is identified by evaluating the accuracy score and root mean squared error (RMSE) of a set of regression algorithms. Generally speaking, these regression algorithms are run on unscaled and scaled data, and utilizee different values for the regularization hyperparameter ‘alpha’ (for all algorithms except for simple OLS linear regression), the L1 ratio (for ElasticNet and SGD Regressor), the penalty (L1, L2, or ElasticNet for SGD Regressor), and the number of estimators (for Random Forest). The LASSO and ElasticNet algorithms use their internal normalization setting to scale the data, as they would not converge otherwise. The MinMax scaler is used for scaling data on the other algorithms in this notebook. 

The best performing regression algorithm in terms of accuracy score and RMSE is then identified. These regression algorithms’ accuracy and RMSE scores are stored in a hyperparameter tuning table, which is displayed below.

In [1]:
import csv
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn
from sklearn.preprocessing import MinMaxScaler
from sklearn import linear_model
from sklearn.linear_model import Ridge
from sklearn.kernel_ridge import KernelRidge
from sklearn.linear_model import Lasso
from sklearn.linear_model import ElasticNet
from sklearn.linear_model import SGDRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn import metrics
from sklearn.metrics import roc_curve
from sklearn.metrics import roc_auc_score
from sklearn.metrics import mean_squared_error
import pickle

In [2]:
df = pd.read_csv('cancer_ml7.csv', index_col=['Geography'])

In [3]:
df.head()

Unnamed: 0_level_0,TARGET_deathRate,avgAnnCount,incidenceRate,medIncome,popEst2015,povertyPercent,studyPerCap,MedianAge,MedianAgeMale,MedianAgeFemale,...,city_min_distsl1_sqrd,sc_min_dists_l1_log,PCT_LACCESS_CHILD10_sqrd,PCT_LACCESS_HHNV10_sqrd,PC_DIRSALES07_sqrd,FMRKT13_sqrd,PCH_FMRKT_09_13_sqrd,PCT_OBESE_ADULTS13_log,PCT_OBESE_ADULTS13_sqrd,CHILDPOVRATE10_log
Geography,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
"Abbeville County, South Carolina",183.7,143.0,430.9,35525,24932,21.4,0.0,43.3,40.7,44.9,...,5.827314,-0.674641,49.425404,36.601854,13.9129,4,0.0,3.456317,1004.89,3.280911
"Acadia Parish, Louisiana",230.5,323.0,492.7,40269,62577,22.0,0.0,35.7,34.7,37.2,...,47.212922,-1.386678,0.243122,3.229274,58.9824,0,0.0,3.499533,1095.61,3.387774
"Accomack County, Virginia",216.2,221.0,479.4,38390,32973,19.4,0.0,45.3,42.7,47.3,...,22.077434,0.153911,0.516719,59.388869,2.9584,4,10000.0,3.303217,739.84,3.356897
"Ada County, Idaho",151.6,1757.0,469.0,57908,434211,11.6,414.545002,35.8,35.0,36.6,...,104.90772,-0.244491,24.459579,0.336674,5.2441,100,123.45679,3.387774,876.16,2.778819
"Adair County, Iowa",178.9,51.0,440.7,48216,7228,10.3,138.350858,45.9,45.0,47.7,...,54.729457,-0.522917,3.281391,3.52072,48.3025,4,0.0,3.443618,979.69,2.646175


In [4]:
df.columns

Index(['TARGET_deathRate', 'avgAnnCount', 'incidenceRate', 'medIncome',
       'popEst2015', 'povertyPercent', 'studyPerCap', 'MedianAge',
       'MedianAgeMale', 'MedianAgeFemale',
       ...
       'city_min_distsl1_sqrd', 'sc_min_dists_l1_log',
       'PCT_LACCESS_CHILD10_sqrd', 'PCT_LACCESS_HHNV10_sqrd',
       'PC_DIRSALES07_sqrd', 'FMRKT13_sqrd', 'PCH_FMRKT_09_13_sqrd',
       'PCT_OBESE_ADULTS13_log', 'PCT_OBESE_ADULTS13_sqrd',
       'CHILDPOVRATE10_log'],
      dtype='object', length=329)

In [5]:
len(df.index.unique())

3047

In [6]:
df.shape

(3047, 329)

In [7]:
df.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
Index: 3047 entries, Abbeville County, South Carolina to Zavala County, Texas
Data columns (total 329 columns):
TARGET_deathRate                  float64
avgAnnCount                       float64
incidenceRate                     float64
medIncome                         int64
popEst2015                        int64
povertyPercent                    float64
studyPerCap                       float64
MedianAge                         float64
MedianAgeMale                     float64
MedianAgeFemale                   float64
AvgHouseholdSize                  float64
PercentMarried                    float64
PctNoHS18_24                      float64
PctHS18_24                        float64
PctSomeCol18_24                   float64
PctBachDeg18_24                   float64
PctHS25_Over                      float64
PctBachDeg25_Over                 float64
PctEmployed16_Over                float64
PctUnemployed16_Over              float64
PctPrivateCove

A manual train/test split is done below for better control of the scaling process, which will use the MinMax scaler.

In [8]:
#Boolean masks
is_test = np.random.rand(len(df)) < 0.2
is_train = ~is_test

In [9]:
df_test = df[is_test]

In [10]:
df_train = df[is_train]

In [11]:
type(df_test)

pandas.core.frame.DataFrame

In [12]:
type(df_train)

pandas.core.frame.DataFrame

For both training and test sets, the target variable is set as 'TARGET_deathRate', the per capita cancer mortality rate (per 100,000 people). The predictive feature set X is defined as all other columns in the DataFrame, for both training and test sets.

In [13]:
y_train_s = df_train['TARGET_deathRate']

In [14]:
target_names = ['TARGET_deathRate']
X_train_s = df_train[[cn for cn in df_train.columns if cn not in target_names]]

In [15]:
y_test_s = df_test['TARGET_deathRate']

In [16]:
X_test_s = df_test[[cn for cn in df_test.columns if cn not in target_names]]

In [17]:
scaler = MinMaxScaler()

The scaler is fitted to the training feature set, and is used to transform all values using the MinMaxScaler.

In [18]:
scaler.fit(X_train_s)

  return self.partial_fit(X, y)


MinMaxScaler(copy=True, feature_range=(0, 1))

In [19]:
X_train_scaled = pd.DataFrame(scaler.transform(X_train_s), columns=X_train_s.columns, index=X_train_s.index)

In [20]:
X_train_scaled.head(1)

Unnamed: 0_level_0,avgAnnCount,incidenceRate,medIncome,popEst2015,povertyPercent,studyPerCap,MedianAge,MedianAgeMale,MedianAgeFemale,AvgHouseholdSize,...,city_min_distsl1_sqrd,sc_min_dists_l1_log,PCT_LACCESS_CHILD10_sqrd,PCT_LACCESS_HHNV10_sqrd,PC_DIRSALES07_sqrd,FMRKT13_sqrd,PCH_FMRKT_09_13_sqrd,PCT_OBESE_ADULTS13_log,PCT_OBESE_ADULTS13_sqrd,CHILDPOVRATE10_log
Geography,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
"Acadia Parish, Louisiana",0.008311,0.554549,0.171164,0.006072,0.425339,0.0,0.365123,0.325397,0.399464,0.398104,...,0.014487,0.502579,0.00021,0.000872,0.001855,0.0,0.0,0.882545,0.824751,0.767657


In [21]:
X_test_scaled = pd.DataFrame(scaler.transform(X_test_s), columns=X_test_s.columns, index=X_test_s.index)

In [22]:
X_test_scaled.head(1)

Unnamed: 0_level_0,avgAnnCount,incidenceRate,medIncome,popEst2015,povertyPercent,studyPerCap,MedianAge,MedianAgeMale,MedianAgeFemale,AvgHouseholdSize,...,city_min_distsl1_sqrd,sc_min_dists_l1_log,PCT_LACCESS_CHILD10_sqrd,PCT_LACCESS_HHNV10_sqrd,PC_DIRSALES07_sqrd,FMRKT13_sqrd,PCH_FMRKT_09_13_sqrd,PCT_OBESE_ADULTS13_log,PCT_OBESE_ADULTS13_sqrd,CHILDPOVRATE10_log
Geography,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
"Abbeville County, South Carolina",0.003592,0.432848,0.125103,0.00237,0.411765,0.0,0.572207,0.484127,0.605898,0.312796,...,0.001788,0.589661,0.042716,0.009878,0.000438,0.000233,0.0,0.796025,0.708192,0.733398


# Machine Learning with Scaled Data

The hyperparameter tuning table that records all of the train and test accuracy and Root Mean Squared Error (RMSE) is called below for reference.

In [496]:
hp_tuning_table_scaled = pd.read_excel('HP tuning table - 2nd Capstone_Loew.xlsx', 
                                         sheet_name = 'HP_Tuning_Scaled')
hp_tuning_table_scaled

Unnamed: 0,Individual Algorithm Summary,LR_#,Train_Accuracy_Score,Train_RMSE,Test_Accuracy_Score,Test_RMSE,Model,Unscaled/Scaled,Non-normalized/Normalized,Alpha,Solver,Penalty,L1 Ratio (for ElasticNet),epsilon (for SGD Regressor),learning_rate (for SGD Regressor),eta0 (for SGD Regressor),power_t (for SGD Regressor),estimators (for Random Forest)
0,"Scaled, OLS Linear Regression",2s,0.656942,16.114653,0.587403,18.396952,OLS Linear Regression,Scaled w/MinMaxScaler,Non-normalized,,,,,,,,,
1,"Scaled, Ridge Regression, Alpha 0.001, auto so...",3s,0.654067,16.182031,0.586034,18.427462,Ridge Regression,Scaled w/MinMaxScaler,Non-normalized,0.00100,Auto,,,,,,,
2,"Scaled, Ridge Regression, Alpha 0.01, auto solver",4s,0.650823,16.257739,0.589173,18.357456,Ridge Regression,Scaled w/MinMaxScaler,Non-normalized,0.01000,Auto,,,,,,,
3,"Scaled, Ridge Regression, Alpha 0.1, auto solver",5s,0.641879,16.464633,0.579563,18.570916,Ridge Regression,Scaled w/MinMaxScaler,Non-normalized,0.10000,Auto,,,,,,,
4,"Scaled, Ridge Regression, Alpha 1, auto solver",6s,0.627356,16.795160,0.569512,18.791586,Ridge Regression,Scaled w/MinMaxScaler,Non-normalized,1.00000,Auto,,,,,,,
5,"Scaled, Ridge Regression, Alpha 10, auto solver",7s,0.598495,17.433424,0.557742,19.046755,Ridge Regression,Scaled w/MinMaxScaler,Non-normalized,10.00000,Auto,,,,,,,
6,"Scaled, Ridge Regression, Alpha 100, auto solver",8s,0.506801,19.321831,0.448304,21.273216,Ridge Regression,Scaled w/MinMaxScaler,Non-normalized,100.00000,Auto,,,,,,,
7,"Scaled, LASSO, Alpha 0.001",9s,0.643860,16.419036,0.587094,18.403844,LASSO,Scaled w/MinMaxScaler,Non-normalized,0.00100,,,,,,,,
8,"Scaled, LASSO, Alpha 0.01",10s,0.617542,17.014891,0.571804,18.741508,LASSO,Scaled w/MinMaxScaler,Non-normalized,0.01000,,,,,,,,
9,"Scaled, LASSO, Alpha 0.1",11s,0.570733,18.026074,0.546129,19.295200,LASSO,Scaled w/MinMaxScaler,Non-normalized,0.10000,,,,,,,,


## Linear Regression: Basic OLS with 80/20 Train-Test Split

In [23]:
lr_2s = linear_model.LinearRegression()
lr_2s

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
         normalize=False)

The algorithm is fitted on the training set, and the accuracy is returned on the training and test sets.

In [24]:
lr_2s.fit(X_train_scaled, y_train_s)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
         normalize=False)

In [25]:
y_pred_2s_train = lr_2s.predict(X_train_scaled)
y_pred_2s_train[0:20]

array([205.34965043, 144.40875261, 172.82581495, 207.42879221,
       175.83027318, 205.79965587, 159.92975648, 164.26279351,
       177.70119614, 188.12027613, 181.06853988, 154.31624421,
       164.57319426, 152.9644585 , 181.71956604, 158.86328955,
       165.18700099, 176.80968266, 155.70322452, 165.51491075])

In [26]:
print("Training Set R^2: {}".format(lr_2s.score(X_train_scaled, y_train_s)))
rmse_2s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_2s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_2s_train))

Training Set R^2: 0.6569421223282644
Training Set Root Mean Squared Error: 16.114652945166245


In [27]:
y_pred_2s_test = lr_2s.predict(X_test_scaled)
y_pred_2s_test[0:20]

array([189.51006262, 210.45582131, 209.64045497, 178.19697654,
       223.0685086 , 154.16860381, 176.98182981, 150.17886328,
       178.58665787, 205.42524552, 195.19021305, 192.26975466,
       193.33067064, 198.38772533, 180.66171092, 187.05399446,
       180.84429804, 175.88308909, 185.28043424, 193.40841834])

In [28]:
print("Test Set R^2: {}".format(lr_2s.score(X_test_scaled, y_test_s)))
rmse_2s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_2s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_2s_test))

Test Set R^2: 0.5874033947522772
Test Set Root Mean Squared Error: 18.39695246165718


In [29]:
filename = 'cancer_lr_2s.sav'
pickle.dump(lr_2s, open(filename, 'wb'))

## Ridge Regression

In [30]:
lr_3s = linear_model.Ridge(alpha=0.001)

In [31]:
lr_3s.fit(X_train_scaled, y_train_s)

Ridge(alpha=0.001, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

In [32]:
y_pred_3s_train = lr_3s.predict(X_train_scaled)
y_pred_3s_train[0:20]

array([205.18264653, 142.49530512, 171.81864453, 206.87144113,
       176.4434895 , 205.46001474, 158.99882779, 163.81926516,
       179.18041282, 188.80541397, 181.15206823, 154.80151973,
       164.80302034, 152.19525231, 182.83191044, 160.83949051,
       166.67091248, 176.99585167, 155.19724844, 166.98878919])

In [33]:
print("Training Set R^2: {}".format(lr_3s.score(X_train_scaled, y_train_s)))
rmse_3s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_3s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_3s_train))

Training Set R^2: 0.6540673528075202
Training Set Root Mean Squared Error: 16.18203117743646


In [34]:
y_pred_3s_test = lr_3s.predict(X_test_scaled)
y_pred_3s_test[0:20]

array([192.65408259, 207.43788917, 210.33981918, 179.61404224,
       222.16353068, 153.64841087, 176.00177881, 149.08584092,
       173.39235131, 228.03180165, 199.52292719, 158.40176722,
       193.84301899, 193.6048713 , 181.4249357 , 187.71112517,
       180.89650549, 176.32813709, 185.60477998, 190.7372278 ])

In [35]:
print("Test Set R^2: {}".format(lr_3s.score(X_test_scaled, y_test_s)))
rmse_3s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_3s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_3s_test))

Test Set R^2: 0.5860337685258903
Test Set Root Mean Squared Error: 18.427461768318633


In [36]:
filename = 'cancer_lr_3s.sav'
pickle.dump(lr_3s, open(filename, 'wb'))

In [37]:
lr_4s = linear_model.Ridge(alpha=0.01)

In [38]:
lr_4s.fit(X_train_scaled, y_train_s)

Ridge(alpha=0.01, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

In [39]:
y_pred_4s_train = lr_4s.predict(X_train_scaled)
y_pred_4s_train[0:20]

array([204.480151  , 142.7420226 , 171.12747102, 206.36127337,
       177.12941166, 206.23010439, 158.58882849, 164.68794744,
       178.84551153, 186.54698151, 182.01959292, 156.46074426,
       165.94714679, 151.67992035, 184.44251801, 161.62134964,
       166.51031729, 177.56669267, 155.41833249, 165.48052568])

In [40]:
print("Training Set R^2: {}".format(lr_4s.score(X_train_scaled, y_train_s)))
rmse_4s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_4s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_4s_train))

Training Set R^2: 0.650822865570293
Training Set Root Mean Squared Error: 16.25773935874027


In [41]:
y_pred_4s_test = lr_4s.predict(X_test_scaled)
y_pred_4s_test[0:20]

array([192.43199413, 201.9786454 , 210.94432554, 177.90724086,
       221.7317404 , 153.64459494, 175.51474697, 146.93578327,
       173.93364202, 196.01201495, 203.25218517, 149.60217406,
       193.83718623, 193.53635505, 181.62659543, 189.61471295,
       182.89457755, 176.78259585, 186.01849627, 191.34026065])

In [42]:
print("Test Set R^2: {}".format(lr_4s.score(X_test_scaled, y_test_s)))
rmse_4s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_4s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_4s_test))

Test Set R^2: 0.589173080262649
Test Set Root Mean Squared Error: 18.357456492630245


In [43]:
filename = 'cancer_lr_4s.sav'
pickle.dump(lr_4s, open(filename, 'wb'))

In [44]:
lr_5s = linear_model.Ridge(alpha=0.1)

In [45]:
lr_5s.fit(X_train_scaled, y_train_s)

Ridge(alpha=0.1, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

In [46]:
y_pred_5s_train = lr_5s.predict(X_train_scaled)
y_pred_5s_train[0:20]

array([203.6574146 , 144.66772198, 170.4209919 , 205.73078724,
       178.21017103, 209.36375925, 157.97428572, 163.47779811,
       178.25255248, 183.83597852, 183.49020692, 156.38340407,
       166.36823731, 149.80070408, 184.83431703, 162.38186634,
       167.59664485, 178.21804995, 154.79747255, 164.65589411])

In [47]:
print("Training Set R^2: {}".format(lr_5s.score(X_train_scaled, y_train_s)))
rmse_5s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_5s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_5s_train))

Training Set R^2: 0.6418791411910627
Training Set Root Mean Squared Error: 16.464633472682436


In [48]:
y_pred_5s_test = lr_5s.predict(X_test_scaled)
y_pred_5s_test[0:20]

array([192.10918089, 197.03485461, 210.25770237, 174.6993121 ,
       221.16088094, 154.17307537, 173.68668666, 143.5086355 ,
       174.75816934, 151.17847122, 206.22542267, 148.99643677,
       192.13853939, 193.46681795, 180.50235502, 191.82179934,
       186.18741751, 177.36825324, 188.02398965, 192.99736824])

In [49]:
print("Test Set R^2: {}".format(lr_5s.score(X_test_scaled, y_test_s)))
rmse_5s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_5s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_5s_test))

Test Set R^2: 0.5795633863670691
Test Set Root Mean Squared Error: 18.57091600254896


In [50]:
filename = 'cancer_lr_5s.sav'
pickle.dump(lr_5s, open(filename, 'wb'))

In [51]:
lr_6s = linear_model.Ridge(alpha=1)

In [52]:
lr_6s.fit(X_train_scaled, y_train_s)

Ridge(alpha=1, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

In [53]:
y_pred_6s_train = lr_6s.predict(X_train_scaled)
y_pred_6s_train[0:20]

array([203.41664674, 147.82958219, 169.93640144, 205.92821388,
       180.72182806, 210.65049046, 157.63171549, 163.40608133,
       177.75854988, 181.83373949, 183.5952137 , 157.23891981,
       166.43950685, 149.99089514, 186.07457666, 164.12240843,
       170.0138751 , 177.91884991, 154.42102851, 165.73709122])

In [54]:
print("Training Set R^2: {}".format(lr_6s.score(X_train_scaled, y_train_s)))
rmse_6s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_6s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_6s_train))

Training Set R^2: 0.6273563244805943
Training Set Root Mean Squared Error: 16.79515962646763


In [55]:
y_pred_6s_test = lr_6s.predict(X_test_scaled)
y_pred_6s_test[0:20]

array([192.69621577, 194.84197716, 209.48106115, 171.70115287,
       220.60503021, 154.28314164, 171.66486688, 142.54282334,
       176.18876346, 133.50828666, 207.9060614 , 156.07284387,
       191.79404114, 192.61583168, 179.0420062 , 191.23556704,
       190.18319521, 178.59566084, 189.43999065, 193.97726946])

In [56]:
print("Test Set R^2: {}".format(lr_6s.score(X_test_scaled, y_test_s)))
rmse_6s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_6s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_6s_test))

Test Set R^2: 0.5695122928590417
Test Set Root Mean Squared Error: 18.791586097954305


In [57]:
filename = 'cancer_lr_6s.sav'
pickle.dump(lr_6s, open(filename, 'wb'))

In [58]:
lr_7s = linear_model.Ridge(alpha=10)

In [59]:
lr_7s.fit(X_train_scaled, y_train_s)

Ridge(alpha=10, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

In [60]:
y_pred_7s_train = lr_7s.predict(X_train_scaled)
y_pred_7s_train[0:20]

array([201.57626931, 148.77051173, 171.36197474, 208.48213973,
       182.77620998, 208.40007555, 154.40855766, 165.79230438,
       178.28731614, 181.95613455, 180.27232663, 160.4605656 ,
       166.98114928, 158.33893748, 188.04408047, 168.08846405,
       171.53906044, 177.12573607, 154.33775461, 166.13999423])

In [61]:
print("Training Set R^2: {}".format(lr_7s.score(X_train_scaled, y_train_s)))
rmse_7s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_7s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_7s_train))

Training Set R^2: 0.5984951007221488
Training Set Root Mean Squared Error: 17.43342360547904


In [62]:
y_pred_7s_test = lr_7s.predict(X_test_scaled)
y_pred_7s_test[0:20]

array([193.17929981, 191.05465427, 211.76523347, 166.6217507 ,
       216.93200362, 155.9473915 , 167.08665129, 143.79588424,
       180.36699144, 149.4351617 , 204.43597738, 165.14539715,
       198.1774682 , 189.01288239, 179.12026056, 188.48536371,
       194.44902818, 183.87913231, 187.74020085, 193.56762434])

In [63]:
print("Test Set R^2: {}".format(lr_7s.score(X_test_scaled, y_test_s)))
rmse_7s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_7s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_7s_test))

Test Set R^2: 0.5577418307333255
Test Set Root Mean Squared Error: 19.046754885197537


In [64]:
filename = 'cancer_lr_7s.sav'
pickle.dump(lr_7s, open(filename, 'wb'))

In [65]:
lr_8s = linear_model.Ridge(alpha=100)

In [66]:
lr_8s.fit(X_train_scaled, y_train_s)

Ridge(alpha=100, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

In [67]:
y_pred_8s_train = lr_8s.predict(X_train_scaled)
y_pred_8s_train[0:20]

array([198.22702402, 151.9354218 , 172.93864952, 205.88682878,
       183.72741137, 201.64469563, 152.25499669, 170.24883683,
       176.72422813, 183.79541107, 176.12351397, 163.33686939,
       170.94906762, 168.45066052, 186.80923299, 176.54051333,
       173.71065548, 177.94519493, 159.02088076, 166.4474197 ])

In [68]:
print("Training Set R^2: {}".format(lr_8s.score(X_train_scaled, y_train_s)))
rmse_8s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_8s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_8s_train))

Training Set R^2: 0.5068011426243204
Training Set Root Mean Squared Error: 19.321831451454134


In [69]:
y_pred_8s_test = lr_8s.predict(X_test_scaled)
y_pred_8s_test[0:20]

array([193.65887069, 187.13631114, 212.03642239, 162.24650482,
       208.07790014, 160.13041493, 168.75817452, 145.75235794,
       187.42103187, 165.84476256, 195.25219432, 171.89001131,
       208.49930118, 185.38052719, 179.86502344, 184.90525164,
       195.65483674, 190.06899383, 184.16952211, 193.32627728])

In [70]:
print("Test Set R^2: {}".format(lr_8s.score(X_test_scaled, y_test_s)))
rmse_8s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_8s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_8s_test))

Test Set R^2: 0.4483035525689296
Test Set Root Mean Squared Error: 21.273216237912873


In [71]:
filename = 'cancer_lr_8s.sav'
pickle.dump(lr_8s, open(filename, 'wb'))

## LASSO

In [72]:
lr_9s = linear_model.Lasso(alpha=0.001)

In [73]:
lr_9s.fit(X_train_scaled, y_train_s)



Lasso(alpha=0.001, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

In [74]:
y_pred_9s_train = lr_9s.predict(X_train_scaled)
y_pred_9s_train[0:20]

array([203.87319419, 144.52947401, 170.10318486, 206.94520408,
       179.43016652, 206.84464186, 158.48782974, 165.95163089,
       178.38973278, 184.96507086, 183.23126048, 156.29553526,
       167.17810085, 149.945287  , 184.34111349, 162.2230237 ,
       166.32157473, 176.98155329, 154.59912284, 163.62736683])

In [75]:
print("Training Set R^2: {}".format(lr_9s.score(X_train_scaled, y_train_s)))
rmse_9s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_9s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_9s_train))

Training Set R^2: 0.6438599594494798
Training Set Root Mean Squared Error: 16.419036199621015


In [76]:
y_pred_9s_test = lr_9s.predict(X_test_scaled)
y_pred_9s_test[0:20]

array([191.42189802, 197.17145395, 211.71508468, 176.0477865 ,
       222.34902829, 153.226207  , 175.06177727, 143.76387772,
       174.0623336 , 157.15425922, 205.73843768, 146.62092336,
       193.97152087, 193.01933592, 181.01451025, 191.59947413,
       184.46024326, 176.72526882, 186.28407869, 191.75722704])

In [77]:
print("Test Set R^2: {}".format(lr_9s.score(X_test_scaled, y_test_s)))
rmse_9s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_9s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_9s_test))

Test Set R^2: 0.5870942316260279
Test Set Root Mean Squared Error: 18.40384368931988


In [78]:
filename = 'cancer_lr_9s.sav'
pickle.dump(lr_9s, open(filename, 'wb'))

In [79]:
lr_10s = linear_model.Lasso(alpha=0.01)

In [80]:
lr_10s.fit(X_train_scaled, y_train_s)

Lasso(alpha=0.01, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

In [81]:
y_pred_10s_train = lr_10s.predict(X_train_scaled)
y_pred_10s_train[0:20]

array([203.3478159 , 150.01491957, 169.21907907, 204.70902948,
       182.27795639, 209.40520333, 157.46653374, 165.20571563,
       178.13422997, 181.28626234, 184.22766658, 157.46035546,
       166.19517011, 150.72327089, 186.08065236, 165.94908734,
       171.30887631, 176.8682076 , 156.4235478 , 166.42172189])

In [82]:
print("Training Set R^2: {}".format(lr_10s.score(X_train_scaled, y_train_s)))
rmse_10s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_10s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_10s_train))

Training Set R^2: 0.617541923000996
Training Set Root Mean Squared Error: 17.01489122376773


In [83]:
y_pred_10s_test = lr_10s.predict(X_test_scaled)
y_pred_10s_test[0:20]

array([193.28335884, 196.38554285, 209.49494433, 171.51808836,
       220.4011101 , 154.45054923, 171.02408262, 141.01012414,
       176.62542541, 137.51422574, 207.35495765, 162.066284  ,
       193.12820105, 194.8026353 , 179.25141666, 189.72944618,
       190.16511445, 178.92989801, 188.62738205, 194.35841834])

In [84]:
print("Test Set R^2: {}".format(lr_10s.score(X_test_scaled, y_test_s)))
rmse_10s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_10s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_10s_test))

Test Set R^2: 0.5718036784987082
Test Set Root Mean Squared Error: 18.741507755687078


In [85]:
filename = 'cancer_lr_10s.sav'
pickle.dump(lr_10s, open(filename, 'wb'))

In [86]:
lr_11s = linear_model.Lasso(alpha=0.1)

In [87]:
lr_11s.fit(X_train_scaled, y_train_s)

Lasso(alpha=0.1, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

In [88]:
y_pred_11s_train = lr_11s.predict(X_train_scaled)
y_pred_11s_train[0:20]

array([202.07324243, 154.79904861, 169.57823141, 201.68576648,
       176.65703226, 200.77907961, 154.31825584, 171.51864827,
       176.54493176, 177.78903314, 185.77618865, 158.66696414,
       170.87302782, 161.76319059, 187.70229987, 167.4606806 ,
       174.30727434, 178.44621695, 157.67432494, 170.00626014])

In [89]:
print("Training Set R^2: {}".format(lr_11s.score(X_train_scaled, y_train_s)))
rmse_11s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_11s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_11s_train))

Training Set R^2: 0.5707327227410498
Training Set Root Mean Squared Error: 18.02607403180161


In [90]:
y_pred_11s_test = lr_11s.predict(X_test_scaled)
y_pred_11s_test[0:20]

array([191.32911087, 197.04503565, 213.00326335, 168.96672337,
       212.56269598, 157.44751625, 161.80957981, 143.36181389,
       179.56974787, 132.33505204, 194.2915923 , 172.14504391,
       202.83408572, 198.06058318, 178.34270339, 186.20768402,
       197.61039779, 181.39026883, 183.84928791, 198.13887418])

In [91]:
print("Test Set R^2: {}".format(lr_11s.score(X_test_scaled, y_test_s)))
rmse_11s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_11s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_11s_test))

Test Set R^2: 0.5461289722547018
Test Set Root Mean Squared Error: 19.295200308021585


In [92]:
filename = 'cancer_lr_11s.sav'
pickle.dump(lr_11s, open(filename, 'wb'))

In [93]:
lr_12s = linear_model.Lasso(alpha=1)

In [94]:
lr_12s.fit(X_train_scaled, y_train_s)

Lasso(alpha=1, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

In [95]:
y_pred_12s_train = lr_12s.predict(X_train_scaled)
y_pred_12s_train[0:20]

array([190.94740774, 168.82767199, 179.06218212, 188.56844485,
       174.05484113, 193.90880534, 159.55124151, 175.47495883,
       174.25294504, 178.34189511, 179.74186509, 175.66038134,
       178.63093048, 170.39317066, 178.93602247, 177.46699796,
       170.74432509, 178.04246551, 166.18171127, 167.50938136])

In [96]:
print("Training Set R^2: {}".format(lr_12s.score(X_train_scaled, y_train_s)))
rmse_12s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_12s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_12s_train))

Training Set R^2: 0.31643531497143584
Training Set Root Mean Squared Error: 22.74715397248038


In [97]:
y_pred_12s_test = lr_12s.predict(X_test_scaled)
y_pred_12s_test[0:20]

array([184.48755072, 183.9791089 , 200.83580773, 172.99563416,
       190.61165335, 163.52976315, 165.78402595, 160.57867935,
       184.19127949, 165.14816706, 179.56816323, 173.91142216,
       194.55130529, 183.4437597 , 175.52345879, 175.2259492 ,
       184.67567287, 188.78927101, 175.99181202, 182.200798  ])

In [98]:
print("Test Set R^2: {}".format(lr_12s.score(X_test_scaled, y_test_s)))
rmse_12s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_12s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_12s_test))

Test Set R^2: 0.2688630819685406
Test Set Root Mean Squared Error: 24.489640947183958


In [99]:
filename = 'cancer_lr_12s.sav'
pickle.dump(lr_12s, open(filename, 'wb'))

In [100]:
lr_13s = linear_model.Lasso(alpha=10)

In [101]:
lr_13s.fit(X_train_scaled, y_train_s)

Lasso(alpha=10, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

In [102]:
y_pred_13s_train = lr_13s.predict(X_train_scaled)
y_pred_13s_train[0:20]

array([178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151])

In [103]:
print("Training Set R^2: {}".format(lr_13s.score(X_train_scaled, y_train_s)))
rmse_13s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_13s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_13s_train))

Training Set R^2: 0.0
Training Set Root Mean Squared Error: 27.512956306533045


In [104]:
y_pred_13s_test = lr_13s.predict(X_test_scaled)
y_pred_13s_test[0:20]

array([178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151])

In [105]:
print("Test Set R^2: {}".format(lr_13s.score(X_test_scaled, y_test_s)))
rmse_13s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_13s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_13s_test))

Test Set R^2: -0.001131456212104309
Test Set Root Mean Squared Error: 28.656860834509846


In [106]:
filename = 'cancer_lr_13s.sav'
pickle.dump(lr_13s, open(filename, 'wb'))

In [107]:
lr_14s = linear_model.Lasso(alpha=100)

In [108]:
lr_14s.fit(X_train_scaled, y_train_s)

Lasso(alpha=100, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

In [109]:
y_pred_14s_train = lr_14s.predict(X_train_scaled)
y_pred_14s_train[0:20]

array([178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151])

In [110]:
print("Training Set R^2: {}".format(lr_14s.score(X_train_scaled, y_train_s)))
rmse_14s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_14s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_14s_train))

Training Set R^2: 0.0
Training Set Root Mean Squared Error: 27.512956306533045


In [111]:
y_pred_14s_test = lr_14s.predict(X_test_scaled)
y_pred_14s_test[0:20]

array([178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151])

In [112]:
print("Test Set R^2: {}".format(lr_14s.score(X_test_scaled, y_test_s)))
rmse_14s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_14s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_14s_test))

Test Set R^2: -0.001131456212104309
Test Set Root Mean Squared Error: 28.656860834509846


In [113]:
filename = 'cancer_lr_14s.sav'
pickle.dump(lr_14s, open(filename, 'wb'))

## Elastic Net with L1 Ratio of 0.25

In [114]:
lr_15s = linear_model.ElasticNet(alpha=0.001, l1_ratio=0.25)

In [115]:
lr_15s.fit(X_train_scaled, y_train_s)



ElasticNet(alpha=0.001, copy_X=True, fit_intercept=True, l1_ratio=0.25,
      max_iter=1000, normalize=False, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [116]:
y_pred_15s_train = lr_15s.predict(X_train_scaled)
y_pred_15s_train[0:20]

array([203.15380778, 148.12474139, 170.09264256, 206.40866158,
       181.24110819, 210.42897947, 157.18915282, 163.85176017,
       177.66676756, 181.53056384, 183.08928713, 157.90404945,
       166.35784545, 151.31707072, 186.73360821, 164.80261128,
       170.56522808, 177.77520895, 154.43770965, 165.72491034])

In [117]:
print("Training Set R^2: {}".format(lr_15s.score(X_train_scaled, y_train_s)))
rmse_15s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_15s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_15s_train))

Training Set R^2: 0.6225968319525462
Training Set Root Mean Squared Error: 16.90207518327997


In [118]:
y_pred_15s_test = lr_15s.predict(X_test_scaled)
y_pred_15s_test[0:20]

array([192.66640531, 194.07825169, 209.81900152, 170.74959653,
       220.37206254, 154.8366827 , 170.6833422 , 142.73816561,
       176.98612046, 134.69426596, 207.87089314, 158.31422176,
       192.55777668, 192.11921705, 178.85907751, 190.45986973,
       191.30240428, 179.64400089, 189.25576196, 193.96042295])

In [119]:
print("Test Set R^2: {}".format(lr_15s.score(X_test_scaled, y_test_s)))
rmse_15s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_15s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_15s_test))

Test Set R^2: 0.5698057362202045
Test Set Root Mean Squared Error: 18.78518033291211


In [120]:
filename = 'cancer_lr_15s.sav'
pickle.dump(lr_15s, open(filename, 'wb'))

In [121]:
lr_16s = linear_model.ElasticNet(alpha=0.01, l1_ratio=0.25)

In [122]:
lr_16s.fit(X_train_scaled, y_train_s)

ElasticNet(alpha=0.01, copy_X=True, fit_intercept=True, l1_ratio=0.25,
      max_iter=1000, normalize=False, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [123]:
y_pred_16s_train = lr_16s.predict(X_train_scaled)
y_pred_16s_train[0:20]

array([200.68481668, 149.42177211, 171.80058493, 208.78133815,
       183.2591608 , 207.03733405, 153.18150477, 167.11905429,
       178.30226634, 182.46470478, 178.98708122, 161.30821478,
       167.85020307, 162.15596169, 188.02221323, 170.35710148,
       172.17471316, 176.98756686, 154.89736154, 166.3126238 ])

In [124]:
print("Training Set R^2: {}".format(lr_16s.score(X_train_scaled, y_train_s)))
rmse_16s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_16s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_16s_train))

Training Set R^2: 0.5798472275369146
Training Set Root Mean Squared Error: 17.833676128342507


In [125]:
y_pred_16s_test = lr_16s.predict(X_test_scaled)
y_pred_16s_test[0:20]

array([193.59858319, 189.75633655, 212.36465413, 164.58249212,
       214.48118991, 156.76241225, 166.29829959, 144.17312783,
       182.0885475 , 156.17291838, 201.75875753, 167.45000195,
       201.87923021, 187.72916172, 179.44788904, 187.63913584,
       195.40823288, 185.80982407, 186.6568321 , 193.40527894])

In [126]:
print("Test Set R^2: {}".format(lr_16s.score(X_test_scaled, y_test_s)))
rmse_16s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_16s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_16s_test))

Test Set R^2: 0.5388397529233431
Test Set Root Mean Squared Error: 19.44952469389643


In [127]:
filename = 'cancer_lr_16s.sav'
pickle.dump(lr_16s, open(filename, 'wb'))

In [128]:
lr_17s = linear_model.ElasticNet(alpha=0.1)

In [129]:
lr_17s.fit(X_train_scaled, y_train_s)

ElasticNet(alpha=0.1, copy_X=True, fit_intercept=True, l1_ratio=0.5,
      max_iter=1000, normalize=False, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [130]:
y_pred_17s_train = lr_17s.predict(X_train_scaled)
y_pred_17s_train[0:20]

array([196.45767432, 153.84411363, 173.20530248, 203.6544058 ,
       181.46699727, 199.66445285, 152.87305899, 171.48248984,
       176.49029083, 182.37112119, 176.6088619 , 164.14375936,
       172.44288128, 168.66958287, 186.58622279, 178.49598983,
       174.76517453, 178.26887881, 161.22528606, 165.95043945])

In [131]:
print("Training Set R^2: {}".format(lr_17s.score(X_train_scaled, y_train_s)))
rmse_17s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_17s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_17s_train))

Training Set R^2: 0.48141988336351615
Training Set Root Mean Squared Error: 19.81276958985461


In [132]:
y_pred_17s_test = lr_17s.predict(X_test_scaled)
y_pred_17s_test[0:20]

array([193.24585928, 186.84998735, 210.84016281, 163.52404305,
       204.06080114, 161.69460487, 168.41170841, 147.57153718,
       188.03530083, 165.8128915 , 192.12156626, 173.45755483,
       209.07386254, 184.93878712, 178.84381817, 183.21056767,
       196.24725553, 190.55330414, 182.54857487, 192.71497492])

In [133]:
print("Test Set R^2: {}".format(lr_17s.score(X_test_scaled, y_test_s)))
rmse_17s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_17s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_17s_test))

Test Set R^2: 0.4225159514960658
Test Set Root Mean Squared Error: 21.764718635860117


In [134]:
filename = 'cancer_lr_17s.sav'
pickle.dump(lr_17s, open(filename, 'wb'))

In [135]:
lr_18s = linear_model.ElasticNet(alpha=1, l1_ratio=0.25)

In [136]:
lr_18s.fit(X_train_scaled, y_train_s)

ElasticNet(alpha=1, copy_X=True, fit_intercept=True, l1_ratio=0.25,
      max_iter=1000, normalize=False, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [137]:
y_pred_18s_train = lr_18s.predict(X_train_scaled)
y_pred_18s_train[0:20]

array([187.09590721, 166.86746582, 175.51475283, 187.96912719,
       179.47715963, 187.81442734, 168.08572969, 174.55038778,
       176.68002293, 179.89238399, 176.11439754, 171.70816021,
       176.47869179, 173.33958614, 180.70561376, 179.62044288,
       176.99455476, 180.3738892 , 174.84756555, 173.83336565])

In [138]:
print("Training Set R^2: {}".format(lr_18s.score(X_train_scaled, y_train_s)))
rmse_18s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_18s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_18s_train))

Training Set R^2: 0.2514377512484497
Training Set Root Mean Squared Error: 23.804070017202598


In [139]:
y_pred_18s_test = lr_18s.predict(X_test_scaled)
y_pred_18s_test[0:20]

array([184.33005806, 182.78882288, 193.94834071, 171.78309037,
       189.12507385, 173.75493328, 176.79498626, 164.65737806,
       183.67408991, 165.54394648, 182.76316292, 177.36794113,
       194.40008165, 181.06434675, 177.42553559, 179.20586182,
       187.48342038, 187.5123929 , 179.78292979, 184.8405997 ])

In [140]:
print("Test Set R^2: {}".format(lr_18s.score(X_test_scaled, y_test_s)))
rmse_18s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_18s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_18s_test))

Test Set R^2: 0.20844573881662265
Test Set Root Mean Squared Error: 25.481407110311743


In [141]:
filename = 'cancer_lr_18s.sav'
pickle.dump(lr_18s, open(filename, 'wb'))

In [142]:
lr_19s = linear_model.ElasticNet(alpha=10, l1_ratio=0.25)

In [143]:
lr_19s.fit(X_train_scaled, y_train_s)

ElasticNet(alpha=10, copy_X=True, fit_intercept=True, l1_ratio=0.25,
      max_iter=1000, normalize=False, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [144]:
y_pred_19s_train = lr_19s.predict(X_train_scaled)
y_pred_19s_train[0:20]

array([178.98816875, 178.82178326, 178.82840359, 178.83608183,
       178.82486938, 178.98571708, 178.79282005, 178.82178326,
       178.82101987, 178.83039559, 178.82840359, 178.82721818,
       178.8233198 , 178.81283702, 178.8225499 , 178.82999556,
       178.8067824 , 178.82101987, 178.94531845, 178.80643351])

In [145]:
print("Training Set R^2: {}".format(lr_19s.score(X_train_scaled, y_train_s)))
rmse_19s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_19s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_19s_train))

Training Set R^2: 0.0018399277976327568
Training Set Root Mean Squared Error: 27.48763372668891


In [146]:
y_pred_19s_test = lr_19s.predict(X_test_scaled)
y_pred_19s_test[0:20]

array([178.82999556, 178.96533541, 178.99655376, 178.82178326,
       178.97736777, 178.80401385, 178.80995869, 178.80197122,
       178.82919794, 178.81725169, 178.81687934, 178.8233198 ,
       178.98249395, 178.82919794, 178.81283702, 178.81912562,
       178.82101987, 178.97697916, 178.81283702, 178.82919794])

In [147]:
print("Test Set R^2: {}".format(lr_19s.score(X_test_scaled, y_test_s)))
rmse_19s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_19s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_19s_test))

Test Set R^2: 0.0001283405097105561
Test Set Root Mean Squared Error: 28.638824649724974


In [148]:
filename = 'cancer_lr_19s.sav'
pickle.dump(lr_19s, open(filename, 'wb'))

In [149]:
lr_20s = linear_model.ElasticNet(alpha=100, l1_ratio=0.25)

In [150]:
lr_20s.fit(X_train_scaled, y_train_s)

ElasticNet(alpha=100, copy_X=True, fit_intercept=True, l1_ratio=0.25,
      max_iter=1000, normalize=False, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [151]:
y_pred_20s_train = lr_20s.predict(X_train_scaled)
y_pred_20s_train[0:20]

array([178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151])

In [152]:
print("Training Set R^2: {}".format(lr_20s.score(X_train_scaled, y_train_s)))
rmse_20s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_20s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_20s_train))

Training Set R^2: 0.0
Training Set Root Mean Squared Error: 27.512956306533045


In [153]:
y_pred_20s_test = lr_20s.predict(X_test_scaled)
y_pred_20s_test[0:20]

array([178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151])

In [154]:
print("Test Set R^2: {}".format(lr_20s.score(X_test_scaled, y_test_s)))
rmse_20s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_20s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_20s_test))

Test Set R^2: -0.001131456212104309
Test Set Root Mean Squared Error: 28.656860834509846


In [155]:
filename = 'cancer_lr_20s.sav'
pickle.dump(lr_20s, open(filename, 'wb'))

## Elastic Net with L1 Ratio of 0.5

In [156]:
lr_21s = linear_model.ElasticNet(alpha=0.001, l1_ratio=0.5)

In [157]:
lr_21s.fit(X_train_scaled, y_train_s)



ElasticNet(alpha=0.001, copy_X=True, fit_intercept=True, l1_ratio=0.5,
      max_iter=1000, normalize=False, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [158]:
y_pred_21s_train = lr_21s.predict(X_train_scaled)
y_pred_21s_train[0:20]

array([203.42447224, 148.34072263, 169.90128907, 205.92124076,
       181.06879649, 210.46687333, 157.42333158, 163.66234617,
       177.78434195, 181.67967297, 183.53426549, 157.6358598 ,
       166.45726699, 150.35470012, 186.29944319, 164.43379515,
       170.15929697, 177.68050202, 154.40180564, 165.90763231])

In [159]:
print("Training Set R^2: {}".format(lr_21s.score(X_train_scaled, y_train_s)))
rmse_21s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_21s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_21s_train))

Training Set R^2: 0.6248207971636459
Training Set Root Mean Squared Error: 16.852201240384055


In [160]:
y_pred_21s_test = lr_21s.predict(X_test_scaled)
y_pred_21s_test[0:20]

array([192.84657492, 194.57875023, 209.58485011, 171.38291044,
       220.46454139, 154.49568422, 171.32249751, 142.66787425,
       176.57752035, 133.78253829, 207.86577552, 158.01546389,
       192.11450392, 192.51848635, 178.83307004, 190.89629517,
       190.5237148 , 178.83439089, 189.33370516, 194.09897211])

In [161]:
print("Test Set R^2: {}".format(lr_21s.score(X_test_scaled, y_test_s)))
rmse_21s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_21s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_21s_test))

Test Set R^2: 0.5697353324559896
Test Set Root Mean Squared Error: 18.786717421399853


In [162]:
filename = 'cancer_lr_21s.sav'
pickle.dump(lr_21s, open(filename, 'wb'))

In [163]:
lr_22s = linear_model.ElasticNet(alpha=0.01, l1_ratio=0.5)

In [164]:
lr_22s.fit(X_train_scaled, y_train_s)

ElasticNet(alpha=0.01, copy_X=True, fit_intercept=True, l1_ratio=0.5,
      max_iter=1000, normalize=False, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [165]:
y_pred_22s_train = lr_22s.predict(X_train_scaled)
y_pred_22s_train[0:20]

array([201.16810762, 149.37738571, 171.29486288, 208.27547915,
       182.8776695 , 207.78594666, 153.96534045, 166.71536116,
       178.30778785, 182.0680801 , 180.01028367, 160.92305246,
       167.42664302, 160.5047761 , 187.92173853, 169.22784847,
       172.01324653, 177.00827232, 154.96758111, 166.23747604])

In [166]:
print("Training Set R^2: {}".format(lr_22s.score(X_train_scaled, y_train_s)))
rmse_22s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_22s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_22s_train))

Training Set R^2: 0.5903594361695605
Training Set Root Mean Squared Error: 17.609163913731898


In [167]:
y_pred_22s_test = lr_22s.predict(X_test_scaled)
y_pred_22s_test[0:20]

array([193.45655262, 190.63595285, 211.99395614, 165.72844691,
       215.62697034, 156.46628503, 166.49606305, 143.77359322,
       180.74379283, 152.32139   , 203.04255614, 167.16404067,
       200.39500763, 188.91555189, 179.2993079 , 188.15064408,
       194.99038019, 184.6596727 , 186.95880368, 193.53843276])

In [168]:
print("Test Set R^2: {}".format(lr_22s.score(X_test_scaled, y_test_s)))
rmse_22s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_22s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_22s_test))

Test Set R^2: 0.5517377595582946
Test Set Root Mean Squared Error: 19.17560785246831


In [169]:
filename = 'cancer_lr_22s.sav'
pickle.dump(lr_22s, open(filename, 'wb'))

In [170]:
lr_23s = linear_model.ElasticNet(alpha=0.1, l1_ratio=0.5)

In [171]:
lr_23s.fit(X_train_scaled, y_train_s)

ElasticNet(alpha=0.1, copy_X=True, fit_intercept=True, l1_ratio=0.5,
      max_iter=1000, normalize=False, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [172]:
y_pred_23s_train = lr_23s.predict(X_train_scaled)
y_pred_23s_train[0:20]

array([196.45767432, 153.84411363, 173.20530248, 203.6544058 ,
       181.46699727, 199.66445285, 152.87305899, 171.48248984,
       176.49029083, 182.37112119, 176.6088619 , 164.14375936,
       172.44288128, 168.66958287, 186.58622279, 178.49598983,
       174.76517453, 178.26887881, 161.22528606, 165.95043945])

In [173]:
print("Training Set R^2: {}".format(lr_23s.score(X_train_scaled, y_train_s)))
rmse_23s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_23s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_23s_train))

Training Set R^2: 0.48141988336351615
Training Set Root Mean Squared Error: 19.81276958985461


In [174]:
y_pred_23s_test = lr_23s.predict(X_test_scaled)
y_pred_23s_test[0:20]

array([193.24585928, 186.84998735, 210.84016281, 163.52404305,
       204.06080114, 161.69460487, 168.41170841, 147.57153718,
       188.03530083, 165.8128915 , 192.12156626, 173.45755483,
       209.07386254, 184.93878712, 178.84381817, 183.21056767,
       196.24725553, 190.55330414, 182.54857487, 192.71497492])

In [175]:
print("Test Set R^2: {}".format(lr_23s.score(X_test_scaled, y_test_s)))
rmse_23s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_23s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_23s_test))

Test Set R^2: 0.4225159514960658
Test Set Root Mean Squared Error: 21.764718635860117


In [176]:
filename = 'cancer_lr_23s.sav'
pickle.dump(lr_23s, open(filename, 'wb'))

In [177]:
lr_24s = linear_model.ElasticNet(alpha=1, l1_ratio=0.5)

In [178]:
lr_24s.fit(X_train_scaled, y_train_s)

ElasticNet(alpha=1, copy_X=True, fit_intercept=True, l1_ratio=0.5,
      max_iter=1000, normalize=False, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [179]:
y_pred_24s_train = lr_24s.predict(X_train_scaled)
y_pred_24s_train[0:20]

array([186.62103847, 168.24749743, 176.58805952, 187.70297234,
       178.51347297, 188.44437292, 168.16268348, 174.50176272,
       176.8993598 , 179.70303362, 176.81456931, 172.5165359 ,
       176.85824436, 174.2387066 , 180.71191291, 179.97679246,
       176.11710217, 179.86710191, 175.32011083, 173.22949486])

In [180]:
print("Training Set R^2: {}".format(lr_24s.score(X_train_scaled, y_train_s)))
rmse_24s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_24s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_24s_train))

Training Set R^2: 0.24515351710575495
Training Set Root Mean Squared Error: 23.903779630734515


In [181]:
y_pred_24s_test = lr_24s.predict(X_test_scaled)
y_pred_24s_test[0:20]

array([184.04857995, 182.6185709 , 193.67365278, 172.92979104,
       188.33391122, 173.09264083, 176.26921183, 165.00645075,
       183.73648018, 167.7163412 , 181.8673606 , 177.37259798,
       193.96119885, 180.92364743, 177.36639678, 179.17601873,
       186.45638025, 186.8103474 , 178.68646757, 184.16094288])

In [182]:
print("Test Set R^2: {}".format(lr_24s.score(X_test_scaled, y_test_s)))
rmse_24s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_24s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_24s_test))

Test Set R^2: 0.20179174711467218
Test Set Root Mean Squared Error: 25.588284328873076


In [183]:
filename = 'cancer_lr_24s.sav'
pickle.dump(lr_24s, open(filename, 'wb'))

In [184]:
lr_25s = linear_model.ElasticNet(alpha=10, l1_ratio=0.5)

In [185]:
lr_25s.fit(X_train_scaled, y_train_s)

ElasticNet(alpha=10, copy_X=True, fit_intercept=True, l1_ratio=0.5,
      max_iter=1000, normalize=False, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [186]:
y_pred_25s_train = lr_25s.predict(X_train_scaled)
y_pred_25s_train[0:20]

array([178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151])

In [187]:
print("Training Set R^2: {}".format(lr_25s.score(X_train_scaled, y_train_s)))
rmse_25s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_25s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_25s_train))

Training Set R^2: 0.0
Training Set Root Mean Squared Error: 27.512956306533045


In [188]:
y_pred_25s_test = lr_25s.predict(X_test_scaled)
y_pred_25s_test[0:20]

array([178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151])

In [189]:
print("Test Set R^2: {}".format(lr_25s.score(X_test_scaled, y_test_s)))
rmse_25s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_25s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_25s_test))

Test Set R^2: -0.001131456212104309
Test Set Root Mean Squared Error: 28.656860834509846


In [190]:
filename = 'cancer_lr_25s.sav'
pickle.dump(lr_25s, open(filename, 'wb'))

In [191]:
lr_26s = linear_model.ElasticNet(alpha=100, l1_ratio=0.5)

In [192]:
lr_26s.fit(X_train_scaled, y_train_s)

ElasticNet(alpha=100, copy_X=True, fit_intercept=True, l1_ratio=0.5,
      max_iter=1000, normalize=False, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [193]:
y_pred_26s_train = lr_26s.predict(X_train_scaled)
y_pred_26s_train[0:20]

array([178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151])

In [194]:
print("Training Set R^2: {}".format(lr_26s.score(X_train_scaled, y_train_s)))
rmse_26s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_26s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_26s_train))

Training Set R^2: 0.0
Training Set Root Mean Squared Error: 27.512956306533045


In [195]:
y_pred_26s_test = lr_26s.predict(X_test_scaled)
y_pred_26s_test[0:20]

array([178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151])

In [196]:
print("Test Set R^2: {}".format(lr_26s.score(X_test_scaled, y_test_s)))
rmse_26s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_26s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_26s_test))

Test Set R^2: -0.001131456212104309
Test Set Root Mean Squared Error: 28.656860834509846


In [197]:
filename = 'cancer_lr_26s.sav'
pickle.dump(lr_26s, open(filename, 'wb'))

## Elastic Net with L1 Ratio of 0.75

In [198]:
lr_27s = linear_model.ElasticNet(alpha=0.001, l1_ratio=0.75)

In [199]:
lr_27s.fit(X_train_scaled, y_train_s)



ElasticNet(alpha=0.001, copy_X=True, fit_intercept=True, l1_ratio=0.75,
      max_iter=1000, normalize=False, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [200]:
y_pred_27s_train = lr_27s.predict(X_train_scaled)
y_pred_27s_train[0:20]

array([203.45748997, 147.56664616, 169.73043022, 205.50330054,
       180.59790174, 210.53464147, 157.91435672, 163.65829515,
       177.59099826, 181.91576759, 183.72400836, 157.01577456,
       166.5990756 , 149.68984773, 185.63290177, 163.93222733,
       169.82785836, 177.93073012, 155.02797395, 165.80423209])

In [201]:
print("Training Set R^2: {}".format(lr_27s.score(X_train_scaled, y_train_s)))
rmse_27s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_27s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_27s_train))

Training Set R^2: 0.6295987067488629
Training Set Root Mean Squared Error: 16.74455096433371


In [202]:
y_pred_27s_test = lr_27s.predict(X_test_scaled)
y_pred_27s_test[0:20]

array([192.73326893, 195.35135334, 209.5013098 , 172.10122854,
       220.73417512, 154.07963059, 172.13485996, 142.50338721,
       175.87383343, 134.26265195, 207.57696232, 155.66463737,
       191.97807426, 192.72196998, 179.20733799, 191.13574841,
       189.35740848, 178.13419091, 189.31270249, 193.58624686])

In [203]:
print("Test Set R^2: {}".format(lr_27s.score(X_test_scaled, y_test_s)))
rmse_27s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_27s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_27s_test))

Test Set R^2: 0.5701916692170508
Test Set Root Mean Squared Error: 18.776752224659432


In [204]:
filename = 'cancer_lr_27s.sav'
pickle.dump(lr_27s, open(filename, 'wb'))

In [205]:
lr_28s = linear_model.ElasticNet(alpha=0.01, l1_ratio=0.75)

In [206]:
lr_28s.fit(X_train_scaled, y_train_s)

ElasticNet(alpha=0.01, copy_X=True, fit_intercept=True, l1_ratio=0.75,
      max_iter=1000, normalize=False, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [207]:
y_pred_28s_train = lr_28s.predict(X_train_scaled)
y_pred_28s_train[0:20]

array([202.13038764, 149.4849316 , 170.46834092, 207.17949959,
       182.42491719, 208.67584148, 155.02875669, 166.16182974,
       178.14035755, 181.61429124, 181.7233686 , 160.21530874,
       166.91021593, 157.77685194, 187.82121414, 167.6689607 ,
       171.71978017, 177.01251035, 154.9330748 , 166.13251823])

In [208]:
print("Training Set R^2: {}".format(lr_28s.score(X_train_scaled, y_train_s)))
rmse_28s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_28s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_28s_train))

Training Set R^2: 0.6027946730318808
Training Set Root Mean Squared Error: 17.33982821325987


In [209]:
y_pred_28s_test = lr_28s.predict(X_test_scaled)
y_pred_28s_test[0:20]

array([193.33883058, 192.0564998 , 211.34990388, 167.64380178,
       217.35785596, 155.79125397, 167.33469672, 143.10094694,
       179.00122118, 146.68746504, 204.93011502, 166.64966106,
       198.0652912 , 190.98054538, 179.04674762, 188.91999059,
       194.03406375, 182.66722651, 187.3124315 , 193.89613765])

In [210]:
print("Test Set R^2: {}".format(lr_28s.score(X_test_scaled, y_test_s)))
rmse_28s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_28s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_28s_test))

Test Set R^2: 0.5647308541111382
Test Set Root Mean Squared Error: 18.895657246290014


In [211]:
filename = 'cancer_lr_28s.sav'
pickle.dump(lr_28s, open(filename, 'wb'))

In [212]:
lr_29s = linear_model.ElasticNet(alpha=0.1, l1_ratio=0.75)

In [213]:
lr_29s.fit(X_train_scaled, y_train_s)

ElasticNet(alpha=0.1, copy_X=True, fit_intercept=True, l1_ratio=0.75,
      max_iter=1000, normalize=False, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [214]:
y_pred_29s_train = lr_29s.predict(X_train_scaled)
y_pred_29s_train[0:20]

array([196.33547144, 153.44303812, 172.88800808, 205.14304382,
       179.82683833, 200.69797097, 152.43320856, 171.89530022,
       176.95132294, 181.09837998, 178.0560797 , 163.14694442,
       172.5272481 , 168.07190042, 187.48014915, 177.0972029 ,
       174.51442224, 177.80051635, 159.25535026, 165.36817695])

In [215]:
print("Training Set R^2: {}".format(lr_29s.score(X_train_scaled, y_train_s)))
rmse_29s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_29s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_29s_train))

Training Set R^2: 0.5051485385136945
Training Set Root Mean Squared Error: 19.35417604588162


In [216]:
y_pred_29s_test = lr_29s.predict(X_test_scaled)
y_pred_29s_test[0:20]

array([193.14881379, 188.08091328, 211.92086979, 163.49916471,
       203.97346063, 160.4983305 , 165.7785689 , 147.45167242,
       187.1500157 , 164.8863221 , 192.28984537, 172.4103628 ,
       208.68642388, 185.52562916, 178.41954041, 183.69230115,
       197.66349994, 189.15572678, 183.2526225 , 193.34912638])

In [217]:
print("Test Set R^2: {}".format(lr_29s.score(X_test_scaled, y_test_s)))
rmse_29s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_29s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_29s_test))

Test Set R^2: 0.45492299424631866
Test Set Root Mean Squared Error: 21.145209468962772


In [218]:
filename = 'cancer_lr_29s.sav'
pickle.dump(lr_29s, open(filename, 'wb'))

In [219]:
lr_30s = linear_model.ElasticNet(alpha=1, l1_ratio=0.75)

In [220]:
lr_30s.fit(X_train_scaled, y_train_s)

ElasticNet(alpha=1, copy_X=True, fit_intercept=True, l1_ratio=0.75,
      max_iter=1000, normalize=False, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [221]:
y_pred_30s_train = lr_30s.predict(X_train_scaled)
y_pred_30s_train[0:20]

array([187.34356282, 168.82503233, 177.77323033, 187.84508203,
       177.04148432, 190.77573609, 166.29360753, 174.82129245,
       176.6689854 , 179.40700281, 177.87813696, 174.23101481,
       177.18034278, 174.53411923, 180.47526774, 179.92469821,
       175.47652278, 179.08923623, 174.06423295, 171.97961559])

In [222]:
print("Training Set R^2: {}".format(lr_30s.score(X_train_scaled, y_train_s)))
rmse_30s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_30s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_30s_train))

Training Set R^2: 0.25049793742178184
Training Set Root Mean Squared Error: 23.819008238305077


In [223]:
y_pred_30s_test = lr_30s.predict(X_test_scaled)
y_pred_30s_test[0:20]

array([183.58915943, 183.01724314, 194.53553804, 173.71538212,
       188.67421286, 171.59289061, 174.31658101, 164.00675307,
       183.44840565, 170.44530275, 180.53440498, 176.07273281,
       194.03437084, 180.76101717, 177.1190198 , 178.7508666 ,
       185.32544664, 186.75438634, 177.8295776 , 183.22683743])

In [224]:
print("Test Set R^2: {}".format(lr_30s.score(X_test_scaled, y_test_s)))
rmse_30s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_30s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_30s_test))

Test Set R^2: 0.20477116595288125
Test Set Root Mean Squared Error: 25.540483838521595


In [225]:
filename = 'cancer_lr_30s.sav'
pickle.dump(lr_30s, open(filename, 'wb'))

In [226]:
lr_31s = linear_model.ElasticNet(alpha=10, l1_ratio=0.75)

In [227]:
lr_31s.fit(X_train_scaled, y_train_s)

ElasticNet(alpha=10, copy_X=True, fit_intercept=True, l1_ratio=0.75,
      max_iter=1000, normalize=False, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [228]:
y_pred_31s_train = lr_31s.predict(X_train_scaled)
y_pred_31s_train[0:20]

array([178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151])

In [229]:
print("Training Set R^2: {}".format(lr_31s.score(X_train_scaled, y_train_s)))
rmse_31s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_31s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_31s_train))

Training Set R^2: 0.0
Training Set Root Mean Squared Error: 27.512956306533045


In [230]:
y_pred_31s_test = lr_31s.predict(X_test_scaled)
y_pred_31s_test[0:20]

array([178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151])

In [231]:
print("Test Set R^2: {}".format(lr_31s.score(X_test_scaled, y_test_s)))
rmse_31s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_31s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_31s_test))

Test Set R^2: -0.001131456212104309
Test Set Root Mean Squared Error: 28.656860834509846


In [232]:
filename = 'cancer_lr_31s.sav'
pickle.dump(lr_31s, open(filename, 'wb'))

In [233]:
lr_32s = linear_model.ElasticNet(alpha=100, l1_ratio=0.75, normalize=True)

In [234]:
lr_32s.fit(X_train_scaled, y_train_s)

ElasticNet(alpha=100, copy_X=True, fit_intercept=True, l1_ratio=0.75,
      max_iter=1000, normalize=True, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [235]:
y_pred_32s_train = lr_32s.predict(X_train_scaled)
y_pred_32s_train[0:20]

array([178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151])

In [236]:
print("Training Set R^2: {}".format(lr_32s.score(X_train_scaled, y_train_s)))
rmse_32s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_32s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_32s_train))

Training Set R^2: 0.0
Training Set Root Mean Squared Error: 27.512956306533045


In [237]:
y_pred_32s_test = lr_32s.predict(X_test_scaled)
y_pred_32s_test[0:20]

array([178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151,
       178.85851151, 178.85851151, 178.85851151, 178.85851151])

In [238]:
print("Test Set R^2: {}".format(lr_32s.score(X_test_scaled, y_test_s)))
rmse_32s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_32s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_32s_test))

Test Set R^2: -0.001131456212104309
Test Set Root Mean Squared Error: 28.656860834509846


In [239]:
filename = 'cancer_lr_32s.sav'
pickle.dump(lr_32s, open(filename, 'wb'))

## Stochastic Gradient Descent with L2 Penalty (Ridge Regression)

In [240]:
lr_33s = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=0.00001, random_state=42)

In [241]:
lr_33s.fit(X_train_scaled, y_train_s)



SGDRegressor(alpha=1e-05, average=False, early_stopping=False, epsilon=0.1,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='invscaling', loss='huber', max_iter=None,
       n_iter=None, n_iter_no_change=5, penalty='l2', power_t=0.25,
       random_state=42, shuffle=True, tol=None, validation_fraction=0.1,
       verbose=0, warm_start=False)

In [242]:
y_pred_33s_train = lr_33s.predict(X_train_scaled)
y_pred_33s_train[0:20]

array([56.6241195 , 53.86045923, 51.42825635, 51.49415511, 52.16946793,
       56.14912863, 49.70013916, 59.63721479, 49.94974509, 49.93793021,
       53.25755755, 53.3829297 , 49.51707274, 50.59398765, 52.156329  ,
       50.46635012, 55.65336984, 50.52079978, 49.27595255, 46.96058755])

In [243]:
print("Training Set R^2: {}".format(lr_33s.score(X_train_scaled, y_train_s)))
rmse_33s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_33s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_33s_train))

Training Set R^2: -20.842733289492315
Training Set Root Mean Squared Error: 128.585130477708


In [244]:
y_pred_33s_test = lr_33s.predict(X_test_scaled)
y_pred_33s_test[0:20]

array([49.5928203 , 49.81344182, 56.75170416, 52.62717543, 55.50688568,
       48.13688581, 50.62767921, 54.22303393, 52.85366783, 67.40544747,
       50.27251257, 47.20907954, 48.10454173, 52.76661863, 47.65942886,
       53.14176286, 51.17824084, 54.38364019, 50.89187593, 52.15331657])

In [245]:
print("Test Set R^2: {}".format(lr_33s.score(X_test_scaled, y_test_s)))
rmse_33s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_33s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_33s_test))

Test Set R^2: -18.938878242454603
Test Set Root Mean Squared Error: 127.88906775737071


In [246]:
filename = 'cancer_lr_33s.sav'
pickle.dump(lr_33s, open(filename, 'wb'))

In [247]:
lr_34s = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=0.0001, random_state=42)

In [248]:
lr_34s.fit(X_train_scaled, y_train_s)



SGDRegressor(alpha=0.0001, average=False, early_stopping=False, epsilon=0.1,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='invscaling', loss='huber', max_iter=None,
       n_iter=None, n_iter_no_change=5, penalty='l2', power_t=0.25,
       random_state=42, shuffle=True, tol=None, validation_fraction=0.1,
       verbose=0, warm_start=False)

In [249]:
y_pred_34s_train = lr_34s.predict(X_train_scaled)
y_pred_34s_train[0:20]

array([56.58588102, 53.82415689, 51.39363111, 51.45948176, 52.13433484,
       56.11121452, 49.66671914, 59.59688626, 49.91615848, 49.9043462 ,
       53.22166059, 53.34694467, 49.48377733, 50.55994746, 52.1212083 ,
       50.43238865, 55.61581137, 50.48680272, 49.24283739, 46.92906538])

In [250]:
print("Training Set R^2: {}".format(lr_34s.score(X_train_scaled, y_train_s)))
rmse_34s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_34s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_34s_train))

Training Set R^2: -20.85464238938476
Training Set Root Mean Squared Error: 128.62017930771657


In [251]:
y_pred_34s_test = lr_34s.predict(X_test_scaled)
y_pred_34s_test[0:20]

array([49.55946723, 49.77994739, 56.71338617, 52.59173129, 55.46942169,
       48.10454916, 50.59361564, 54.18648374, 52.81805639, 67.35974161,
       50.23869126, 47.17738392, 48.07221911, 52.73107145, 47.62742626,
       53.10594829, 51.1437913 , 54.34695516, 50.85761774, 52.11819366])

In [252]:
print("Test Set R^2: {}".format(lr_34s.score(X_test_scaled, y_test_s)))
rmse_34s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_34s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_34s_test))

Test Set R^2: -18.94978340653399
Test Set Root Mean Squared Error: 127.92403613946794


In [253]:
filename = 'cancer_lr_34s.sav'
pickle.dump(lr_34s, open(filename, 'wb'))

In [254]:
lr_35s = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=0.001, random_state=42)

In [255]:
lr_35s.fit(X_train_scaled, y_train_s)



SGDRegressor(alpha=0.001, average=False, early_stopping=False, epsilon=0.1,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='invscaling', loss='huber', max_iter=None,
       n_iter=None, n_iter_no_change=5, penalty='l2', power_t=0.25,
       random_state=42, shuffle=True, tol=None, validation_fraction=0.1,
       verbose=0, warm_start=False)

In [256]:
y_pred_35s_train = lr_35s.predict(X_train_scaled)
y_pred_35s_train[0:20]

array([56.2054368 , 53.46297477, 51.04913523, 51.11450757, 51.78478613,
       55.73399775, 49.33421415, 59.19564744, 49.58199567, 49.57020953,
       52.86451223, 52.98892027, 49.15251214, 50.22127225, 51.77178253,
       50.09449718, 55.24213231, 50.14855696, 48.91336498, 46.61544276])

In [257]:
print("Training Set R^2: {}".format(lr_35s.score(X_train_scaled, y_train_s)))
rmse_35s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_35s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_35s_train))

Training Set R^2: -20.973314819820978
Training Set Root Mean Squared Error: 128.96891538734758


In [258]:
y_pred_35s_test = lr_35s.predict(X_test_scaled)
y_pred_35s_test[0:20]

array([49.22762876, 49.44670184, 56.33215034, 52.2390873 , 55.09668309,
       47.78282294, 50.25470785, 53.82283541, 52.46374843, 66.90500176,
       49.90219391, 46.8620356 , 47.75063313, 52.37740272, 47.30902348,
       52.74961953, 50.80104369, 53.98196672, 50.51677437, 51.76874624])

In [259]:
print("Test Set R^2: {}".format(lr_35s.score(X_test_scaled, y_test_s)))
rmse_35s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_35s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_35s_test))

Test Set R^2: -19.058453146186928
Test Set Root Mean Squared Error: 128.27197455829867


In [260]:
filename = 'cancer_lr_35s.sav'
pickle.dump(lr_35s, open(filename, 'wb'))

In [261]:
lr_36s = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=0.01, random_state=42)

In [262]:
lr_36s.fit(X_train_scaled, y_train_s)



SGDRegressor(alpha=0.01, average=False, early_stopping=False, epsilon=0.1,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='invscaling', loss='huber', max_iter=None,
       n_iter=None, n_iter_no_change=5, penalty='l2', power_t=0.25,
       random_state=42, shuffle=True, tol=None, validation_fraction=0.1,
       verbose=0, warm_start=False)

In [263]:
y_pred_36s_train = lr_36s.predict(X_train_scaled)
y_pred_36s_train[0:20]

array([52.58779044, 50.02839115, 47.77326329, 47.83411146, 48.46085172,
       52.14706466, 46.17234382, 55.38024964, 46.4043267 , 46.39281884,
       49.46833804, 49.58443181, 46.00243747, 47.00073211, 48.44898775,
       46.88145234, 51.68877298, 46.93213099, 45.78028269, 43.63314746])

In [264]:
print("Training Set R^2: {}".format(lr_36s.score(X_train_scaled, y_train_s)))
rmse_36s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_36s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_36s_train))

Training Set R^2: -22.118644745046833
Training Set Root Mean Squared Error: 132.28738882792373


In [265]:
y_pred_36s_test = lr_36s.predict(X_test_scaled)
y_pred_36s_test[0:20]

array([46.07213628, 46.27777152, 52.70693062, 48.88567156, 51.55231533,
       44.72345285, 47.03195551, 50.36477791, 49.09455852, 62.58080367,
       46.70237885, 43.86332679, 44.69265411, 49.01427686, 44.28124512,
       49.36122834, 47.54180034, 50.51130025, 47.27568524, 48.44576764])

In [266]:
print("Test Set R^2: {}".format(lr_36s.score(X_test_scaled, y_test_s)))
rmse_36s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_36s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_36s_test))

Test Set R^2: -20.107370558488515
Test Set Root Mean Squared Error: 131.58310436747848


In [267]:
filename = 'cancer_lr_36s.sav'
pickle.dump(lr_36s, open(filename, 'wb'))

In [268]:
lr_37s = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=0.1, random_state=42)

In [269]:
lr_37s.fit(X_train_scaled, y_train_s)



SGDRegressor(alpha=0.1, average=False, early_stopping=False, epsilon=0.1,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='invscaling', loss='huber', max_iter=None,
       n_iter=None, n_iter_no_change=5, penalty='l2', power_t=0.25,
       random_state=42, shuffle=True, tol=None, validation_fraction=0.1,
       verbose=0, warm_start=False)

In [270]:
y_pred_37s_train = lr_37s.predict(X_train_scaled)
y_pred_37s_train[0:20]

array([29.60850761, 28.20701461, 26.96206884, 26.99541851, 27.34355193,
       29.36415055, 26.08434909, 31.14411014, 26.2141614 , 26.20591514,
       27.89347249, 27.95753978, 25.98962866, 26.54023837, 27.33748082,
       26.47065212, 29.11566604, 26.49923921, 25.87267599, 24.68693125])

In [271]:
print("Training Set R^2: {}".format(lr_37s.score(X_train_scaled, y_train_s)))
rmse_37s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_37s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_37s_train))

Training Set R^2: -30.107039895282966
Training Set Root Mean Squared Error: 153.4498971050498


In [272]:
y_pred_37s_test = lr_37s.predict(X_test_scaled)
y_pred_37s_test[0:20]

array([26.02657413, 26.14392958, 29.67727137, 27.57871023, 29.03880455,
       25.2867021 , 26.55728478, 28.39310474, 27.68986111, 35.10980515,
       26.37417674, 24.81265385, 25.26767377, 27.64734118, 25.04465415,
       27.83531466, 26.83644513, 28.46625765, 26.6878887 , 27.33418911])

In [273]:
print("Test Set R^2: {}".format(lr_37s.score(X_test_scaled, y_test_s)))
rmse_37s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_37s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_37s_test))

Test Set R^2: -27.428542007630302
Test Set Root Mean Squared Error: 152.7074954654062


In [274]:
filename = 'cancer_lr_37s.sav'
pickle.dump(lr_37s, open(filename, 'wb'))

In [275]:
lr_38s = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=1, random_state=42)

In [276]:
lr_38s.fit(X_train_scaled, y_train_s)



SGDRegressor(alpha=1, average=False, early_stopping=False, epsilon=0.1,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='invscaling', loss='huber', max_iter=None,
       n_iter=None, n_iter_no_change=5, penalty='l2', power_t=0.25,
       random_state=42, shuffle=True, tol=None, validation_fraction=0.1,
       verbose=0, warm_start=False)

In [277]:
y_pred_38s_train = lr_38s.predict(X_train_scaled)
y_pred_38s_train[0:20]

array([5.10783921, 4.93804057, 4.77380154, 4.77832755, 4.82492538,
       5.07827229, 4.66581271, 5.30557368, 4.68019449, 4.6790049 ,
       4.89166705, 4.89943015, 4.64964928, 4.72491711, 4.82213021,
       4.71152262, 5.04585428, 4.71474233, 4.63834349, 4.4860846 ])

In [278]:
print("Training Set R^2: {}".format(lr_38s.score(X_train_scaled, y_train_s)))
rmse_38s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_38s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_38s_train))

Training Set R^2: -39.980697494234526
Training Set Root Mean Squared Error: 176.127402966156


In [279]:
y_pred_38s_test = lr_38s.predict(X_test_scaled)
y_pred_38s_test[0:20]

array([4.65401884, 4.6698669 , 5.1205736 , 4.85417627, 5.03649628,
       4.56026974, 4.72383999, 4.96020224, 4.86506103, 5.81500433,
       4.69727377, 4.5003984 , 4.55767004, 4.86083686, 4.53116244,
       4.88474589, 4.75825195, 4.96389713, 4.73715939, 4.8204353 ])

In [280]:
print("Test Set R^2: {}".format(lr_38s.score(X_test_scaled, y_test_s)))
rmse_38s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_38s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_38s_test))

Test Set R^2: -36.48659816150742
Test Set Root Mean Squared Error: 175.3561800813042


In [281]:
filename = 'cancer_lr_38s.sav'
pickle.dump(lr_38s, open(filename, 'wb'))

In [282]:
lr_39s = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=10, random_state=42)

In [283]:
lr_39s.fit(X_train_scaled, y_train_s)



SGDRegressor(alpha=10, average=False, early_stopping=False, epsilon=0.1,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='invscaling', loss='huber', max_iter=None,
       n_iter=None, n_iter_no_change=5, penalty='l2', power_t=0.25,
       random_state=42, shuffle=True, tol=None, validation_fraction=0.1,
       verbose=0, warm_start=False)

In [284]:
y_pred_39s_train = lr_39s.predict(X_train_scaled)
y_pred_39s_train[0:20]

array([1.89462136, 1.88397568, 1.86556755, 1.86592954, 1.87090127,
       1.89056118, 1.85652939, 1.91563549, 1.85695293, 1.85644544,
       1.877047  , 1.87596407, 1.85361964, 1.86196309, 1.87082975,
       1.8589472 , 1.89015803, 1.85940087, 1.8521145 , 1.83730599])

In [285]:
print("Training Set R^2: {}".format(lr_39s.score(X_train_scaled, y_train_s)))
rmse_39s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_39s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_39s_train))

Training Set R^2: -41.37934178409891
Training Set Root Mean Squared Error: 179.10774333938093


In [286]:
y_pred_39s_test = lr_39s.predict(X_test_scaled)
y_pred_39s_test[0:20]

array([1.85236487, 1.85510996, 1.89959722, 1.87479791, 1.88745761,
       1.84466899, 1.86019279, 1.88592847, 1.8735895 , 1.97009012,
       1.85755377, 1.83802673, 1.84237937, 1.87396185, 1.8417202 ,
       1.87722757, 1.8640267 , 1.87970494, 1.85947278, 1.87030786])

In [287]:
print("Test Set R^2: {}".format(lr_39s.score(X_test_scaled, y_test_s)))
rmse_39s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_39s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_39s_test))

Test Set R^2: -37.770302784032445
Test Set Root Mean Squared Error: 178.33338671392406


In [288]:
filename = 'cancer_lr_39s.sav'
pickle.dump(lr_39s, open(filename, 'wb'))

In [289]:
lr_40s = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=100, random_state=42)

In [290]:
lr_40s.fit(X_train_scaled, y_train_s)



SGDRegressor(alpha=100, average=False, early_stopping=False, epsilon=0.1,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='invscaling', loss='huber', max_iter=None,
       n_iter=None, n_iter_no_change=5, penalty='l2', power_t=0.25,
       random_state=42, shuffle=True, tol=None, validation_fraction=0.1,
       verbose=0, warm_start=False)

In [291]:
y_pred_40s_train = lr_40s.predict(X_train_scaled)
y_pred_40s_train[0:20]

array([1.57965911, 1.57640227, 1.57490663, 1.57648073, 1.5756532 ,
       1.5787565 , 1.57386527, 1.58079238, 1.57418523, 1.57450265,
       1.57602673, 1.57676935, 1.57407759, 1.57489424, 1.57564942,
       1.57450916, 1.57829748, 1.57463812, 1.5736695 , 1.57270253])

In [292]:
print("Training Set R^2: {}".format(lr_40s.score(X_train_scaled, y_train_s)))
rmse_40s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_40s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_40s_train))

Training Set R^2: -41.519635785382626
Training Set Root Mean Squared Error: 179.40396054494346


In [293]:
y_pred_40s_test = lr_40s.predict(X_test_scaled)
y_pred_40s_test[0:20]

array([1.5739949 , 1.57531806, 1.57955582, 1.57584766, 1.5786258 ,
       1.57320019, 1.57478853, 1.57690807, 1.5762023 , 1.5865057 ,
       1.57445441, 1.57272291, 1.57444294, 1.57606241, 1.57354476,
       1.57637684, 1.57601556, 1.57806824, 1.57641604, 1.57653847])

In [294]:
print("Test Set R^2: {}".format(lr_40s.score(X_test_scaled, y_test_s)))
rmse_40s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_40s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_40s_test))

Test Set R^2: -37.89909900180667
Test Set Root Mean Squared Error: 178.6293557935772


In [295]:
filename = 'cancer_lr_40s.sav'
pickle.dump(lr_40s, open(filename, 'wb'))

In [296]:
lr_41s = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=0.00001, random_state=42, 
                                   learning_rate='constant')

In [297]:
lr_41s.fit(X_train_scaled, y_train_s)



SGDRegressor(alpha=1e-05, average=False, early_stopping=False, epsilon=0.1,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='constant', loss='huber', max_iter=None, n_iter=None,
       n_iter_no_change=5, penalty='l2', power_t=0.25, random_state=42,
       shuffle=True, tol=None, validation_fraction=0.1, verbose=0,
       warm_start=False)

In [298]:
y_pred_41s_train = lr_41s.predict(X_train_scaled)
y_pred_41s_train[0:20]

array([189.00788908, 175.80044257, 172.84200501, 177.14673652,
       175.83016866, 187.13164955, 163.74452818, 192.35251382,
       168.63563193, 170.08719929, 177.7283409 , 173.68036617,
       168.44064808, 167.0178002 , 177.08235147, 171.08730687,
       183.79717056, 172.41022837, 164.47092135, 159.47242606])

In [299]:
print("Training Set R^2: {}".format(lr_41s.score(X_train_scaled, y_train_s)))
rmse_41s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_41s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_41s_train))

Training Set R^2: 0.0869860668793655
Training Set Root Mean Squared Error: 26.28911468740667


In [300]:
y_pred_41s_test = lr_41s.predict(X_test_scaled)
y_pred_41s_test[0:20]

array([169.46883014, 170.96145337, 193.63849543, 174.87955842,
       187.39928314, 163.14123799, 170.86849231, 175.55555965,
       180.70542248, 214.18646212, 172.31201342, 161.0961076 ,
       167.24510349, 179.78933367, 162.61591375, 180.43393624,
       175.99694751, 183.43312736, 170.25848705, 178.87037902])

In [301]:
print("Test Set R^2: {}".format(lr_41s.score(X_test_scaled, y_test_s)))
rmse_41s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_41s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_41s_test))

Test Set R^2: 0.08194950445573113
Test Set Root Mean Squared Error: 27.442037003924593


In [302]:
filename = 'cancer_lr_41s.sav'
pickle.dump(lr_41s, open(filename, 'wb'))

In [303]:
lr_41s2 = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=0.0001, random_state=42, 
                                    learning_rate='constant')

In [304]:
lr_41s2.fit(X_train_scaled, y_train_s)



SGDRegressor(alpha=0.0001, average=False, early_stopping=False, epsilon=0.1,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='constant', loss='huber', max_iter=None, n_iter=None,
       n_iter_no_change=5, penalty='l2', power_t=0.25, random_state=42,
       shuffle=True, tol=None, validation_fraction=0.1, verbose=0,
       warm_start=False)

In [305]:
y_pred_41s2_train = lr_41s2.predict(X_train_scaled)
y_pred_41s2_train[0:20]

array([188.87915728, 175.7134404 , 172.7287393 , 177.00547351,
       175.72009474, 187.00897801, 163.65656186, 192.24421485,
       168.53094977, 169.96817371, 177.61603074, 173.59119917,
       168.3347908 , 166.92435758, 176.95110812, 170.98237872,
       183.67663523, 172.29229794, 164.35182859, 159.37150823])

In [306]:
print("Training Set R^2: {}".format(lr_41s2.score(X_train_scaled, y_train_s)))
rmse_41s2_train = np.sqrt(mean_squared_error(y_train_s, y_pred_41s2_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_41s2_train))

Training Set R^2: 0.08625441872467643
Training Set Root Mean Squared Error: 26.29964603293361


In [307]:
y_pred_41s2_test = lr_41s2.predict(X_test_scaled)
y_pred_41s2_test[0:20]

array([169.35145383, 170.82348958, 193.48626713, 174.7697625 ,
       187.26538642, 163.02528035, 170.75904059, 175.46130919,
       180.56394131, 214.05993743, 172.19031116, 160.98935365,
       167.09771106, 179.66132838, 162.4969326 , 180.32634629,
       175.86140194, 183.29855206, 170.1575938 , 178.72842372])

In [308]:
print("Test Set R^2: {}".format(lr_41s2.score(X_test_scaled, y_test_s)))
rmse_41s2_test = np.sqrt(mean_squared_error(y_test_s, y_pred_41s2_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_41s2_test))

Test Set R^2: 0.08155710218058232
Test Set Root Mean Squared Error: 27.447901151581934


In [309]:
filename = 'cancer_lr_41s2.sav'
pickle.dump(lr_41s2, open(filename, 'wb'))

In [310]:
lr_41s3 = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=0.001, random_state=42, 
                                    learning_rate='constant')

In [311]:
lr_41s3.fit(X_train_scaled, y_train_s)



SGDRegressor(alpha=0.001, average=False, early_stopping=False, epsilon=0.1,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='constant', loss='huber', max_iter=None, n_iter=None,
       n_iter_no_change=5, penalty='l2', power_t=0.25, random_state=42,
       shuffle=True, tol=None, validation_fraction=0.1, verbose=0,
       warm_start=False)

In [312]:
y_pred_41s3_train = lr_41s3.predict(X_train_scaled)
y_pred_41s3_train[0:20]

array([187.79217334, 175.16118114, 171.81128817, 175.88635229,
       174.86729337, 185.88902679, 163.04357263, 191.40572382,
       167.68004923, 169.01316974, 176.70887348, 172.77081492,
       167.44828332, 166.20537042, 175.98015985, 170.02886243,
       182.72393269, 171.30153153, 163.59465625, 158.58515054])

In [313]:
print("Training Set R^2: {}".format(lr_41s3.score(X_train_scaled, y_train_s)))
rmse_41s3_train = np.sqrt(mean_squared_error(y_train_s, y_pred_41s3_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_41s3_train))

Training Set R^2: 0.07874592264980462
Training Set Root Mean Squared Error: 26.407480633437256


In [314]:
y_pred_41s3_test = lr_41s3.predict(X_test_scaled)
y_pred_41s3_test[0:20]

array([168.33801231, 169.81694206, 192.29855036, 174.01915209,
       186.07843064, 162.17749992, 169.95286722, 174.92580551,
       179.49208048, 213.18192435, 171.19922378, 160.15340553,
       165.89222532, 178.6458459 , 161.57482953, 179.32470995,
       174.79737168, 182.15452347, 169.23766865, 177.66723791])

In [315]:
print("Test Set R^2: {}".format(lr_41s3.score(X_test_scaled, y_test_s)))
rmse_41s3_test = np.sqrt(mean_squared_error(y_test_s, y_pred_41s3_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_41s3_test))

Test Set R^2: 0.07667783383591076
Test Set Root Mean Squared Error: 27.520713666619233


In [316]:
filename = 'cancer_lr_41s3.sav'
pickle.dump(lr_41s3, open(filename, 'wb'))

In [317]:
lr_41s4 = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=0.01, random_state=42, 
                                    learning_rate='constant')

In [318]:
lr_41s4.fit(X_train_scaled, y_train_s)



SGDRegressor(alpha=0.01, average=False, early_stopping=False, epsilon=0.1,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='constant', loss='huber', max_iter=None, n_iter=None,
       n_iter_no_change=5, penalty='l2', power_t=0.25, random_state=42,
       shuffle=True, tol=None, validation_fraction=0.1, verbose=0,
       warm_start=False)

In [319]:
y_pred_41s4_train = lr_41s4.predict(X_train_scaled)
y_pred_41s4_train[0:20]

array([173.95237506, 164.18154547, 159.18937642, 161.40444692,
       162.09014144, 172.08138327, 152.51617134, 179.05003364,
       155.54437689, 156.15384462, 164.06816775, 161.59951631,
       154.92824304, 155.01583655, 162.57116778, 157.14871449,
       170.04695884, 157.96970732, 152.27942089, 147.00222809])

In [320]:
print("Training Set R^2: {}".format(lr_41s4.score(X_train_scaled, y_train_s)))
rmse_41s4_train = np.sqrt(mean_squared_error(y_train_s, y_pred_41s4_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_41s4_train))

Training Set R^2: -0.234071454232067
Training Set Root Mean Squared Error: 30.56380440756597


In [321]:
y_pred_41s4_test = lr_41s4.predict(X_test_scaled)
y_pred_41s4_test[0:20]

array([155.08944734, 156.3652976 , 176.75058155, 162.10364986,
       171.81041006, 150.34396356, 157.66989686, 164.47876877,
       165.21230186, 200.20086342, 157.61750211, 148.11099271,
       151.69843189, 164.80588235, 149.22083621, 165.53863896,
       160.64326464, 168.28657602, 157.19885755, 163.36976231])

In [322]:
print("Test Set R^2: {}".format(lr_41s4.score(X_test_scaled, y_test_s)))
rmse_41s4_test = np.sqrt(mean_squared_error(y_test_s, y_pred_41s4_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_41s4_test))

Test Set R^2: -0.1845885919064285
Test Set Root Mean Squared Error: 31.17215548992959


In [323]:
filename = 'cancer_lr_41s4.sav'
pickle.dump(lr_41s4, open(filename, 'wb'))

In [324]:
lr_41s5 = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=0.00001, epsilon= 0.01, random_state=42, 
                                    learning_rate='constant')

In [325]:
lr_41s5.fit(X_train_scaled, y_train_s)



SGDRegressor(alpha=1e-05, average=False, early_stopping=False, epsilon=0.01,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='constant', loss='huber', max_iter=None, n_iter=None,
       n_iter_no_change=5, penalty='l2', power_t=0.25, random_state=42,
       shuffle=True, tol=None, validation_fraction=0.1, verbose=0,
       warm_start=False)

In [326]:
y_pred_41s5_train = lr_41s5.predict(X_train_scaled)
y_pred_41s5_train[0:20]

array([44.58490892, 42.42423536, 40.49880117, 40.54877445, 41.08759639,
       44.20647191, 39.14415016, 46.95833743, 39.34586283, 39.33088572,
       41.9378157 , 42.03561499, 38.99863133, 39.84749177, 41.08060212,
       39.7390055 , 43.82495525, 39.78411365, 38.81859806, 36.98413274])

In [327]:
print("Training Set R^2: {}".format(lr_41s5.score(X_train_scaled, y_train_s)))
rmse_41s5_train = np.sqrt(mean_squared_error(y_train_s, y_pred_41s5_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_41s5_train))

Training Set R^2: -24.766489044999997
Training Set Root Mean Squared Error: 139.65769862332934


In [328]:
y_pred_41s5_test = lr_41s5.predict(X_test_scaled)
y_pred_41s5_test[0:20]

array([39.05274253, 39.2359871 , 44.69235525, 41.45417934, 43.7034828 ,
       37.91195641, 39.87449118, 42.71318742, 41.6251052 , 53.08660797,
       39.59069733, 37.17907686, 37.87816582, 41.55929063, 37.53658148,
       41.84959176, 40.30430414, 42.81976984, 40.07175687, 41.07486175])

In [329]:
print("Test Set R^2: {}".format(lr_41s5.score(X_test_scaled, y_test_s)))
rmse_41s5_test = np.sqrt(mean_squared_error(y_test_s, y_pred_41s5_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_41s5_test))

Test Set R^2: -22.533040219189918
Test Set Root Mean Squared Error: 138.938331595043


In [330]:
filename = 'cancer_lr_41s5.sav'
pickle.dump(lr_41s5, open(filename, 'wb'))

In [331]:
lr_41s6 = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=0.00001, epsilon=1, random_state=42, 
                                    learning_rate='constant')

In [332]:
lr_41s6.fit(X_train_scaled, y_train_s)



SGDRegressor(alpha=1e-05, average=False, early_stopping=False, epsilon=1,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='constant', loss='huber', max_iter=None, n_iter=None,
       n_iter_no_change=5, penalty='l2', power_t=0.25, random_state=42,
       shuffle=True, tol=None, validation_fraction=0.1, verbose=0,
       warm_start=False)

In [333]:
y_pred_41s6_train = lr_41s6.predict(X_train_scaled)
y_pred_41s6_train[0:20]

array([196.18182301, 152.61544007, 173.66795503, 200.3476208 ,
       176.22921876, 193.26110174, 149.05750317, 170.02270975,
       171.92156287, 179.60373832, 175.85073467, 159.58509667,
       173.07216926, 166.20412604, 183.6362625 , 177.06819374,
       175.01152875, 181.28460373, 159.58909788, 162.06143811])

In [334]:
print("Training Set R^2: {}".format(lr_41s6.score(X_train_scaled, y_train_s)))
rmse_41s6_train = np.sqrt(mean_squared_error(y_train_s, y_pred_41s6_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_41s6_train))

Training Set R^2: 0.4206135388212725
Training Set Root Mean Squared Error: 20.942157899753244


In [335]:
y_pred_41s6_test = lr_41s6.predict(X_test_scaled)
y_pred_41s6_test[0:20]

array([186.03651341, 188.37259424, 213.38207229, 162.27801866,
       200.94374487, 163.63153635, 169.61415951, 146.61901694,
       193.19095413, 179.93044518, 187.57943826, 168.31514081,
       203.28585733, 188.1105665 , 175.15485259, 185.19833949,
       194.73443578, 192.33126724, 177.01663535, 196.14257988])

In [336]:
print("Test Set R^2: {}".format(lr_41s6.score(X_test_scaled, y_test_s)))
rmse_41s6_test = np.sqrt(mean_squared_error(y_test_s, y_pred_41s6_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_41s6_test))

Test Set R^2: 0.3809118166931488
Test Set Root Mean Squared Error: 22.535091108061813


In [337]:
filename = 'cancer_lr_41s6.sav'
pickle.dump(lr_41s6, open(filename, 'wb'))

In [338]:
lr_41s7 = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=0.00001, epsilon=2, random_state=42, 
                                    learning_rate='constant')

In [339]:
lr_41s7.fit(X_train_scaled, y_train_s)



SGDRegressor(alpha=1e-05, average=False, early_stopping=False, epsilon=2,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='constant', loss='huber', max_iter=None, n_iter=None,
       n_iter_no_change=5, penalty='l2', power_t=0.25, random_state=42,
       shuffle=True, tol=None, validation_fraction=0.1, verbose=0,
       warm_start=False)

In [340]:
y_pred_41s7_train = lr_41s7.predict(X_train_scaled)
y_pred_41s7_train[0:20]

array([198.90097819, 150.51356069, 172.61810509, 202.23454174,
       174.69494165, 195.24695359, 148.08899452, 167.33357459,
       173.46076044, 180.22306508, 174.74931923, 159.99922885,
       172.88734713, 169.91081714, 184.06997724, 175.72594846,
       173.17294702, 179.0313156 , 158.68819296, 162.04702088])

In [341]:
print("Training Set R^2: {}".format(lr_41s7.score(X_train_scaled, y_train_s)))
rmse_41s7_train = np.sqrt(mean_squared_error(y_train_s, y_pred_41s7_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_41s7_train))

Training Set R^2: 0.46165075816685297
Training Set Root Mean Squared Error: 20.186885111025234


In [342]:
y_pred_41s7_test = lr_41s7.predict(X_test_scaled)
y_pred_41s7_test[0:20]

array([187.47785389, 191.95091154, 213.18910252, 159.23409623,
       203.67791188, 160.77287821, 167.0465279 , 144.06030419,
       190.95425565, 180.01173446, 189.38757348, 168.59947973,
       207.94775386, 187.62375474, 178.51720319, 183.30556652,
       195.1888474 , 190.18958358, 182.29590535, 195.59130865])

In [343]:
print("Test Set R^2: {}".format(lr_41s7.score(X_test_scaled, y_test_s)))
rmse_41s7_test = np.sqrt(mean_squared_error(y_test_s, y_pred_41s7_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_41s7_test))

Test Set R^2: 0.4130132037537061
Test Set Root Mean Squared Error: 21.94306184183735


In [344]:
filename = 'cancer_lr_41s7.sav'
pickle.dump(lr_41s7, open(filename, 'wb'))

In [345]:
lr_41s8 = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=0.00001, epsilon=4, random_state=42, 
                                    learning_rate='constant')

In [346]:
lr_41s8.fit(X_train_scaled, y_train_s)



SGDRegressor(alpha=1e-05, average=False, early_stopping=False, epsilon=4,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='constant', loss='huber', max_iter=None, n_iter=None,
       n_iter_no_change=5, penalty='l2', power_t=0.25, random_state=42,
       shuffle=True, tol=None, validation_fraction=0.1, verbose=0,
       warm_start=False)

In [347]:
y_pred_41s8_train = lr_41s8.predict(X_train_scaled)
y_pred_41s8_train[0:20]

array([200.20357804, 150.01196937, 171.22852052, 204.10369979,
       176.27434022, 199.21978753, 148.50743242, 166.24451359,
       174.67746602, 181.73379552, 173.7613069 , 159.47729496,
       171.61394205, 170.17109202, 183.93444799, 172.37438723,
       171.04231016, 175.71269449, 158.05315841, 163.8410476 ])

In [348]:
print("Training Set R^2: {}".format(lr_41s8.score(X_train_scaled, y_train_s)))
rmse_41s8_train = np.sqrt(mean_squared_error(y_train_s, y_pred_41s8_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_41s8_train))

Training Set R^2: 0.49296482264385066
Training Set Root Mean Squared Error: 19.59098643928991


In [349]:
y_pred_41s8_test = lr_41s8.predict(X_test_scaled)
y_pred_41s8_test[0:20]

array([187.55420366, 192.2662746 , 209.32932113, 157.28477126,
       207.13563297, 157.78631215, 164.34874648, 143.8887776 ,
       186.55793889, 175.01054358, 190.59382506, 167.57650407,
       206.19672682, 186.75084471, 179.07783728, 181.62792944,
       194.24771332, 188.33714196, 184.35851942, 193.26825675])

In [350]:
print("Test Set R^2: {}".format(lr_41s8.score(X_test_scaled, y_test_s)))
rmse_41s8_test = np.sqrt(mean_squared_error(y_test_s, y_pred_41s8_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_41s8_test))

Test Set R^2: 0.44719633267733583
Test Set Root Mean Squared Error: 21.294552538172667


In [351]:
filename = 'cancer_lr_41s8.sav'
pickle.dump(lr_41s8, open(filename, 'wb'))

In [352]:
lr_41s9 = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=0.00001, epsilon=32, random_state=42, 
                                    learning_rate='constant')

In [353]:
lr_41s9.fit(X_train_scaled, y_train_s)



SGDRegressor(alpha=1e-05, average=False, early_stopping=False, epsilon=32,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='constant', loss='huber', max_iter=None, n_iter=None,
       n_iter_no_change=5, penalty='l2', power_t=0.25, random_state=42,
       shuffle=True, tol=None, validation_fraction=0.1, verbose=0,
       warm_start=False)

In [354]:
y_pred_41s9_train = lr_41s9.predict(X_train_scaled)
y_pred_41s9_train[0:20]

array([181.04619442, 137.15588671, 167.74752665, 204.12943707,
       169.49132355, 193.30523241, 139.39724889, 145.64053556,
       175.85370772, 177.70423555, 164.98799653, 148.1584993 ,
       163.20728793, 156.88892419, 184.45859171, 161.20941326,
       157.37694967, 170.84158133, 145.85965214, 162.82957853])

In [355]:
print("Training Set R^2: {}".format(lr_41s9.score(X_train_scaled, y_train_s)))
rmse_41s9_train = np.sqrt(mean_squared_error(y_train_s, y_pred_41s9_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_41s9_train))

Training Set R^2: 0.3875345989188239
Training Set Root Mean Squared Error: 21.53168603014946


In [356]:
y_pred_41s9_test = lr_41s9.predict(X_test_scaled)
y_pred_41s9_test[0:20]

array([186.18516094, 189.00890108, 196.80158989, 155.50792702,
       204.57481786, 153.90339677, 150.97174973, 126.93107537,
       176.98155723, 121.85957568, 188.80692317, 156.65136328,
       195.6094302 , 183.05002903, 173.01828693, 180.42158432,
       191.45418891, 180.09646177, 171.26423422, 189.49450462])

In [357]:
print("Test Set R^2: {}".format(lr_41s9.score(X_test_scaled, y_test_s)))
rmse_41s9_test = np.sqrt(mean_squared_error(y_test_s, y_pred_41s9_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_41s9_test))

Test Set R^2: 0.38006043265959444
Test Set Root Mean Squared Error: 22.55058116711339


In [358]:
filename = 'cancer_lr_41s9.sav'
pickle.dump(lr_41s9, open(filename, 'wb'))

In [366]:
lr_41s10 = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=0.00001, epsilon=18, random_state=42, 
                                     learning_rate='constant')

In [367]:
lr_41s10.fit(X_train_scaled, y_train_s)



SGDRegressor(alpha=1e-05, average=False, early_stopping=False, epsilon=18,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='constant', loss='huber', max_iter=None, n_iter=None,
       n_iter_no_change=5, penalty='l2', power_t=0.25, random_state=42,
       shuffle=True, tol=None, validation_fraction=0.1, verbose=0,
       warm_start=False)

In [368]:
y_pred_41s10_train = lr_41s10.predict(X_train_scaled)
y_pred_41s10_train[0:20]

array([187.10244013, 141.71967954, 169.56652979, 204.27672484,
       172.19934399, 198.44351467, 144.22450857, 153.65255778,
       176.0708845 , 179.6412929 , 167.88069925, 153.56365755,
       165.84541741, 162.63088279, 184.71914849, 164.44860444,
       162.68941231, 171.00413299, 149.94360766, 163.79275696])

In [369]:
print("Training Set R^2: {}".format(lr_41s10.score(X_train_scaled, y_train_s)))
rmse_41s10_train = np.sqrt(mean_squared_error(y_train_s, y_pred_41s10_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_41s10_train))

Training Set R^2: 0.4705582106134505
Training Set Root Mean Squared Error: 20.01918381590762


In [370]:
y_pred_41s10_test = lr_41s10.predict(X_test_scaled)
y_pred_41s10_test[0:20]

array([186.91642668, 189.83124149, 199.45158495, 156.22806614,
       206.87613869, 155.74678228, 153.76214382, 132.69566496,
       178.84726629, 140.07504041, 190.16936954, 160.08252137,
       199.62307257, 183.92143297, 175.93809709, 182.01842011,
       193.50187239, 182.71982159, 176.4270832 , 190.20155366])

In [371]:
print("Test Set R^2: {}".format(lr_41s10.score(X_test_scaled, y_test_s)))
rmse_41s10_test = np.sqrt(mean_squared_error(y_test_s, y_pred_41s10_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_41s10_test))

Test Set R^2: 0.4514946270789109
Test Set Root Mean Squared Error: 21.21160367351703


In [372]:
filename = 'cancer_lr_41s10.sav'
pickle.dump(lr_41s10, open(filename, 'wb'))

In [373]:
lr_41s11 = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=0.00001, epsilon=11, random_state=42, 
                                     learning_rate='constant')

In [374]:
lr_41s11.fit(X_train_scaled, y_train_s)



SGDRegressor(alpha=1e-05, average=False, early_stopping=False, epsilon=11,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='constant', loss='huber', max_iter=None, n_iter=None,
       n_iter_no_change=5, penalty='l2', power_t=0.25, random_state=42,
       shuffle=True, tol=None, validation_fraction=0.1, verbose=0,
       warm_start=False)

In [375]:
y_pred_41s11_train = lr_41s11.predict(X_train_scaled)
y_pred_41s11_train[0:20]

array([190.70864788, 142.83149068, 168.27061274, 203.70121891,
       173.08551643, 200.4899486 , 145.27700878, 157.12711699,
       174.53354824, 179.32048766, 167.34867812, 155.32835185,
       166.11406619, 164.75705354, 182.60223507, 165.82667003,
       164.71427041, 169.94212315, 153.14826539, 163.10353653])

In [376]:
print("Training Set R^2: {}".format(lr_41s11.score(X_train_scaled, y_train_s)))
rmse_41s11_train = np.sqrt(mean_squared_error(y_train_s, y_pred_41s11_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_41s11_train))

Training Set R^2: 0.4717489587881277
Training Set Root Mean Squared Error: 19.996658936549938


In [377]:
y_pred_41s11_test = lr_41s11.predict(X_test_scaled)
y_pred_41s11_test[0:20]

array([185.94968412, 189.83860271, 200.71841074, 154.39407758,
       207.45913295, 155.4790971 , 155.83754966, 136.12647704,
       178.86405821, 151.33252796, 189.037271  , 161.24829825,
       202.16334264, 183.13091943, 176.70682253, 180.4902986 ,
       193.03845981, 183.71499763, 179.47562066, 189.21273154])

In [378]:
print("Test Set R^2: {}".format(lr_41s11.score(X_test_scaled, y_test_s)))
rmse_41s11_test = np.sqrt(mean_squared_error(y_test_s, y_pred_41s11_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_41s11_test))

Test Set R^2: 0.44558502710789427
Test Set Root Mean Squared Error: 21.325564510802476


In [379]:
filename = 'cancer_lr_41s11.sav'
pickle.dump(lr_41s11, open(filename, 'wb'))

In [387]:
lr_41s12 = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=0.00001, epsilon=5, random_state=42, 
                                     learning_rate='constant')

In [388]:
lr_41s12.fit(X_train_scaled, y_train_s)



SGDRegressor(alpha=1e-05, average=False, early_stopping=False, epsilon=5,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='constant', loss='huber', max_iter=None, n_iter=None,
       n_iter_no_change=5, penalty='l2', power_t=0.25, random_state=42,
       shuffle=True, tol=None, validation_fraction=0.1, verbose=0,
       warm_start=False)

In [389]:
y_pred_41s12_train = lr_41s12.predict(X_train_scaled)
y_pred_41s12_train[0:20]

array([199.60403564, 149.31449397, 170.83052894, 204.90222745,
       176.5006574 , 200.88839296, 148.42048183, 165.90346606,
       174.93191556, 181.86707625, 173.20319124, 159.31582803,
       170.73716174, 169.74262175, 183.77570712, 171.12416518,
       170.54471875, 174.43649684, 157.6922754 , 164.31461178])

In [390]:
print("Training Set R^2: {}".format(lr_41s12.score(X_train_scaled, y_train_s)))
rmse_41s12_train = np.sqrt(mean_squared_error(y_train_s, y_pred_41s12_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_41s12_train))

Training Set R^2: 0.49995016055950314
Training Set Root Mean Squared Error: 19.45556755695575


In [391]:
y_pred_41s12_test = lr_41s12.predict(X_test_scaled)
y_pred_41s12_test[0:20]

array([187.44922645, 191.88714479, 207.85301059, 156.55632914,
       208.36072627, 157.47828672, 163.24631122, 143.49571833,
       185.03202697, 172.15281341, 190.67048716, 166.64707786,
       205.71467466, 186.05463751, 178.93447221, 181.33480878,
       193.95221883, 187.86796299, 184.48593717, 192.55298393])

In [392]:
print("Test Set R^2: {}".format(lr_41s12.score(X_test_scaled, y_test_s)))
rmse_41s12_test = np.sqrt(mean_squared_error(y_test_s, y_pred_41s12_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_41s12_test))

Test Set R^2: 0.456304140786877
Test Set Root Mean Squared Error: 21.118403021289645


In [393]:
filename = 'cancer_lr_41s12.sav'
pickle.dump(lr_41s12, open(filename, 'wb'))

In [394]:
lr_41s13 = linear_model.SGDRegressor(loss='huber', penalty='l1', alpha=0.00001, epsilon=5, random_state=42, 
                                     learning_rate='constant')

In [395]:
lr_41s13.fit(X_train_scaled, y_train_s)



SGDRegressor(alpha=1e-05, average=False, early_stopping=False, epsilon=5,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='constant', loss='huber', max_iter=None, n_iter=None,
       n_iter_no_change=5, penalty='l1', power_t=0.25, random_state=42,
       shuffle=True, tol=None, validation_fraction=0.1, verbose=0,
       warm_start=False)

In [396]:
y_pred_41s13_train = lr_41s13.predict(X_train_scaled)
y_pred_41s13_train[0:20]

array([199.61823697, 149.31370015, 170.83109317, 204.9123623 ,
       176.50007085, 200.90042478, 148.42151949, 165.91632103,
       174.9329358 , 181.86652305, 173.20465206, 159.32360659,
       170.73627753, 169.74846037, 183.77805766, 171.12445931,
       170.55290191, 174.43476008, 157.68794309, 164.31361626])

In [397]:
print("Training Set R^2: {}".format(lr_41s13.score(X_train_scaled, y_train_s)))
rmse_41s13_train = np.sqrt(mean_squared_error(y_train_s, y_pred_41s13_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_41s13_train))

Training Set R^2: 0.5000075200411056
Training Set Root Mean Squared Error: 19.45445167491215


In [398]:
y_pred_41s13_test = lr_41s13.predict(X_test_scaled)
y_pred_41s13_test[0:20]

array([187.45200438, 191.89272283, 207.8589527 , 156.5536868 ,
       208.37076172, 157.47514462, 163.24150369, 143.49370024,
       185.03273023, 172.16796816, 190.67237648, 166.64449559,
       205.72355719, 186.05435326, 178.93743657, 181.33452813,
       193.95879634, 187.87504993, 184.50094902, 192.55785366])

In [399]:
print("Test Set R^2: {}".format(lr_41s13.score(X_test_scaled, y_test_s)))
rmse_41s13_test = np.sqrt(mean_squared_error(y_test_s, y_pred_41s13_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_41s13_test))

Test Set R^2: 0.4563483326879759
Test Set Root Mean Squared Error: 21.117544746091607


In [400]:
filename = 'cancer_lr_41s13.sav'
pickle.dump(lr_41s13, open(filename, 'wb'))

In [401]:
lr_41s14 = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=0.001, epsilon=5, random_state=42, 
                                     learning_rate='constant')

In [402]:
lr_41s14.fit(X_train_scaled, y_train_s)



SGDRegressor(alpha=0.001, average=False, early_stopping=False, epsilon=5,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='constant', loss='huber', max_iter=None, n_iter=None,
       n_iter_no_change=5, penalty='l2', power_t=0.25, random_state=42,
       shuffle=True, tol=None, validation_fraction=0.1, verbose=0,
       warm_start=False)

In [403]:
y_pred_41s14_train = lr_41s14.predict(X_train_scaled)
y_pred_41s14_train[0:20]

array([199.22601154, 149.31775344, 170.77202201, 204.19270951,
       176.21696283, 200.15547001, 148.24993818, 165.85202585,
       174.61220419, 181.55561673, 173.15294788, 159.25444961,
       170.8183172 , 169.35880672, 183.50008509, 171.2413688 ,
       170.60071801, 174.60296398, 157.72150084, 164.08870665])

In [404]:
print("Training Set R^2: {}".format(lr_41s14.score(X_train_scaled, y_train_s)))
rmse_41s14_train = np.sqrt(mean_squared_error(y_train_s, y_pred_41s14_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_41s14_train))

Training Set R^2: 0.4958581859237848
Training Set Root Mean Squared Error: 19.53500912200216


In [405]:
y_pred_41s14_test = lr_41s14.predict(X_test_scaled)
y_pred_41s14_test[0:20]

array([187.09724218, 191.51906079, 207.75511492, 156.65603946,
       207.75403869, 157.70794829, 163.49421328, 143.52777852,
       185.21033202, 171.92115421, 190.1093814 , 166.81988971,
       205.22206371, 185.98194792, 178.51128204, 181.17374794,
       193.58536468, 187.96779335, 183.97311622, 192.34548205])

In [406]:
print("Test Set R^2: {}".format(lr_41s14.score(X_test_scaled, y_test_s)))
rmse_41s14_test = np.sqrt(mean_squared_error(y_test_s, y_pred_41s14_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_41s14_test))

Test Set R^2: 0.4517537359753423
Test Set Root Mean Squared Error: 21.206592998396346


In [407]:
filename = 'cancer_lr_41s14.sav'
pickle.dump(lr_41s14, open(filename, 'wb'))

In [408]:
lr_41s15 = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=0.00001, epsilon=5, random_state=42, 
                                     learning_rate='constant', eta0=0.1)

In [409]:
lr_41s15.fit(X_train_scaled, y_train_s)



SGDRegressor(alpha=1e-05, average=False, early_stopping=False, epsilon=5,
       eta0=0.1, fit_intercept=True, l1_ratio=0.15,
       learning_rate='constant', loss='huber', max_iter=None, n_iter=None,
       n_iter_no_change=5, penalty='l2', power_t=0.25, random_state=42,
       shuffle=True, tol=None, validation_fraction=0.1, verbose=0,
       warm_start=False)

In [410]:
y_pred_41s15_train = lr_41s15.predict(X_train_scaled)
y_pred_41s15_train[0:20]

array([179.31250563, 143.45096653, 170.30488628, 207.75080048,
       170.68919852, 193.39319006, 146.60394487, 154.78413677,
       182.3904968 , 183.98470197, 168.12083981, 143.33742601,
       174.17894114, 168.7617424 , 195.50430597, 170.73090727,
       152.87896054, 188.24099842, 153.22692577, 166.66829484])

In [411]:
print("Training Set R^2: {}".format(lr_41s15.score(X_train_scaled, y_train_s)))
rmse_41s15_train = np.sqrt(mean_squared_error(y_train_s, y_pred_41s15_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_41s15_train))

Training Set R^2: 0.40844217449264153
Training Set Root Mean Squared Error: 21.16098407661788


In [412]:
y_pred_41s15_test = lr_41s15.predict(X_test_scaled)
y_pred_41s15_test[0:20]

array([199.03788619, 194.64357073, 203.35034586, 164.48742413,
       197.95305395, 161.8379677 , 159.65848945, 145.54606542,
       185.25598175,  91.91098988, 192.4143021 , 173.72490504,
       206.48293179, 186.10625048, 186.68782929, 192.69524384,
       199.12872295, 179.2924761 , 166.74674879, 203.02543333])

In [413]:
print("Test Set R^2: {}".format(lr_41s15.score(X_test_scaled, y_test_s)))
rmse_41s15_test = np.sqrt(mean_squared_error(y_test_s, y_pred_41s15_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_41s15_test))

Test Set R^2: 0.40777298839726783
Test Set Root Mean Squared Error: 22.040790512795404


In [414]:
filename = 'cancer_lr_41s15.sav'
pickle.dump(lr_41s15, open(filename, 'wb'))

In [415]:
lr_41s16 = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=0.00001, epsilon=5, random_state=42, 
                                     learning_rate='constant', eta0=0.001)

In [416]:
lr_41s16.fit(X_train_scaled, y_train_s)



SGDRegressor(alpha=1e-05, average=False, early_stopping=False, epsilon=5,
       eta0=0.001, fit_intercept=True, l1_ratio=0.15,
       learning_rate='constant', loss='huber', max_iter=None, n_iter=None,
       n_iter_no_change=5, penalty='l2', power_t=0.25, random_state=42,
       shuffle=True, tol=None, validation_fraction=0.1, verbose=0,
       warm_start=False)

In [417]:
y_pred_41s16_train = lr_41s16.predict(X_train_scaled)
y_pred_41s16_train[0:20]

array([191.71112132, 157.65377301, 173.21450996, 193.88306123,
       176.77284622, 189.20782465, 152.13425548, 172.63589209,
       170.60087428, 177.28570789, 175.73667027, 159.9212155 ,
       173.25996257, 162.27186324, 182.17065763, 175.98427372,
       175.941941  , 180.47272833, 159.99387928, 162.58189148])

In [418]:
print("Training Set R^2: {}".format(lr_41s16.score(X_train_scaled, y_train_s)))
rmse_41s16_train = np.sqrt(mean_squared_error(y_train_s, y_pred_41s16_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_41s16_train))

Training Set R^2: 0.3572501539633759
Training Set Root Mean Squared Error: 22.057599608356693


In [419]:
y_pred_41s16_test = lr_41s16.predict(X_test_scaled)
y_pred_41s16_test[0:20]

array([181.21480918, 183.52830951, 208.35954497, 166.04082735,
       195.73412265, 165.32808431, 171.6101815 , 152.95841225,
       191.18256409, 181.79632711, 183.75359974, 167.80950419,
       192.29454798, 186.8104352 , 171.3205155 , 185.15936573,
       190.60306246, 189.93586129, 172.79625516, 192.32289877])

In [420]:
print("Test Set R^2: {}".format(lr_41s16.score(X_test_scaled, y_test_s)))
rmse_41s16_test = np.sqrt(mean_squared_error(y_test_s, y_pred_41s16_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_41s16_test))

Test Set R^2: 0.33388932306969166
Test Set Root Mean Squared Error: 23.375249677804504


In [421]:
filename = 'cancer_lr_41s16.sav'
pickle.dump(lr_41s16, open(filename, 'wb'))

In [422]:
lr_42 = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=0.00001, random_state=42, 
                                  learning_rate='optimal')

In [423]:
lr_42.fit(X_train_scaled, y_train_s)



SGDRegressor(alpha=1e-05, average=False, early_stopping=False, epsilon=0.1,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='optimal', loss='huber', max_iter=None, n_iter=None,
       n_iter_no_change=5, penalty='l2', power_t=0.25, random_state=42,
       shuffle=True, tol=None, validation_fraction=0.1, verbose=0,
       warm_start=False)

In [424]:
y_pred_42_train = lr_42.predict(X_train_scaled)
y_pred_42_train[0:20]

array([182.12428934, 139.13563274, 171.55984251, 205.19437454,
       180.73540923, 220.85799221, 139.28884374, 145.1796888 ,
       193.05561395, 190.24138711, 159.83507092, 150.30317402,
       182.51584676, 162.55438221, 207.78771464, 180.79779668,
       164.83630114, 185.85939382, 156.9423591 , 169.90042464])

In [425]:
print("Training Set R^2: {}".format(lr_42.score(X_train_scaled, y_train_s)))
rmse_42_train = np.sqrt(mean_squared_error(y_train_s, y_pred_42_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_42_train))

Training Set R^2: 0.2966841813582005
Training Set Root Mean Squared Error: 23.073445485103896


In [426]:
y_pred_42_test = lr_42.predict(X_test_scaled)
y_pred_42_test[0:20]

array([209.07679652, 207.27374944, 222.33883299, 178.09139607,
       221.73802521, 174.26619287, 169.53715741, 121.18669701,
       196.4269474 ,  86.06086938, 190.5285719 , 174.08350149,
       213.11807551, 200.33509628, 174.44213205, 196.77043451,
       214.1159916 , 203.14927477, 187.00760155, 205.18423139])

In [427]:
print("Test Set R^2: {}".format(lr_42.score(X_test_scaled, y_test_s)))
rmse_42_test = np.sqrt(mean_squared_error(y_test_s, y_pred_42_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_42_test))

Test Set R^2: 0.3214227759464806
Test Set Root Mean Squared Error: 23.592974609812302


In [428]:
filename = 'cancer_lr_42.sav'
pickle.dump(lr_42, open(filename, 'wb'))

In [429]:
lr_43 = linear_model.SGDRegressor(loss='huber', penalty='l2', alpha=0.00001, random_state=42, 
                                  learning_rate='optimal', epsilon=5)

In [430]:
lr_43.fit(X_train_scaled, y_train_s)



SGDRegressor(alpha=1e-05, average=False, early_stopping=False, epsilon=5,
       eta0=0.01, fit_intercept=True, l1_ratio=0.15,
       learning_rate='optimal', loss='huber', max_iter=None, n_iter=None,
       n_iter_no_change=5, penalty='l2', power_t=0.25, random_state=42,
       shuffle=True, tol=None, validation_fraction=0.1, verbose=0,
       warm_start=False)

In [431]:
y_pred_43_train = lr_43.predict(X_train_scaled)
y_pred_43_train[0:20]

array([ 858.26655854,  569.9251322 ,  259.12691722, -355.7590754 ,
        365.80198221,   89.35603588,  885.34628913,   23.48279064,
        266.02613479,    6.41385379,   64.81045183, -110.17952774,
        -21.85530867,  417.09253842, -132.70767011, -120.24605567,
         27.43929595, -208.9233352 ,  186.01972983,  -78.56375713])

In [432]:
print("Training Set R^2: {}".format(lr_43.score(X_train_scaled, y_train_s)))
rmse_43_train = np.sqrt(mean_squared_error(y_train_s, y_pred_43_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_43_train))

Training Set R^2: -150.35763738026802
Training Set Root Mean Squared Error: 338.48500063317636


In [433]:
y_pred_43_test = lr_43.predict(X_test_scaled)
y_pred_43_test[0:20]

array([-331.81458814,  193.08642184,  719.70953722,  413.32450858,
        241.8102644 , -320.36455158,  212.11861579,  296.71089476,
       -493.63595282, -493.71662024,  775.85450982, -234.14289695,
        148.53469073, -205.36977049,  419.82737919,  -11.38276865,
          4.31661457,   -9.05225907,  603.21527184,  120.90273875])

In [434]:
print("Test Set R^2: {}".format(lr_43.score(X_test_scaled, y_test_s)))
rmse_43_test = np.sqrt(mean_squared_error(y_test_s, y_pred_43_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_43_test))

Test Set R^2: -144.07578834477604
Test Set Root Mean Squared Error: 344.9693660091798


In [435]:
filename = 'cancer_lr_43.sav'
pickle.dump(lr_43, open(filename, 'wb'))

## Kernel Ridge

In [436]:
lr_49s = KernelRidge(alpha=0.001)

In [437]:
lr_49s.fit(X_train_scaled, y_train_s)

KernelRidge(alpha=0.001, coef0=1, degree=3, gamma=None, kernel='linear',
      kernel_params=None)

In [438]:
y_pred_49s_train = lr_49s.predict(X_train_scaled)
y_pred_49s_train[0:20]

array([205.15832738, 142.50737704, 171.8147401 , 206.88919599,
       176.41730152, 205.47482823, 158.99090677, 163.82973025,
       179.18220161, 188.79643553, 181.14507708, 154.80469348,
       164.80326288, 152.19491291, 182.86458249, 160.84016692,
       166.6540871 , 177.01096054, 155.23195606, 166.95500442])

In [439]:
print("Training Set R^2: {}".format(lr_49s.score(X_train_scaled, y_train_s)))
rmse_49s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_49s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_49s_train))

Training Set R^2: 0.654069162015542
Training Set Root Mean Squared Error: 16.1819888618309


In [440]:
y_pred_49s_test = lr_49s.predict(X_test_scaled)
y_pred_49s_test[0:20]

array([192.66211003, 207.41110375, 210.31846302, 179.61448163,
       222.14595356, 153.67440874, 176.01109949, 149.06089116,
       173.42922597, 227.33025485, 199.50468523, 158.59205065,
       193.85741532, 193.63621748, 181.42942547, 187.85069192,
       180.90023006, 176.3346381 , 185.62078071, 190.73695485])

In [441]:
print("Test Set R^2: {}".format(lr_49s.score(X_test_scaled, y_test_s)))
rmse_49s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_49s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_49s_test))

Test Set R^2: 0.5861867486014155
Test Set Root Mean Squared Error: 18.424056544935134


In [442]:
filename = 'cancer_lr_49s.sav'
pickle.dump(lr_49s, open(filename, 'wb'))

In [443]:
lr_50s = KernelRidge(alpha=0.01)

In [444]:
lr_50s.fit(X_train_scaled, y_train_s)

KernelRidge(alpha=0.01, coef0=1, degree=3, gamma=None, kernel='linear',
      kernel_params=None)

In [445]:
y_pred_50s_train = lr_50s.predict(X_train_scaled)
y_pred_50s_train[0:20]

array([204.44285618, 142.71061767, 171.13648741, 206.38711949,
       177.10774441, 206.27014023, 158.5664729 , 164.68070649,
       178.85119425, 186.5584191 , 182.02076295, 156.47185924,
       165.95192937, 151.65176545, 184.49642096, 161.62258485,
       166.49141114, 177.58539202, 155.43665894, 165.43848219])

In [446]:
print("Training Set R^2: {}".format(lr_50s.score(X_train_scaled, y_train_s)))
rmse_50s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_50s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_50s_train))

Training Set R^2: 0.6508058202057458
Training Set Root Mean Squared Error: 16.258136171253025


In [447]:
y_pred_50s_test = lr_50s.predict(X_test_scaled)
y_pred_50s_test[0:20]

array([192.48804396, 201.93469465, 210.92009821, 177.91285416,
       221.70358736, 153.70099021, 175.52622886, 146.87494013,
       174.01403722, 195.66790901, 203.24032996, 149.69825943,
       193.84831074, 193.58787886, 181.64684405, 189.8024866 ,
       182.8867622 , 176.78100696, 186.0430941 , 191.35401704])

In [448]:
print("Test Set R^2: {}".format(lr_50s.score(X_test_scaled, y_test_s)))
rmse_50s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_50s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_50s_test))

Test Set R^2: 0.5892973645271045
Test Set Root Mean Squared Error: 18.354679513508017


In [449]:
filename = 'cancer_lr_50s.sav'
pickle.dump(lr_50s, open(filename, 'wb'))

In [450]:
lr_51s = KernelRidge(alpha=0.1)

In [451]:
lr_51s.fit(X_train_scaled, y_train_s)

KernelRidge(alpha=0.1, coef0=1, degree=3, gamma=None, kernel='linear',
      kernel_params=None)

In [452]:
y_pred_51s_train = lr_51s.predict(X_train_scaled)
y_pred_51s_train[0:20]

array([203.70066227, 144.77186935, 170.33188967, 205.68901293,
       178.22487048, 209.15330712, 158.13973724, 163.53372095,
       178.19273277, 183.79815052, 183.48539797, 156.39085309,
       166.27081574, 149.999438  , 184.69831298, 162.36462648,
       167.66429394, 178.17631726, 154.71374393, 164.65905502])

In [453]:
print("Training Set R^2: {}".format(lr_51s.score(X_train_scaled, y_train_s)))
rmse_51s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_51s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_51s_train))

Training Set R^2: 0.6419694127578425
Training Set Root Mean Squared Error: 16.46255822085405


In [454]:
y_pred_51s_test = lr_51s.predict(X_test_scaled)
y_pred_51s_test[0:20]

array([191.82148711, 197.12847299, 210.31305638, 174.67812773,
       221.2801264 , 154.04607245, 173.66472068, 143.93647976,
       174.44425962, 151.7299257 , 206.30702088, 148.99446582,
       192.17539855, 193.34528718, 180.42352269, 191.33602086,
       186.23545187, 177.46451537, 187.96407799, 192.92914028])

In [455]:
print("Test Set R^2: {}".format(lr_51s.score(X_test_scaled, y_test_s)))
rmse_51s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_51s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_51s_test))

Test Set R^2: 0.5796691484204308
Test Set Root Mean Squared Error: 18.56858007166218


In [456]:
filename = 'cancer_lr_51s.sav'
pickle.dump(lr_51s, open(filename, 'wb'))

In [457]:
lr_52s = KernelRidge(alpha=1)

In [458]:
lr_52s.fit(X_train_scaled, y_train_s)

KernelRidge(alpha=1, coef0=1, degree=3, gamma=None, kernel='linear',
      kernel_params=None)

In [459]:
y_pred_52s_train = lr_52s.predict(X_train_scaled)
y_pred_52s_train[0:20]

array([203.25987796, 148.13753092, 169.63929541, 206.1088464 ,
       180.62358837, 209.29087679, 158.52362056, 163.46116356,
       177.64472392, 181.65122898, 183.86444512, 157.4335518 ,
       165.8785651 , 150.71373405, 185.77896011, 163.96773308,
       170.55973423, 178.02740379, 153.9265665 , 165.33515368])

In [460]:
print("Training Set R^2: {}".format(lr_52s.score(X_train_scaled, y_train_s)))
rmse_52s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_52s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_52s_train))

Training Set R^2: 0.6270882346399089
Training Set Root Mean Squared Error: 16.80119998409424


In [461]:
y_pred_52s_test = lr_52s.predict(X_test_scaled)
y_pred_52s_test[0:20]

array([191.34867276, 195.34850617, 209.85032553, 171.30230756,
       221.10333766, 153.84640392, 171.00995976, 144.07754616,
       175.26260859, 133.45899774, 207.64870234, 154.48716203,
       192.21694822, 192.47313718, 178.65078235, 189.53890592,
       190.4330414 , 179.47926253, 189.2923031 , 193.97599527])

In [462]:
print("Test Set R^2: {}".format(lr_52s.score(X_test_scaled, y_test_s)))
rmse_52s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_52s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_52s_test))

Test Set R^2: 0.5700249467311902
Test Set Root Mean Squared Error: 18.780393619045448


In [463]:
filename = 'cancer_lr_52s.sav'
pickle.dump(lr_52s, open(filename, 'wb'))

In [464]:
lr_53s = KernelRidge(alpha=10)

In [465]:
lr_53s.fit(X_train_scaled, y_train_s)

KernelRidge(alpha=10, coef0=1, degree=3, gamma=None, kernel='linear',
      kernel_params=None)

In [466]:
y_pred_53s_train = lr_53s.predict(X_train_scaled)
y_pred_53s_train[0:20]

array([201.38146533, 149.26217733, 171.17220709, 209.22547395,
       181.65677281, 206.16342015, 155.24778471, 165.5985388 ,
       178.78508152, 181.03310623, 181.67408429, 160.71299408,
       166.45193278, 158.8655463 , 187.75852905, 167.18959623,
       172.55225393, 177.65410522, 153.94289919, 165.51893874])

In [467]:
print("Training Set R^2: {}".format(lr_53s.score(X_train_scaled, y_train_s)))
rmse_53s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_53s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_53s_train))

Training Set R^2: 0.5966499725153155
Training Set Root Mean Squared Error: 17.47343560828238


In [468]:
y_pred_53s_test = lr_53s.predict(X_test_scaled)
y_pred_53s_test[0:20]

array([190.82265297, 192.97222914, 212.73014579, 165.83142277,
       217.24276672, 155.39108867, 165.08800537, 144.73589234,
       179.88515203, 150.37894763, 202.31079942, 160.36505701,
       198.20491824, 189.52283825, 178.42323046, 187.758011  ,
       194.91127596, 185.35962808, 187.56414253, 194.30786061])

In [469]:
print("Test Set R^2: {}".format(lr_53s.score(X_test_scaled, y_test_s)))
rmse_53s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_53s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_53s_test))

Test Set R^2: 0.5579478472499363
Test Set Root Mean Squared Error: 19.042318106586112


In [470]:
filename = 'cancer_lr_53s.sav'
pickle.dump(lr_53s, open(filename, 'wb'))

In [471]:
lr_54s = KernelRidge(alpha=100)

In [472]:
lr_54s.fit(X_train_scaled, y_train_s)

KernelRidge(alpha=100, coef0=1, degree=3, gamma=None, kernel='linear',
      kernel_params=None)

In [473]:
y_pred_54s_train = lr_54s.predict(X_train_scaled)
y_pred_54s_train[0:20]

array([199.11797859, 152.77840644, 173.02309497, 206.37183297,
       179.71801404, 199.47576047, 150.50982621, 170.53259765,
       176.28809658, 180.65574193, 178.91283506, 163.09767249,
       170.43500071, 166.78886873, 186.29324821, 173.64394791,
       175.29214173, 178.09980452, 155.90994113, 163.5496795 ])

In [474]:
print("Training Set R^2: {}".format(lr_54s.score(X_train_scaled, y_train_s)))
rmse_54s_train = np.sqrt(mean_squared_error(y_train_s, y_pred_54s_train))
print("Training Set Root Mean Squared Error: {}".format(rmse_54s_train))

Training Set R^2: 0.5038273988607713
Training Set Root Mean Squared Error: 19.37999442567625


In [475]:
y_pred_54s_test = lr_54s.predict(X_test_scaled)
y_pred_54s_test[0:20]

array([188.35876556, 188.98523231, 213.86314073, 161.53582933,
       207.35453863, 158.58722025, 165.67515557, 146.77117004,
       188.01391516, 179.08752295, 191.3586575 , 164.99500085,
       203.82317055, 187.14333678, 177.9568548 , 186.76104953,
       194.01565372, 189.9664754 , 182.65385887, 194.23897891])

In [476]:
print("Test Set R^2: {}".format(lr_54s.score(X_test_scaled, y_test_s)))
rmse_54s_test = np.sqrt(mean_squared_error(y_test_s, y_pred_54s_test))
print("Test Set Root Mean Squared Error: {}".format(rmse_54s_test))

Test Set R^2: 0.4568871966510032
Test Set Root Mean Squared Error: 21.107076365553624


In [477]:
filename = 'cancer_lr_54s.sav'
pickle.dump(lr_54s, open(filename, 'wb'))

## Random Forest

In [478]:
rfr_1s = RandomForestRegressor(n_estimators=10, random_state=0)

In [479]:
rfr_1s.fit(X_train_scaled, y_train_s)

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=None,
           oob_score=False, random_state=0, verbose=0, warm_start=False)

In [480]:
rfr_pred_1s_train = rfr_1s.predict(X_train_scaled)

In [481]:
rfr_1s.score(X_train_scaled, y_train_s)

0.9154940070088083

In [482]:
print('Mean Absolute Error Train:', metrics.mean_absolute_error(y_train_s, rfr_pred_1s_train))
print('Mean Squared Error Train:', metrics.mean_squared_error(y_train_s, rfr_pred_1s_train))
print('Root Mean Squared Error Train:', np.sqrt(metrics.mean_squared_error(y_train_s, rfr_pred_1s_train)))

Mean Absolute Error Train: 5.612989309210526
Mean Squared Error Train: 63.96789009046053
Root Mean Squared Error Train: 7.997992878870331


In [483]:
rfr_pred_1s_test = rfr_1s.predict(X_test_scaled)

In [484]:
rfr_1s.score(X_test_scaled, y_test_s)

0.5232179327763447

In [485]:
print('Mean Absolute Error Test:', metrics.mean_absolute_error(y_test_s, rfr_pred_1s_test))
print('Mean Squared Error Test:', metrics.mean_squared_error(y_test_s, rfr_pred_1s_test))
print('Root Mean Squared Error Test:', np.sqrt(metrics.mean_squared_error(y_test_s, rfr_pred_1s_test)))

Mean Absolute Error Test: 13.81769105691057
Mean Squared Error Test: 391.0983954471545
Root Mean Squared Error Test: 19.776207812600333


In [486]:
rfr_2s = RandomForestRegressor(n_estimators=100, random_state=0)

In [487]:
rfr_2s.fit(X_train_scaled, y_train_s)

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=None,
           oob_score=False, random_state=0, verbose=0, warm_start=False)

In [488]:
rfr_pred_2s_train = rfr_2s.predict(X_train_scaled)

In [489]:
rfr_2s.score(X_train_scaled, y_train_s)

0.9382618607826071

In [490]:
print('Mean Absolute Error Train:', metrics.mean_absolute_error(y_train_s, rfr_pred_2s_train))
print('Mean Squared Error Train:', metrics.mean_squared_error(y_train_s, rfr_pred_2s_train))
print('Root Mean Squared Error Train:', np.sqrt(metrics.mean_squared_error(y_train_s, rfr_pred_2s_train)))

Mean Absolute Error Train: 5.048208881578942
Mean Squared Error Train: 46.73347255098679
Root Mean Squared Error Train: 6.836188451980152


In [491]:
rfr_pred_2s_test = rfr_2s.predict(X_test_scaled)

In [492]:
rfr_2s.score(X_test_scaled, y_test_s)

0.5728614803663985

In [493]:
print('Mean Absolute Error Test:', metrics.mean_absolute_error(y_test_s, rfr_pred_2s_test))
print('Mean Squared Error Test:', metrics.mean_squared_error(y_test_s, rfr_pred_2s_test))
print('Root Mean Squared Error Test:', np.sqrt(metrics.mean_squared_error(y_test_s, rfr_pred_2s_test)))

Mean Absolute Error Test: 13.111354471544718
Mean Squared Error Test: 350.37641125040653
Root Mean Squared Error Test: 18.718344244361106


In [428]:
rfr_3s = RandomForestRegressor(n_estimators=1000, random_state=0)

In [429]:
rfr_3s.fit(X_train_scaled, y_train_s)

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=None,
           oob_score=False, random_state=0, verbose=0, warm_start=False)

In [430]:
rfr_pred_3s_train = rfr_3s.predict(X_train_scaled)

In [431]:
rfr_3s.score(X_train_scaled, y_train_s)

0.9154742926585733

In [432]:
print('Mean Absolute Error Train:', metrics.mean_absolute_error(y_train_s, rfr_pred_3s_train))
print('Mean Squared Error Train:', metrics.mean_squared_error(y_train_s, rfr_pred_3s_train))
print('Root Mean Squared Error Train:', np.sqrt(metrics.mean_squared_error(y_train_s, rfr_pred_3s_train)))

Mean Absolute Error Train: 5.717845711940911
Mean Squared Error Train: 65.81630648338121
Root Mean Squared Error Train: 8.11272497274382


In [433]:
rfr_pred_3s_test = rfr_3s.predict(X_test_scaled)

In [434]:
rfr_3s.score(X_test_scaled, y_test_s)

0.5165718288315223

In [435]:
print('Mean Absolute Error Test:', metrics.mean_absolute_error(y_test_s, rfr_pred_3s_test))
print('Mean Squared Error Test:', metrics.mean_squared_error(y_test_s, rfr_pred_3s_test))
print('Root Mean Squared Error Test:', np.sqrt(metrics.mean_squared_error(y_test_s, rfr_pred_3s_test)))

Mean Absolute Error Test: 14.058147540983606
Mean Squared Error Test: 353.0132440983606
Root Mean Squared Error Test: 18.7886466808645


## Best Performing Algorithm

The best performing algorithm is the unscaled, Ridge Regression algorithm using unscaled data and the automatic solver with an Alpha of 0.001. This algorithm has an accuracy of 0.6465 and a Root Mean Squared Error (RMSE) of 16.6 for the training set. The test set has an accuracy of 0.6408 and a Root Mean Squared Error (RMSE) of 16.2.