# Ames Housing Dataset - ElasticNet

> Gianmaria Pizzo - 872966@stud.unive.it

These notebooks represent the project submission for the course [Data and Web Mining](https://www.unive.it/data/course/337525) by Professor [Claudio Lucchese](https://www.unive.it/data/people/5590426) at [Ca' Foscari University of Venice](https://www.unive.it).

---

## Structure of this notebook

This notebook covers the following points
* The idea
* Tuning:
    * Automatic: GridSearchCV Hyperparameters tuning for ElasticNet.
    * Manual
* Model validation
* Results
* Analysis of worst and best predictions.

---

### Before running this notebook

To avoid issues, before running the following notebook it is best to
* Clean previous cell outputs
* Restart the kernel


---

### Environment, Globals and Imports

In [1]:
!pip install mlxtend
!pip install xgboost



In [2]:
# Interactive
%matplotlib notebook
# Static
# %matplotlib inline

# Environment for this notebook
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scipy
import warnings
import sklearn 
import IPython
import xgboost
from scipy import stats
from sklearn.model_selection import RepeatedKFold
from sklearn.linear_model import ElasticNet

# Set the style for the plots
sns.set()
plt.style.use('ggplot')
sns.set_style("darkgrid")
# Ignore warnings
warnings.filterwarnings('ignore') 

# Working folder
WORKING_DIR = os.getcwd()
# Resources folder
RESOURCES_DIR = os.path.join(os.getcwd(), 'resources')
# Name of file
IN_LABEL = 'ames_housing_out_21.csv'
IN_LABEL2 = 'ames_housing_out_22.csv'
ORIG_LABEL = 'ames_housing_out_22_orig.csv'

In [3]:
# Utils Module

def sort_alphabetically(dataset, last_label = None):
    """
    Sorts the dataset alphabetically 

    :param dataset: a pd.DataFrame
    :param last_label: a str containing an existing column label in the dataset
    :returns: pd.DataFrame
    """
    # Sort
    dataset = dataset.reindex(sorted(dataset.columns), axis=1)
    # Move target column to last index
    if last_label is not None:
        col = dataset.pop(last_label)
        dataset.insert(dataset.shape[1], last_label, col)
    return dataset

In [4]:
from sklearn.model_selection import train_test_split

# Module for train test split

def get_X_y(dataset, label, ignore=None):
    """
    Returns X and y and ignores labels in ignore
    :param dataset: a pd.DataFrame
    :param label: a str containing an existing target column label in the dataset
    :param ignore: a list of str containing an existing column label in the dataset to ignore
    :returns: tuple of pd.DataFrame
    """
    if ignore is not None:
        # Drop the labels 
        all_columns = list(dataset.columns)
        # Include only columns that are existing 
        to_drop = [i for i in all_columns if i in ignore] +[label]
        return dataset.drop(columns=to_drop), dataset[[label]]
    return dataset.drop(columns=[label]), dataset[[label]]

def get_train_test(X, y, size = 0.2, state = 33):
    """
    Returns X_train_[size], X_test, y_train_[size], y_test
    :param X: a pd.DataFrame without the target column
    :param y: a pd.DataFrame with one column, the target
    :param size: a float representing the fraction for the test size
    :param state: an integer representing the random state for the test
    :returns: 4 pd.DataFrame usually called "X_train_[size], X_test, y_train_[size], y_test"
    """
    return train_test_split(X, y, test_size=size, random_state = state)

def get_train_val_test(X, y, size_t=0.2, size_v=0.25, state_v = 42):
    """
    Returns X_train, X_valid, X_test, y_train, y_valid, y_test
    :param X: a pd.DataFrame without the target column
    :param y: a pd.DataFrame with one column, the target
    :param size_t: a float representing the fraction for the test size
    :param size_v: a float representing the fraction for the validation
    :param state_v: an integer representing the random state for the validation
    :returns: 6 pd.DataFrame usually called X_train, X_valid, X_test, y_train, y_valid, y_test
    """
    X_train_s, X_test, y_train_s, y_test = get_train_test(X, y, size = size_t)
    X_train, X_valid, y_train, y_valid = get_train_test(X_train_s, y_train_s, size = size_v, state = state_v)
    return X_train, X_valid, X_test, y_train, y_valid, y_test

In [32]:
from sklearn.model_selection import LeaveOneOut, GridSearchCV
from sklearn.metrics import mean_squared_error, mean_squared_log_error, mean_absolute_error, r2_score, max_error 
from mlxtend.evaluate import bias_variance_decomp

# Module for traininig and testing
def get_regression_metrics(y_test, y_pred):
    metrics = {
            "RMSE": mean_squared_error(y_true=y_test, y_pred=y_pred, squared=False),
            "MSE": mean_squared_error(y_true=y_test, y_pred=y_pred),
            "MAE": mean_absolute_error(y_true=y_test, y_pred=y_pred),
            "R2": r2_score(y_true=y_test, y_pred=y_pred),
            "MAX_Err": max_error(y_true=y_test, y_pred=y_pred)}
    return metrics


def get_bias_variance_decomp(dataset, model, label, split_size, ignore, 
                             num_rounds=50, random_state=230324945):
    # Get split
    X, y = get_X_y(dataset, label=label, ignore=None)
    X_train, X_test, y_train, y_test = get_train_test(X, y, size = split_size, 
                                                      state = random_state)
    # Only accepts np.arrays
    mse, bias, var = bias_variance_decomp(estimator=model, 
                                          X_train=X_train.values, 
                                          y_train=y_train.values, 
                                          X_test=X_test.values, 
                                          y_test=y_test.values, 
                                          loss='mse', num_rounds=num_rounds, 
                                          random_seed=random_state)
    print('Avg Expected RMSE: %.3f' % np.sqrt(mse))
    print('Avg Expected MSE: %.3f' % mse)
    print('Avg Bias: %.3f' % bias)
    print('Avg Variance: %.3f' % var)
    pass


def LOO_estimator_eval(dataset, target, estimator, params, ignore=None):
    """
    Function used to evaluate estimators, based on Leave One Out process. It adds a 
    column 'Predicted' to the given dataset, and returns the metrics used to evaluate the 
    performances
    
    :param dataset: a pd.DataFrame with the target column
    :param target: a str representing the target
    :param estimator: instance of some estimator (i.e. XGBoostRegressor())
    :param params: a dictionary containing the parameters for the estimator
    :param ignor: a list of strings representing the feature to ignore
    :returns: the pd.DataFrame
    """
    # Splitter
    splitter = LeaveOneOut()
    
    # Add predicted
    dataset['Predicted'] = 0.0
    
    # Ignore
    if ignore is not None:
        ignore = ignore + ['Predicted']
    else:
        ignore = ['Predicted']
    
    # Split X, y
    X, y = get_X_y(dataset, label=target, ignore=ignore)
    
    # For each fold and tuple train, test indices
    for i, (train_index, test_index) in enumerate(splitter.split(X)):
        # Re-Assign
        model = estimator
        
        # Base model initialized with some parameters
        if params is not None: 
            print(params)
            model.set_params(params)
        
        # Get train part
        train = dataset.loc[train_index.tolist()]
        X_train, y_train = get_X_y(train, label=target, ignore=ignore)
       
        # Train 
        model.fit(X_train, y_train)

        # Get test part 
        test = dataset.loc[test_index.tolist()]
        X_test, y_test = get_X_y(test, label=target, ignore=ignore)
        
        # Add predict to dataset
        y_pred = model.predict(X_test)
        dataset.loc[test_index.tolist()[0]]['Predicted'] = y_pred[0]
    return get_regression_metrics(dataset[[target]], dataset[['Predicted']])


def GridSearch_CV_Tuning(dataset, target, estimator, params, ignore=None, n_repeats=4, n_splits=4, 
                random_state=33412):
    """
    Function used to evaluate estimators, based on GridSearchCV process. It evaluates the
    performances through a Repeated K Fold, and returns the results
    
    :param dataset: a pd.DataFrame with the target column
    :param target: a str representing the target
    :param estimator: instance of some estimator (i.e. XGBoostRegressor())
    :param params: a dictionary containing the parameters for the estimator
    :param ignore: a list of strings representing the feature to ignore
    :param n_repeats: a integer
    :param n_splits: a integer
    :returns: the pd.DataFrame containing the results
    """
    # Ignore
    if ignore is not None:
        ignore = ignore + ['Predicted']
    else:
        ignore = ['Predicted']
        
    # RepeatedKFold splitter
    splitter = RepeatedKFold(n_repeats=n_repeats, n_splits=n_splits, random_state=random_state)
    
    # GridSearchCV
    clf = GridSearchCV(estimator=estimator, cv=splitter,
                       param_grid=params, return_train_score = True,
                       scoring =['neg_mean_squared_error', 'neg_root_mean_squared_error', 'r2'],
                       refit=False, n_jobs=-1, verbose=3)
    # X, y
    X, y = get_X_y(dataset, label=target, ignore=(ignore + ['Predicted']))
    # Train, Test split
    X_train, X_test, y_train, y_test = get_train_test(X, y)
    # Fit
    clf.fit(X_train, y_train)
    
    return pd.DataFrame(clf.cv_results_)



## Dataset Overview

The dataset we are going to consider are the following ones:
* The modified dataset, in two different subset versions
* The original dataset

In [33]:
df = pd.read_csv(os.path.join(RESOURCES_DIR, IN_LABEL))
df2 = pd.read_csv(os.path.join(RESOURCES_DIR, IN_LABEL2))
df_orig = pd.read_csv(os.path.join(RESOURCES_DIR, ORIG_LABEL))

df.drop(columns=['Unnamed: 0', 'Latitude', 'Longitude'], inplace=True)
df_orig.drop(columns=['Unnamed: 0', 'Latitude', 'Longitude'], inplace=True)

df = sort_alphabetically(df, 'Sale_Price')
df2 = sort_alphabetically(df2, 'Sale_Price')
df_orig = sort_alphabetically(df_orig, 'Sale_Price')

---

## Hyperparameters Tuning

First of all, let us try to use a Grid Search CV to find the best parameters.

### Automatic Parameters Tuning: Randomized Grid Search

By defining the repetitions, the splits and the parameters, we repeatedly train and test the models. From each one of the model, we obtain three scores which we can use to check the best a parameters.

But first we can choose a parsimonious range of hyperparameters to test

In [8]:
en_params = {
    'alpha': [0.0, 0.5, 1.0, 1.5],
    'l1_ratio': [0.25, 0.5, 0.75],
    'max_iter':[25, 50, 100, 250, 500, 1000],
    'positive': [True, False],
    'selection':['cyclic', 'random'],
}

In [9]:
results = GridSearch_CV_Tuning(dataset=df, target='Sale_Price', estimator=ElasticNet(), params=en_params)

Fitting 16 folds for each of 288 candidates, totalling 4608 fits


From this dataframe we want to obtain the 10 best models for each metric we used. 

In [10]:
results

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_alpha,param_l1_ratio,param_max_iter,param_positive,param_selection,params,...,split8_train_r2,split9_train_r2,split10_train_r2,split11_train_r2,split12_train_r2,split13_train_r2,split14_train_r2,split15_train_r2,mean_train_r2,std_train_r2
0,0.010500,0.002693,0.006187,0.000950,0.0,0.25,25,True,cyclic,"{'alpha': 0.0, 'l1_ratio': 0.25, 'max_iter': 2...",...,0.876239,0.876845,0.882676,0.885675,0.880019,0.885693,0.877323,0.877389,0.880235,0.003324
1,0.006938,0.001519,0.005187,0.000808,0.0,0.25,25,True,random,"{'alpha': 0.0, 'l1_ratio': 0.25, 'max_iter': 2...",...,0.876230,0.876845,0.882673,0.885651,0.880005,0.885675,0.877317,0.877372,0.880214,0.003318
2,0.006188,0.000390,0.005000,0.000353,0.0,0.25,25,False,cyclic,"{'alpha': 0.0, 'l1_ratio': 0.25, 'max_iter': 2...",...,0.903127,0.902657,0.910010,0.912123,0.908710,0.908789,0.906356,0.903523,0.906916,0.002558
3,0.006375,0.000696,0.005000,0.000500,0.0,0.25,25,False,random,"{'alpha': 0.0, 'l1_ratio': 0.25, 'max_iter': 2...",...,0.903187,0.902679,0.909796,0.911840,0.907053,0.908646,0.904644,0.903023,0.906483,0.002491
4,0.006750,0.000829,0.005375,0.000696,0.0,0.25,50,True,cyclic,"{'alpha': 0.0, 'l1_ratio': 0.25, 'max_iter': 5...",...,0.876239,0.876845,0.882676,0.885675,0.880019,0.885693,0.877324,0.877389,0.880235,0.003324
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
283,0.010375,0.001053,0.004812,0.000527,1.5,0.75,500,False,random,"{'alpha': 1.5, 'l1_ratio': 0.75, 'max_iter': 5...",...,0.881881,0.881578,0.888123,0.890923,0.887047,0.889913,0.883371,0.881590,0.885597,0.003042
284,0.005750,0.000559,0.004563,0.000609,1.5,0.75,1000,True,cyclic,"{'alpha': 1.5, 'l1_ratio': 0.75, 'max_iter': 1...",...,0.865740,0.865620,0.871379,0.875543,0.869820,0.874643,0.866871,0.866244,0.869490,0.003395
285,0.006500,0.001173,0.004125,0.000696,1.5,0.75,1000,True,random,"{'alpha': 1.5, 'l1_ratio': 0.75, 'max_iter': 1...",...,0.865740,0.865620,0.871379,0.875543,0.869820,0.874643,0.866871,0.866244,0.869490,0.003395
286,0.006312,0.000768,0.004688,0.000463,1.5,0.75,1000,False,cyclic,"{'alpha': 1.5, 'l1_ratio': 0.75, 'max_iter': 1...",...,0.881881,0.881578,0.888123,0.890923,0.887047,0.889913,0.883371,0.881590,0.885597,0.003042


In [11]:
best_r2 = list(results[['rank_test_r2','mean_train_r2', 'mean_test_r2']][results['rank_test_r2']==1].index)
best_mse = list(results[['rank_test_neg_mean_squared_error','mean_train_neg_mean_squared_error', 'mean_test_neg_mean_squared_error']][results['rank_test_neg_mean_squared_error']==1].index)
best_rmse = list(results[['rank_test_neg_root_mean_squared_error','mean_train_neg_root_mean_squared_error', 'mean_test_neg_root_mean_squared_error']][results['rank_test_neg_root_mean_squared_error']==1].index)

best = list(set(best_r2) | set(best_mse) | set(best_rmse))

In [12]:
best_df = results[['mean_fit_time', 'mean_test_neg_mean_squared_error', 'mean_test_neg_root_mean_squared_error', 'mean_test_r2', 'params',]].loc[best].sort_values(by=['mean_fit_time'])

In [13]:
best_df

Unnamed: 0,mean_fit_time,mean_test_neg_mean_squared_error,mean_test_neg_root_mean_squared_error,mean_test_r2,params
31,0.007187,-552310200.0,-23468.658913,0.900996,"{'alpha': 0.0, 'l1_ratio': 0.5, 'max_iter': 50..."


In [14]:
pd.DataFrame(list(best_df.params))

Unnamed: 0,alpha,l1_ratio,max_iter,positive,selection
0,0.0,0.5,50,False,random


Just to make sure this is the right way I want to re iter this on the original dataset

In [15]:
results_orig = GridSearch_CV_Tuning(dataset=df_orig, target='Sale_Price', estimator=ElasticNet(), params=en_params)

Fitting 16 folds for each of 288 candidates, totalling 4608 fits


In [16]:
results_orig

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_alpha,param_l1_ratio,param_max_iter,param_positive,param_selection,params,...,split8_train_r2,split9_train_r2,split10_train_r2,split11_train_r2,split12_train_r2,split13_train_r2,split14_train_r2,split15_train_r2,mean_train_r2,std_train_r2
0,0.004936,0.000658,0.004250,0.000559,0.0,0.25,25,True,cyclic,"{'alpha': 0.0, 'l1_ratio': 0.25, 'max_iter': 2...",...,0.827896,0.844351,0.843103,0.849535,0.831813,0.830523,0.855216,0.846161,0.841077,0.011032
1,0.004938,0.000747,0.003875,0.000697,0.0,0.25,25,True,random,"{'alpha': 0.0, 'l1_ratio': 0.25, 'max_iter': 2...",...,0.827897,0.844346,0.839901,0.849550,0.831812,0.830540,0.855222,0.846167,0.840784,0.011156
2,0.005438,0.000609,0.004375,0.000484,0.0,0.25,25,False,cyclic,"{'alpha': 0.0, 'l1_ratio': 0.25, 'max_iter': 2...",...,0.848379,0.864293,0.863846,0.869652,0.852786,0.852328,0.875099,0.865511,0.861468,0.009809
3,0.005749,0.000901,0.004250,0.000829,0.0,0.25,25,False,random,"{'alpha': 0.0, 'l1_ratio': 0.25, 'max_iter': 2...",...,0.846699,0.863824,0.861176,0.868976,0.850579,0.852062,0.873262,0.864856,0.859639,0.010174
4,0.006312,0.001648,0.004688,0.000583,0.0,0.25,50,True,cyclic,"{'alpha': 0.0, 'l1_ratio': 0.25, 'max_iter': 5...",...,0.827909,0.844357,0.843110,0.849550,0.831824,0.830541,0.855222,0.846170,0.841088,0.011031
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
283,0.024249,0.005437,0.004001,0.001118,1.5,0.75,500,False,random,"{'alpha': 1.5, 'l1_ratio': 0.75, 'max_iter': 5...",...,0.821309,0.838780,0.836617,0.847087,0.826041,0.824756,0.851406,0.840635,0.835813,0.012011
284,0.005188,0.000807,0.004687,0.000463,1.5,0.75,1000,True,cyclic,"{'alpha': 1.5, 'l1_ratio': 0.75, 'max_iter': 1...",...,0.814849,0.831369,0.830317,0.837860,0.819186,0.818640,0.841070,0.834200,0.828453,0.011447
285,0.006125,0.000781,0.004562,0.000609,1.5,0.75,1000,True,random,"{'alpha': 1.5, 'l1_ratio': 0.75, 'max_iter': 1...",...,0.814849,0.831372,0.830319,0.837860,0.819186,0.818642,0.841072,0.834202,0.828455,0.011447
286,0.017437,0.001540,0.005000,0.000353,1.5,0.75,1000,False,cyclic,"{'alpha': 1.5, 'l1_ratio': 0.75, 'max_iter': 1...",...,0.821309,0.838780,0.836617,0.847087,0.826041,0.824756,0.851406,0.840635,0.835813,0.012011


In [17]:
best_r22 = list(results_orig[['rank_test_r2','mean_train_r2', 'mean_test_r2']][results_orig['rank_test_r2']==1].index)
best_mse2 = list(results_orig[['rank_test_neg_mean_squared_error','mean_train_neg_mean_squared_error', 'mean_test_neg_mean_squared_error']][results_orig['rank_test_neg_mean_squared_error']==1].index)
best_rmse2 = list(results_orig[['rank_test_neg_root_mean_squared_error','mean_train_neg_root_mean_squared_error', 'mean_test_neg_root_mean_squared_error']][results_orig['rank_test_neg_root_mean_squared_error']==1].index)

best2 = list(set(best_r22) | set(best_mse2) | set(best_rmse2))

best_df2 = results_orig[['mean_fit_time', 'mean_test_neg_mean_squared_error', 'mean_test_neg_root_mean_squared_error', 'mean_test_r2', 'params',]].loc[best2].sort_values(by=['mean_fit_time'])

In [18]:
best_df2 

Unnamed: 0,mean_fit_time,mean_test_neg_mean_squared_error,mean_test_neg_root_mean_squared_error,mean_test_r2,params
54,0.006625,-944107900.0,-30523.288386,0.845182,"{'alpha': 0.0, 'l1_ratio': 0.75, 'max_iter': 5..."
30,0.007313,-944107900.0,-30523.288386,0.845182,"{'alpha': 0.0, 'l1_ratio': 0.5, 'max_iter': 50..."
6,0.007375,-944107900.0,-30523.288386,0.845182,"{'alpha': 0.0, 'l1_ratio': 0.25, 'max_iter': 5..."


In [19]:
pd.DataFrame(list(best_df2.params))

Unnamed: 0,alpha,l1_ratio,max_iter,positive,selection
0,0.0,0.75,50,False,cyclic
1,0.0,0.5,50,False,cyclic
2,0.0,0.25,50,False,cyclic


In [20]:
results2 = GridSearch_CV_Tuning(dataset=df2, target='Sale_Price', estimator=ElasticNet(), params=en_params)

Fitting 16 folds for each of 288 candidates, totalling 4608 fits


In [21]:
results2

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_alpha,param_l1_ratio,param_max_iter,param_positive,param_selection,params,...,split8_train_r2,split9_train_r2,split10_train_r2,split11_train_r2,split12_train_r2,split13_train_r2,split14_train_r2,split15_train_r2,mean_train_r2,std_train_r2
0,0.004688,0.000845,0.003626,0.000599,0.0,0.25,25,True,cyclic,"{'alpha': 0.0, 'l1_ratio': 0.25, 'max_iter': 2...",...,0.889512,0.888775,0.894683,0.898249,0.891763,0.895032,0.892279,0.891584,0.892761,0.002881
1,0.004625,0.000696,0.003875,0.000857,0.0,0.25,25,True,random,"{'alpha': 0.0, 'l1_ratio': 0.25, 'max_iter': 2...",...,0.889497,0.888669,0.894605,0.897407,0.891719,0.895026,0.892272,0.891548,0.892166,0.003117
2,0.006000,0.001732,0.004750,0.000559,0.0,0.25,25,False,cyclic,"{'alpha': 0.0, 'l1_ratio': 0.25, 'max_iter': 2...",...,0.909006,0.909584,0.915462,0.917581,0.912927,0.912964,0.913615,0.911707,0.912856,0.002400
3,0.005999,0.001323,0.004687,0.000682,0.0,0.25,25,False,random,"{'alpha': 0.0, 'l1_ratio': 0.25, 'max_iter': 2...",...,0.909095,0.909079,0.915712,0.917574,0.911769,0.913202,0.906572,0.911242,0.912124,0.002755
4,0.006125,0.001053,0.004625,0.000485,0.0,0.25,50,True,cyclic,"{'alpha': 0.0, 'l1_ratio': 0.25, 'max_iter': 5...",...,0.889540,0.888804,0.894752,0.898261,0.891800,0.895070,0.892299,0.891609,0.892795,0.002882
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
283,0.024499,0.006374,0.004312,0.001102,1.5,0.75,500,False,random,"{'alpha': 1.5, 'l1_ratio': 0.75, 'max_iter': 5...",...,0.888938,0.887985,0.894336,0.896833,0.892984,0.893849,0.890956,0.890194,0.892056,0.002661
284,0.009188,0.002855,0.005062,0.000428,1.5,0.75,1000,True,cyclic,"{'alpha': 1.5, 'l1_ratio': 0.75, 'max_iter': 1...",...,0.878881,0.876649,0.881910,0.885645,0.880536,0.882955,0.880481,0.879361,0.880849,0.002787
285,0.010812,0.004202,0.004813,0.000634,1.5,0.75,1000,True,random,"{'alpha': 1.5, 'l1_ratio': 0.75, 'max_iter': 1...",...,0.878881,0.876649,0.881910,0.885638,0.880535,0.882955,0.880481,0.879361,0.880849,0.002786
286,0.016750,0.001854,0.005000,0.000353,1.5,0.75,1000,False,cyclic,"{'alpha': 1.5, 'l1_ratio': 0.75, 'max_iter': 1...",...,0.888938,0.887985,0.894336,0.896833,0.892984,0.893849,0.890956,0.890194,0.892056,0.002661


In [22]:
best_r3 = list(results2[['rank_test_r2','mean_train_r2', 'mean_test_r2']][results2['rank_test_r2']==1].index)
best_mse3 = list(results2[['rank_test_neg_mean_squared_error','mean_train_neg_mean_squared_error', 'mean_test_neg_mean_squared_error']][results2['rank_test_neg_mean_squared_error']==1].index)
best_rmse3 = list(results2[['rank_test_neg_root_mean_squared_error','mean_train_neg_root_mean_squared_error', 'mean_test_neg_root_mean_squared_error']][results2['rank_test_neg_root_mean_squared_error']==1].index)

best3 = list(set(best_r3) | set(best_mse3) | set(best_rmse3))

best_df3 = results_orig[['mean_fit_time', 'mean_test_neg_mean_squared_error', 'mean_test_neg_root_mean_squared_error', 'mean_test_r2', 'params',]].loc[best3].sort_values(by=['mean_fit_time'])

In [23]:
best_df3

Unnamed: 0,mean_fit_time,mean_test_neg_mean_squared_error,mean_test_neg_root_mean_squared_error,mean_test_r2,params
43,0.032,-944913400.0,-30534.527905,0.845051,"{'alpha': 0.0, 'l1_ratio': 0.5, 'max_iter': 50..."


In [24]:
pd.DataFrame(list(best_df3.params))

Unnamed: 0,alpha,l1_ratio,max_iter,positive,selection
0,0.0,0.5,500,False,random


---

## Elastic Net Evaluation - Leave One Out Metrics Computing

Now that we have some good knowledge about hyperparameters for the estimator, we can closely analyze how accurate the model is.

To get the most accurate results on the test, we are going to use the Leave One Out Approach for the instances. The model is going to be evaluated multiple times and the dataset will be changed in place. 
This is going to allow us to find the best and worst predictions.

In [34]:
Enet1 = ElasticNet(alpha=0.0, l1_ratio = 0.5, max_iter=50, positive=True, selection = 'cyclic')
Enet2 = ElasticNet(alpha=0.0, l1_ratio = 0.75, max_iter=50, positive=True, selection = 'random')

In [35]:
test_df1 = df.copy()
LOO_estimator_eval(dataset=test_df1, target='Sale_Price', 
                   estimator=Enet1, 
                   params=None, ignore=None)

{'RMSE': 26700.578915042264,
 'MSE': 712920914.3983995,
 'MAE': 19189.81572759709,
 'R2': 0.8778111867181878,
 'MAX_Err': 227541.11343941686}

In [36]:
test_df2 = df.copy()
LOO_estimator_eval(dataset=test_df2, target='Sale_Price', 
                   estimator=Enet2, 
                   params=None, ignore=None)

{'RMSE': 26701.282899444563,
 'MSE': 712958508.4761707,
 'MAE': 19190.05150862439,
 'R2': 0.8778047434007643,
 'MAX_Err': 227677.74990962807}

We can see that for the first dataset, the most parsimonious model is also the best one!

In [37]:
test_df3 = df2.copy()
LOO_estimator_eval(dataset=test_df3, target='Sale_Price', 
                   estimator=Enet1, 
                   params=None, ignore=None)

{'RMSE': 198156.87459070655,
 'MSE': 39266146947.557,
 'MAE': 182843.00038124286,
 'R2': -5.7298964033338144,
 'MAX_Err': 615000.0}

In [38]:
test_df4 = df2.copy()
LOO_estimator_eval(dataset=test_df4, target='Sale_Price', 
                   estimator=Enet2, 
                   params=None, ignore=None)

{'RMSE': 198156.87459070655,
 'MSE': 39266146947.557,
 'MAE': 182843.00038124286,
 'R2': -5.7298964033338144,
 'MAX_Err': 615000.0}

Curiosly enough, the RMSE is much smaller here, but the RSquared Value is looking very bad.

In [39]:
test_df5 = df_orig.copy()
LOO_estimator_eval(dataset=test_df5, target='Sale_Price', 
                   estimator=Enet1, 
                   params=None, ignore=None)

{'RMSE': 32941.440984012486,
 'MSE': 1085138534.1031773,
 'MAE': 20505.819714176752,
 'R2': 0.8299077387862126,
 'MAX_Err': 564428.9879390353}

In [40]:
test_df6 = df_orig.copy()
LOO_estimator_eval(dataset=test_df6, target='Sale_Price', 
                   estimator=Enet2, 
                   params=None, ignore=None)

{'RMSE': 32946.278884246545,
 'MSE': 1085457292.3185496,
 'MAE': 20507.795693128115,
 'R2': 0.8298577743771264,
 'MAX_Err': 564379.406314693}

## Worst Predictions and Best Predictions

To consider the best and worst predictions, I decided to consider the models with the lowest error.

In [41]:
# Modified First Subset
test_df1['Prediction_Error'] = np.abs(test_df1['Sale_Price']-test_df1['Predicted'])
# Original Second Subset
test_df6['Prediction_Error'] = np.abs(test_df6['Sale_Price']-test_df6['Predicted'])

### Most wrong on df

In [42]:
test_df1.sort_values(by=['Prediction_Error', 'Sale_Price'], ascending=False).head(10)

Unnamed: 0,BC_Bsmt_Unf_SF,Bedroom_AbvGr,Bsmt_Full_Bath,Bsmt_Unf_SF,Central_Air,External_Eval,Fireplace_Gr_Area_Ratio,Fireplace_Qu,Garage_Area,Garage_Cars,...,garage_type_1,hs_style_1,neighborhoods_1,neighborhoods_2,neighborhoods_3,neighborhoods_4,sale_cond_1,Sale_Price,Predicted,Prediction_Error
40,3.016554,2.0,1.0,142.0,0.0,180.9,1182.0,4.0,820.0,3.0,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,611657.0,384115.886561,227541.113439
389,4.213478,4.0,0.0,1734.0,0.0,1117.7999,2822.0,4.0,1020.0,3.0,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,582933.0,389741.360722,193191.639278
2106,4.218352,4.0,0.0,1752.0,0.0,644.0,2868.0,4.0,716.0,3.0,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,556581.0,375847.410645,180733.589355
957,3.341847,1.0,2.0,278.0,0.0,1563.2999,1235.0,4.0,789.0,3.0,...,1.0,1.0,0.0,0.0,0.0,0.0,1.0,615000.0,439400.644016,175599.355984
388,3.441461,2.0,2.0,342.0,0.0,1106.9999,1337.0,4.0,762.0,3.0,...,1.0,1.0,0.0,0.0,0.0,0.0,1.0,610000.0,437285.507367,172714.492633
380,3.840031,2.0,1.0,788.0,0.0,653.39996,1201.0,4.0,672.0,3.0,...,1.0,1.0,0.0,0.0,0.0,0.0,1.0,555000.0,406254.448952,148745.551048
1469,3.597804,2.0,1.0,474.0,0.0,280.0,1419.0,4.0,567.0,2.0,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,392000.0,257044.50416,134955.49584
1601,4.229794,2.0,0.0,1795.0,0.0,98.0,1795.0,4.0,895.0,3.0,...,1.0,1.0,0.0,0.0,1.0,0.0,0.0,147000.0,280521.228665,133521.228665
2104,4.153759,5.0,0.0,1528.0,0.0,642.6,3390.0,5.0,758.0,3.0,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,545224.0,413432.911475,131791.088525
2218,2.482185,4.0,1.0,48.0,0.0,2681.8,3500.0,3.0,959.0,3.0,...,1.0,1.0,0.0,0.0,0.0,0.0,1.0,584500.0,456239.79541,128260.20459


In [43]:
test_df1.sort_values(by=['Prediction_Error', 'Sale_Price'], ascending=False).head(50).describe()

Unnamed: 0,BC_Bsmt_Unf_SF,Bedroom_AbvGr,Bsmt_Full_Bath,Bsmt_Unf_SF,Central_Air,External_Eval,Fireplace_Gr_Area_Ratio,Fireplace_Qu,Garage_Area,Garage_Cars,...,garage_type_1,hs_style_1,neighborhoods_1,neighborhoods_2,neighborhoods_3,neighborhoods_4,sale_cond_1,Sale_Price,Predicted,Prediction_Error
count,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50.0,...,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50.0
mean,3.334758,2.7,0.86,663.08,0.04,692.375983,1718.48,3.62,698.42,2.44,...,0.92,0.92,0.14,0.02,0.08,0.2,0.48,391140.46,320955.147539,102640.185419
std,1.07688,1.216385,0.606428,610.674396,0.197949,561.19655,887.243432,1.193349,276.691982,0.860944,...,0.274048,0.274048,0.35051,0.141421,0.274048,0.404061,0.504672,147980.18752,93300.710734,36658.517011
min,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,39300.0,-33125.718991,67138.312448
25%,3.293201,2.0,0.25,251.75,0.0,318.0,1129.125,4.0,553.5,2.0,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,312375.0,274177.924017,74674.613881
50%,3.607678,3.0,1.0,484.0,0.0,609.04998,1715.0,4.0,734.0,3.0,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,423935.0,348347.262778,92186.270918
75%,3.832562,3.0,1.0,776.0,0.0,934.5,2463.25,4.0,890.0,3.0,...,1.0,1.0,0.0,0.0,0.0,0.0,1.0,483517.25,386535.60284,119297.106999
max,4.312694,6.0,2.0,2140.0,1.0,2681.8,3500.0,5.0,1174.0,3.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,615000.0,456239.79541,227541.113439


### Best on df

In [44]:
test_df1.sort_values(by=['Prediction_Error', 'Sale_Price']).head(10)

Unnamed: 0,BC_Bsmt_Unf_SF,Bedroom_AbvGr,Bsmt_Full_Bath,Bsmt_Unf_SF,Central_Air,External_Eval,Fireplace_Gr_Area_Ratio,Fireplace_Qu,Garage_Area,Garage_Cars,...,garage_type_1,hs_style_1,neighborhoods_1,neighborhoods_2,neighborhoods_3,neighborhoods_4,sale_cond_1,Sale_Price,Predicted,Prediction_Error
570,3.40043,3.0,0.0,314.0,0.0,300.3,662.0,2.0,486.0,2.0,...,1.0,1.0,1.0,0.0,0.0,0.0,1.0,205000.0,204999.538899,0.461101
832,3.882115,3.0,0.0,861.0,0.0,65.0,1477.0,4.0,216.0,1.0,...,0.0,0.0,1.0,0.0,0.0,0.0,1.0,145000.0,145004.095411,4.095411
439,2.246507,3.0,0.0,30.0,0.0,0.0,0.0,0.0,400.0,2.0,...,1.0,0.0,0.0,0.0,1.0,0.0,1.0,167000.0,167006.353973,6.353973
1849,3.150307,2.0,1.0,187.0,1.0,114.399994,0.0,0.0,240.0,1.0,...,0.0,0.0,1.0,0.0,0.0,0.0,1.0,108500.0,108474.441265,25.558735
21,3.44426,3.0,1.0,344.0,0.0,0.0,1078.0,2.0,500.0,2.0,...,1.0,1.0,1.0,0.0,0.0,0.0,1.0,149900.0,149929.120875,29.120875
881,3.705522,3.0,0.0,594.0,0.0,20.8,1404.0,1.0,504.0,2.0,...,1.0,1.0,0.0,0.0,0.0,1.0,1.0,160000.0,160080.093319,80.093319
196,4.222642,4.0,0.0,1768.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,1.0,0.0,0.0,0.0,1.0,136500.0,136418.823942,81.176058
1756,3.062317,3.0,1.0,156.0,0.0,770.89996,1647.0,2.0,280.0,1.0,...,1.0,1.0,1.0,0.0,0.0,0.0,1.0,153000.0,152898.103199,101.896801
2471,4.080272,3.0,0.0,1308.0,0.0,80.0,0.0,0.0,848.0,3.0,...,1.0,1.0,0.0,0.0,0.0,1.0,0.0,229800.0,229912.844419,112.844419
1856,0.0,3.0,1.0,0.0,0.0,327.59998,0.0,0.0,240.0,1.0,...,0.0,0.0,1.0,0.0,0.0,0.0,1.0,129000.0,129168.278584,168.278584


In [45]:
test_df1.sort_values(by=['Prediction_Error', 'Sale_Price']).head(50).describe()

Unnamed: 0,BC_Bsmt_Unf_SF,Bedroom_AbvGr,Bsmt_Full_Bath,Bsmt_Unf_SF,Central_Air,External_Eval,Fireplace_Gr_Area_Ratio,Fireplace_Qu,Garage_Area,Garage_Cars,...,garage_type_1,hs_style_1,neighborhoods_1,neighborhoods_2,neighborhoods_3,neighborhoods_4,sale_cond_1,Sale_Price,Predicted,Prediction_Error
count,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50.0,...,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50.0
mean,3.14478,2.96,0.36,457.02,0.08,261.147995,795.05,1.56,469.52,1.8,...,0.78,0.74,0.32,0.02,0.26,0.36,0.92,175619.98,175734.411029,324.391859
std,1.131707,0.604743,0.484873,363.820963,0.274048,229.660805,830.023581,1.618263,182.545185,0.699854,...,0.418452,0.443087,0.471212,0.141421,0.443087,0.484873,0.274048,45615.249433,45588.104602,165.73726
min,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,108500.0,108474.441265,0.461101
25%,3.155381,3.0,0.0,189.0,0.0,86.849996,0.0,0.0,309.0,1.0,...,1.0,0.25,0.0,0.0,0.0,0.0,1.0,141250.0,141385.718449,212.120088
50%,3.488767,3.0,0.0,377.5,0.0,213.19999,653.0,1.0,461.5,2.0,...,1.0,1.0,0.0,0.0,0.0,0.0,1.0,166000.0,166175.283053,355.193865
75%,3.764724,3.0,1.0,673.0,0.0,354.05,1467.75,3.0,588.75,2.0,...,1.0,1.0,1.0,0.0,0.75,1.0,1.0,204937.5,205102.493491,463.079822
max,4.222642,4.0,1.0,1768.0,1.0,1012.69995,2614.0,4.0,869.0,3.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,336000.0,336439.1989,552.738495


### Most wrong on df_orig

In [46]:
test_df6.sort_values(by=['Prediction_Error', 'Sale_Price'], ascending=False).head(10)

Unnamed: 0,Age,BC_Bsmt_Unf_SF,BC_External_SF,Baths,Bedroom_Liv_Area_Ratio,Bsmt_Eval,Bsmt_Unf_SF,Central_Air,Exter_Qual,Fireplace_Eval,...,bldg_type_3,garage_type_1,hs_style_1,neighborhoods_1,neighborhoods_3,neighborhoods_4,sale_cond_1,Sale_Price,Predicted,Prediction_Error
1498,0.0,3.599931,38.78769,4.5,1880.6666,25356.5,466.0,0.0,3.0,3761.3333,...,0.0,1.0,1.0,1.0,0.0,0.0,0.0,160000.0,724379.406315,564379.406315
2180,-1.0,4.003929,39.531784,4.0,2547.5,21144.25,1085.0,0.0,3.0,5095.0,...,0.0,1.0,1.0,1.0,0.0,0.0,0.0,183850.0,676949.090637,493099.090637
2181,0.0,3.903065,31.517199,4.5,1558.6666,13022.7,878.0,0.0,3.0,9352.0,...,0.0,1.0,1.0,1.0,0.0,0.0,0.0,184750.0,547177.406652,362427.406652
1760,11.0,3.387806,35.48137,4.5,1119.0,9464.2,300.0,0.0,2.0,3876.3296,...,0.0,1.0,1.0,0.0,0.0,0.0,0.0,745000.0,421828.835095,323171.164905
1767,13.0,3.959812,26.965897,4.0,1079.0,10142.601,989.0,0.0,3.0,4825.4346,...,0.0,1.0,1.0,0.0,0.0,0.0,1.0,755000.0,474372.771317,280627.228683
44,1.0,3.024274,11.323102,3.5,1182.0,9669.5,142.0,0.0,3.0,2364.0,...,0.0,1.0,1.0,0.0,0.0,0.0,0.0,611657.0,404642.442164,207014.557836
1182,31.0,3.708188,20.123798,3.0,981.3333,3056.5498,584.0,0.0,2.0,5888.0,...,0.0,1.0,1.0,0.0,0.0,1.0,0.0,150000.0,348895.438547,198895.438547
433,1.0,4.22668,26.458273,3.5,705.5,7196.1,1734.0,0.0,3.0,5644.0,...,0.0,1.0,1.0,0.0,0.0,0.0,0.0,582933.0,392703.572447,190229.427553
2332,1.0,4.231577,23.635477,3.5,717.0,7868.4,1752.0,0.0,2.0,5736.0,...,0.0,1.0,1.0,0.0,0.0,0.0,0.0,556581.0,368289.449488,188291.550512
2445,11.0,3.673307,27.104687,4.5,906.75,8009.5,543.0,0.0,2.0,6282.148,...,0.0,1.0,1.0,0.0,0.0,0.0,1.0,625000.0,437567.086794,187432.913206


In [47]:
test_df6.sort_values(by=['Prediction_Error', 'Sale_Price'], ascending=False).head(50).describe()

Unnamed: 0,Age,BC_Bsmt_Unf_SF,BC_External_SF,Baths,Bedroom_Liv_Area_Ratio,Bsmt_Eval,Bsmt_Unf_SF,Central_Air,Exter_Qual,Fireplace_Eval,...,bldg_type_3,garage_type_1,hs_style_1,neighborhoods_1,neighborhoods_3,neighborhoods_4,sale_cond_1,Sale_Price,Predicted,Prediction_Error
count,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50.0,...,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50.0
mean,18.04,3.494047,22.807556,3.27,987.739667,7450.410504,684.62,0.0,2.3,3955.391764,...,0.0,0.9,0.92,0.2,0.08,0.16,0.34,408134.18,353826.812204,150105.118808
std,30.170238,0.832978,9.97355,0.770701,466.435897,4311.861175,573.7098,0.0,0.735402,1805.720223,...,0.0,0.303046,0.274048,0.404061,0.274048,0.370328,0.478518,175073.130292,113752.14672,99068.385822
min,-1.0,0.0,0.0,2.0,470.33334,0.0,0.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,80400.0,138059.517472,80414.751035
25%,1.0,3.360193,18.962324,3.0,686.875025,4962.268625,283.5,0.0,2.0,2475.0,...,0.0,1.0,1.0,0.0,0.0,0.0,0.0,254969.75,279620.886316,92013.36274
50%,1.5,3.670643,23.485247,3.5,816.125,7122.7375,540.0,0.0,2.0,3818.83145,...,0.0,1.0,1.0,0.0,0.0,0.0,0.0,447500.0,362268.227889,115011.902493
75%,18.25,3.945625,27.838413,3.5,1109.0,9054.05625,961.25,0.0,3.0,5257.75,...,0.0,1.0,1.0,0.0,0.0,0.0,1.0,543418.0,407468.811532,174865.513362
max,114.0,4.304173,41.71983,4.5,2547.5,25356.5,2042.0,0.0,3.0,9352.0,...,0.0,1.0,1.0,1.0,1.0,1.0,1.0,755000.0,724379.406315,564379.406315


### Best on df

In [48]:
test_df6.sort_values(by=['Prediction_Error', 'Sale_Price']).head(10)

Unnamed: 0,Age,BC_Bsmt_Unf_SF,BC_External_SF,Baths,Bedroom_Liv_Area_Ratio,Bsmt_Eval,Bsmt_Unf_SF,Central_Air,Exter_Qual,Fireplace_Eval,...,bldg_type_3,garage_type_1,hs_style_1,neighborhoods_1,neighborhoods_3,neighborhoods_4,sale_cond_1,Sale_Price,Predicted,Prediction_Error
2402,2.0,3.556804,22.718481,3.5,638.0,4366.4995,426.0,0.0,2.0,3315.1453,...,0.0,1.0,1.0,0.0,0.0,0.0,1.0,279000.0,279011.787151,11.787151
216,43.0,4.022427,10.544686,4.0,458.0,4763.1997,1128.0,1.0,0.0,0.0,...,1.0,0.0,1.0,1.0,0.0,0.0,1.0,136000.0,135985.511185,14.488815
1227,55.0,3.638512,16.335052,1.0,345.33334,2693.5999,505.0,0.0,1.0,0.0,...,0.0,1.0,1.0,1.0,0.0,0.0,1.0,112900.0,112917.821853,17.821853
1091,1.0,3.608105,19.794222,3.0,679.0,4151.2495,474.0,0.0,2.0,2716.0,...,0.0,1.0,1.0,0.0,0.0,0.0,0.0,217300.0,217326.790566,26.790566
1287,98.0,3.781024,20.762966,1.0,680.0,1546.9999,680.0,1.0,1.0,0.0,...,0.0,1.0,1.0,1.0,0.0,0.0,1.0,111000.0,111047.183449,47.183449
81,30.0,3.759443,21.226215,2.5,521.3333,1998.7499,650.0,0.0,1.0,2708.9275,...,0.0,1.0,1.0,1.0,0.0,0.0,1.0,171000.0,171055.578695,55.578695
2232,67.0,3.455262,7.532824,1.0,405.0,1871.9999,345.0,0.0,1.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,1.0,141000.0,141055.673238,55.673238
113,32.0,3.625021,11.487687,3.5,630.3333,4008.6,491.0,0.0,1.0,3275.308,...,0.0,1.0,0.0,0.0,1.0,0.0,1.0,212500.0,212440.766693,59.233307
2351,31.0,3.09166,26.543766,3.5,517.3333,2192.4749,163.0,0.0,1.0,2688.1428,...,0.0,1.0,1.0,0.0,1.0,0.0,1.0,179900.0,179836.972408,63.027592
114,22.0,3.709825,27.46122,2.5,571.3333,3013.4998,586.0,0.0,1.0,2968.735,...,0.0,1.0,1.0,0.0,1.0,0.0,1.0,196500.0,196432.555392,67.444608


In [49]:
test_df6.sort_values(by=['Prediction_Error', 'Sale_Price']).head(50).describe()

Unnamed: 0,Age,BC_Bsmt_Unf_SF,BC_External_SF,Baths,Bedroom_Liv_Area_Ratio,Bsmt_Eval,Bsmt_Unf_SF,Central_Air,Exter_Qual,Fireplace_Eval,...,bldg_type_3,garage_type_1,hs_style_1,neighborhoods_1,neighborhoods_3,neighborhoods_4,sale_cond_1,Sale_Price,Predicted,Prediction_Error
count,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50.0,...,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50.0
mean,36.92,3.353171,16.684932,2.24,467.651663,2921.436896,514.36,0.08,1.2,1556.34035,...,0.04,0.72,0.86,0.6,0.28,0.04,0.88,169408.02,169414.560893,212.034718
std,28.613426,0.920362,7.509941,0.770899,117.712326,962.812833,359.023208,0.274048,0.451754,1567.272741,...,0.197949,0.453557,0.35051,0.494872,0.453557,0.197949,0.328261,49510.929436,49504.483076,134.040695
min,0.0,0.0,0.0,1.0,288.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,95000.0,95099.161213,11.787151
25%,9.75,3.208697,12.101474,2.0,368.83333,2336.1,209.25,0.0,1.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,1.0,131675.0,131942.408786,99.147919
50%,39.5,3.589369,18.279463,2.0,455.16667,2824.3999,456.0,0.0,1.0,1577.89825,...,0.0,1.0,1.0,1.0,0.0,0.0,1.0,156450.0,156599.365027,176.818957
75%,50.0,3.803708,21.830725,2.5,553.833325,3693.11245,713.25,0.0,1.0,2732.696,...,0.0,1.0,1.0,1.0,1.0,0.0,1.0,204416.25,204176.851263,341.947519
max,118.0,4.114147,32.471565,4.0,703.5,4763.1997,1368.0,1.0,2.0,5207.8022,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,280000.0,280328.224381,444.752512


---

### Final Comment

The Elastic Net Regressor tested present a lot of variance in their results, which is hard to account for.

The most probable cause are the number of estimators.