<hr style="height:.9px;border:none;color:#333;background-color:#333;" />
<hr style="height:.9px;border:none;color:#333;background-color:#333;" />

<br><h2>Script 09 | Hyperparameter Tuning</h2>
<br>
Written by Chase Kusterer<br>
<a href="https://github.com/chase-kusterer">GitHub</a> | <a href="https://www.linkedin.com/in/kusterer/">LinkedIn</a>
<br><br><br>

<hr style="height:.9px;border:none;color:#333;background-color:#333;" />
<hr style="height:.9px;border:none;color:#333;background-color:#333;" />

<h2>Part I: Preparation</h2><br>
Run the following code to import necessary packages, load data, and set display options for pandas. 

In [None]:
# importing libraries
import matplotlib.pyplot as plt                        # data visualization
import pandas as pd                                    # data science essentials
from sklearn.model_selection import train_test_split   # train-test split
from sklearn.tree import DecisionTreeRegressor         # regression trees
from sklearn.tree import plot_tree                     # tree plots
from sklearn.model_selection import RandomizedSearchCV # hyperparameter tuning
import warnings                                        # warnings from code



# setting pandas print options
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)


# suppressing warnings
warnings.filterwarnings(action = 'ignore')


# specifying the path and file name
file = './datasets/housing_feature_rich.xlsx'


# reading the file into Python
housing = pd.read_excel(io     = file,
                        header = 0   )


housing.drop(labels  = ['property_id'],
             axis    = 1,
             inplace = True)


# checking housing dataset
housing.head(n = 5)

<br>

In [None]:
#################
## full models ##
#################

# all x-data
x_all = list(housing.drop(labels  = ['Sale_Price', 'log_Sale_Price'],
                          axis    = 1))

# original x-data
x_original = list(housing.loc[ : , 'Lot_Area' : 'Porch_Area' ])



################
## original y ##
################

# best base model 
x_base = ['Mas_Vnr_Area',  'Total_Bsmt_SF', 'First_Flr_SF',
          'Second_Flr_SF', 'Garage_Area']


# best model after feature engineering
x_rich = ['Lot_Area', 'Garage_Cars', 'Overall_Qual', 'Total_Bsmt_SF',
          'NridgHt', 'Kitchen_AbvGr', 'has_Second_Flr',
          'Mas_Vnr_Area', 'has_Garage', 'Porch_Area',
          'NWAmes', 'OldTown', 'Overall_Cond', 'NAmes',
          'Edwards', 'Somerst', 'Fireplaces', 'Second_Flr_SF',
          'First_Flr_SF', 'has_Mas_Vnr', 'CulDSac', 'Total_Bath',
          'Crawfor', 'Garage_Area', 'has_Porch']



###################
## logarithmic y ##
###################

# best model after feature engineering (log y)
x_rich_log_y = ['Lot_Area', 'First_Flr_SF', 'Second_Flr_SF', 'Garage_Cars' ,
                'Overall_Qual', 'Overall_Cond', 'Total_Bsmt_SF', 'OldTown',
                'Kitchen_AbvGr', 'Total_Bath', 'has_Second_Flr', 'NridgHt',
                'Fireplaces', 'Porch_Area', 'Somerst', 'CollgCr', 'Crawfor',
                'CulDSac', 'NWAmes', 'Edwards', 'Gilbert']



########################
## response variables ##
########################
original_y = 'Sale_Price'
log_y      = 'log_Sale_Price'

<br>

In [None]:
# preparing x-data
x_data = housing[ x_original ]

# preparing y-data
y_data = housing[ original_y ]

<br>

In [None]:
# train-test split
x_train, x_test, y_train, y_test = train_test_split(x_data, # x
                                                    y_data, # y
                                                    test_size    = 0.25,
                                                    random_state = 702 )

<hr style="height:.9px;border:none;color:#333;background-color:#333;" /><br>

<strong>The Analytics Kitchen</strong><br>
Model selection can be though of as selecting from the various appliances that can be used for cooking. Hyperparameter tuning can be thought of as an extension of this. For example, if we wanted to cook something in the oven, how hot should the oven be in order to get the best results? How does this compare to using a microwave given its best settings for the job (time, wattage, etc.)?<br><br>
In the same way that we might adjust the temperature of an oven, we can make adjustments to the <strong>hyperparameters</strong> of a machine learning algorithm in order to optimize its results. <a href = "https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning)">This Wikipedia page</a> does an excellent job of defining a hyperparameter as: <em>a parameter whose value is set before the learning process begins</em>. In other words, these are arguments that are set before a model is fit and predictions are made. Hyperparameters can be found in the optional arguments of a model object.

<hr style="height:.9px;border:none;color:#333;background-color:#333;" /><br>

<h2>Part II: Hyperparameter Tuning with RandomizedSearchCV</h2><br>

We could manually analyze each combination of hyperparameter values one by one, but that would take a very long time. Instead, we can automate this process using <a href="https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html">RandomizedSearchCV</a> from scikit-learn.<br><br>
<strong>Note:</strong> RandomizedSearchCV searches various combinations of hyperparameters, optimizing for a given metric. <font color='red'><strong>This can take a LONG time.</strong></font> To alleviate this, make sure your ranges are reasonably small.

<hr style="height:.9px;border:none;color:#333;background-color:#333;" /><br>

<h4>a) Use the help file to tune hyperparameters for a decision tree regressor model.

In [None]:
help(DecisionTreeRegressor)

<br>

In [None]:
# declaring a hyperparameter space
criterion_range = ["squared_error", "friedman_mse", "absolute_error", "poisson"]
#splitter_range  = _____
#depth_range     = _____
#leaf_range      = _____


# creating a hyperparameter grid
param_grid = {'criterion' : criterion_range,}
              #'NAME OF HYPERPARAMETER' : HYPERPARAMETER_RANGE,
              #'NAME OF HYPERPARAMETER' : HYPERPARAMETER_RANGE,
              #'NAME OF HYPERPARAMETER' : HYPERPARAMETER_RANGE}


# INSTANTIATING the model object without hyperparameters
tuned_tree = DecisionTreeRegressor(random_state = 219)


# RandomizedSearchCV object
tuned_tree_cv = RandomizedSearchCV(estimator             = tuned_tree,
                                   param_distributions   = param_grid,
                                   cv                    = 5,
                                   n_iter                = 1000,
                                   random_state          = 702)


# FITTING to the FULL DATASET (due to cross-validation)
tuned_tree_cv.fit(x_data, y_data)


# printing the optimal parameters and best score
print("Tuned Parameters  :", tuned_tree_cv.best_params_)
print("Tuned Training AUC:", tuned_tree_cv.best_score_.round(4))

In [None]:
# declaring a hyperparameter space
criterion_range = ["mse", "friedman_mse", "mae", "poisson"]
splitter_range  = ['best', 'random']
depth_range     = np.arange(1, 11, 1)
leaf_range      = np.arange(1, 1001, 1)


# creating a hyperparameter grid
param_grid = {'criterion'        : criterion_range,
              'splitter'         : splitter_range,
              'max_depth'        : depth_range,
              'min_samples_leaf' : leaf_range}


# INSTANTIATING the model object without hyperparameters
tuned_tree = DecisionTreeRegressor(random_state = 219)


# RandomizedSearchCV object
tuned_tree_cv = RandomizedSearchCV(estimator             = tuned_tree,
                                   param_distributions   = param_grid,
                                   cv                    = 5,
                                   n_iter                = 1000,
                                   random_state          = 702)


# FITTING to the FULL DATASET (due to cross-validation)
tuned_tree_cv.fit(x_data, y_data)


# printing the optimal parameters and best score
print("Tuned Parameters  :", tuned_tree_cv.best_params_)
print("Tuned Training AUC:", tuned_tree_cv.best_score_.round(4))

<hr style="height:.9px;border:none;color:#333;background-color:#333;" /><br>

<h4>b) Build a classification tree model based on the hyperparameter tuning results.</h4>

In [None]:
# naming the model
model_name = _____


# INSTANTIATING a logistic regression model with tuned values
model = DecisionTreeRegressor(_____)


# FITTING to the TRAINING data
model.fit(x_train, y_train)


# PREDICTING based on the testing set
model.predict(x_test)


# SCORING results
model_train_score = model.score(x_train, y_train).round(4)
model_test_score  = model.score(x_test, y_test).round(4)
model_gap         = abs(model_train_score - model_test_score).round(4)


# displaying results
print('Training Score :', model_train_score)
print('Testing Score  :', model_test_score)
print('Train-Test Gap :', model_gap)

In [None]:
# building a model based on hyperparameter tuning results

# INSTANTIATING a logistic regression model with tuned values
model = DecisionTreeRegressor(splitter         = 'best',
                              min_samples_leaf = 19,
                              max_depth        = 10,
                              criterion        = 'mae',
                              random_state     = 702)


# FITTING to the TRAINING data
model_fit = model.fit(x_train, y_train)


# PREDICTING based on the testing set
model_pred = model.predict(x_test)


# SCORING the results
model_train_score = model.score(x_train, y_train).round(4) # using R-square
model_test_score  = model.score(x_test, y_test).round(4)   # using R-square
model_gap         = abs(model_train_score - model_test_score).round(4)


# displaying results
print('Training Score :', model_train_score)
print('Testing Score  :', model_test_score)
print('Train-Test Gap :', model_gap)

<hr style="height:.9px;border:none;color:#333;background-color:#333;" /><br>

<h4>c) (Optional) Plot the tree graphically.</h4>

In [None]:
# setting figure size
plt.figure(figsize=(60, 20))


# developing a plotted tree
_____


# rendering the plot
_____

In [None]:
# setting figure size
plt.figure(figsize=(60, 20))


# developing a plotted tree
plot_tree(decision_tree = model, 
          feature_names = x_train.columns,
          filled        = True, 
          rounded       = True, 
          fontsize      = 14)


# rendering the plot
plt.show()

<hr style="height:.9px;border:none;color:#333;background-color:#333;" /><br>

<h2>Part III: Analyzing Hyperparameter Results</h2><br>
The following codes will help in analyzing the results of hyperparameter tuning.

In [None]:
tuned_tree_cv.cv_results_

<br>

In [None]:
def tuning_results(cv_results, n=5):
    """
    This function will display the top "n" models from hyperparameter tuning,
    based on "rank_test_score".

    PARAMETERS
    ----------
    cv_results = results dictionary from the attribute ".cv_results_"
    n          = number of models to display
    """
    param_lst = []

    for result in cv_results["params"]:
        result = str(result).replace(":", "=")
        param_lst.append(result[1:-1])


    results_df = pd.DataFrame(data = {
        "Model_Rank" : cv_results["rank_test_score"],
        "Mean_Test_Score" : cv_results["mean_test_score"],
        "SD_Test_Score" : cv_results["std_test_score"],
        "Parameters" : param_lst
    })


    results_df = results_df.sort_values(by = "Model_Rank", axis = 0)
    return results_df.head(n = n)

<br>

In [None]:
help(tuning_results)

<hr style="height:.9px;border:none;color:#333;background-color:#333;" /><br>

<h4>a) Use the function to see the top ranked models after hyperparameter tuning.</h4>

In [None]:
# run tuning_results() on the hyperparameter tuning results
tuning_results(cv_results = tuned_tree_cv.cv_results_, n = 5)

<hr style="height:.9px;border:none;color:#333;background-color:#333;" />
<hr style="height:.9px;border:none;color:#333;background-color:#333;" />

~~~
 _____             _                   
/__   \_   _ _ __ (_)_ __   __ _       
  / /\/ | | | '_ \| | '_ \ / _` |      
 / /  | |_| | | | | | | | | (_| |      
 \/    \__,_|_| |_|_|_| |_|\__, |      
                           |___/       
 _____                           _     
/__   \_____      ____ _ _ __ __| |___ 
  / /\/ _ \ \ /\ / / _` | '__/ _` / __|
 / / | (_) \ V  V / (_| | | | (_| \__ \
 \/   \___/ \_/\_/ \__,_|_|  \__,_|___/
                                       
 __                                _   
/ _\_   _  ___ ___ ___  ___ ___   / \  
\ \| | | |/ __/ __/ _ \/ __/ __| /  /  
_\ \ |_| | (_| (_|  __/\__ \__ \/\_/   
\__/\__,_|\___\___\___||___/___/\/     
                                       


~~~


<hr style="height:.9px;border:none;color:#333;background-color:#333;" />
<hr style="height:.9px;border:none;color:#333;background-color:#333;" />

<br> 