### Build Grid Search functions
In data science it is a great idea to try building algorithms, models and processes 'from scratch' so you can really understand what is happening at a deeper level. Of course there are great packages and libraries for this work (and we will get to that very soon!) but building from scratch will give you a great edge in your data science work.  

In this exercise, you will create a function to take in 2 hyperparameters, build models and return results. You will use this function in a future exercise.  

You will have available the X_train, X_test, y_train and y_test datasets available.

### Instructions
Build a function that takes two parameters called learn_rate and max_depth for the learning rate and maximum depth.  
Add capability in the function to build a GBM model and fit it to the data with the input hyperparameters.  
Have the function return the results of that model and the chosen hyperparameters (learn_rate and max_depth).
Take Hint (-30 XP)

In [None]:
# Create the function
def gbm_grid_search(learn_rate, max_depth):

	# Create the model
    model = GradientBoostingClassifier(learning_rate=learn_rate, max_depth=max_depth)
    
    # Use the model to make predictions
    predictions = model.fit(X_train, y_train).predict(X_test)
    
    # Return the hyperparameters and score
    return([learn_rate, max_depth, accuracy_score(y_test, predictions)])

### Iteratively tune multiple hyperparameters
In this exercise, you will build on the function you previously created to take in 2 hyperparameters, build a model and return the results. You will now use that to loop through some values and then extend this function and loop with another hyperparameter.   

The function gbm_grid_search(learn_rate, max_depth) is available in this exercise.   

If you need to remind yourself of the function you can run the function print_func() that has been created for you

### Instructions 
Write a for-loop to test the values (0.01, 0.1, 0.5) for the learning_rate and (2, 4, 6) for the max_depth using the function you created gbm_grid_search and print the results.  
Extend the gbm_grid_search function to include the hyperparameter subsample. Name this new function gbm_grid_search_extended.  
Extend your loop to call gbm_grid_search (available in your console), then test the values [0.4 , 0.6] for the subsample hyperparameter and print the results. max_depth_list & learn_rate_list are available in your environment.

In [None]:
# Create the relevant lists
results_list = []
learn_rate_list = [.01,.1,.5]
max_depth_list = [2,4,6]

# Create the for loop
for learn_rate in learn_rate_list:
    for max_depth in max_depth_list:
        results_list.append(gbm_grid_search(learn_rate,max_depth))

# Print the results
print(results_list)   

### GridSearchCV with Scikit Learn
The GridSearchCV module from Scikit Learn provides many useful features to assist with efficiently undertaking a grid search. You will now put your learning into practice by creating a GridSearchCV object with certain parameters.  
The desired options are:  
A Random Forest Estimator, with the split criterion as 'entropy'  
5-fold cross validation  
The hyperparameters max_depth (2, 4, 8, 15) and max_features ('auto' vs 'sqrt')  
Use roc_auc to score the models  
Use 4 cores for processing in parallel  
Ensure you refit the best model and return training scores  
You will have available X_train, X_test, y_train & y_test datasets.  
### Instructions
Create a Random Forest estimator as specified in the context above.  
Create a parameter grid as specified in the context above.  
Create a GridSearchCV object as outlined in the context above, using the two elements created in the previous two instructions.

In [None]:
# Create a Random Forest Classifier with specified criterion
rf_class = RandomForestClassifier(criterion='entropy')

# Create the parameter grid
param_grid = {'max_depth': [2,4,8,15], 'max_features': ['auto', 'sqrt']} 

# Create a GridSearchCV object
grid_rf_class = GridSearchCV(
    estimator=rf_class,
    param_grid=param_grid,
    scoring='roc_auc',
    n_jobs=4,
    cv=5,
    refit=True, return_train_score=True)
print(grid_rf_class)

### Exploring the grid search results
You will now explore the cv_results_ property of the GridSearchCV object defined in the video. This is a dictionary that we can read into a pandas DataFrame and contains a lot of useful information about the grid search we just undertook.   
A reminder of the different column types in this property:  
time_ columns  
param_ columns (one for each hyperparameter) and the singular params column (with all hyperparameter settings)  
a train_score column for each cv fold including the mean_train_score and std_train_score columns  
a test_score column for each cv fold including the mean_test_score and std_test_score columns  
a rank_test_score column with a number from 1 to n (number of iterations) ranking the rows based on their mean_test_score  
### Instructions
Read the cv_results_ property of the grid_rf_class GridSearchCV object into a data frame & print the whole thing out to inspect.  
Extract & print the singular column containing a dictionary of all hyperparameters used in each iteration of the grid search.  
Extract & print the row that had the best mean test score by indexing using the rank_test_score column.

In [None]:
# Read the cv_results property into a dataframe & print it out
cv_results_df = pd.DataFrame(grid_rf_class.cv_results_)
print(cv_results_df)

# Extract and print the column with a dictionary of hyperparameters used
column = cv_results_df.loc[:, ['params']]
print(column)

# Extract and print the row that had the best mean test score
best_row = cv_results_df[cv_results_df['rank_test_score'] == 1 ]
print(best_row)

### Analyzing the best results
At the end of the day, we primarily care about the best performing 'square' in a grid search. Luckily Scikit Learn's gridSearchCv objects have a number of parameters that provide key information on just the best square (or row in cv_results_).

Three properties you will explore are:

best_score_ – The score (here ROC_AUC) from the best-performing square.   
best_index_ – The index of the row in cv_results_ containing information on the best-performing square.  
best_params_ – A dictionary of the parameters that gave the best score, for example 'max_depth': 10  
The grid search object grid_rf_class is available.  

A dataframe (cv_results_df) has been created from the cv_results_ for you on line 6. This will help you index into the results.

### Instructions
Extract and print out the ROC_AUC score from the best performing square in grid_rf_class.
Create a variable from the best-performing row by indexing into cv_results_df.
Create a variable, best_n_estimators by extracting the n_estimators parameter from the best-performing square in grid_rf_class and print it out.

### Using the best results
While it is interesting to analyze the results of our grid search, our final goal is practical in nature; we want to make predictions on our test set using our estimator object.

We can access this object through the best_estimator_ property of our grid search object.

In this exercise we will take a look inside the best_estimator_ property and then use this to make predictions on our test set for credit card defaults and generate a variety of scores. Remember to use predict_proba rather than predict since we need probability values rather than class labels for our roc_auc score. We use a slice [:,1] to get probabilities of the positive class.

You have available the X_test and y_test datasets to use and the grid_rf_class object from previous exercises.

### Instructions
Check the type of the best_estimator_ property.  
Use the best_estimator_ property to make predictions on our test set.  
Generate a confusion matrix and ROC_AUC score from our predictions.

In [None]:
# See what type of object the best_estimator_ property is
print(type(grid_rf_class.best_estimator_))

# Create an array of predictions directly using the best_estimator_ property
predictions = grid_rf_class.best_estimator_.predict(X_test)

# Take a look to confirm it worked, this should be an array of 1's and 0's
print(predictions[0:5])

# Now create a confusion matrix 
print("Confusion Matrix \n", confusion_matrix(y_test, predictions))

# Get the ROC-AUC score
predictions_proba = grid_rf_class.best_estimator_.predict_proba(X_test)[:,1]
print("ROC-AUC Score \n", roc_auc_score(y_test, predictions_proba))