## Model Parameter

![model_parameter](model_parameter.png)

### Extracting a Logistic Regression parameter

In [None]:
# Create a list of original variable names from the training DataFrame
original_variables = X_train.columns

# Extract the coefficients of the logistic regression estimator
model_coefficients = log_reg_clf.coef_[0]

# Create a dataframe of the variables and coefficients & print it out
coefficient_df = pd.DataFrame({"Variable" : original_variables, "Coefficient": model_coefficients})
print(coefficient_df.head())

# Print out the top 3 positive variables
top_three_df = coefficient_df.sort_values(by='Coefficient', axis=0, ascending=False)[0:3]
print(top_three_df)

Variable   Coefficient

0     LIMIT_BAL -2.886513e-06

1           AGE -8.231685e-03

2         PAY_0  7.508570e-04

3         PAY_2  3.943751e-04

4         PAY_3  3.794236e-04

  Variable  Coefficient
    
2    PAY_0     0.000751

6    PAY_5     0.000438

5    PAY_4     0.000435

You have succesfully extracted and reviewed a very important parameter for the Logistic Regression Model. The coefficients of the model allow you to see which variables are having a larger or smaller impact on the outcome. Additionally the sign lets you know if it is a positive or negative relationship.

### Extracting a Random Forest parameter

You will now translate the work previously undertaken on the logistic regression model to a random forest model. A parameter of this model is, for a given tree, how it decided to split at each level.

This analysis is not as useful as the coefficients of logistic regression as you will be unlikely to ever explore every split and every tree in a random forest model. However, it is a very useful exercise to peak under the hood at what the model is doing.

In [None]:
# Extract the 7th (index 6) tree from the random forest
chosen_tree = rf_clf.estimators_[6]

# Visualize the graph using the provided image
imgplot = plt.imshow(tree_viz_image)
plt.show()

# Extract the parameters and level of the top (index 0) node
split_column = chosen_tree.tree_.feature[0]
split_column_name = X_train.columns[split_column]
split_value = chosen_tree.tree_.threshold[0]

# Print out the feature and level
print("This node split on feature {}, at a value of {}".format(split_column_name, split_value))

This node split on feature PAY_4, at a value of 1.0

![tree_image](tree_image.svg)

## Hyperparameters

![hyperparameter](hyperparameter.png)

![find_hyperparameters_that_matter](find_hyperparameters_that_matter.png)

Understanding what hyperparameters are available and the impact of different hyperparameters is a core skill for any data scientist. As models become more complex, there are many different settings you can set, but only some will have a large impact on your model.

In [None]:
# Print out the old estimator, notice which hyperparameter is badly set
print(rf_clf_old)

# Get confusion matrix & accuracy for the old rf_model
print("Confusion Matrix: \n\n {} \n Accuracy Score: \n\n {}".format(
  	confusion_matrix(y_test, rf_old_predictions),
  	accuracy_score(y_test, rf_old_predictions)))

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',

                max_depth=None, max_features='auto', max_leaf_nodes=None,
                
                min_impurity_decrease=0.0, min_impurity_split=None,
                
                min_samples_leaf=1, min_samples_split=2,
                
                min_weight_fraction_leaf=0.0, n_estimators=5, n_jobs=None,
                
                oob_score=False, random_state=42, verbose=0, warm_start=False)
                
    Confusion Matrix: 
    
     [[276  37]
     
     [ 64  23]] 
     
     Accuracy Score: 
    
     0.7475

Improve the above score by changing n_estimators

In [None]:
# Create a new random forest classifier with better hyperparamaters
rf_clf_new = RandomForestClassifier(n_estimators=500)

# Fit this to the data and obtain predictions
rf_new_predictions = rf_clf_new.fit(X_train, y_train).predict(X_test)

# Assess the new model (using new predictions!)
print("Confusion Matrix: \n\n", confusion_matrix(y_test, rf_new_predictions))
print("Accuracy Score: \n\n", accuracy_score(y_test, rf_new_predictions))

Confusion Matrix: 
    

 [[300  13]
  
 [ 63  24]]

Accuracy Score: 

 0.81
 
 We got a nice 5% accuracy boost just from changing the n_estimators

In [None]:
# Build a knn estimator for each value of n_neighbours
knn_5 = KNeighborsClassifier(n_neighbors=5)
knn_10 = KNeighborsClassifier(n_neighbors=10)
knn_20 = KNeighborsClassifier(n_neighbors=20)

# Fit each to the training data & produce predictions
knn_5_predictions = knn_5.fit(X_train, y_train).predict(X_test)
knn_10_predictions = knn_10.fit(X_train, y_train).predict(X_test)
knn_20_predictions = knn_20.fit(X_train, y_train).predict(X_test)

# Get an accuracy score for each of the models
knn_5_accuracy = accuracy_score(y_test, knn_5_predictions)
knn_10_accuracy = accuracy_score(y_test, knn_10_predictions)
knn_20_accuracy = accuracy_score(y_test, knn_20_predictions)
print("The accuracy of 5, 10, 20 neighbours was {}, {}, {}".format(knn_5_accuracy, knn_10_accuracy, knn_20_accuracy))

The accuracy of 5, 10, 20 neighbours was 0.7125, 0.765, 0.7825

## Hyperparameter Values
![hyperparameter_values](hyperparameter_values.png)

![conflicting_hyperparameter_choices](conflicting_hyperparameter_choices.png)

![silly_hyperparameter_values](silly_hyperparameter_values.png)


### Evaluating different hyperparamer values of single hyperparameter

Finding the best hyperparameter of interest without writing hundreds of lines of code for hundreds of models is an important efficiency gain that will greatly assist your future machine learning model building.

An important hyperparameter for the GBM algorithm is the learning rate. But which learning rate is best for this problem? By writing a loop to search through a number of possibilities, collating these and viewing them you can find the best one.

Possible learning rates to try include 0.001, 0.01, 0.05, 0.1, 0.2 and 0.5

In [None]:
# Set the learning rates & results storage
learning_rates = [0.001, 0.01, 0.05, 0.1, 0.2, 0.5]
results_list = []

# Create the for loop to evaluate model predictions for each learning rate
for lr in learning_rates:
    model = GradientBoostingClassifier(learning_rate=lr)
    predictions = model.fit(X_train, y_train).predict(X_test)
    # Save the learning rate and accuracy score
    results_list.append([lr, accuracy_score(y_test, predictions)])

# Gather everything into a DataFrame
results_df = pd.DataFrame(results_list, columns=['learning_rate', 'accuracy'])

learning_rate    accuracy

        0.001    0.7825
    
        0.010    0.8025
        
        0.050    0.8100
        
        0.100    0.7975
        
        0.200    0.7900
        
        0.500    0.7775
        
You efficiently tested a few different values for a single hyperparameter and can easily see which learning rate value was the best. Here, it seems that a learning rate of 0.05 yields the best accuracy.

### Building Learning Curves of different hyperparamer values of a single hyperparameter

If we want to test many different values for a single hyperparameter it can be difficult to easily view that in the form of a DataFrame. Previously you learned about a nice trick to analyze this. A graph called a 'learning curve' can nicely demonstrate the effect of increasing or decreasing a particular hyperparameter on the final result.

In [None]:
# Set the learning rates & accuracies list
learn_rates = np.linspace(0.01 , 2, num=30)
accuracies = []

# Create the for loop
for learn_rate in learn_rates:
  	# Create the model, predictions & save the accuracies as before
    model = GradientBoostingClassifier(learning_rate=learn_rate)
    predictions = model.fit(X_train, y_train).predict(X_test)
    accuracies.append(accuracy_score(y_test, predictions))

# Plot results    
plt.plot(learn_rates, accuracies)
plt.gca().set(xlabel='learning_rate', ylabel='Accuracy', title='Accuracy for different learning_rates')
plt.show()

![learning_curve](learning_curve.svg)

You can see that for low values, you get a pretty good accuracy. However once the learning rate pushes much above 1.5, the accuracy starts to drop. You have learned and practiced a useful skill for visualizing large amounts of results for a single hyperparameter.

## Grid Search

Number of models created by evaluating two hyperparameters with 5 and 10 values respectively. Even applying cross validation.

![grid_search_models](grid_search_models.png)

### Build Grid Search functions

In data science it is a great idea to try building algorithms, models and processes 'from scratch' so you can really understand what is happening at a deeper level. Of course there are great packages and libraries for this work (and we will get to that very soon!) but building from scratch will give you a great edge in your data science work.

In [None]:
# Create the function
def gbm_grid_search(learn_rate, max_depth):

	# Create the model
    model = GradientBoostingClassifier(learning_rate=learn_rate, max_depth=max_depth)
    
    # Use the model to make predictions
    predictions = model.fit(X_train, y_train).predict(X_test)
    
    # Return the hyperparameters and score
    return([learn_rate, max_depth, accuracy_score(y_test, predictions)])

You now have a function you can call to test different combinations of two hyperparameters for the GBM algorithm.

In [None]:
# Create the relevant lists
results_list = []
learn_rate_list = [0.01, 0.1, 0.5]
max_depth_list = [2, 4, 6]

# Create the for loop
for learn_rate in learn_rate_list:
    for max_depth in max_depth_list:
        results_list.append(gbm_grid_search(learn_rate,max_depth))

# Print the results
print(results_list) 

[[0.01, 2, 0.78], [0.01, 4, 0.78], [0.01, 6, 0.76], [0.1, 2, 0.74], [0.1, 4, 0.76], [0.1, 6, 0.75], [0.5, 2, 0.73], [0.5, 4, 0.74], [0.5, 6, 0.74]]

Extend the gbm_grid_search function to include the hyperparameter subsample. Name this new function gbm_grid_search_extended.


In [None]:
results_list = []
learn_rate_list = [0.01, 0.1, 0.5]
max_depth_list = [2,4,6]

# Extend the function input
def gbm_grid_search_extended(learn_rate, max_depth, subsample):

	# Extend the model creation section
    model = GradientBoostingClassifier(learning_rate=learn_rate, max_depth=max_depth, subsample=subsample)
    
    predictions = model.fit(X_train, y_train).predict(X_test)
    
    # Extend the return part
    return([learn_rate, max_depth, subsample, accuracy_score(y_test, predictions)])

Extend your loop to call gbm_grid_search (available in your console), then test the values [0.4 , 0.6] for the subsample hyperparameter and print the results. max_depth_list & learn_rate_list are available in your environment.

In [None]:
results_list = []

# Create the new list to test
subsample_list =  [0.4 , 0.6]

for learn_rate in learn_rate_list:
    for max_depth in max_depth_list:
    
    	# Extend the for loop
        for subsample in subsample_list:
        	
            # Extend the results to include the new hyperparameter
            results_list.append(gbm_grid_search_extended(learn_rate, max_depth, subsample))
            
# Print results
print(results_list)    

[[0.01, 2, 0.4, 0.73], [0.01, 2, 0.6, 0.74], [0.01, 4, 0.4, 0.73], [0.01, 4, 0.6, 0.75], [0.01, 6, 0.4, 0.72], [0.01, 6, 0.6, 0.78], [0.1, 2, 0.4, 0.74], [0.1, 2, 0.6, 0.74], [0.1, 4, 0.4, 0.73], [0.1, 4, 0.6, 0.73], [0.1, 6, 0.4, 0.74], [0.1, 6, 0.6, 0.76], [0.5, 2, 0.4, 0.64], [0.5, 2, 0.6, 0.67], [0.5, 4, 0.4, 0.72], [0.5, 4, 0.6, 0.71], [0.5, 6, 0.4, 0.63], [0.5, 6, 0.6, 0.64]]

You have effectively built your own grid search! You went from 2 to 3 hyperparameters and can see how you could extend that to even more values and hyperparameters. That was a lot of effort though. Be warned - we are now entering a world that can get very computationally expensive very fast!

## Grid Search

![grid_search_steps](grid_search_steps.png)

![grid_search_models](grid_search_models.png)

![grid_search_inputs](grid_search_inputs.png)

![estimator](estimator.png)

![param_grid](param_grid.png)

![cv](cv.png)

![scoring](scoring.png)

![refit](refit.png)

**Refit must be set to True in order to use the best_estimator_ property of the grid search object for fitting.**

![return_train_score](return_train_score.png)

The GridSearchCV module from Scikit Learn provides many useful features to assist with efficiently undertaking a grid search. You will now put your learning into practice by creating a GridSearchCV object with certain parameters.

The desired options are:

- A Random Forest Estimator, with the split criterion as 'entropy'
- 5-fold cross validation
- The hyperparameters max_depth (2, 4, 8, 15) and max_features ('auto' vs 'sqrt')
- Use roc_auc to score the models
- Use 4 cores for processing in parallel
- Ensure you refit the best model and return training scores

In [None]:
# Create a Random Forest Classifier with specified criterion
rf_class = RandomForestClassifier(criterion='entropy')

# Create the parameter grid
param_grid = {'max_depth': [2, 4, 8, 15], 'max_features': ['auto', 'sqrt']} 

# Create a GridSearchCV object
grid_rf_class = GridSearchCV(
    estimator=rf_class,
    param_grid=param_grid,
    scoring='roc_auc',
    n_jobs=4,
    cv=5,
    refit=True, return_train_score=True)
print(grid_rf_class)

You now understand all the inputs to a GridSearchCV object and can tune many different hyperparameters and many different values for each on a chosen algorithm!

## Grid Search Output

![grid_search_output](grid_search_output.png)

You will now explore the cv_results_ property of the GridSearchCV object defined in the video. This is a dictionary that we can read into a pandas DataFrame and contains a lot of useful information about the grid search we just undertook.

A reminder of the different column types in this property:

time_ columns
param_ columns (one for each hyperparameter) and the singular params column (with all hyperparameter settings)
a train_score column for each cv fold including the mean_train_score and std_train_score columns
a test_score column for each cv fold including the mean_test_score and std_test_score columns
a rank_test_score column with a number from 1 to n (number of iterations) ranking the rows based on their mean_test_score

In [None]:
# Read the cv_results property into a dataframe & print it out
cv_results_df = pd.DataFrame(grid_rf_class.cv_results_)
print(cv_results_df)

# Extract and print the column with a dictionary of hyperparameters used
column = cv_results_df.loc[:, ['params']]
print(column)

# Extract and print the row that had the best mean test score
best_row = cv_results_df[cv_results_df['rank_test_score'] == 1 ]
print(best_row)

You have build invaluable skills in looking 'under the hood' at what your grid search is doing by extracting and analysing the cv_results_ property.

### Analyzing the best results
At the end of the day, we primarily care about the best performing 'square' in a grid search. Luckily Scikit Learn's gridSearchCv objects have a number of parameters that provide key information on just the best square (or row in cv_results_).

Three properties you will explore are:

- best_score_ – The score (here ROC_AUC) from the best-performing square.
- best_index_ – The index of the row in cv_results_ containing information on the best-performing square.
- best_params_ – A dictionary of the parameters that gave the best score, for example 'max_depth': 10

In [None]:
# Print out the ROC_AUC score from the best-performing square
best_score = grid_rf_class.best_score_
print(best_score)

# Create a variable from the row related to the best-performing square
cv_results_df = pd.DataFrame(grid_rf_class.cv_results_)
best_row = cv_results_df.loc[[grid_rf_class.best_index_]]
print(best_row)

# Get the n_estimators parameter from the best-performing square and print
best_n_estimators = grid_rf_class.best_params_["n_estimators"]
print(best_n_estimators)

Being able to quickly find and prioritize the huge volume of information given back from machine learning modeling output is a great skill. Here you had great practice doing that with cv_results_ by quickly isolating the key information on the best performing square. This will be very important when your grids grow from 12 squares to many more!

In [None]:
# See what type of object the best_estimator_ property is
print(type(grid_rf_class.best_estimator_))

# Create an array of predictions directly using the best_estimator_ property
predictions = grid_rf_class.best_estimator_.predict(X_test)

# Take a look to confirm it worked, this should be an array of 1's and 0's
print(predictions[0:5])

# Now create a confusion matrix 
print("Confusion Matrix \n", confusion_matrix(y_test, predictions))

# Get the ROC-AUC score
predictions_proba = grid_rf_class.best_estimator_.predict_proba(X_test)[:,1]
print("ROC-AUC Score \n", roc_auc_score(y_test, predictions_proba))

[0 0 0 0 1]

Confusion Matrix 

 [[140   8]
 
 [ 36  16]]
 
ROC-AUC Score 

 0.7436330561330562

The .best_estimator_ property is a really powerful property to understand for streamlining your machine learning model building process. You now can run a grid search and seamlessly use the best model from that search to make predictions.

## Random Search

![random_search](random_search.png)

![why_random_search_works](why_random_search_works.png)

![probability_trick](probability_trick.png)

![probability_trick_missing](probability_trick_missing.png)

![probability_trick_getting](probability_trick_getting.png)

![probability_trick_advantage](probability_trick_advantage.png)

![random_search_important_notes](random_search_important_notes.png)


### Randomly Sample Hyperparameters
To undertake a random search, we firstly need to undertake a random sampling of our hyperparameter space.

In this exercise, you will firstly create some lists of hyperparameters that can be zipped up to a list of lists. Then you will randomly sample hyperparameter combinations preparation for running a random search.

In [None]:
# Create a list of values for the learning_rate hyperparameter
learn_rate_list = list(np.linspace(0.01,1.5,200))

# Create a list of values for the min_samples_leaf hyperparameter
min_samples_list = list(range(10,41))

# Combination list
combinations_list = [list(x) for x in product(learn_rate_list, min_samples_list)]

# Sample hyperparameter combinations for a random search.
random_combinations_index = np.random.choice(range(0, len(combinations_list)), 250, replace=False)
combinations_random_chosen = [combinations_list[x] for x in random_combinations_index]

# Print the result
print(combinations_random_chosen)

You generated some hyperparameter combinations and randomly sampled in that space.

### Randomly Search with Random Forest

- Create lists of the values 'gini' and 'entropy' for criterion & "auto", "sqrt", "log2", None for max_features.
- Create a list of values between 3 and 55 inclusive for the hyperparameter max_depth and assign to the list max_depth_list. - - Remember that range(N,M) will create a list from N to M-1.
- Combine these lists into a list of lists to sample from using product().
- Randomly sample 150 models from the combined list and print the result.

In [None]:
# Create lists for criterion and max_features
criterion_list = ['gini', 'entropy']
max_feature_list = ["auto", "sqrt", "log2", None]

# Create a list of values for the max_depth hyperparameter
max_depth_list = list(range(3,56))

# Combination list
combinations_list = [list(x) for x in product(criterion_list, max_feature_list, max_depth_list)]

# Sample hyperparameter combinations for a random search
combinations_random_chosen = random.sample(combinations_list, 150)

# Print the result
print(combinations_random_chosen)

This one was a bit harder but you managed to sample using text options and learned a new function to sample your lists.

### Visualizing a Random Search
Visualizing the search space of random search allows you to easily see the coverage of this technique and therefore allows you to see the effect of your sampling on the search space.

In [None]:
def sample_and_visualize_hyperparameters(n_samples):

  # If asking for all combinations, just return the entire list.
  if n_samples == len(combinations_list):
    combinations_random_chosen = combinations_list
  else:
    combinations_random_chosen = []
    random_combinations_index = np.random.choice(range(0, len(combinations_list)), n_samples, replace=False)
    combinations_random_chosen = [combinations_list[x] for x in random_combinations_index]
    
  # Pull out the X and Y to plot
  rand_y, rand_x = [x[0] for x in combinations_random_chosen], [x[1] for x in combinations_random_chosen]

  # Plot 
  plt.clf() 
  plt.scatter(rand_y, rand_x, c=['blue']*len(combinations_random_chosen))
  plt.gca().set(xlabel='learn_rate', ylabel='min_samples_leaf', title='Random Search Hyperparameters')
  plt.gca().set_xlim(x_lims)
  plt.gca().set_ylim(y_lims)
  plt.show()

In [None]:
# Confirm how many hyperparameter combinations & print
number_combs = len(combinations_list)
print(number_combs)

# Sample and visualise specified combinations
for x in [50, 500 , 1500 ]:
    sample_and_visualize_hyperparameters(x)
    
# Sample all the hyperparameter combinations & visualise
sample_and_visualize_hyperparameters(number_combs)

![50](50.svg)

![500](500.svg)

![1500](1500.svg)

![2000](2000.svg)

Those were some great viz you produced! Notice how the bigger your sample space of a random search the more it looks like a grid search?

![random_grid_diff](random_grid_diff.png)

![key_diff](key_diff.png)

## The RandomizedSearchCV Object
Just like the GridSearchCV library from Scikit Learn, RandomizedSearchCV provides many useful features to assist with efficiently undertaking a random search. You're going to create a RandomizedSearchCV object, making the small adjustment needed from the GridSearchCV object.

The desired options are:

- A default Gradient Boosting Classifier Estimator
- 5-fold cross validation
- Use accuracy to score the models
- Use 4 cores for processing in parallel
- Ensure you refit the best model and return training scores
- Randomly sample 10 models
- The hyperparameter grid should be for learning_rate (150 values between 0.1 and 2) and min_samples_leaf (all values between and including 20 and 64).

In [None]:
# Create the parameter grid
param_grid = {'learning_rate': np.linspace(0.1,2,150), 'min_samples_leaf': list(range(20,65))} 

# Create a random search object
random_GBM_class = RandomizedSearchCV(
    estimator = GradientBoostingClassifier(),
    param_distributions = param_grid,
    n_iter = 10,
    scoring='accuracy', n_jobs=4, cv = 5, refit=True, return_train_score = True)

# Fit to the training data
random_GBM_class.fit(X_train, y_train)

# Print the values used for both hyperparameters
print(random_GBM_class.cv_results_['param_learning_rate'])
print(random_GBM_class.cv_results_['param_min_samples_leaf'])

[1.1073825503355705 1.0691275167785235 0.4697986577181208
 1.2476510067114095 1.5664429530201343 1.7577181208053692
 1.859731543624161 1.5791946308724834 0.5463087248322147
 1.7577181208053692]

[47 54 61 30 63 32 60 43 38 27]

In [None]:
# Create the parameter grid
param_grid = {'max_depth': list(range(5,26)), 'max_features': ['auto' , 'sqrt']} 

# Create a random search object
random_rf_class = RandomizedSearchCV(
    estimator = RandomForestClassifier(n_estimators=80),
    param_distributions = param_grid, n_iter = 5,
    scoring='roc_auc', n_jobs=4, cv = 3, refit=True, return_train_score = True )

# Fit to the training data
random_rf_class.fit(X_train, y_train)

[18 11 10 22 10]

['sqrt' 'auto' 'sqrt' 'sqrt' 'auto']

## Comparing Grid and Random Search

![random_grid_similarities](random_grid_similarities.png)

![random_grid_differences](random_grid_differences.png)

![random_vs_grid](random_vs_grid.png)

As you saw, random search tests a larger space of values so is more likely to get close to the best score, given the same computational resources as Grid Search.

## Grid and Random Search Side by Side
Visualizing the search space of random and grid search together allows you to easily see the coverage that each technique has and therefore brings to life their specific advantages and disadvantages.

In this exercise, you will sample hyperparameter combinations in a grid search way as well as a random search way, then plot these to see the difference.

You will have available:

- combinations_list which is a list of combinations of learn_rate and min_samples_leaf for this algorithm
- The function visualize_search() which will make your hyperparameter combinations into X and Y coordinates and plot both grid and random search combinations on the same graph. It takes as input two lists of hyperparameter combinations.

In [None]:
# Sample grid coordinates
grid_combinations_chosen = combinations_list[0:300]

# Create a list of sample indexes
sample_indexes = list(range(0,len(combinations_list)))

# Randomly sample 300 indexes
random_indexes = np.random.choice(sample_indexes, 300, replace=False)

# Use indexes to create random sample
random_combinations_chosen = [combinations_list[index] for index in random_indexes]

# Call the function to produce the visualization
visualize_search(grid_combinations_chosen, random_combinations_chosen)

![grid_random_image](grid_random_image.svg)

You can really see how a grid search will cover a small area completely whilst random search will cover a much larger area but not completely.

![uninformed_informed_search](uninformed_informed_search.png)

## Informed Search: Coarse to Fine

### Visualizing Coarse to Fine

You're going to undertake the first part of a Coarse to Fine search. This involves analyzing the results of an initial random search that took place over a large search space, then deciding what would be the next logical step to make your hyperparameter search finer.

You have available:

- combinations_list - a list of the possible hyperparameter combinations the random search was undertaken on.
- results_df - a DataFrame that has each hyperparameter combination and the resulting accuracy of all 500 trials. Each hyperparameter is a column, with the header the hyperparameter name.
- visualize_hyperparameter() - a function that takes in a column of the DataFrame (as a string) and produces a scatter plot of this column's values compared to the accuracy scores. An example call of the function would be visualize_hyperparameter('accuracy')

In [None]:
# Confirm the size of the combinations_list
print(len(combinations_list))

# Sort the results_df by accuracy and print the top 10 rows
print(results_df.sort_values(by='accuracy', ascending=False).head(10))

# Confirm which hyperparameters were used in this search
print(results_df.columns)

# Call visualize_hyperparameter() with each hyperparameter in turn
visualize_hyperparameter('max_depth')
visualize_hyperparameter('min_samples_leaf')
visualize_hyperparameter('learn_rate')

![max_depth](max_depth.svg)

![min_samples_leaf](min_samples_leaf.svg)

![learn_rate](learn_rate.svg)

We have undertaken the first step of a Coarse to Fine search. Results clearly seem better when max_depth is below 20. learn_rates smaller than 1 seem to perform well too. There is not a strong trend for min_samples leaf though. 

### Coarse to Fine Iterations

You will now visualize the first random search undertaken, construct a tighter grid and check the results. You will have available:

- results_df - a DataFrame that has the hyperparameter combination and the resulting accuracy of all 500 trials. Only the hyperparameters that had the strongest visualizations from the previous exercise are included (max_depth and learn_rate)
- visualize_first() - This function takes no arguments but will visualize each of your hyperparameters against accuracy for your first random search.

In [None]:
# Use the provided function to visualize the first results
# visualize_first()

# Create some combinations lists & combine:
max_depth_list = list(range(1,21))
learn_rate_list = np.linspace(0.001,1,50)

# Call the function to visualize the second results
visualize_second()

![finer_max_depth](finer_max_depth.svg)

![finer_learn_rate](finer_learn_rate.svg)

You can see in the second example our results are all generally higher. There also appears to be a bump around max_depths between 5 and 10 as well as learn_rate less than 0.2 so perhaps there is even more room for improvement!

## Informed Search Bayesian Statistics

![bayes_intro](bayes_intro.png)

![bayes_rule](bayes_rule.png)

![bayes_rule_ctd](bayes_rule_ctd.png)

![bayes_in_medicine](bayes_in_medicine.png)

![bayes_in_medicine_ctd](bayes_in_medicine_ctd.png)

### Bayes Rule in Python
In this exercise you will undertake a practical example of setting up Bayes formula, obtaining new evidence and updating your 'beliefs' in order to get a more accurate result. The example will relate to the likelihood that someone will close their account for your online software product.

These are the probabilities we know:

- 7% (0.07) of people are likely to close their account next month
- 15% (0.15) of people with accounts are unhappy with your product (you don't know who though!)
- 35% (0.35) of people who are likely to close their account are unhappy with your product

In [None]:
# Assign probabilities to variables 
p_unhappy = 0.15
p_unhappy_close = 0.35

# Probabiliy someone will close
p_close = 0.07

# Probability unhappy person will close
p_close_unhappy = (p_unhappy_close * p_close) / p_unhappy
print(p_close_unhappy)

0.16333333333333336

You correctly were able to frame this problem in a Bayesian way, and update your beliefs using new evidence. There's a 16.3% chance that a customer, given that they are unhappy, will close their account.

## Bayesian Hyperparameter Tuning

![bayes_in_hyperparam_tuning](bayes_in_hyperparam_tuning.png)

## Bayesian Hyperparameter tuning with Hyperopt
![bayesian_hyperparam_tuning_hyperopt](bayesian_hyperparam_tuning_hyperopt.png)
In this example you will set up and run a bayesian hyperparameter optimization process using the package Hyperopt (already imported as hp for you). You will set up the domain (which is similar to setting up the grid for a grid search), then set up the objective function. Finally, you will run the optimizer over 20 iterations.

You will need to set up the domain using values:

- max_depth using quniform distribution (between 2 and 10, increasing by 2)
- learning_rate using uniform distribution (0.001 to 0.9)
Note that for the purpose of this exercise, this process was reduced in data sample size and hyperopt & GBM iterations. If you are trying out this method by yourself on your own machine, try a larger search space, more trials, more cvs and a larger dataset size to really see this in action!

In [None]:
# Set up space dictionary with specified hyperparameters
space = {'max_depth': hp.quniform('max_depth', 2, 10, 2),'learning_rate': hp.uniform('learning_rate', 0.001,0.9)}

# Set up objective function
def objective(params):
    params = {'max_depth': int(params['max_depth']),'learning_rate': params['learning_rate']}
    gbm_clf = GradientBoostingClassifier(n_estimators=100, **params) 
    best_score = cross_val_score(gbm_clf, X_train, y_train, scoring='accuracy', cv=2, n_jobs=4).mean()
    loss = 1 - best_score
    return loss

# Run the algorithm
best = fmin(fn=objective,space=space, max_evals=20, rstate=np.random.RandomState(42), algo=tpe.suggest)

  0%|          | 0/20 [00:00<?, ?it/s, best loss: ?]

  5%|5         | 1/20 [00:00<00:13,  1.44it/s, best loss: 0.26759418985474637]

 10%|#         | 2/20 [00:01<00:11,  1.53it/s, best loss: 0.2549063726593165] 

 15%|#5        | 3/20 [00:01<00:10,  1.63it/s, best loss: 0.2549063726593165]

 20%|##        | 4/20 [00:02<00:09,  1.63it/s, best loss: 0.2549063726593165]

 25%|##5       | 5/20 [00:03<00:12,  1.25it/s, best loss: 0.2549063726593165]

 30%|###       | 6/20 [00:04<00:11,  1.20it/s, best loss: 0.2549063726593165]

 35%|###5      | 7/20 [00:05<00:09,  1.34it/s, best loss: 0.2549063726593165]

 40%|####      | 8/20 [00:05<00:08,  1.37it/s, best loss: 0.2549063726593165]

 45%|####5     | 9/20 [00:06<00:07,  1.38it/s, best loss: 0.2549063726593165]

 50%|#####     | 10/20 [00:06<00:06,  1.52it/s, best loss: 0.2549063726593165]

 55%|#####5    | 11/20 [00:07<00:05,  1.56it/s, best loss: 0.2549063726593165]

 60%|######    | 12/20 [00:08<00:05,  1.51it/s, best loss: 0.2549063726593165]

 65%|######5   | 13/20 [00:09<00:04,  1.42it/s, best loss: 0.2549063726593165]

 70%|#######   | 14/20 [00:10<00:06,  1.04s/it, best loss: 0.2525688142203555]

 75%|#######5  | 15/20 [00:11<00:04,  1.12it/s, best loss: 0.2525688142203555]

 80%|########  | 16/20 [00:12<00:03,  1.22it/s, best loss: 0.2525688142203555]

 85%|########5 | 17/20 [00:13<00:03,  1.06s/it, best loss: 0.24246856171404285]

 90%|######### | 18/20 [00:14<00:01,  1.10it/s, best loss: 0.24246856171404285]

 95%|#########5| 19/20 [00:14<00:00,  1.22it/s, best loss: 0.24246856171404285]

100%|##########| 20/20 [00:15<00:00,  1.34it/s, best loss: 0.24246856171404285]

100%|##########| 20/20 [00:15<00:00,  1.29it/s, best loss: 0.24246856171404285]

    {'learning_rate': 0.11310589268581149, 'max_depth': 6.0}

You succesfully built your first bayesian hyperparameter tuning algorithm. This will be a very powerful tool for your machine learning modeling in future. Bayesian hyperparameter tuning is a new and popular method so this first taster is a valuable thing to gain experience in.

## Informed Search: Genetic Algorithms

![genetics_evolution_nature](genetics_evolution_nature.png)

![genetics_in_ML](genetics_in_ML.png)

![genetics_in_ML_advantages](genetics_in_ML_advantages.png)


## TPOT

![tpot_intro](tpot_intro.png)

![tpot_components](tpot_components.png)

![tpot_example](tpot_example.png)

In [None]:
# Assign the values outlined to the inputs
number_generations = 3
population_size = 4
offspring_size = 3
scoring_function = 'accuracy'

# Create the tpot classifier
tpot_clf = TPOTClassifier(generations=number_generations, population_size=population_size,
                          offspring_size=offspring_size, scoring=scoring_function,
                          verbosity=2, random_state=2, cv=2)

# Fit the classifier to the training data
tpot_clf.fit(X_train, y_train)

# Score on the test set
print(tpot_clf.score(X_test, y_test))

Warning: xgboost.XGBClassifier is not available and will not be used by TPOT.

Generation 1 - Current best internal CV score: 0.7575064376609415

Generation 2 - Current best internal CV score: 0.7750693767344183

Generation 3 - Current best internal CV score: 0.7750693767344183
    
Best pipeline: BernoulliNB(input_matrix, alpha=0.1, fit_prior=True)

0.76

You can see in the output the score produced by the chosen model (in this case a version of Naive Bayes) over each generation, and then the final accuracy score with the hyperparameters chosen for the final model. This is a great first example of using TPOT for automated hyperparameter tuning. You can now extend on this on your own and build great machine learning models!

### Analysing TPOT's stability

You will now see the random nature of TPOT by constructing the classifier with different random states and seeing what model is found to be best by the algorithm. This assists to see that TPOT is quite unstable when not run for a reasonable amount of time.

**Create the TPOT classifier, fit to the data and score using a random_state of 42.**

In [None]:
# Create the tpot classifier 
tpot_clf = TPOTClassifier(generations=2, population_size=4, offspring_size=3, scoring='accuracy', cv=2,
                          verbosity=2, random_state=42)

# Fit the classifier to the training data
tpot_clf.fit(X_train, y_train)

# Score on the test set
print(tpot_clf.score(X_test, y_test))

Warning: xgboost.XGBClassifier is not available and will not be used by TPOT.
    
    Generation 1 - Current best internal CV score: 0.7549688742218555
        
    Generation 2 - Current best internal CV score: 0.7549688742218555
    
    Best pipeline: DecisionTreeClassifier(input_matrix, criterion=gini, max_depth=7, min_samples_leaf=11, min_samples_split=12)
        
    0.75

**Now try using a random_state of 122. The numbers don't mean anything special, but should produce different results.**

In [None]:
# Create the tpot classifier 
tpot_clf = TPOTClassifier(generations=2, population_size=4, offspring_size=3, scoring='accuracy', cv=2,
                          verbosity=2, random_state=122)

# Fit the classifier to the training data
tpot_clf.fit(X_train, y_train)

# Score on the test set
print(tpot_clf.score(X_test, y_test))

Warning: xgboost.XGBClassifier is not available and will not be used by TPOT.
    
Generation 1 - Current best internal CV score: 0.7675066876671917
    
Generation 2 - Current best internal CV score: 0.7675066876671917
    
Best pipeline: KNeighborsClassifier(MaxAbsScaler(input_matrix), n_neighbors=57, p=1, weights=distance)
    
0.75

**Finally try using the random_state of 99. See how there is a different result again?**

In [None]:
# Create the tpot classifier 
tpot_clf = TPOTClassifier(generations=2, population_size=4, offspring_size=3, scoring='accuracy', cv=2,
                          verbosity=2, random_state=99)

# Fit the classifier to the training data
tpot_clf.fit(X_train, y_train)

# Score on the test set
print(tpot_clf.score(X_test, y_test))

Warning: xgboost.XGBClassifier is not available and will not be used by TPOT.
    
Generation 1 - Current best internal CV score: 0.8075326883172079
    
Generation 2 - Current best internal CV score: 0.8075326883172079

Best pipeline: RandomForestClassifier(SelectFwe(input_matrix, alpha=0.033), bootstrap=False, criterion=gini, max_features=1.0, min_samples_leaf=19, min_samples_split=10, n_estimators=100)
    
0.78

You can see that TPOT is quite unstable when only running with low generations, population size and offspring. The first model chosen was a Decision Tree, then a K-nearest Neighbor model and finally a Random Forest. Increasing the generations, population size and offspring and running this for a long time will assist to produce better models and more stable results. Don't hesitate to try it yourself on your own machine!