# Advanced Classification Part 4 - Exercises

## Exercise 1

#### Task 1 
##### Load libraries that are used in this module.

#### Result:

#### Task 2
##### Define the directory settings.
#### Result:

#### Task 3
##### Load the clean pickled dataset `bank_clean.sav` and save as `bank`.
##### Load the pickled `metrics_forest_ex` dataframe and save as `metrics_gbm_ex`.
##### Print the head of the data.

#### Result:

#### Task 4
##### Select the predictors by dropping variable `y` and save the result to a dataframe `X_ex`.
##### Save the target variable `y` column to `y_ex` variable.
##### Set seed as 1.
##### Split the data into training and test sets with 70:30 ratio and save respective variables to `X_train_ex`, `X_test_ex`, `y_train_ex`, `y_test_ex`.

#### Result:

#### Task 5 
##### Instantiate a vanilla GMB model and name as `gbm_ex`.
##### Fit the model with train data.
#### Result:

#### Task 6
##### Predict the predictions and their probabilities and save as `gbm_y_predict_ex` and `gbm_y_predict_proba_ex` respectively.
##### Find the performance scores of the gbm model using function `get_performance_scores()` which we defined below. Save as `gbm_scores_ex`.


In [7]:
def get_performance_scores(y_test, y_predict, y_predict_prob, eps=1e-15, beta=0.5):

    from sklearn import metrics

    # Scores keys.
    metric_keys = ["accuracy", "precision", "recall", "f1", "fbeta", "log_loss", "AUC"]

    # Score values.
    metric_values = [None]*len(metric_keys)

    metric_values[0] = metrics.accuracy_score(y_test, y_predict)
    metric_values[1] = metrics.precision_score(y_test, y_predict)
    metric_values[2] = metrics.recall_score(y_test, y_predict)
    metric_values[3] = metrics.f1_score(y_test, y_predict)
    metric_values[4] = metrics.fbeta_score(y_test, y_predict, beta=beta)
    metric_values[5] = metrics.log_loss(y_test, y_predict_prob[:, 1], eps=eps)
    metric_values[6] = metrics.roc_auc_score(y_test, y_predict_prob[:, 1])

    perf_metrics = dict(zip(metric_keys, metric_values))

    return(perf_metrics)

#### Result:

#### Task 7
##### Plot precision-recall curve of the three models we have run till now.
#### Result:

#### Task 8
##### Similarly, plot the ROC curve of the three models.
#### Result:

#### Task 9
##### Update `metrics_gbm_ex` with our gbm scores `gbm_scores_ex`. Print results.
#### Result:

#### Task 10
##### Instantiate a gbm model again and create a grid of parameter ranges as we did in class.
##### Call the grid `random_grid_ex`.
#### Result:

#### Task 11
##### Instantiate the randomized model, call it `gbm_random_ex`.
##### Use 3-fold cross-validation with 100 different combinations.
##### Fit the model with `X_train_ex` and ` y_train_ex` and take a look at the `best_params_`.
#### Result:

#### Task 12
##### Now use the optimized parameters to implement the optimized gradient boosting model.
##### Name it `optimized_gbm_ex`.
##### Fit with train  data.
#### Result:

## Exercise 2

#### Task 1
##### Predict on the test data using our optimized gbm classifier `optimized_gbm_ex`.
##### Predict the predictions and their probabilities and save as `optimized_gbm_y_predict_ex` and `optimized_gbm_y_predict_proba_ex` respectively.
##### Find the performance scores of the optimized forest using function `get_performance_scores()` which we defined above.





#### Result:

#### Task 2
#### Now plot the precision-recall plot of the four models. Save the new gbm plot as `opt_gbm_prec_recall_ex`.

#### Result:

#### Task 3
#### Similarly, plot the ROC plot of the four models. Save the new gbm plot as `opt_gbm_roc_ex`.
#### Result:

#### Task 4
##### Update `metrics_gbm_ex` with our gbm scores `optimized_gbm_scores_ex`. Print results.
##### Convert `metrics_gbm_ex` into a pandas dataframe and create new column `metric` using the index.
##### Use `pd.melt()` to switch our dataframe to long format.
#### Result:

#### Task 5
##### Now plot the metrics for each model for comparison.
#### Result:

#### Task 6
##### We summarize the conversion to a dataframe and plots into a single function `compare_metrics()`.
##### Use the function to perform the above actions on `metrics_gbm_ex`.

In [24]:
def compare_metrics(metrics_dict, color_list = None):
    
    import pandas as pd
    import matplotlib.pyplot as plt
    
    metrics_df = pd.DataFrame(metrics_dict)
    metrics_df["metric"] = metrics_df.index
    metrics_df = metrics_df.reset_index(drop = True)

    metrics_long = pd.melt(metrics_df,
                           id_vars = "metric",
                           var_name = "model",
                           value_vars = list(metrics_dict.keys()))
    
    if color_list is None:
        cmap = plt.rcParams['axes.prop_cycle'].by_key()['color']
        colors = cmap[:len(metrics_dict.keys())]
    else:
        colors = color_list

    fig, axes = plt.subplots(2, 3, figsize = (15, 8))
    for (metric, group), ax in zip(metrics_long.groupby("metric"), axes.flatten()):
        group.plot(x = 'model', 
                   y = 'value', 
                   kind = 'bar',
                   color = colors,
                   ax = ax,
                   title = metric,
                   legend = None,
                   sharex = True)
        
        ax.xaxis.set_tick_params(rotation = 45)
    plt.tight_layout(0.5)
     
    return((fig, axes))

#### Result: