<a href="https://colab.research.google.com/github/aml21/Data-Analytics-Notebooks/blob/main/XGBoost_ML_Exemplar.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Build an XGBoost model with Python

## 1 XGBoost tuning

Throughout the following exercises, you will learn to use Python to construct and interpret an XGBoost classification model using the XGBoost modeling library. Before starting on this pro
gramming exercise, we strongly recommend watching the video lecture and completing the IVQ for
the associated topics.

All the information you need for solving this assignment is in this notebook, and all the code you
will be implementing will take place within this notebook.

Topics of focus include:

    • Relevant import statements
    • Fitting a model
    • Using GridSearchCV to cross-validate the model and tune the following hyperparameters:
      – max_depth
      – min_child_weight
      – learning_rate
      – n_estimators
    • Model evaluation using precision, recall, and F1 score
    • Examining feature importance

 As we move forward, you can find instructions on how to install required libraries as they arise in
 this notebook.

 ### 1.1 Review

 This notebook is a continuation of the bank churn project. Below is a recap of the considerations
 and decisions that we’ve already made. For detailed discussion of these topics, refer back to the
 fully annotated notebook for the decision tree model.

    Modeling objective: To predict whether a customer will churn—a binary classification task
    Target variable: Exited column—0 or 1
    Class balance: The data is imbalanced 80/20 (not churned/churned), but we will not perform class balancing.
    Primary evaluation metric: F1 score
    Modeling workflow and model selection: The champion model will be the model with the best validation F1 score. Only the champion model will be used to predict on the test data. See the Section ?? for details and limitations of this approach.

 ### 1.2 Import statements

 Before we begin with the exercises and analyzing the data, we need to import all libraries and
 extensions required for this programming exercise. Throughout the course, we will be using numpy
 and pandas for operations, and matplotlib for plotting.

In [12]:
import numpy as np
import pandas as pd
# This is the classifier
from xgboost import XGBClassifier

# This is the function that helps plot feature importance
from xgboost import plot_importance
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score,\
f1_score, confusion_matrix, ConfusionMatrixDisplay, RocCurveDisplay
import matplotlib.pyplot as plt

# This displays all of the columns, preventing Juptyer from redacting them.
pd.set_option('display.max_columns', None)

# This module lets us save our models once we fit them.
import pickle

 ### 1.3 Read in the data

In [13]:
# Read in data
file = 'Churn_Modelling.csv'
df_original = pd.read_csv(file)
df_original.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


 ## 1.4 Feature engineering

### 1.4.1 Feature selection

 In this step, we’ll prepare the data for modeling. Notice from above that there are a number of
 columns that we wouldn’t expect to offer any predictive signal to the model. These columns include
 RowNumber, CustomerID, and Surname. We’ll drop these columns so they don’t introduce noise to
 our model.

 We’ll also drop the Gender column, because we don’t want our model to make predictions based
 on gender.

In [14]:
# Drop useless and sensitive (Gender) cols
churn_df = df_original.drop(['RowNumber', 'CustomerId', 'Surname', 'Gender'],
                            axis=1)
churn_df.head()

Unnamed: 0,CreditScore,Geography,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,619,France,42,2,0.0,1,1,1,101348.88,1
1,608,Spain,41,1,83807.86,1,0,1,112542.58,0
2,502,France,42,8,159660.8,3,1,0,113931.57,1
3,699,France,39,1,0.0,2,0,0,93826.63,0
4,850,Spain,43,2,125510.82,1,1,1,79084.1,0


### 1.4.2 Feature transformation

 Next, we’ll dummy encode the Geography variable, which is categorical. We do this with the pd.get_dummies() function and setting drop_first='True', which replaces the Geography column with two new Boolean columns called Geography_Germany and Geography_Spain.

In [15]:
# Dummy encode categoricals
churn_df2 = pd.get_dummies(churn_df, drop_first='True')
churn_df2.head()

Unnamed: 0,CreditScore,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited,Geography_Germany,Geography_Spain
0,619,42,2,0.0,1,1,1,101348.88,1,False,False
1,608,41,1,83807.86,1,0,1,112542.58,0,False,True
2,502,42,8,159660.8,3,1,0,113931.57,1,False,False
3,699,39,1,0.0,2,0,0,93826.63,0,False,False
4,850,43,2,125510.82,1,1,1,79084.1,0,False,True


## 1.5 Split the data

We’ll split the data into features and target variable, and into training data and test data using the train_test_split() function.

Don’t forget to include the stratify=y parameter, as this is what ensures that the 80/20 class ratio of the target variable is maintained in both the training and test datasets after splitting.

Lastly, we set a random seed so we and others can reproduce our work.

In [16]:
# Define the y (target) variable
y = churn_df2["Exited"]
# Define the X (predictor) variables
X = churn_df2.copy()
X = X.drop("Exited", axis = 1)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, stratify=y, random_state=42)

## 1.6 Modeling

### 1.6.1 Cross-validated hyperparameter tuning

The cross-validation process is the same as it was for the decision tree and random forest models.
 The only difference is that we’re tuning different hyperparameters now. The steps are included
 below as a review.

For details on cross-validating with GridSearchCV, refer back to the decision tree notebook, or to
 the GridSearchCV documentation in scikit-learn.
 1. Instantiate the classifier (and set the random_state). Note here that we’ve included a pa
rameter called objective whose value is binary:logistic. This means that the model
 is performing a binary classification task that outputs a logistic probability. The objective
 would be different for different kinds of problems—for instance, if you were trying to predict
 more than two classes or performing a linear regression on continuous data. Refer to the
 XGBoost documentation for more information.
 2. Create a dictionary of hyperparameters to search over.
 3. Create a dictionary of scoring metrics to capture.
 4. Instantiate the GridSearchCV object. Pass as arguments:

    • The classifier (xgb)

    • The dictionary of hyperparameters to search over (cv_params)

    • The dictionary of scoring metrics (scoring)

    • The number of cross-validation folds you want (cv=5)
    
    • The scoring metric that you want GridSearch to use when it selects the “best” model (i.e., the model that performs best on average over all validation folds) (refit='f1')

 5. Fit the data (X_train, y_train) to the GridSearchCV object (xgb_cv)
 Note that we use the %%time magic at the top of the cell where we fit the model. This outputs the
 final runtime of the cell.

In [17]:
xgb = XGBClassifier(objective='binary:logistic', random_state=0)

cv_params = {'max_depth': [4,5,6,7,8],
             'min_child_weight': [1,2,3,4,5],
             'learning_rate': [0.1, 0.2, 0.3],
             'n_estimators': [75, 100, 125]
              }
scoring = {'accuracy': 'accuracy', 'precision': 'precision', 'recall': 'recall', 'f1': 'f1'}

xgb_cv = GridSearchCV(xgb, cv_params, scoring=scoring, cv=5, refit='f1')

 **Note:** The following operation may take over 30 minutes to complete

In [18]:
%%time
xgb_cv.fit(X_train, y_train)

CPU times: user 6min 23s, sys: 3.64 s, total: 6min 26s
Wall time: 3min 53s


 ## 1.7 Pickle

 We’ll pickle the model so we don’t have to refit it every time we run this notebook. Remember,
 there are three steps:
 1. Define the path to the location where it will save
 2. Write the file (i.e., save the model)
 3. Read the model back in

In [19]:
path = '/content/'

In [20]:
# Pickle the model
with open(path + 'xgb_cv_model.pickle', 'wb') as to_write:
     pickle.dump(xgb_cv, to_write)

In [21]:
# Open pickled model
with open(path+'xgb_cv_model.pickle', 'rb') as to_read:
     xgb_cv = pickle.load(to_read)

Don’t forget to go back and comment out the line of code where you fit the model and the code
 that writes the pickle!

Let’s check our model’s score and compare it to our random forest’s score on the same cross
validated train data. We’ll have to import the pickled random forest model. This is where pickling
 comes in handy!

In [None]:
# Open pickled random forest model
with open(path +'rf_cv_model.pickle', 'rb') as to_read:
  rf_cv = pickle.load(to_read)

rf_cv.fit(X_train, y_train)

print('F1 score random forest CV: ', rf_cv.best_score_)
print('F1 score XGB CV: ', xgb_cv.best_score_)

We’ll use the same helper function we used in previous notebooks to organize our results into a
 dataframe.

In [None]:
def make_results(model_name, model_object):
 '''
 Accepts as arguments a model name (your choice- string) and
 a fit GridSearchCV model object.
 Returns a pandas df with the F1, recall, precision, and accuracy scores
 for the model with the best mean F1 score across all validation folds.
 '''
 # Get all the results from the CV and put them in a df
 cv_results = pd.DataFrame(model_object.cv_results_)

 # Isolate the row of the df with the max(mean f1 score)
 best_estimator_results = cv_results.iloc[cv_results['mean_test_f1'].idxmax(), :]

 # Extract accuracy, precision, recall, and f1 score from that row
 f1 = best_estimator_results.mean_test_f1
 recall = best_estimator_results.mean_test_recall
 precision = best_estimator_results.mean_test_precision
 accuracy = best_estimator_results.mean_test_accuracy

 # Create table of results
 table = pd.DataFrame({'Model': [model_name],
                      'F1': [f1],
                      'Recall': [recall],
                      'Precision': [precision],
                      'Accuracy': [accuracy]
                      }
                      )
 return table

In [None]:
# Create xgb model results table
xgb_cv_results = make_results('XGBoost CV', xgb_cv)
xgb_cv_results

Now we’ll read back in the master results table from the last notebook and concatenate it with the
 results we just created above.

In [None]:
# Read in master results table
results = pd.read_csv(path+'results2.csv')

# Concatentate xgb model results table with master results table
results = pd.concat([xgb_cv_results, results]).sort_values(by=['F1'],
ascending=False)
results

## 1.8 Model selection and final results


### 1.8.1 Predicting on the test data

We’re ready to select a champion model! Based on the above table, it’s clear that our XGBoost
 model has the top F1 score on the validation data by a small margin.
 Since we won’t be building any more models, we can at last use our champion model (XGBoost)
 to predict on the test data.

In [None]:
# Predict on test data
xgb_cv_preds = xgb_cv.predict(X_test)
print('F1 score final XGB model: ', f1_score(y_test, xgb_cv_preds))
print('Recall score final XGB model: ', recall_score(y_test, xgb_cv_preds))
print('Precision score final XGB model: ', precision_score(y_test,xgb_cv_preds))
print('Accuracy score final XGB model: ', accuracy_score(y_test, xgb_cv_preds))

Wow! The final model performed even better on the test data than it did on the validation data. This is unusual. Typically, performance on test data is a little worse than on validation data. Butthe difference here is small, so it’s not cause for concern.

Let’s check our confusion matrix.

### 1.8.2 Confusion matrix

In [None]:
# Create helper function to plot confusion matrix
def conf_matrix_plot(model, x_data, y_data):
 '''
 Accepts as argument model object, X data (test or validate), and y data (test or validate).
 Returns a plot of confusion matrix for predictions on y data.
 '''
 model_pred = model.predict(x_data)
 cm = confusion_matrix(y_data, model_pred, labels=model.classes_)
 disp = ConfusionMatrixDisplay(confusion_matrix=cm,
                               display_labels=model.classes_)
 disp.plot()
 plt.show()

In [None]:
conf_matrix_plot(xgb_cv, X_test, y_test)

From the 2,500 people in our test data, there are 509 customers who left the bank. Of those, our
 model captures 256. The confusion matrix indicates that, when the model makes an error, it’s
 usually a Type II error—it gives a false negative by failing to predict that a customer will leave.
 On the other hand, it makes far fewer Type I errors, which are false positives.

 Ultimately, whether these results are acceptable depends on the costs of the measures taken to
 prevent a customer from leaving versus the value of retaining them. In this case, bank leaders may
 decide that they’d rather have more true positives, even if it means also capturing significantly
 more false positives. If so, perhaps optimizing the models based on their F1 scores is insufficient.
 Maybe we’d prioritize a different evaluation metric.

 One way to modify the decision-making without retraining the model is to adjust the threshold at
 which the model predicts a positive response. In other words, the model determines a probability
 that a given customer will churn. By default, if that probability is ￿ 0.50, then the model will
 label that customer as churned. Probabilities of < 0.50 would designate a non-responder. But
 it’s possible to adjust this decision threshold. For instance, if we set the threshold to 0.25, then
 the model would label customers with predicted probabilities ￿ 0.25 as churned, and those with
 probabilities < 0.25 as not churned. This would increase the recall of the model, but decrease the
 accuracy.

 In any case, what is certain is that our model helps the bank. Consider the results if decision-makers
 had done nothing. In that case, they’d expect to lose 509 customers. Alternatively, they could
 give everybody an incentive to stay. That would cost the bank for each of the 2,500 customers in
 our test set. Finally, the bank could give incentives at random—say, by flipping a coin. Doing this
 would incentivize about the same number of true responders as our model selects. But the bank
 would lose a lot of money offering the incentives to people who aren’t likely to leave, and our model is very good at identifying these customers.

### 1.8.3 Feature importance

The XGBoost library has a function called plot_importance, which we imported at the beginning
 of this notebook. This let’s us check the features selected by the model as the most predictive. We
 can create a plot by calling this function and passing to it the best estimator from our grid search.

In [None]:
plot_importance(xgb_cv.best_estimator_);

This tells us that the four most important features used by our model were EstimatedSalary,
 Balance, CreditScore, and Age. This is very useful information. In a full project, we’d go back
 and examine these features very closely to understand how and why they are affecting churn.

 At this point, it would also be a good idea to go back and add the model predictions and Gender
 feature to each sample in our data. Then we could examine how evenly the model distributes its
 error across reported gender identities.

 *Areminder about modeling trade-offs

 Remember, the decision to use only the champion model to predict on the test data comes with
 a trade-off. The benefit is that we get a true idea of how we’d expect the model to perform on
 new, unseen data. The cost of this decision is that, by using the validation scores to both tune
 hyperparamters and select the champion model, we run the risk of selecting the model that most
 overfit the validation data.

 Alternatively, we could have selected our champion model by using all of our tuned models to
 predict on the test data and choosing the one that performed best. That also would have come with a trade-off. There wouldn’t be as much risk of overfitting to the validation data, but by using
 the test data to select our champion model, we wouldn’t get a truly objective idea of how the model
 would perform on new, unseen data. We would need a new dataset for that, which means we would
 have had to set more data aside at the beginning, resulting in less data to use to train the model.

 With sufficient data, a more rigorous approach would be:

 1. Split the data into training, validation, and test sets
 2. Tune hyperparameters using cross-validation on the training set
 3. Use all tuned models to predict on the validation set
 4. Select a champion model based on performance on the validation set
 5. Use champion model alone to predict on test data

 Every modeling decision comes with a trade-off. What’s most important is that you’re aware of
 the trade-offs and apply the best reasoning to the task at hand.

# Exemplar_Build an XGBoost model

## 1.1 Introduction

In this activity, you’ll build on the skills and techniques you learned in the decision tree and random
 forest lessons to construct your own XGBoost classification model. The XGBoost model is a very
 powerful extension of decision trees, so having a strong working familiarity with this process will
 strengthen your skills and resume as a data professional.

 This activity is a continuation of the airlines project in which you built decision tree and random
 forest models. You will use the same data, but this time you will train, tune, and evaluate an
 XGBoost model. You’ll then compare the performance of all three models and decide which model
 is best. Finally, you’ll explore the feature importances of your model and identify the features that
 most contribute to customer satisfaction.

## 1.2 Step 1: Imports

### 1.2.1 Import packages
Begin with your import statements. First, import pandas, numpy, and matplotlib for data prepa
ration. Next, import scikit-learn (sklearn) for model preparation and evaluation. Then, import
 xgboost, which provides the classification algorithm you’ll implement to formulate your predictive
 model.

In [None]:
# Import relevant libraries and modules.
import numpy as np
import pandas as pd
import matplotlib as plt
import pickle
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn import metrics
from xgboost import XGBClassifier
from xgboost import plot_importance

### 1.2.2 Load the dataset

 To formulate your model, pandas is used to import a csv of airline passenger satisfaction data
 called Invistico_Airline.csv. This DataFrame is called airline_data. As shown in this cell,
 the dataset has been automatically loaded in for you. You do not need to download the .csv file, or
 provide more code, in order to access the dataset and proceed with this lab. Please continue with
 this activity by completing the following instructions.

In [None]:
# RUN THIS CELL TO IMPORT YOUR DATA.
airline_data = pd.read_csv('Invistico_Airline.csv', on_bad_lines='skip')

### 1.2.3 Display the data

Examine the first 10 rows of data to familiarize yourself with the dataset.

In [None]:
# Display first ten rows of data.

airline_data.head(10)

### 1.2.4 Display the data type for each column

 Next, observe the types of data present within this dataset.

In [None]:
# Display the data type for each column in your DataFrame.

airline_data.dtypes

**Question:** Identify the target (or predicted) variable for passenger satisfaction. What is your
 initial hypothesis about which variables will be valuable in predicting satisfaction?

    • satisfaction represents the classification variable to be predicted.
    • Many of these variables seem like meaningful predictors of satisfaction. In particular, delays (either departure or arrival) may be negatively correlated with satisfaction.

## 1.3 Step 2: Model preparation

Before you proceed with modeling, consider which metrics you will ultimately want to leverage to
 evaluate your model.

**Question:** Which metrics are most suited to evaluating this type of model?

 • As this is a binary classfication problem, it will be important to evaluate not just accuracy,
 but the balance of false positives and false negatives that the model’s predictions provide.
 Therefore, precision, recall, and ultimately the F1 score will be excellent metrics to use.

 • The ROC AUC (Area Under the Receiver Operating Characteristic) score is also suited to
 this type of modeling.

### 1.3.1 Prepare your data for predictions

 You may have noticed when previewing your data that there are several non-numerical variables
 (object data types) within the dataset.

 To prepare this DataFrame for modeling, first convert these variables into a numerical format.

In [None]:
# Convert the object predictor variables to numerical dummies.
airline_data_dummies = pd.get_dummies(airline_data,
columns=['satisfaction','Customer␣ Type','Type of Travel','Class'])

### 1.3.2 Isolate your target and predictor variables

 Separately define the target variable (satisfaction) and the features.

In [None]:
# Define the y (target) variable.
y = airline_data_dummies['satisfaction_satisfied']

# Define the X (predictor) variables.
X = airline_data_dummies.drop(['satisfaction_satisfied','satisfaction_dissatisfied'], axis = 1)

### 1.3.3 Divide your data

 Divide your data into a training set (75% of the data) and test set (25% of the data). This is an
 important step in the process, as it allows you to reserve a part of the data that the model has not
 used to test how well the model generalizes (or performs) on new data.

In [None]:
# Perform the split operation on your data.
# Assign the outputs as follows: X_train, X_test, y_train, y_test.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25,random_state = 0)

## 1.4 Step 3: Model building

### 1.4.1 “Instantiate” your XGBClassifer

 Before you fit your model to your airline dataset, first create the XGB Classifier model and define
 its objective. You’ll use this model to fit and score different hyperparameters during the GridSearch
 cross-validation process.

In [None]:
# Define xgb to be your XGBClassifier.
xgb = XGBClassifier(objective='binary:logistic', random_state=0)

### 1.4.2 Define the parameters for hyperparameter tuning

 To identify suitable parameters for your xgboost model, first define the parameters for hyper
parameter tuning. Specifically, consider tuning max_depth, min_child_weight, learning_rate,
 n_estimators, subsample, and/or colsample_bytree.
 Consider a more limited range for each hyperparameter to allow for timely iteration and model
 training. For example, using a single possible value for each of the six hyperparameters listed
 above will take approximately one minute to run on this platform.

 {
    'max_depth': [4],'min_child_weight': [3],'learning_rate': [0.1],'n_estimators': [5],'subsample': [0.7],
    'colsample_bytree': [0.7]

 }

 If you add just one new option, for example by changing max_depth: [4] to max_depth: [3, 6],
 and keep everything else the same, you can expect the run time to approximately double. If you
 use two possibilities for each hyperparameter, the run time would extend to ~1 hour.

In [None]:
# Define parameters for tuning as `cv_params`.

# NOTE! This cell will take a long time to run. Only uncomment and run it if you have the processing
# power or patience to wait. Otherwise, scroll to see results.

 cv_params = {'max_depth': [4, 6],
              'min_child_weight': [3, 5],
              'learning_rate': [0.1, 0.2, 0.3],
              'n_estimators': [5,10,15],
              'subsample': [0.7],
              'colsample_bytree': [0.7]
              }

**Question:** What is the likely effect of adding more estimators to your GridSearch?
 • More estimators will initially improve the model’s performance. However, increasing the
 number of estimators will also considerably increase the time spent during the GridSearch
 process, and there will be diminishing returns as the number of estimators continues to
 increase.

### 1.4.3 Define how the models will be evaluated

 Define howthemodelswill be evaluated for hyperparameter tuning. To yield the best understanding
 of model performance, utilize a suite of metrics.

In [None]:
# Define your criteria as `scoring`.
scoring = {'accuracy', 'precision', 'recall', 'f1'}

### 1.4.4 Construct the GridSearch cross-validation

 Construct the GridSearch cross-validation using the model, parameters, and scoring metrics you
 defined. Additionally, define the number of folds and specify which metric from above will guide
 the refit strategy.

In [None]:
# Construct your GridSearch.
xgb_cv = GridSearchCV(xgb,
                      cv_params,
                      scoring = scoring,
                      cv = 5,
                      refit = 'f1'
                      )

### 1.4.5 Fit the GridSearch model to your training data

 If your GridSearch takes too long, revisit the parameter ranges above and consider narrowing the
 range and reducing the number of estimators.
 Note: The following cell might take several minutes to run.

In [None]:
%%time
# fit the GridSearch model to training data

xgb_cv = xgb_cv.fit(X_train, y_train)
xgb_cv

**Question:** Which optimal set of parameters did the GridSearch yield?
 Through accessing the best_params_ attribute of the fitted GridSearch model, the optimal set of
 hyperparameters was:
 {'colsample_bytree': 0.7, 'learning_rate': 0.3, 'max_depth': 6,
 'min_child_weight': 5, 'n_estimators': 15, 'subsample': 0.7}

**Note:** Your results may vary from this example response.

### 1.4.6 Save your model for reference using pickle

 Use the pickle library you’ve already imported to save the output of this model.

In [None]:
# Use `pickle` to save the trained model.
pickle.dump(xgb_cv, open('xgb_cv.sav', 'wb'))

## 1.5 Step 4: Results and evaluation

### 1.5.1 Formulate predictions on your test set
 To evaluate the predictions yielded from your model, leverage a series of metrics and evaluation
 techniques from scikit-learn by examining the actual observed values in the test set relative to your
 model’s prediction.
 First, use your trained model to formulate predictions on your test set.

In [None]:
# Apply your model to predict on your test data. Call this output "y_pred".
y_pred = xgb_cv.predict(X_test)

### 1.5.2 Leverage metrics to evaluate your model’s performance

 Apply a series of metrics from scikit-learn to assess your model. Specifically, print the accuracy
 score, precision score, recall score, and f1 score associated with your test data and predicted values.

In [None]:
# 1. Print your accuracy score.
ac_score = metrics.accuracy_score(y_test, y_pred)
print('accuracy score:', ac_score)

# 2. Print your precision score.
pc_score = metrics.precision_score(y_test, y_pred)
print('precision score:', pc_score)

# 3. Print your recall score.
rc_score = metrics.recall_score(y_test, y_pred)
print('recall score:', rc_score)

# 4. Print your f1 score.
f1_score = metrics.f1_score(y_test, y_pred)
print('f1 score:', f1_score)

**Question:** How should you interpret your accuracy score?
 The accuracy score for this model is 0.939, or 93.9% accurate.

**Question:** Is your accuracy score alone sufficient to evaluate your model?

In classification problems, accuracy is useful to know but may not be the best metric to evaluate this model.
 Question: When observing the precision and recall scores of your model, how do you interpret
 these values, and is one more accurate than the other?
 Precision and recall scores are both useful to evaluate the correct predictive capability of the model
 because they balance the false positives and false negatives inherent in prediction. The model
 shows a precision score of 0.948, suggesting the model is very good at predicting true positives.
 This means the model correctly predicts whether the airline passenger will be satisfied. The recall
 score of 0.940 is also very good. This means that the model does a good job of correctly identifying
 dissatisfied passengers within the dataset. These two metrics combined give a better assessment of
 model performance than the accuracy metric does alone.

**Question:** What does your model’s F1 score tell you, beyond what the other metrics provide?*
 The F1 score balances the precision and recall performance to give a combined assessment of how
 well this model delivers predictions. In this case, the F1 score is 0.944, which suggests very strong
 predictive power in this model.

### 1.5.3 Gain clarity with the confusion matrix

 Recall that a confusion matrix is a graphic that shows a model’s true and false positives and
 true and false negatives. It helps to create a visual representation of the components feeding into
 the metrics above.
 Create a confusion matrix based on your predicted values for the test set.

In [None]:
# Construct and display your confusion matrix.
# Construct the confusion matrix for your predicted and test values.
cm = metrics.confusion_matrix(y_test, y_pred)
# Create the display for your confusion matrix.

# Plot the visual in-line.
disp = metrics.ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=xgb_cv.classes_)
disp.plot()

 **Question:** When observing your confusion matrix, what do you notice? Does this correlate to any
 of your other calculations?
 The top left to bottom right diagonal in the confusion matrix represents the correct predictions,
 and the ratio of these squares showcases the accuracy.
 Additionally, the concentration of true positives and true negatives stands out relative to false
 positives and false negatives, respectively. This ratio is why the precision score is so high (0.944).

### 1.5.4 Visualize most important features

 xgboost has a built-in function to visualize the relative importance of the features in the model
 using matplotlib. Output and examine the feature importance of your model.

In [None]:
# Plot the relative feature importance of the predictor variables in your model.
plot_importance(xgb_cv.best_estimator_)

**Question:** Examine the feature importances outputted above. What is your assessment of the
 result? Did anything surprise you?

    • Byawide margin, “seat comfort” rated as most important in the model.The type of seating is very different between first class and coach seating. However, the perks of being in first.
    class also go beyond the seating type, so perhaps that is an underlying explanation of this feature’s importance.
    • Surprisingly, delays (both arrival and departure) did not score as highly important.

### 1.5.5 Compare models

 Create a table of results to compare model performance.

In [None]:
# Create a table of results to compare model performance.

table = pd.DataFrame({'Model': ["Tuned Decision Tree", "Tuned Random Forest","Tuned XGBoost"],
                      'F1': [0.945422, 0.947306, f1_score],
                      'Recall': [0.935863, 0.944501, rc_score],
                      'Precision': [0.955197, 0.950128, pc_score],
                      'Accuracy': [0.940864, 0.942450, ac_score]
                      }
                    )
table

**Question:** How does this model compare to the decision tree and random forest models you built
 in previous labs?

 Based on the results shown in the table above, the F1, precision, recall, and accuracy scores of
 the XGBoost model are similar to the corresponding scores of the decision tree and random forest
 models. The random forest model seemed to outperform the decision tree model as well as the
 XGBoost model.

## 1.6 Considerations

 **What are some key takeaways you learned from this lab?** - The evaluation of the model
 is important to inform if the model has delivered accurate predictions.- Splitting the data is
 important for ensuring that there is new data for the model to test its predictive performance.
Each metric provides an evaluation from a different standpoint, and accuracy alone is not a strong
 way to evaluate a model.- Effective assessments balance the true/false positives versus true/false
 negatives through the confusion matrix and F1 score.

**How would you share your findings with your team?** - Showcase the data used to create the
 prediction and the performance of the model overall.- Review the sample output of the features
 16
and the confusion matrix to reference the model’s performance.- Highlight the metric values,
 emphasizing the F1 score.- Visualize the feature importance to showcase what drove the model’s
 predictions.

**What would you share with and recommend to stakeholders?** -Themodelcreatedishighly
 effective at predicting passenger satisfaction.- The feature importance of seat comfort warrants
 additional investigation. It will be important to ask domain experts why they believe this feature
 scores so highly in this model.