In [2]:
pip install xgboost

Defaulting to user installation because normal site-packages is not writeable
Collecting xgboost
  Downloading xgboost-1.7.3-py3-none-manylinux2014_x86_64.whl (193.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m193.6/193.6 MB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m00:01[0m00:02[0m
Installing collected packages: xgboost
Successfully installed xgboost-1.7.3
Note: you may need to restart the kernel to use updated packages.


In [1]:
import pandas as pd
import numpy as np
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

In [3]:
fwd_data = pd.read_csv("processed_data.csv")

In [4]:
# Split the dataframe into features and target
X = fwd_data.drop('Target', axis=1)
y = fwd_data['Target']

In [5]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [6]:
# Create the XGBoost model
xgb_model = xgb.XGBClassifier()
xgb_model.fit(X_train, y_train)

In [7]:
# Make predictions on the test data
y_pred = xgb_model.predict(X_test)

In [8]:

# Evaluate the model's performance
acc = accuracy_score(y_test, y_pred)
print("Accuracy:", acc)

Accuracy: 0.9994913414735485


In [9]:
import numpy as np
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report

# Calculate prediction and actual values

y_pred = np.round(y_pred)
y_test = np.array(y_test)

# Calculate accuracy
acc = accuracy_score(y_test, y_pred)
print("Accuracy: ", acc)

# Calculate confusion matrix
cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix: \n", cm)

# Calculate precision, recall and F1-score
cr = classification_report(y_test, y_pred)
print("Classification Report: \n", cr)

Accuracy:  0.9994913414735485
Confusion Matrix: 
 [[830823    331]
 [   275 359940]]
Classification Report: 
               precision    recall  f1-score   support

           0       1.00      1.00      1.00    831154
           1       1.00      1.00      1.00    360215

    accuracy                           1.00   1191369
   macro avg       1.00      1.00      1.00   1191369
weighted avg       1.00      1.00      1.00   1191369



 the accuracy of the XGBoost model is 0.9994913414735485 which is close to 1, indicating a high accuracy.

A confusion matrix is a table that is used to evaluate the performance of a classifier. The entries in the matrix are the count of actual and predicted classifications. In this case, the confusion matrix shows the count of actual 0's and 1's, and the corresponding count of predictions made by the XGBoost model.

The precision metric is the ratio of true positive predictions (correctly predicted 1's) to the total number of positive predictions made by the classifier. In this case, precision is 1.0, meaning that all positive predictions made by the classifier are correct.

The recall metric is the ratio of true positive predictions (correctly predicted 1's) to the total number of actual positive classifications. In this case, recall is 1.0, meaning that the classifier correctly predicted all positive classifications.

The F1-score is the harmonic mean of precision and recall. In this case, the F1-score is also 1.0, indicating a high accuracy in both precision and recall.

The support metric is the number of observations for each class.

The weighted average is a weighted mean of the precision, recall, and F1-score of the two classes, taking into account the number of observations for each class. In this case, the weighted average is also 1.0, indicating a high overall accuracy.

In [11]:
import pickle

# Save the model
filename = 'fwd-classifier.pkl'
with open(filename, 'wb') as file:
    pickle.dump(xgb_model, file)