In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [2]:
X_train = pd.read_csv('../cleaned_datasets/X_train.csv')
X_test = pd.read_csv('../cleaned_datasets/X_test.csv')
X_valid = pd.read_csv('../cleaned_datasets/X_valid.csv')

y_train = pd.read_csv('../cleaned_datasets/y_train.csv')
y_test = pd.read_csv('../cleaned_datasets/y_test.csv')
y_valid = pd.read_csv('../cleaned_datasets/y_valid.csv')

## Baseline Model:

In [4]:
import xgboost as xgb
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.metrics import precision_score, recall_score, f1_score, roc_auc_score

xgb_baseline_model = xgb.XGBClassifier(random_state=42)
xgb_baseline_model.fit(X_train, y_train.values.ravel())

y_pred_xgb_baseline = xgb_baseline_model.predict(X_valid)

print("XGBoost Baseline Model Accuracy: ", accuracy_score(y_valid, y_pred_xgb_baseline))
print("XGBoost Baseline Confusion Matrix: ")
print(confusion_matrix(y_valid, y_pred_xgb_baseline))
print("\nXGBoost Baseline Classification Report:\n", classification_report(y_valid, y_pred_xgb_baseline))
print("\nXGBoost Baseline Precision Score:", precision_score(y_valid, y_pred_xgb_baseline))
print("\nXGBoost Baseline Recall Score:", recall_score(y_valid, y_pred_xgb_baseline))
print("\nXGBoost Baseline F1 Score:", f1_score(y_valid, y_pred_xgb_baseline))
print("\nXGBoost Baseline ROC AUC Score:", roc_auc_score(y_valid, y_pred_xgb_baseline))


XGBoost Baseline Model Accuracy:  0.8651994497936726
XGBoost Baseline Confusion Matrix: 
[[599  73]
 [ 25  30]]

XGBoost Baseline Classification Report:
               precision    recall  f1-score   support

           0       0.96      0.89      0.92       672
           1       0.29      0.55      0.38        55

    accuracy                           0.87       727
   macro avg       0.63      0.72      0.65       727
weighted avg       0.91      0.87      0.88       727


XGBoost Baseline Precision Score: 0.2912621359223301

XGBoost Baseline Recall Score: 0.5454545454545454

XGBoost Baseline F1 Score: 0.37974683544303794

XGBoost Baseline ROC AUC Score: 0.7184117965367965


Here is an interpretation of the XGBoost baseline model performance:

**Accuracy**: The accuracy score is 86.52%, which means that the model is correctly classifying the instances around 86.5% of the time. However, as mentioned before, accuracy can be misleading when dealing with imbalanced datasets, so it is essential to look at other metrics as well.

**Confusion Matrix**: The confusion matrix provides a detailed view of the true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). In this case:
   1. 599 instances were correctly classified as non-hazardous (TN).
   2. 30 instances were correctly classified as hazardous (TP).
   3. 73 instances were classified as hazardous but were actually non-hazardous (FP).
   4. 25 instances were classified as non-hazardous but were actually hazardous (FN).

**Classification Report**:

**Precision**: Precision is the ratio of true positives (TP) to the sum of true positives and false positives (FP). It measures the proportion of correct positive predictions out of all positive predictions made by the model.
   1. For class 0 (non-hazardous): The model has a precision of 96%, meaning that 96% of the instances predicted as non-hazardous were indeed non-hazardous.
   2. For class 1 (hazardous): The model has a precision of 29.13%, meaning that around 29.13% of the instances predicted as hazardous were actually hazardous.

**Recall**: Recall is the ratio of true positives (TP) to the sum of true positives and false negatives (FN). It measures the proportion of actual positive instances that were correctly identified by the model.
   1. For class 0 (non-hazardous): The model has a recall of 89%, meaning that it correctly identified 89% of the non-hazardous instances.
   2. For class 1 (hazardous): The model has a recall of 54.55%, meaning that it correctly identified 54.55% of the hazardous instances.

**F1-score**: The F1-score is the harmonic mean of precision and recall. It provides a balanced measure of a model's performance, especially when dealing with imbalanced datasets.
   1. For class 0 (non-hazardous): The model has an F1-score of 92%, which indicates a good balance between precision and recall for the non-hazardous class.
   2. For class 1 (hazardous): The model has an F1-score of 37.97%, which indicates a relatively poor balance between precision and recall for the hazardous class.

**Support**: Support is the number of actual occurrences of each class in the dataset.
   1. For class 0 (non-hazardous): There are 672 instances in the dataset.
   2. For class 1 (hazardous): There are 55 instances in the dataset.

**Precision Score**: The precision score for the positive (hazardous) class is 29.13%, which indicates that out of all the instances predicted as hazardous, only 29.13% were actually hazardous.

**Recall Score**: The recall score for the positive (hazardous) class is 54.55%, which indicates that out of all the actual hazardous instances, the model correctly identified 54.55% of them.

**F1 Score**: The F1 score for the positive (hazardous) class is 37.97%, which is the harmonic mean of precision and recall. It gives a balanced measure of the model's performance on the positive class, especially when there is an imbalance in the dataset.

**ROC AUC Score**: The ROC AUC score is 71.84%, which measures the model's ability to distinguish between the two classes. A score of 100% would indicate a perfect classifier, while a score of 50% means the model is no better than random guessing. A score of 71.84% suggests that the model has moderate discrimination power.

The XGBoost baseline model performs better than the Random Forest baseline model in terms of accuracy, precision, F1 score, and ROC AUC score. However, it still struggles with the hazardous class. This is likely due to the imbalanced nature of the validation dataset, and further optimization should help improve the model's performance on the hazardous class.

## Hyperparamter Tuning using GridSearchCV:

In [None]:
from sklearn.model_selection import GridSearchCV

param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 10, 20, 30],
    'min_child_weight': [1, 2, 4],
    'learning_rate': [0.1, 0.01, 0.001],
    'gamma': [0, 0.1, 0.2],
    'subsample': [0.5, 0.8, 1],
    'colsample_bytree': [0.5, 0.8, 1]
}

grid_search_xgb = GridSearchCV(estimator=xgb.XGBClassifier(random_state=42), param_grid=param_grid, cv=5, verbose=0, n_jobs=-1)
grid_search_xgb.fit(X_train, y_train.values.ravel())


In [None]:
best_params = grid_search_xgb.best_params_
print("Best Parameters: ", best_params)

In [10]:
tuned_xgb_model = xgb.XGBClassifier(
    colsample_bytree=0.5,
    gamma=0.2,
    learning_rate=0.1,
    max_depth=20,
    min_child_weight=1,
    n_estimators=200,
    subsample=0.8,
    random_state=42
)

tuned_xgb_model.fit(X_train, y_train.values.ravel())

y_pred_tuned_xgb = tuned_xgb_model.predict(X_valid)

print("Tuned XGBoost Model Accuracy: ", accuracy_score(y_valid, y_pred_tuned_xgb))
print("Tuned XGBoost Confusion Matrix: ")
print(confusion_matrix(y_valid, y_pred_tuned_xgb))
print("\nTuned XGBoost Classification Report:\n", classification_report(y_valid, y_pred_tuned_xgb))
print("\nTuned XGBoost Precision Score:", precision_score(y_valid, y_pred_tuned_xgb))
print("\nTuned XGBoost Recall Score:", recall_score(y_valid, y_pred_tuned_xgb))
print("\nTuned XGBoost F1 Score:", f1_score(y_valid, y_pred_tuned_xgb))
print("\nTuned XGBoost ROC AUC Score:", roc_auc_score(y_valid, y_pred_tuned_xgb))


Tuned XGBoost Model Accuracy:  0.8638239339752407
Tuned XGBoost Confusion Matrix: 
[[600  72]
 [ 27  28]]

Tuned XGBoost Classification Report:
               precision    recall  f1-score   support

           0       0.96      0.89      0.92       672
           1       0.28      0.51      0.36        55

    accuracy                           0.86       727
   macro avg       0.62      0.70      0.64       727
weighted avg       0.91      0.86      0.88       727


Tuned XGBoost Precision Score: 0.28

Tuned XGBoost Recall Score: 0.509090909090909

Tuned XGBoost F1 Score: 0.36129032258064514

Tuned XGBoost ROC AUC Score: 0.7009740259740259


Here is the interpretation of the performance metrics for the tuned XGBoost model on the validation dataset:

**Accuracy**: The accuracy score is 86.38%, which means that the model is correctly classifying the instances around 86% of the time. However, accuracy can be misleading when dealing with imbalanced datasets, so it is essential to look at other metrics as well.

**Confusion Matrix**: The confusion matrix provides a detailed view of the true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). In this case:
   1. 600 instances were correctly classified as non-hazardous (TN).
   2. 28 instances were correctly classified as hazardous (TP).
   3. 72 instances were classified as hazardous but were actually non-hazardous (FP).
   4. 27 instances were classified as non-hazardous but were actually hazardous (FN).

**Classification Report**:

**Precision**: Precision is the ratio of true positives (TP) to the sum of true positives and false positives (FP). It measures the proportion of correct positive predictions out of all positive predictions made by the model.
   1. For class 0 (non-hazardous): The model has a precision of 96%, meaning that 96% of the instances predicted as non-hazardous were indeed non-hazardous.
   2. For class 1 (hazardous): The model has a precision of 28%, meaning that only 28% of the instances predicted as hazardous were actually hazardous.

**Recall**: Recall is the ratio of true positives (TP) to the sum of true positives and false negatives (FN). It measures the proportion of actual positive instances that were correctly identified by the model.
   1. For class 0 (non-hazardous): The model has a recall of 89%, meaning that it correctly identified 89% of the non-hazardous instances.
   2. For class 1 (hazardous): The model has a recall of 50.91%, meaning that it correctly identified around 51% of the hazardous instances.

**F1-score**: The F1-score is the harmonic mean of precision and recall. It provides a balanced measure of a model's performance, especially when dealing with imbalanced datasets.
   1. For class 0 (non-hazardous): The model has an F1-score of 92%, which indicates a good balance between precision and recall for the non-hazardous class.
   2. For class 1 (hazardous): The model has an F1-score of 36.13%, which indicates a relatively poor balance between precision and recall for the hazardous class.

**Precision**: The precision score for the positive (hazardous) class is 28%, which indicates that out of all the instances predicted as hazardous, only 28% were actually hazardous.

**Recall**: The recall score for the positive (hazardous) class is 50.91%, which indicates that out of all the actual hazardous instances, the model correctly identified around 51% of them.

**F1 Score**: The F1 score for the positive (hazardous) class is 36.13%, which is the harmonic mean of precision and recall. It gives a balanced measure of the model's performance on the positive class, especially when there is an imbalance in the dataset.

**ROC AUC Score**: The ROC AUC score is 70.10%, which measures the model's ability to distinguish between the two classes. A score of 100% would indicate a perfect classifier, while a score of 50% means the model is no better than random guessing. A score of 70% suggests that the model has moderate discrimination power.

The tuned XGBoost model performs well on the non-hazardous class but still struggles with the hazardous class. This is likely due to the imbalanced nature of the validation dataset. Although the tuned model shows some improvement compared to the baseline model, further optimization and experimentation with different techniques like oversampling or undersampling might help improve the model's performance on the hazardous class.