![Image One](One.png)


**_The key to success in any organization is attracting and retaining top talent. This analysis is useful for an HR Analyst as its task is to determine which factors keep employees at the company and which prompt others to leave. By knowing these factors the HR analyst can change to prevent the loss of good people._**

<div style="text-align: center; background-color: #856ff8; padding: 10px;">
    <h2 style="font-weight: bold;">OUTLINE</h2>
</div>

- Importing Various Modules
- Loading Dataset
- Train and Test
- Define and run hyperparameter tuning for XGBoost
- Train XGBoost with the best parameters and evaluate

<div style="text-align: center; background-color: yellow; padding: 10px;">
    <h2 style="font-weight: bold;">Importing Various Modules</h2>
</div>

In [1]:
import pandas as pd
import numpy as np
import re
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import precision_recall_curve, roc_curve, auc, accuracy_score
from xgboost import XGBClassifier
import matplotlib.pyplot as plt
from tabulate import tabulate

<div style="text-align: center; background-color: yellow; padding: 10px;">
    <h2 style="font-weight: bold;">Loading  dataset</h2>
</div>

In [2]:
df_balanced = pd.read_csv('final_balanced.csv')

<div style="text-align: center; background-color: yellow; padding: 10px;">
    <h2 style="font-weight: bold;">Train and Test</h2>
</div>

In [3]:
new_columns = [re.sub(r'\W', '_', col) for col in df_balanced.columns]
df_balanced.columns = new_columns

X = df_balanced.drop('Attrition', axis=1)
y = df_balanced['Attrition']

# Convert target variable to numeric
label_encoder = LabelEncoder()
cat_cols = X.select_dtypes(include=['object']).columns
for col in cat_cols:
    X[col] = label_encoder.fit_transform(X[col])

#This encodes 'No' to 0 and 'Yes' to 1
y = label_encoder.fit_transform(y)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


<div style="text-align: center; background-color: yellow; padding: 10px;">
    <h2 style="font-weight: bold;">Define and run hyperparameter tuning for XGBoost</h2>
</div>

In [4]:
# Define the parameter grid for XGBoost
param_grid = {
    'learning_rate': [.01,.001],
    'n_estimators': [128,256,1024,2048],
}

# Initialize the XGBoost classifier
xgb_clf = XGBClassifier()

# Initialize GridSearchCV
grid_search = GridSearchCV(estimator=xgb_clf, param_grid=param_grid,
                           scoring='accuracy', cv=3, verbose=1, n_jobs=-1)

# Fit the GridSearchCV on the training data
grid_search.fit(X_train_scaled, y_train)

# Get the best parameters
best_params = grid_search.best_params_
print("Best parameters found: ", best_params)


Fitting 3 folds for each of 8 candidates, totalling 24 fits
Best parameters found:  {'learning_rate': 0.01, 'n_estimators': 2048}


<div style="text-align: center; background-color: yellow; padding: 10px;">
    <h2 style="font-weight: bold;">Train XGBoost with the best parameters and evaluate</h2>
</div>

In [5]:
# Train XGBoost with the best parameters
best_xgb_clf = grid_search.best_estimator_
best_xgb_clf.fit(X_train_scaled, y_train)

# Predict probabilities for positive class
y_scores_best = best_xgb_clf.predict_proba(X_test_scaled)[:, 1]

# Compute precision, recall and thresholds for precision-recall curve
precisions_best, recalls_best, thresholds_pr_best = precision_recall_curve(y_test, y_scores_best)

# Compute fpr, tpr and thresholds for ROC curve
fpr_best, tpr_best, thresholds_roc_best = roc_curve(y_test, y_scores_best)

# Compute AUC for ROC curve
roc_auc_best = auc(fpr_best, tpr_best)

# Calculate accuracy
y_pred_best = best_xgb_clf.predict(X_test_scaled)
accuracy_best = accuracy_score(y_test, y_pred_best)

# Print the evaluation metrics in tabular format
table = [["Metric", "Value"],
         ["Accuracy", f"{accuracy_best*100}"],
         ["AUC", f"{roc_auc_best*100}"]]

print(tabulate(table, headers="firstrow", tablefmt="grid"))

+----------+---------+
| Metric   |   Value |
| Accuracy | 92.8687 |
+----------+---------+
| AUC      | 97.3814 |
+----------+---------+
