## Income Classifier Evaluation

This notebook reloads the tuned XGBoost model, evaluates it on the held-out test split, and reports headline metrics to confirm deployment readiness.


In [1]:
import pandas as pd

# Load the cleaned dataset
data = pd.read_csv('../data/cleaned.csv')

# Check the first few rows
print(data.head())

   age  education  capital-gain  capital-loss  hours-per-week  income  \
0   25          1      0.000000           0.0              40       0   
1   38         11      0.000000           0.0              50       0   
2   28          7      0.000000           0.0              40       1   
3   44         15      8.947546           0.0              40       1   
4   18         15      0.000000           0.0              30       0   

   workclass_Federal-gov  workclass_Local-gov  workclass_Never-worked  \
0                  False                False                   False   
1                  False                False                   False   
2                  False                 True                   False   
3                  False                False                   False   
4                  False                False                   False   

   workclass_Private  ...  native-country_South  native-country_Taiwan  \
0               True  ...                 False 

### Load Cleaned Dataset

We bring in the preprocessed feature matrix to align evaluation with the training pipeline.


### Recreate Train/Test Split

We rebuild the same 80/20 split used during training so evaluation metrics stay comparable.


In [2]:
from sklearn.model_selection import train_test_split

# Features (X) and target (y)
X = data.drop(columns=['income'])  # Assuming 'income' is the target column
y = data['income']  # The target column

# Train-test split (80-20 split)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Check the shape of the split data
print(f"Training data: {X_train.shape}, Test data: {X_test.shape}")

Training data: (25570, 91), Test data: (6393, 91)


### Load Saved Model

We fetch the persisted XGBoost model so we can evaluate exactly what will be used in production scoring.


In [3]:
import joblib

# Load the saved best model
best_model = joblib.load('../models/best_xgboost_model.pkl')

# Check if the model is loaded correctly
print("Model loaded successfully.")

Model loaded successfully.


### Generate Predictions and Metrics

We compute accuracy, precision, recall, and F1 score (weighted) to summarize performance across the imbalanced classes.


In [4]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Predict on the test set
y_pred = best_model.predict(X_test)

# Evaluate the model's performance
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')
# Print the evaluation results
print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1 Score: {f1:.4f}")


Accuracy: 0.8628
Precision: 0.8582
Recall: 0.8628
F1 Score: 0.8587
