**FraudGuard: Credit Card Transaction Anomaly Detection**

This project addresses a binary classification problem where credit card transactions are labeled as fraudulent or genuine. A major challenge is the extreme class imbalance, with fraud cases representing only about 0.17% of all transactions, making accuracy an unsuitable evaluation metric. The Missing fraudulent transactions (false negatives) carry a high financial risk, while incorrectly flagging genuine transactions (false positives) negatively impacts customer experience. As a result, the model evaluation prioritizes achieving high precision while maintaining strong recall, focusing on the precision–recall tradeoff. This solution uses the industrystandard "creditcard.csv" dataset, which contains anonymized PCA-transformed features (V1–V28) to preserve confidentiality, alongside raw Time and Amount features, with the target variable indicating fraud (1) or genuine transactions (0).


Data Preprocessing & Feature Engineering

In [22]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import precision_score, recall_score
from xgboost import XGBClassifier
from sklearn.metrics import precision_score, recall_score, classification_report

In [23]:
df = pd.read_csv("creditcard.csv")


In [24]:
df.head()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11,V12,V13,V14,V15,V16,V17,V18,V19,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,0.0,-1.359807,-0.072781,2.536347,1.378155,-0.338321,0.462388,0.239599,0.098698,0.363787,0.090794,-0.5516,-0.617801,-0.99139,-0.311169,1.468177,-0.470401,0.207971,0.025791,0.403993,0.251412,-0.018307,0.277838,-0.110474,0.066928,0.128539,-0.189115,0.133558,-0.021053,149.62,0.0
1,0.0,1.191857,0.266151,0.16648,0.448154,0.060018,-0.082361,-0.078803,0.085102,-0.255425,-0.166974,1.612727,1.065235,0.489095,-0.143772,0.635558,0.463917,-0.114805,-0.183361,-0.145783,-0.069083,-0.225775,-0.638672,0.101288,-0.339846,0.16717,0.125895,-0.008983,0.014724,2.69,0.0
2,1.0,-1.358354,-1.340163,1.773209,0.37978,-0.503198,1.800499,0.791461,0.247676,-1.514654,0.207643,0.624501,0.066084,0.717293,-0.165946,2.345865,-2.890083,1.109969,-0.121359,-2.261857,0.52498,0.247998,0.771679,0.909412,-0.689281,-0.327642,-0.139097,-0.055353,-0.059752,378.66,0.0
3,1.0,-0.966272,-0.185226,1.792993,-0.863291,-0.010309,1.247203,0.237609,0.377436,-1.387024,-0.054952,-0.226487,0.178228,0.507757,-0.287924,-0.631418,-1.059647,-0.684093,1.965775,-1.232622,-0.208038,-0.1083,0.005274,-0.190321,-1.175575,0.647376,-0.221929,0.062723,0.061458,123.5,0.0
4,2.0,-1.158233,0.877737,1.548718,0.403034,-0.407193,0.095921,0.592941,-0.270533,0.817739,0.753074,-0.822843,0.538196,1.345852,-1.11967,0.175121,-0.451449,-0.237033,-0.038195,0.803487,0.408542,-0.009431,0.798278,-0.137458,0.141267,-0.20601,0.502292,0.219422,0.215153,69.99,0.0


using standard scalar to scale the raw features "Time" and "Amount"

In [25]:
scaler = StandardScaler()
df[['Amount', 'Time']] = scaler.fit_transform(df[['Amount', 'Time']])


splittig the features and the target

In [26]:
X = df.drop('Class', axis=1)
y = df['Class']


Remove rows where y contains NaN before splitting

In [27]:
import numpy as np

# Create a mask for non-NaN values in y
mask = ~np.isnan(y)
X_clean = X[mask]
y_clean = y[mask]


splitting using train_test_split

In [28]:
X_train, X_test, y_train, y_test = train_test_split(
    X_clean, y_clean, test_size=0.2, stratify=y_clean, random_state=42
)

MODEL SELECTION: The model i selected for this task is xgboost due to its ability to handle imbalance well, strong precision control and excellent performance on tabular data.

After training an xgboost classifier on the dataset and evaluating the models performance i got a precision of 0.9054054054054054 and a recall of   0.8933333333333333 which i was not satisfied with, so i decided to take a different approach which is A "Two-Stage Fraud Detection system",it  is a machine learning architecture that uses two sequential models to detect fraud more effectively by balancing high fraud capture (recall) with low false alarms (precision) Instead of relying on one model to do everything, the system splits the job:

training and fitting the models

Model 1: High Recall Detector

This first model is trained with Slightly deeper trees, and a Lower threshold so it Prioritizes recall

In [29]:
stage1_model = XGBClassifier(
    n_estimators=200,
    max_depth=5,
    learning_rate=0.1,
    subsample=0.9,
    colsample_bytree=0.9,
    scale_pos_weight=len(y_train[y_train==0]) / len(y_train[y_train==1]),
    eval_metric='aucpr',
    random_state=42
)

stage1_model.fit(X_train, y_train)


0,1,2
,objective,'binary:logistic'
,base_score,
,booster,
,callbacks,
,colsample_bylevel,
,colsample_bynode,
,colsample_bytree,0.9
,device,
,early_stopping_rounds,
,enable_categorical,False


model 1 Predictions

In [30]:
stage1_scores = stage1_model.predict_proba(X_test)[:, 1]
stage1_threshold = 0.20  # low threshold to maximize recall
stage1_preds = (stage1_scores >= stage1_threshold).astype(int)


Filtering Transactions for model 2: This filtering ensures Stage 2 is trained only on transactions deemed suspicious by Stage 1, allowing it to focus on high-precision fraud confirmation while mirroring real-world deployment.

In [31]:
X_stage2 = X_test[stage1_preds == 1]
y_stage2 = y_test[stage1_preds == 1]


Model 2: High Precision Filter

this second model has Shallower trees, a Strong regularization and a Higher decision threshold

In [32]:
stage2_model = XGBClassifier(
    n_estimators=300,
    max_depth=3,
    learning_rate=0.05,
    subsample=0.8,
    colsample_bytree=0.8,
    scale_pos_weight=1,  # precision-focused
    eval_metric='aucpr',
    random_state=42
)

stage2_model.fit(X_stage2, y_stage2)


0,1,2
,objective,'binary:logistic'
,base_score,
,booster,
,callbacks,
,colsample_bylevel,
,colsample_bynode,
,colsample_bytree,0.8
,device,
,early_stopping_rounds,
,enable_categorical,False


model 2 Predictions

In [33]:
stage2_scores = stage2_model.predict_proba(X_stage2)[:, 1]
stage2_threshold = 0.75  # high threshold for precision
stage2_preds = (stage2_scores >= stage2_threshold).astype(int)


In [34]:
final_predictions = np.zeros(len(X_test))
final_predictions[stage1_preds == 1] = stage2_preds


Evaluating the model

In [35]:
print("Final Precision:", precision_score(y_test, final_predictions))
print("Final Recall:", recall_score(y_test, final_predictions))
print(classification_report(y_test, final_predictions))

Final Precision: 1.0
Final Recall: 0.8933333333333333
              precision    recall  f1-score   support

         0.0       1.00      1.00      1.00     38193
         1.0       1.00      0.89      0.94        75

    accuracy                           1.00     38268
   macro avg       1.00      0.95      0.97     38268
weighted avg       1.00      1.00      1.00     38268

