# Fraud & Cyber-Threat Prediction â€” End-to-End Notebook
Goal:Predict fraudulent / cyber-risky transactions using transaction metadata, behavioral features, and market stress indicators.

Author: Milani Chikeka  

Seed:42
---
Sections
1. Setup & imports  
2. Load dataset (Kaggle `creditcard.csv` recommended) or simulate synthetic transactions  
3. Market stress synthetic enrichment (USD/ZAR returns, VIX proxy, repo rate changes)  
4. Feature engineering (behavioral + transaction + stress features)  
5. Train/test split & imbalance handling  
6. Models: Logistic Regression baseline + LightGBM (main)  
7. Evaluation: ROC, PR, confusion matrix, business metrics  
8. Explainability: SHAP plots  
9. Save model & preprocessing pipeline


In [None]:
#Setups and imports
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import joblib
import warnings  
warnings.filterwarnings('ignore')
from datetime import datetime, timedelta
from scikitlearn.model_selection import train_test_split,StratifiedKFold
from scikitlearn.preprocessing import StandardScaler,OneHotEncoder
from scikitlearn.compose import ColumnTransformer
from scikitlearn.pipeline import Pipeline
from scikitlearn.metrics import (roc_auc_score, precision_recall_curve,average_precision_score ,confusion_matrix,classification_report)
from scikitlearn.linear_model import LogisticRegression
from scikitlearn.ensemble import RandomForestClassifier

# Optional libraies and seedings
try:
    import lightgbm as lgb
except Exception as e:
    print("install lightgbm: pip install lightgbm")
raise e

try:
    import shap 
except Exception as e:
    print("install shap: pip install shap")
    raise e

#This is for imbalanbce handling.
try:
    from imblearn.over_sampling import SMOTE
except Exception as e:
    print("Install imbalanced-learn: pip install imbalanced-learn")    
    raise e

RandoomSeed=42
np.random.seed(RandoomSeed)

#Style plotting
sns.set(style="whitegrid")
    

ModuleNotFoundError: No module named 'seaborn'

#2Load the date/create synthetic data.
-Load the Kaggle dataset `creditcard.csv`, put it in `./data/creditcard.csv`.
-If dataset is not present, create 'synthetic' transaction datasets.