. . . **Credit Card Fraud Detection System** . . .

> \***Loading the Dataset**\*

In \[3\]:

    import pandas as pd

    # Load the dataset
    df = pd.read_csv('creditcard.csv')

    # Show the first few rows
    df.head()

Out\[3\]:

|     | Time | V1        | V2        | V3       | V4        | V5        | V6        | V7        | V8        | V9        | ... | V21       | V22       | V23       | V24       | V25       | V26       | V27       | V28       | Amount | Class |
|-----|------|-----------|-----------|----------|-----------|-----------|-----------|-----------|-----------|-----------|-----|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|--------|-------|
| 0   | 0.0  | -1.359807 | -0.072781 | 2.536347 | 1.378155  | -0.338321 | 0.462388  | 0.239599  | 0.098698  | 0.363787  | ... | -0.018307 | 0.277838  | -0.110474 | 0.066928  | 0.128539  | -0.189115 | 0.133558  | -0.021053 | 149.62 | 0     |
| 1   | 0.0  | 1.191857  | 0.266151  | 0.166480 | 0.448154  | 0.060018  | -0.082361 | -0.078803 | 0.085102  | -0.255425 | ... | -0.225775 | -0.638672 | 0.101288  | -0.339846 | 0.167170  | 0.125895  | -0.008983 | 0.014724  | 2.69   | 0     |
| 2   | 1.0  | -1.358354 | -1.340163 | 1.773209 | 0.379780  | -0.503198 | 1.800499  | 0.791461  | 0.247676  | -1.514654 | ... | 0.247998  | 0.771679  | 0.909412  | -0.689281 | -0.327642 | -0.139097 | -0.055353 | -0.059752 | 378.66 | 0     |
| 3   | 1.0  | -0.966272 | -0.185226 | 1.792993 | -0.863291 | -0.010309 | 1.247203  | 0.237609  | 0.377436  | -1.387024 | ... | -0.108300 | 0.005274  | -0.190321 | -1.175575 | 0.647376  | -0.221929 | 0.062723  | 0.061458  | 123.50 | 0     |
| 4   | 2.0  | -1.158233 | 0.877737  | 1.548718 | 0.403034  | -0.407193 | 0.095921  | 0.592941  | -0.270533 | 0.817739  | ... | -0.009431 | 0.798278  | -0.137458 | 0.141267  | -0.206010 | 0.502292  | 0.219422  | 0.215153  | 69.99  | 0     |

5 rows × 31 columns

Key Features:

1.  Data Generation & Preprocessing

Generates realistic synthetic credit card transaction data (mimicking
the Credit Card Fraud Dataset structure) Handles severely imbalanced
data using SMOTE (Synthetic Minority Oversampling) or undersampling
Proper feature scaling and train-test splitting

1.  Model Training

Supports both Random Forest and Gradient Boosting classifiers Optimized
hyperparameters for fraud detection Feature importance analysis

1.  Comprehensive Evaluation

Precision, Recall, and F1-score calculations AUC-ROC score for model
performance Detailed classification report Confusion matrix analysis

1.  Interactive Testing Interface

Command-line interface with multiple testing options:

Test with sample transactions Test with custom transaction inputs Batch
testing with multiple transactions Feature importance visualization

Risk level classification (Low/Medium/High)

In \[6\]:

    import pandas as pd
    import numpy as np
    from sklearn.model_selection import train_test_split
    from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
    from sklearn.preprocessing import StandardScaler
    from sklearn.metrics import classification_report, confusion_matrix, precision_score, recall_score, f1_score, roc_auc_score
    from sklearn.utils import resample
    from imblearn.over_sampling import SMOTE
    from imblearn.under_sampling import RandomUnderSampler
    import matplotlib.pyplot as plt
    import seaborn as sns
    import warnings
    warnings.filterwarnings('ignore')

    class FraudDetectionSystem:
        def __init__(self):
            self.model = None
            self.scaler = StandardScaler()
            self.feature_names = None
            self.is_trained = False
            
        def generate_synthetic_data(self, n_samples=100000, fraud_rate=0.002):
            """Generate synthetic credit card transaction data"""
            print("Generating synthetic credit card transaction data...")
            
            np.random.seed(42)
            
            # Generate normal transactions
            n_normal = int(n_samples * (1 - fraud_rate))
            n_fraud = n_samples - n_normal
            
            # Normal transactions
            normal_data = {
                'V1': np.random.normal(-0.5, 2, n_normal),
                'V2': np.random.normal(0.2, 1.5, n_normal),
                'V3': np.random.normal(-0.1, 1.8, n_normal),
                'V4': np.random.normal(0.3, 1.2, n_normal),
                'V5': np.random.normal(-0.2, 1.6, n_normal),
                'V6': np.random.normal(0.1, 1.4, n_normal),
                'V7': np.random.normal(-0.1, 1.3, n_normal),
                'V8': np.random.normal(0.0, 1.1, n_normal),
                'V9': np.random.normal(0.1, 1.2, n_normal),
                'V10': np.random.normal(-0.2, 1.5, n_normal),
                'Amount': np.random.lognormal(3, 1.5, n_normal),
                'Time': np.random.uniform(0, 172800, n_normal),  # 2 days in seconds
                'Class': np.zeros(n_normal)
            }
            
            # Fraudulent transactions (different patterns)
            fraud_data = {
                'V1': np.random.normal(2.5, 1.8, n_fraud),  # Different mean
                'V2': np.random.normal(-1.2, 2.1, n_fraud),
                'V3': np.random.normal(1.8, 1.5, n_fraud),
                'V4': np.random.normal(-2.1, 1.8, n_fraud),
                'V5': np.random.normal(1.5, 1.2, n_fraud),
                'V6': np.random.normal(-1.8, 1.6, n_fraud),
                'V7': np.random.normal(2.2, 1.4, n_fraud),
                'V8': np.random.normal(-1.5, 1.3, n_fraud),
                'V9': np.random.normal(1.7, 1.1, n_fraud),
                'V10': np.random.normal(-2.3, 1.2, n_fraud),
                'Amount': np.random.lognormal(2, 2, n_fraud),  # Different amount distribution
                'Time': np.random.uniform(0, 172800, n_fraud),
                'Class': np.ones(n_fraud)
            }
            
            # Combine data
            data = {}
            for key in normal_data.keys():
                data[key] = np.concatenate([normal_data[key], fraud_data[key]])
            
            df = pd.DataFrame(data)
            
            # Shuffle the data
            df = df.sample(frac=1).reset_index(drop=True)
            
            print(f"Generated {len(df)} transactions:")
            print(f"- Normal transactions: {len(df[df['Class'] == 0])}")
            print(f"- Fraudulent transactions: {len(df[df['Class'] == 1])}")
            print(f"- Fraud rate: {df['Class'].mean():.4f}")
            
            return df
        
        def preprocess_data(self, df, sampling_method='smote'):
            """Preprocess data and handle class imbalance"""
            print("\nPreprocessing data...")
            
            # Separate features and target
            X = df.drop('Class', axis=1)
            y = df['Class']
            
            self.feature_names = X.columns.tolist()
            
            # Split data
            X_train, X_test, y_train, y_test = train_test_split(
                X, y, test_size=0.2, random_state=42, stratify=y
            )
            
            # Scale features
            X_train_scaled = self.scaler.fit_transform(X_train)
            X_test_scaled = self.scaler.transform(X_test)
            
            print(f"Original training set distribution:")
            print(f"- Normal: {sum(y_train == 0)}")
            print(f"- Fraud: {sum(y_train == 1)}")
            
            # Handle imbalanced data
            if sampling_method == 'smote':
                print("\nApplying SMOTE...")
                smote = SMOTE(random_state=42)
                X_train_resampled, y_train_resampled = smote.fit_resample(X_train_scaled, y_train)
            elif sampling_method == 'undersample':
                print("\nApplying Random Undersampling...")
                undersampler = RandomUnderSampler(random_state=42)
                X_train_resampled, y_train_resampled = undersampler.fit_resample(X_train_scaled, y_train)
            else:
                X_train_resampled, y_train_resampled = X_train_scaled, y_train
            
            print(f"Resampled training set distribution:")
            print(f"- Normal: {sum(y_train_resampled == 0)}")
            print(f"- Fraud: {sum(y_train_resampled == 1)}")
            
            return X_train_resampled, X_test_scaled, y_train_resampled, y_test
        
        def train_model(self, X_train, y_train, model_type='random_forest'):
            """Train the fraud detection model"""
            print(f"\nTraining {model_type} model...")
            
            if model_type == 'random_forest':
                self.model = RandomForestClassifier(
                    n_estimators=100,
                    max_depth=10,
                    min_samples_split=5,
                    min_samples_leaf=2,
                    random_state=42,
                    n_jobs=-1
                )
            elif model_type == 'gradient_boosting':
                self.model = GradientBoostingClassifier(
                    n_estimators=100,
                    learning_rate=0.1,
                    max_depth=6,
                    random_state=42
                )
            
            self.model.fit(X_train, y_train)
            self.is_trained = True
            print("Model training completed!")
        
        def evaluate_model(self, X_test, y_test):
            """Evaluate the model performance"""
            if not self.is_trained:
                print("Model not trained yet!")
                return
            
            print("\nEvaluating model performance...")
            
            # Predictions
            y_pred = self.model.predict(X_test)
            y_pred_proba = self.model.predict_proba(X_test)[:, 1]
            
            # Calculate metrics
            precision = precision_score(y_test, y_pred)
            recall = recall_score(y_test, y_pred)
            f1 = f1_score(y_test, y_pred)
            auc = roc_auc_score(y_test, y_pred_proba)
            
            print(f"\nModel Performance Metrics:")
            print(f"- Precision: {precision:.4f}")
            print(f"- Recall: {recall:.4f}")
            print(f"- F1-Score: {f1:.4f}")
            print(f"- AUC-ROC: {auc:.4f}")
            
            print(f"\nDetailed Classification Report:")
            print(classification_report(y_test, y_pred, target_names=['Normal', 'Fraud']))
            
            # Confusion Matrix
            cm = confusion_matrix(y_test, y_pred)
            print(f"\nConfusion Matrix:")
            print(f"True Negatives: {cm[0,0]}")
            print(f"False Positives: {cm[0,1]}")
            print(f"False Negatives: {cm[1,0]}")
            print(f"True Positives: {cm[1,1]}")
            
            return {
                'precision': precision,
                'recall': recall,
                'f1_score': f1,
                'auc_roc': auc,
                'confusion_matrix': cm
            }
        
        def plot_feature_importance(self):
            """Plot feature importance"""
            if not self.is_trained:
                print("Model not trained yet!")
                return
            
            importances = self.model.feature_importances_
            indices = np.argsort(importances)[::-1]
            
            plt.figure(figsize=(12, 8))
            plt.title("Feature Importance in Fraud Detection")
            plt.bar(range(len(importances)), importances[indices])
            plt.xticks(range(len(importances)), [self.feature_names[i] for i in indices], rotation=45)
            plt.tight_layout()
            plt.show()
        
        def predict_transaction(self, transaction_data):
            """Predict if a single transaction is fraudulent"""
            if not self.is_trained:
                print("Model not trained yet!")
                return None
            
            # Ensure transaction_data is in the right format
            if isinstance(transaction_data, dict):
                transaction_df = pd.DataFrame([transaction_data])
            else:
                transaction_df = pd.DataFrame([transaction_data], columns=self.feature_names)
            
            # Scale the data
            transaction_scaled = self.scaler.transform(transaction_df)
            
            # Make prediction
            prediction = self.model.predict(transaction_scaled)[0]
            probability = self.model.predict_proba(transaction_scaled)[0, 1]
            
            return {
                'prediction': int(prediction),
                'fraud_probability': probability,
                'risk_level': self.get_risk_level(probability)
            }
        
        def get_risk_level(self, probability):
            """Classify risk level based on fraud probability"""
            if probability < 0.3:
                return "Low"
            elif probability < 0.7:
                return "Medium"
            else:
                return "High"

    def testing_interface():
        """Interactive testing interface for the fraud detection system"""
        print("\n" + "="*60)
        print("CREDIT CARD FRAUD DETECTION SYSTEM - TESTING INTERFACE")
        print("="*60)
        
        # Initialize and train the system
        fraud_detector = FraudDetectionSystem()
        
        # Generate synthetic data
        df = fraud_detector.generate_synthetic_data(n_samples=50000)
        
        # Preprocess data
        X_train, X_test, y_train, y_test = fraud_detector.preprocess_data(df, sampling_method='smote')
        
        # Train model
        fraud_detector.train_model(X_train, y_train, model_type='random_forest')
        
        # Evaluate model
        metrics = fraud_detector.evaluate_model(X_test, y_test)
        
        # Interactive testing loop
        while True:
            print("\n" + "-"*50)
            print("TESTING OPTIONS:")
            print("1. Test with sample transaction")
            print("2. Test with custom transaction")
            print("3. Batch test with random transactions")
            print("4. Show feature importance")
            print("5. Exit")
            
            choice = input("\nEnter your choice (1-5): ").strip()
            
            if choice == '1':
                # Test with sample transaction
                sample_transaction = {
                    'V1': -1.359807,
                    'V2': -0.072781,
                    'V3': 2.536347,
                    'V4': 1.378155,
                    'V5': -0.338321,
                    'V6': 0.462388,
                    'V7': 0.239599,
                    'V8': 0.098698,
                    'V9': 0.363787,
                    'V10': 0.090794,
                    'Amount': 149.62,
                    'Time': 406
                }
                
                result = fraud_detector.predict_transaction(sample_transaction)
                print(f"\nSample Transaction Analysis:")
                print(f"Amount: ${sample_transaction['Amount']:.2f}")
                print(f"Prediction: {'FRAUD' if result['prediction'] == 1 else 'NORMAL'}")
                print(f"Fraud Probability: {result['fraud_probability']:.4f}")
                print(f"Risk Level: {result['risk_level']}")
                
            elif choice == '2':
                # Test with custom transaction
                print("\nEnter transaction details:")
                try:
                    transaction = {}
                    for feature in fraud_detector.feature_names:
                        if feature == 'Amount':
                            value = float(input(f"{feature} ($): "))
                        elif feature == 'Time':
                            value = float(input(f"{feature} (seconds): "))
                        else:
                            value = float(input(f"{feature}: "))
                        transaction[feature] = value
                    
                    result = fraud_detector.predict_transaction(transaction)
                    print(f"\nTransaction Analysis:")
                    print(f"Amount: ${transaction['Amount']:.2f}")
                    print(f"Prediction: {'FRAUD' if result['prediction'] == 1 else 'NORMAL'}")
                    print(f"Fraud Probability: {result['fraud_probability']:.4f}")
                    print(f"Risk Level: {result['risk_level']}")
                    
                except ValueError:
                    print("Invalid input! Please enter numeric values.")
                    
            elif choice == '3':
                # Batch test
                try:
                    n_tests = int(input("How many random transactions to test? "))
                    test_df = fraud_detector.generate_synthetic_data(n_samples=n_tests)
                    
                    fraud_count = 0
                    high_risk_count = 0
                    
                    for i in range(min(n_tests, len(test_df))):
                        transaction = test_df.iloc[i].drop('Class').to_dict()
                        result = fraud_detector.predict_transaction(transaction)
                        
                        if result['prediction'] == 1:
                            fraud_count += 1
                        if result['risk_level'] == 'High':
                            high_risk_count += 1
                    
                    print(f"\nBatch Test Results:")
                    print(f"Total transactions tested: {min(n_tests, len(test_df))}")
                    print(f"Predicted as fraud: {fraud_count}")
                    print(f"High risk transactions: {high_risk_count}")
                    print(f"Fraud detection rate: {fraud_count/min(n_tests, len(test_df))*100:.2f}%")
                    
                except ValueError:
                    print("Invalid input! Please enter a valid number.")
                    
            elif choice == '4':
                # Show feature importance (text-based since we can't display plots)
                if fraud_detector.is_trained:
                    importances = fraud_detector.model.feature_importances_
                    feature_importance = list(zip(fraud_detector.feature_names, importances))
                    feature_importance.sort(key=lambda x: x[1], reverse=True)
                    
                    print("\nFeature Importance Ranking:")
                    for i, (feature, importance) in enumerate(feature_importance, 1):
                        print(f"{i:2d}. {feature:8s}: {importance:.4f}")
                else:
                    print("Model not trained yet!")
                    
            elif choice == '5':
                print("\nThank you for using the Fraud Detection System!")
                break
                
            else:
                print("Invalid choice! Please enter 1-5.")

    if __name__ == "__main__":
        testing_interface()

    ============================================================
    CREDIT CARD FRAUD DETECTION SYSTEM - TESTING INTERFACE
    ============================================================
    Generating synthetic credit card transaction data...
    Generated 50000 transactions:
    - Normal transactions: 49900
    - Fraudulent transactions: 100
    - Fraud rate: 0.0020

    Preprocessing data...
    Original training set distribution:
    - Normal: 39920
    - Fraud: 80

    Applying SMOTE...
    Resampled training set distribution:
    - Normal: 39920
    - Fraud: 39920

    Training random_forest model...
    Model training completed!

    Evaluating model performance...

    Model Performance Metrics:
    - Precision: 0.3659
    - Recall: 0.7500
    - F1-Score: 0.4918
    - AUC-ROC: 0.9975

    Detailed Classification Report:
                  precision    recall  f1-score   support

          Normal       1.00      1.00      1.00      9980
           Fraud       0.37      0.75      0.49        20

        accuracy                           1.00     10000
       macro avg       0.68      0.87      0.75     10000
    weighted avg       1.00      1.00      1.00     10000


    Confusion Matrix:
    True Negatives: 9954
    False Positives: 26
    False Negatives: 5
    True Positives: 15

    --------------------------------------------------
    TESTING OPTIONS:
    1. Test with sample transaction
    2. Test with custom transaction
    3. Batch test with random transactions
    4. Show feature importance
    5. Exit

    Sample Transaction Analysis:
    Amount: $149.62
    Prediction: NORMAL
    Fraud Probability: 0.0000
    Risk Level: Low

    --------------------------------------------------
    TESTING OPTIONS:
    1. Test with sample transaction
    2. Test with custom transaction
    3. Batch test with random transactions
    4. Show feature importance
    5. Exit

    Enter transaction details:

    Transaction Analysis:
    Amount: $500.00
    Prediction: NORMAL
    Fraud Probability: 0.0750
    Risk Level: Low

    --------------------------------------------------
    TESTING OPTIONS:
    1. Test with sample transaction
    2. Test with custom transaction
    3. Batch test with random transactions
    4. Show feature importance
    5. Exit

    Generating synthetic credit card transaction data...
    Generated 3 transactions:
    - Normal transactions: 2
    - Fraudulent transactions: 1
    - Fraud rate: 0.3333

    Batch Test Results:
    Total transactions tested: 3
    Predicted as fraud: 1
    High risk transactions: 1
    Fraud detection rate: 33.33%

    --------------------------------------------------
    TESTING OPTIONS:
    1. Test with sample transaction
    2. Test with custom transaction
    3. Batch test with random transactions
    4. Show feature importance
    5. Exit

    Feature Importance Ranking:
     1. V10     : 0.2315
     2. V1      : 0.1623
     3. V7      : 0.1353
     4. V9      : 0.1094
     5. V4      : 0.0898
     6. V5      : 0.0882
     7. V6      : 0.0586
     8. V3      : 0.0541
     9. V8      : 0.0451
    10. V2      : 0.0165
    11. Amount  : 0.0053
    12. Time    : 0.0038

    --------------------------------------------------
    TESTING OPTIONS:
    1. Test with sample transaction
    2. Test with custom transaction
    3. Batch test with random transactions
    4. Show feature importance
    5. Exit

    Thank you for using the Fraud Detection System!