## WOE Binning and Model Training

In this stage we are going to perform WOE binning and Model training using Logistic Regression, Descision Trees(Random Forest Model) and Gradient Boosting Machine(GBM) models. We are going to start with WOE Bining first.

- **Information Value (IV)**:
  - IV measures the predictive power of a feature in relation to a binary target variable. It quantifies how well a feature separates the classes (e.g., default vs. non-default).
  - IV is calculated based on the distribution of the feature values across the target classes and is particularly useful for binary classification problems.

## Overview of ScorecardPy

**ScorecardPy** is a Python library designed for building credit scoring models. It provides tools and functions to help data scientists and analysts create, validate, and deploy scorecards, particularly in the context of binary classification problems, such as predicting loan defaults or credit risk.

### Key Features

1. **Binning**:
   - Automatically bins continuous variables into categories, which helps in transforming numerical data into a format suitable for scorecard modeling.

2. **WoE (Weight of Evidence) Transformation**:
   - Converts binned variables into Weight of Evidence, which is a common transformation used in credit scoring to quantify the predictive power of each category.

3. **Scorecard Development**:
   - Facilitates the creation of a scorecard by allowing users to define points for each feature based on their predictive power and importance.

4. **Model Evaluation**:
   - Provides tools for evaluating the performance of the scorecard through metrics such as KS statistics, AUC (Area Under the Curve), and confusion matrices.

5. **Visualization**:
   - Includes functions for visualizing the distribution of variables, the relationship between features and the target variable, and the overall performance of the scorecard.

6. **Integration**:
   - Can be easily integrated into existing data science workflows and can work with pandas DataFrames, making it user-friendly for those familiar with Python and data manipulation.

### Use Cases

- **Credit Risk Assessment**: Building models to evaluate the creditworthiness of applicants.
- **Fraud Detection**: Identifying potentially fraudulent transactions or behaviors.
- **Customer Segmentation**: Analyzing customer data to inform marketing strategies or product offerings.


In [80]:
import pandas as pd
import scorecardpy as sc
from monotonic_binning.monotonic_woe_binning import Binning

In [81]:
# Load the data
df = pd.read_csv("../notebooks/data/df_merged.csv")

In [82]:
print(df.columns.tolist())

['CustomerId', 'TransactionId', 'BatchId', 'AccountId', 'SubscriptionId', 'CurrencyCode', 'CountryCode', 'ProviderId', 'ProductId', 'ProductCategory', 'ChannelId', 'Value', 'TransactionStartTime', 'PricingStrategy', 'FraudResult', 'Transaction_Hour', 'Transaction_Day', 'Transaction_Month', 'Transaction_Year', 'Total_Transaction_Amount', 'Average_Transaction_Amount', 'Transaction_Count', 'Std_Transaction_Amount', 'TransactionDate', 'Recency', 'Frequency', 'Monetary', 'Size', 'Risk_Label']


In [83]:
import pandas as pd
import scorecardpy as sc
import matplotlib.pyplot as plt
import seaborn as sns

# Sample DataFrame (replace this with your actual DataFrame)
# df = pd.read_csv("your_file.csv")  # Uncomment this line to load your dataset

# Convert 'Risk_Label' to binary format (1 for 'Good' and 0 for 'Bad')
df['Risk_Label'] = df['Risk_Label'].map({'Good': 1, 'Bad': 0})

# Define the columns to perform WoE binning on, including categorical variables
columns_to_bin = [
    'Total_Transaction_Amount',
    'Average_Transaction_Amount',
    'Transaction_Count',
    'Recency',
    'Frequency',
    'Monetary',
    'ProviderId',        # Added categorical variable
    'ProductId',         # Added categorical variable
    'ProductCategory',    # Added categorical variable
    'ChannelId'          # Added categorical variable
]

# Calculate y_threshold (proportion of Good outcomes)
y_threshold = df['Risk_Label'].mean()  # Proportion of Good (1) outcomes
# Set p_threshold based on the proportion of Bad outcomes
p_threshold = 1 - y_threshold  # Proportion of Bad instances

# Optimal n_threshold can be determined by your domain knowledge or experimentation
n_threshold = 10  # Number of bins

# Perform WoE Binning for each column and merge the results
for column in columns_to_bin:
    # Perform WoE binning using scorecardpy's woebin function
    binning_result = sc.woebin(df, y='Risk_Label', x=column, 
                                n_bin=n_threshold,  # Number of bins
                                p=0.05)  # Optional: Minimum proportion of good/bad for a bin

    # Create a new column for WoE values in df
    woe_column_name = f'WoE_{column}'

    # Use woebin_ply to get WoE values and inspect the resulting DataFrame
    woebin_ply_result = sc.woebin_ply(df, binning_result)

    # Print columns for debugging
    print("Columns in woebin_ply_result:", woebin_ply_result.columns)

    # Map the WoE values back to the DataFrame using the correct column name
    woe_value_column = f"{column}_woe"  # Check the naming convention for WoE values
    if woe_value_column in woebin_ply_result.columns:
        df[woe_column_name] = woebin_ply_result[woe_value_column]  # Extract the WoE values
    else:
        print(f"Warning: '{woe_value_column}' not found in the result DataFrame for '{column}'.")

# Display the first few rows of the updated DataFrame
print(df.head())


[INFO] creating woe binning ...
[INFO] converting into woe values ...
Columns in woebin_ply_result: Index(['BatchId', 'Std_Transaction_Amount', 'Monetary', 'TransactionStartTime', 'Transaction_Count', 'Frequency', 'Transaction_Month', 'TransactionId', 'Size', 'ProductCategory', 'Risk_Label', 'CurrencyCode', 'CountryCode', 'FraudResult', 'Transaction_Hour', 'ProductId', 'CustomerId', 'ChannelId', 'SubscriptionId',
       'PricingStrategy', 'Average_Transaction_Amount', 'Recency', 'Transaction_Day', 'Transaction_Year', 'Value', 'ProviderId', 'TransactionDate', 'AccountId', 'Total_Transaction_Amount_woe'],
      dtype='object')
[INFO] creating woe binning ...
[INFO] converting into woe values ...
Columns in woebin_ply_result: Index(['BatchId', 'Std_Transaction_Amount', 'Monetary', 'TransactionStartTime', 'Transaction_Count', 'Frequency', 'Transaction_Month', 'TransactionId', 'Size', 'ProductCategory', 'Risk_Label', 'CurrencyCode', 'CountryCode', 'FraudResult', 'Transaction_Hour', 'Total_T

In [84]:
# Display the first few rows of the updated DataFrame
print(df.head(10))

        CustomerId         TransactionId         BatchId       AccountId       SubscriptionId CurrencyCode  CountryCode  ProviderId  ProductId  ProductCategory  ChannelId     Value TransactionStartTime  PricingStrategy  FraudResult  Transaction_Hour  Transaction_Day  Transaction_Month  Transaction_Year  \
0  CustomerId_4406   TransactionId_76871   BatchId_36123  AccountId_3957   SubscriptionId_887          UGX          256           6         10                0          3  0.142612      11/15/2018 2:18                2            0                 2               15                 11              2018   
1  CustomerId_4406   TransactionId_73770   BatchId_15642  AccountId_4841  SubscriptionId_3829          UGX          256           4          6                2          2  0.002572      11/15/2018 2:19                2            0                 2               15                 11              2018   
2  CustomerId_4683   TransactionId_26203   BatchId_53941  AccountId_4229   Subs

In [85]:
import pandas as pd

# Save the DataFrame as a CSV file
df.to_csv('../notebooks/data/final_woe_bin.csv', index=False)

print("DataFrame saved as 'final_woe_bin.csv'")


DataFrame saved as 'final_woe_bin.csv'


### Model training And Evaluation

In this phase we are going to train the aformentioned models and evaluate them.

In [86]:
import os
import pandas as pd
import numpy as np
import pickle
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score
from monotonic_binning.monotonic_woe_binning import Binning
import matplotlib.pyplot as plt
import seaborn as sns

# Load your DataFrame
df = pd.read_csv("../notebooks/data/df_merged.csv")

# Convert Risk_Label to binary format (1 for 'Good' and 0 for 'Bad')
df['Risk_Label'] = df['Risk_Label'].map({'Good': 1, 'Bad': 0})

# Define features and target variable
X = df[['Recency', 'Frequency', 'Monetary']]
y = df['Risk_Label']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Model Selection and Training
models = {
    'Logistic Regression': LogisticRegression(max_iter=1000),
    'Random Forest': RandomForestClassifier(),
    'Gradient Boosting': GradientBoostingClassifier()
}

# Hyperparameter tuning setup
param_grid = {
    'Logistic Regression': {'C': [0.01, 0.1, 1, 10]},
    'Random Forest': {'n_estimators': [50, 100, 200], 'max_depth': [None, 10, 20]},
    'Gradient Boosting': {'n_estimators': [50, 100, 200], 'learning_rate': [0.01, 0.1, 0.2]}
}

# Train and evaluate models
performance = {}

for model_name, model in models.items():
    # Hyperparameter tuning
    grid_search = GridSearchCV(model, param_grid[model_name], cv=5, scoring='roc_auc')
    grid_search.fit(X_train, y_train)

    # Best model after tuning
    best_model = grid_search.best_estimator_
    
    # Predictions
    y_pred = best_model.predict(X_test)
    y_pred_proba = best_model.predict_proba(X_test)[:, 1]  # Get probability of positive class
    
    # Calculate metrics
    accuracy = accuracy_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred)
    recall = recall_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)
    roc_auc = roc_auc_score(y_test, y_pred_proba)

    # Store performance
    performance[model_name] = {
        'Accuracy': accuracy,
        'Precision': precision,
        'Recall': recall,
        'F1 Score': f1,
        'ROC AUC': roc_auc
    }

    # Create the Models directory if it doesn't exist
    os.makedirs('Models', exist_ok=True)

    # Save the model
    with open(f'Models/{model_name.replace(" ", "_")}.pkl', 'wb') as f:
        pickle.dump(best_model, f)

# Display performance metrics
performance_df = pd.DataFrame(performance).T
print(performance_df)

                     Accuracy  Precision    Recall  F1 Score   ROC AUC
Logistic Regression  0.784734   0.747896  0.753365  0.750621  0.886593
Random Forest        1.000000   1.000000  1.000000  1.000000  1.000000
Gradient Boosting    1.000000   1.000000  1.000000  1.000000  1.000000


### Interpretation
### Interpretation of Model Performance

The results of the modeling task reveal a clear distinction in the performance metrics of the three models employed to predict credit risk based on the features of Recency, Frequency, and Monetary values. The **Logistic Regression** model achieved an accuracy of approximately **78.47%**, with precision, recall, and F1 scores reflecting a balanced performance. The **ROC AUC** of **0.887** indicates a good ability to distinguish between the positive and negative classes. This suggests that while the Logistic Regression model performs reasonably well, it may not capture the complexity of the data as effectively as the more sophisticated ensemble methods used.

In stark contrast, both the **Random Forest** and **Gradient Boosting** models achieved perfect scores across all metrics, including an accuracy of **100%** and an ROC AUC of **1.000**. While these results appear promising, they raise concerns about potential overfitting. The extreme performance of these models suggests they may have learned noise or specific patterns from the training data that do not generalize well to unseen data. Such overfitting could lead to unreliable predictions in a real-world scenario, where the model may fail to maintain its accuracy on new, unseen instances. As a result, while the Random Forest and Gradient Boosting models showcase exceptional metrics, caution must be exercised when interpreting these results due to the likelihood of overfitting. Balancing model complexity with generalizability will be essential for developing a robust credit scoring system.