# AI-Driven Detection of Loan Stacking Behavior in Kenya’s Digital Lending Ecosystem

Group Members:
1. Naomi Ngigi
2. Elvis Kiprono
3. Godwine Wasonga
4. Janine Makorre
5. Trevor Maina
6. Caroline Wachira

DSF 12 FT Remote 

July 2025

## Introduction

Access to traditional banking in Kenya was once limited, but the launch of M-Pesa in 2007 transformed financial inclusion by enabling mobile money services across the country. Building on this, digital lending apps like Tala, Branch, and M-Shwari have made short-term, collateral-free loans widely accessible.

But with this convenience comes new risks. One growing concern is loan stacking, where borrowers take multiple loans across different platforms at the same time. As Collins Munyendo (2022) notes, this behavior often creates a cycle of debt, with borrowers taking on new loans just to repay existing ones—at high interest rates that can exceed 30% monthly.

Since most lenders operate without shared data, they lack visibility into borrowers’ full credit behavior. To solve this, our project introduces an AI-driven loan stacking detection model, trained on behavioral signals and delivered through a real-time API. The system is designed to help digital lenders make more informed decisions and promote safer, more sustainable lending practices.

## Problem Statement

While mobile lending apps have made credit more accessible in Kenya, they have also created a hidden risk: borrowers taking out multiple loans across different platforms without oversight. This loan stacking behavior often results in users relying on new loans to repay old ones, pushing them into unsustainable debt cycles.

Digital lenders, operating in isolation, lack the tools to detect this behavior before issuing loans. The absence of real-time risk detection mechanisms makes it difficult to differentiate between reliable and high-risk borrowers—leading to increased default rates, operational losses, and regulatory concerns.

There is a pressing need for a smart, data-driven system that can detect stacking patterns early and support responsible lending decisions.

## Objectives

1. Can a neural network detect early signs of loan stacking using borrowing and repayment behavior?

2. Which user behaviors best predict over-indebtedness or default?

3. How does a real-time loan stacking API improve credit decisions and reduce risk?

## Key Stakeholders

1. Financial Institutions and Lenders - Use the system to reduce defaults, assess borrower risk, and improve portfolio quality

2. Data Science & Risk Analytics teams - Build, train and monitor the neural network models ans risk scoring tools.

3. Regulators & Policy makers - Ensure the solution aligns with lending regulations and protects consumers from predatory practices

4. Borrowers(End User) - Benefit from fairer credit decisions and protection from over-indebtedness

5. Fintech Developers & Platform Providers - Inegrate and maintain the real-time risk prediction API across lending platforms

6. Business & Product Decision-Makers - Drive adoption, measure impact, and align the solition with strategic lending goals

## Description of Dataset

To build our model, we created a synthetic dataset that simulates real-world borrowing behavior in Kenya’s mobile lending ecosystem. Since access to actual user data is limited by privacy concerns, we generated realistic records—including loan amounts, repayment history, and cross-platform borrowing—to train and test our AI model safely and ethically.

Features:
+ user_id: Unique identifier for each borrower

+ loan_amount: Amount borrowed (KES)

+ repayment_days: Agreed repayment duration

+ apps_installed: Number of known loan apps on the borrower’s device

+ loans_last_30_days: Number of loans taken in the past month

+ repayment_ratio: Portion of past loans successfully repaid

+ previous_defaults: Count of previous loan defaults

+ employment_status: Borrower’s employment category (e.g., formal, informal, unemployed)

+ location_type: Urban or rural residence classification

+ default_risk: Binary label indicating whether the borrower defaulted (1) or repaid (0)

## Project Approach

To tackle the growing challenge of loan stacking in Kenya’s digital lending landscape, we follow a complete data science pipeline—from data generation to model deployment—using deep learning and real-time API integration.

1. Synthetic Dataset Creation
Given the lack of accessible real-world borrower data due to privacy concerns, we generated a synthetic dataset that mimics realistic mobile loan behaviors. It includes features such as borrowing frequency, repayment history, app usage patterns, and past defaults. 

2. Neural Network Modeling
A deep learning classifier is trained using TensorFlow and Keras to identify high-risk borrowers based on behavioral patterns. 

3. Model Evaluation & Interpretability
The model is evaluated using metrics such as accuracy, precision, recall, and ROC-AUC. We also use SHAP values or similar tools to interpret predictions and ensure transparency in risk assessment.

4. API Development & Deployment
After validation, the model is deployed as a real-time API using FastAPI. This allows lenders to submit borrower data and receive instant risk predictions, enabling smarter credit decisions within existing platforms.

5. System Testing & Validation
We simulate real-world scenarios by testing the API with various borrower profiles to evaluate its ability to detect loan stacking behavior effectively.

## Modules and Libraries

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

## Data Understanding

### Reading Basic Dataset Information

In [4]:
loan_df = pd.read_csv('mobile_loan_data.csv')
loan_df.head()

Unnamed: 0,user_id,age,income,employment_status,education_level,region,number_of_active_loans,apps_installed,loan_frequency_last_30_days,repayment_ratio_overall,...,loan_amount,interest_rate,loan_grade,loan_term_days,debt_to_income_ratio,delinquencies_last_2yrs,public_records,revolving_utilization,total_credit_lines,is_default
0,8d6b7bc1-64ea-472f-bcaf-eb2aa3c0ecdb,48,9091.92,Hourly,High School,Uasin Gishu,0,3,0,0.955326,...,16036.688815,3.722139,B,30,0.19153,0,0,0.0,14,False
1,82a1cd7f-31c7-49b4-94ef-9a9deadcc332,57,14046.93,Hourly,High School,Machakos,3,1,2,0.793824,...,7784.892343,14.206618,C,90,0.247405,0,0,0.219868,9,False
2,9a622af5-342f-47d2-bff1-05bd7617dd88,60,20660.47,Casual,Bachelors,Garissa,0,2,1,0.96927,...,53524.136336,8.738447,A,90,0.021086,0,0,0.040268,18,False
3,829f4b42-c231-46b0-bfae-574a68350082,32,79820.64,Casual,Bachelors,Nairobi City,2,2,3,0.814373,...,28132.474729,18.853501,B,60,0.125704,1,0,0.25324,5,True
4,27172a13-5b15-43f5-b481-4393b965970f,32,38490.32,Self-Employed,PhD,Kajiado,6,11,24,0.187815,...,18980.061246,28.965193,F,14,0.439578,2,1,0.837382,22,True


In [16]:
num_rows, num_columns = loan_df.shape
print(f"Rows: {num_rows}, Columns: {num_columns}")

Rows: 10000, Columns: 22


In [5]:
loan_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 22 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   user_id                      10000 non-null  object 
 1   age                          10000 non-null  int64  
 2   income                       10000 non-null  float64
 3   employment_status            10000 non-null  object 
 4   education_level              10000 non-null  object 
 5   region                       10000 non-null  object 
 6   number_of_active_loans       10000 non-null  int64  
 7   apps_installed               10000 non-null  int64  
 8   loan_frequency_last_30_days  10000 non-null  int64  
 9   repayment_ratio_overall      10000 non-null  float64
 10  credit_limit_utilization     10000 non-null  float64
 11  device_or_ID_shared          10000 non-null  bool   
 12  loan_amount                  10000 non-null  float64
 13  interest_rate    

In [6]:
loan_df.describe()

Unnamed: 0,age,income,number_of_active_loans,apps_installed,loan_frequency_last_30_days,repayment_ratio_overall,credit_limit_utilization,loan_amount,interest_rate,loan_term_days,debt_to_income_ratio,delinquencies_last_2yrs,public_records,revolving_utilization,total_credit_lines
count,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0
mean,41.4323,41191.97579,2.6715,4.5248,6.3364,0.684032,0.402382,25840.983429,20.013697,33.8311,0.291857,1.263,0.2822,0.327978,10.298
std,13.894932,27676.318384,2.652827,3.815107,7.523902,0.255767,0.277827,18550.730471,7.935258,25.081604,0.155448,1.452736,0.570259,0.263337,5.464612
min,18.0,1001.16,0.0,1.0,0.0,0.051682,0.0,500.0,2.031782,7.0,0.01,0.0,0.0,0.0,1.0
25%,29.0,18293.7675,1.0,2.0,2.0,0.480339,0.148561,13300.665032,14.207233,14.0,0.174196,0.0,0.0,0.090732,6.0
50%,41.0,34433.83,2.0,3.0,3.0,0.745939,0.355809,20984.824781,19.982298,28.0,0.273293,1.0,0.0,0.26056,10.0
75%,54.0,63485.2975,4.0,6.0,8.0,0.909687,0.634475,32544.261941,24.706228,60.0,0.383386,2.0,0.0,0.533847,13.0
max,65.0,99990.2,11.0,16.0,32.0,1.0,0.991243,100620.826501,40.860496,90.0,0.7,5.0,2.0,0.924872,27.0


In [7]:
loan_df.isnull().sum()

user_id                        0
age                            0
income                         0
employment_status              0
education_level                0
region                         0
number_of_active_loans         0
apps_installed                 0
loan_frequency_last_30_days    0
repayment_ratio_overall        0
credit_limit_utilization       0
device_or_ID_shared            0
loan_amount                    0
interest_rate                  0
loan_grade                     0
loan_term_days                 0
debt_to_income_ratio           0
delinquencies_last_2yrs        0
public_records                 0
revolving_utilization          0
total_credit_lines             0
is_default                     0
dtype: int64

In [14]:
loan_df.duplicated().sum()

0