FinFlow Informal Sector Credit Profiling
- Overview
FinFlow is a data-driven project focused on building alternative credit profiling methods for individuals in the informal economy. The goal is to assess creditworthiness for users who lack traditional financial records but demonstrate consistent income patterns.
- Problem Statement Access to formal credit in many emerging markets is heavily dependent on documented financial histories such as payslips, bank statements, and credit bureau records. However, a large segment of the workforce—including spaza shop owners, street vendors, and gig workers—operates within the informal economy and lacks these traditional financial footprints.
Despite generating consistent and often predictable income, these individuals are systematically excluded from credit systems, not due to risk, but due to lack of measurable visibility. This creates a structural gap where creditworthiness exists in reality but cannot be quantified using conventional models.
As a result:
- Financial institutions miss out on a large, underserved market
- Informal workers rely on high-interest or predatory lending
- Economic mobility is constrained despite active participation in commerce
FinFlow aims to bridge this gap by developing alternative credit profiling approaches using behavioral, transactional, and proxy financial data to better represent the true financial reliability of informal sector participants.
- Objectives
- Develop a framework for evaluating creditworthiness without traditional credit history
- Identify proxy indicators of financial reliability
- Build a predictive model for credit risk classification
- Provide insights that could support inclusive lending strategies
-
Dataset Source: Synthetic Size: 500 rows, 42 features
-
Model Performance & Trade-offs
This project was developed under realistic constraints typical of informal sector data:
~500 records ~15% default rate Limited representation of true defaulters
The baseline model achieved an ROC-AUC of ~0.70, reflecting the limitations of small, imbalanced datasets. Rather than optimizing purely for AUC, the modeling strategy focused on decision-relevant performance, specifically the ability to identify high-risk borrowers.
- Interpretation At face value, the optimized model performs worse due to lower accuracy and AUC. However, this reflects a deliberate shift in objective rather than a degradation in usefulness.
The baseline model achieves high accuracy by predicting most applicants as non-default This results in missing the majority of defaulters, which is costly in a lending context
The optimized model corrects this by:
Increasing default recall from 6.7% → 93.3% Maintaining comparable precision Accepting lower accuracy as a trade-off for better risk detection
finflow/ ├── data/ │ └── generate_mock_data.py ├── sql/ │ ├── 01_cohort_analysis.sql │ ├── 02_repayment_scoring.sql │ └── 03_risk_segmentation.sql ├── notebooks/ │ ├── 01_eda.ipynb │ ├── 02_feature_engineering.ipynb │ ├── 03_credit_scoring.ipynb │ └── 04_shap_explainability.ipynb ├── exports/ │ └── (all your CSVs) ├── dashboard/ │ └── FinFlow_Dashboard.pdf │ └── page1_portfolio.png │ └── page2_geography.png │ └── page3_narrative.png └── README.md