GitHub - T-Letuka/Finflow

FinFlow Informal Sector Credit Profiling

Overview

FinFlow is a data-driven project focused on building alternative credit profiling methods for individuals in the informal economy. The goal is to assess creditworthiness for users who lack traditional financial records but demonstrate consistent income patterns.

Problem Statement Access to formal credit in many emerging markets is heavily dependent on documented financial histories such as payslips, bank statements, and credit bureau records. However, a large segment of the workforce—including spaza shop owners, street vendors, and gig workers—operates within the informal economy and lacks these traditional financial footprints.

Despite generating consistent and often predictable income, these individuals are systematically excluded from credit systems, not due to risk, but due to lack of measurable visibility. This creates a structural gap where creditworthiness exists in reality but cannot be quantified using conventional models.

As a result:

Financial institutions miss out on a large, underserved market
Informal workers rely on high-interest or predatory lending
Economic mobility is constrained despite active participation in commerce

FinFlow aims to bridge this gap by developing alternative credit profiling approaches using behavioral, transactional, and proxy financial data to better represent the true financial reliability of informal sector participants.

Objectives

Develop a framework for evaluating creditworthiness without traditional credit history
Identify proxy indicators of financial reliability
Build a predictive model for credit risk classification
Provide insights that could support inclusive lending strategies

Dataset Source: Synthetic Size: 500 rows, 42 features
Model Performance & Trade-offs

This project was developed under realistic constraints typical of informal sector data:

~500 records ~15% default rate Limited representation of true defaulters

The baseline model achieved an ROC-AUC of ~0.70, reflecting the limitations of small, imbalanced datasets. Rather than optimizing purely for AUC, the modeling strategy focused on decision-relevant performance, specifically the ability to identify high-risk borrowers.

Interpretation At face value, the optimized model performs worse due to lower accuracy and AUC. However, this reflects a deliberate shift in objective rather than a degradation in usefulness.

The baseline model achieves high accuracy by predicting most applicants as non-default This results in missing the majority of defaulters, which is costly in a lending context

The optimized model corrects this by:

Increasing default recall from 6.7% → 93.3% Maintaining comparable precision Accepting lower accuracy as a trade-off for better risk detection

finflow/ ├── data/ │ └── generate_mock_data.py ├── sql/ │ ├── 01_cohort_analysis.sql │ ├── 02_repayment_scoring.sql │ └── 03_risk_segmentation.sql ├── notebooks/ │ ├── 01_eda.ipynb │ ├── 02_feature_engineering.ipynb │ ├── 03_credit_scoring.ipynb │ └── 04_shap_explainability.ipynb ├── exports/ │ └── (all your CSVs) ├── dashboard/ │ └── FinFlow_Dashboard.pdf │ └── page1_portfolio.png │ └── page2_geography.png │ └── page3_narrative.png └── README.md

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
notebooks		notebooks
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
generate_mock.py		generate_mock.py
requirements.txt		requirements.txt
run_sql_analytics.py		run_sql_analytics.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages