GitHub - rohit180497/NBFI-Loan-Repayment: This project aims to build an end-to-end loan default prediction system for a Non-Banking Financial Institution (NBFI). The system is designed to ingest, clean, process, and predict loan default probabilities while ensuring model deployment, monitoring, and automated CI/CD.

Welcome to the Ultimate MLOps-Powered Loan Default Prediction System!

The financial sector is rapidly evolving, and Non-Banking Financial Institutions (NBFIs) are at the forefront of innovation. As digital lending grows, accurate loan default prediction has become critical for risk management and financial stability.

This repository presents an End-to-End MLOps Architecture for Loan Default Prediction designed to automate the entire machine learning lifecycle—from data ingestion to model deployment and monitoring—leveraging the power of Azure Cloud.

By implementing MLOps best practices, this solution enhances model reliability, scalability, and reproducibility—making it ready for real-world deployment in financial services.

High Level Architecture

Project Demo

🏗️ Architecture Overview

The project follows the Medallion Architecture implemented using Azure SQL Database for structured data processing. The flow involves three main data layers:

1️⃣ Bronze Layer (Raw Data Ingestion)

Data is collected from multiple sources and stored in the Azure SQL Database.
Data is extracted from CSV files and loaded into the bronze.Bronze_Customer table.
Azure Data Factory (ADF) is used for incremental loading into Azure SQL.

2️⃣ Silver Layer (Data Cleaning & Preprocessing)

Data is cleaned and transformed in Python (pandas) in a local environment.
Cleaned data is loaded into silver.Silver_Customer table in Azure SQL.
Databricks is used for transformations to ensure high-quality structured data.

3️⃣ Gold Layer (Feature Engineering & Model Training)

Feature engineering is performed on the silver layer data.
Transformed data is stored in the gold layer (gold.Gold_Customer table).
The model training pipeline uses MLflow for experiment tracking and logs the best models.

🏆 Model Training & Deployment (Azure ML Studio & MLflow)

Model Training Steps

Data Extraction: Load the gold.Gold_Customer table for model training.
Preprocessing: Apply feature scaling, encoding, and missing value handling.
Model Training: Train multiple models using Logistic Regression, XGBoost, and LightGBM.
MLflow Integration: Store models as artifacts in Azure ML Studio.
Model Selection: Choose the best-performing model and register it in MLflow.

🚀 Containerization & Docker

The best model is retrieved from Azure ML Studio and packaged inside a Flask Web API.
The API is containerized using Docker and pushed to Azure Container Registry (ACR).
The Dockerfile ensures that dependencies are installed correctly and the Flask app runs smoothly.

🔐 Service Principal Authentication

Azure Service Principal is used to securely access Azure ML & ACR.
The service principal credentials are stored in a Kubernetes Secret for authentication.

🏗️ Deployment on Azure Kubernetes Service (AKS)

Azure Kubernetes Service (AKS) is used to deploy the containerized model.
Steps involved:
1. Create an AKS Cluster.
2. Deploy the Flask API as a Kubernetes pod.
3. Use LoadBalancer service to expose the API.
4. Store environment variables securely using Kubernetes Secrets.

📊 Storing Predictions in Azure SQL DB

Every model prediction is stored in the Predictions table.
The stored data includes:
- run_id (MLflow Run ID)
- client_id (User Identifier)
- input_raw_data (JSON format)
- processed_data (Final model input)
- prediction_probability
- predicted_class
- prediction_timestamp

📡 Monitoring & Logging

Azure Monitor & Log Analytics track API performance and error logs.
Application Insights helps in real-time monitoring of API health.
Prometheus & Grafana are planned for model drift detection.

🔄 CI/CD Pipeline (GitHub Actions)

The CI/CD pipeline automates model deployment when new code is pushed to GitHub.

Steps in the CI/CD Pipeline

Code Push & Trigger CI/CD: GitHub Actions trigger on main branch commits.
Build & Test: Runs unit tests, linting, and security scans.
Build Docker Image: Creates a new image with the latest model.
Push to Azure Container Registry (ACR).
Deploy to AKS using a Kubernetes YAML configuration.
Monitor Deployment Status using kubectl commands.

📜 Summary of Azure Services Used

Service	Purpose
Azure SQL Database	Stores raw, processed, and prediction data
Azure Data Factory	Manages data ingestion pipelines
Azure Databricks	Performs data transformations and feature engineering
Azure ML Studio	Tracks ML experiments and stores trained models
MLflow	Handles model versioning and artifact storage
Azure Container Registry (ACR)	Stores containerized model images
Azure Kubernetes Service (AKS)	Deploys and scales the application
Azure Monitor	Logs API requests, errors, and performance metrics
GitHub Actions	Automates CI/CD workflow

📌 Next Steps & Future Enhancements

Implement Model Retraining Pipeline with AutoML in Azure ML.
Enable Drift Detection using Evidently AI & Prometheus.
Extend the system to streaming data using Azure Event Hubs.

🎯 Conclusion

This project demonstrates a fully automated ML pipeline from data ingestion to model deployment with Azure services. It ensures scalability, security, and automation for real-world financial applications.

Authors: [Rohit Kosamkar]

📌 Repository: GitHub Repo

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
app		app
assets		assets
az-model-notebooks		az-model-notebooks
az-notebooks		az-notebooks
data		data
mlruns		mlruns
model		model
models		models
notebooks		notebooks
plots		plots
sql		sql
utils		utils
.gitignore		.gitignore
AKS_deployment_guide.md		AKS_deployment_guide.md
CICD-pipeline_guide.md		CICD-pipeline_guide.md
LICENSE		LICENSE
README.md		README.md
Report_sample.html		Report_sample.html
config.json		config.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Welcome to the Ultimate MLOps-Powered Loan Default Prediction System!

High Level Architecture

Project Demo

🏗️ Architecture Overview

1️⃣ Bronze Layer (Raw Data Ingestion)

2️⃣ Silver Layer (Data Cleaning & Preprocessing)

3️⃣ Gold Layer (Feature Engineering & Model Training)

🏆 Model Training & Deployment (Azure ML Studio & MLflow)

Model Training Steps

🚀 Containerization & Docker

🔐 Service Principal Authentication

🏗️ Deployment on Azure Kubernetes Service (AKS)

📊 Storing Predictions in Azure SQL DB

📡 Monitoring & Logging

🔄 CI/CD Pipeline (GitHub Actions)

Steps in the CI/CD Pipeline

📜 Summary of Azure Services Used

📌 Next Steps & Future Enhancements

🎯 Conclusion

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

rohit180497/NBFI-Loan-Repayment

Folders and files

Latest commit

History

Repository files navigation

Welcome to the Ultimate MLOps-Powered Loan Default Prediction System!

High Level Architecture

Project Demo

🏗️ Architecture Overview

1️⃣ Bronze Layer (Raw Data Ingestion)

2️⃣ Silver Layer (Data Cleaning & Preprocessing)

3️⃣ Gold Layer (Feature Engineering & Model Training)

🏆 Model Training & Deployment (Azure ML Studio & MLflow)

Model Training Steps

🚀 Containerization & Docker

🔐 Service Principal Authentication

🏗️ Deployment on Azure Kubernetes Service (AKS)

📊 Storing Predictions in Azure SQL DB

📡 Monitoring & Logging

🔄 CI/CD Pipeline (GitHub Actions)

Steps in the CI/CD Pipeline

📜 Summary of Azure Services Used

📌 Next Steps & Future Enhancements

🎯 Conclusion

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages