Customer Retention System

A comprehensive machine learning system that handles the entire ML lifecycle from data ingestion to model deployment, focused on predicting and improving customer retention.

Project Overview

This project implements a complete machine learning pipeline for customer retention prediction with:

Data ingestion and processing
Model training and hyperparameter tuning
Model evaluation and comparison
Model deployment as microservices
Model monitoring and performance tracking
User-friendly interfaces for interacting with the system

The system is designed with a microservices architecture using Docker containers, ensuring scalability and ease of deployment.

Architecture

customer-retention-system/
├── .github/workflows/      # CI pipeline configuration
├── assets/                 # Project images and resources
├── configs/                # Configuration management
├── data/                   # Input data storage
├── databases/              # Database scripts and migrations
├── docker-compose.yml      # Container orchestration
├── example-env.sh          # Example .env
├── migrations/             # Database migrations
├── notebooks/              # Experimental Jupyter notebooks
├── prepare_db.sh           # Database preparation script
├── requirements.txt        # Dependencies
├── src/
│   ├── core/               # Core functionality
│   │   ├── common/         # Shared utilities
│   │   ├── ingestion/      # Data ingestion and processing
│   │   ├── training/       # Model training pipeline
│   │   └── deployment/     # Model deployment services
│   ├── monitoring/         # Monitoring services (Prefect)
│   ├── streamlit_ui/       # Streamlit user interface
│   └── tmp/                # Temporary files
└── tests/                  # Automated tests

Key Features

End-to-End ML Pipeline: Handles all aspects of the machine learning lifecycle
Microservices Architecture: Components are containerized for scalability and isolation
Model Versioning & Tracking: Integrated with MLflow for experiment tracking
Workflow Orchestration: Utilizes prefect for workflow orchestration
CI Integration: Automated testing with GitHub Actions
Multiple Interfaces: Flask API endpoints and Streamlit UI for interacting with the system
Extensible Model Framework: Easily add or swap models as needed

Models Implemented

The system includes several regression models for customer retention prediction:

Linear Regression
Decision Tree Regression
Random Forest Regression
XGBoost Regression

The modular design makes it easy to add new models to the pipeline based on your specific needs.

Adaptability

While built for customer retention prediction, this framework can be readily adapted for other machine learning use cases by:

Updating data ingestion components for new data sources
Modifying preprocessing steps for different data types
Adding new machine learning models appropriate for the specific problem
Adjusting evaluation metrics based on the new objectives
Customizing the UI/API endpoints for the new application

The microservices architecture ensures that individual components can be modified without disrupting the entire system.

Workflow Orchestration with Prefect

The project uses Prefect for orchestrating the ML pipeline workflows. Below are visualizations of the flows:

Scheduled Runs

Flow Diagram

Technologies Used

Machine Learning: scikit-learn, XGBoost, TensorFlow/Keras
Workflow Orchestration: Prefect
Model Tracking: MLflow
Containerization: Docker
Database: SQLite
Web Interfaces: Flask, Streamlit
CI: GitHub Actions

Getting Started

Prerequisites

Docker and Docker Compose
Python 3.8+

Installation

Clone the repository

https://github.com/Godwin-T/auto-customer-retention.git
cd customer-retention-system

Run the system using Docker Compose

chmod +x prepare_db.sh
./prepare_db.sh
docker-compose up

Access the Streamlit UI at http://localhost:8501

Usage

API Endpoints

/predict - Get retention predictions for new customers
/process - Process new data
/train - Train models with specified parameters

Streamlit UI

The Streamlit interface provides an interactive way to:

Upload customer data
Process and save data to database
Configure and train models with custom parameters
View model performance metrics and comparisons
Deploy models to production

Future Improvements

Database Migration

The current system uses SQLite, which has limitations:

Concurrency Issues: SQLite experiences locking problems with concurrent operations
Scalability Constraints: Not suitable for high-volume production environments

Planned PostgreSQL Implementation:

Better handling of concurrent operations
Improved performance for larger datasets
More robust transaction support
Better integration with monitoring tools

Model Enhancements

Advanced Hyperparameter Tuning: Implement Grid Search and Random Search for optimal parameter selection
AutoML Integration: Add support for automated model selection and feature engineering
Ensemble Methods: Implement stacking and blending techniques for improved prediction accuracy
Custom Loss Functions: Support for specialized loss functions tailored to retention prediction

Flexibility Improvements

Parameter Customization: Enhanced configuration options for all pipeline stages
Feature Selection Algorithms: Automatic feature importance analysis and selection
Data Drift Detection: Automated detection of data distribution changes
A/B Testing Framework: Compare model versions in production

Development

Adding New Models

Create a new model class in src/core/training/
Register the model in the model registry
Update configuration as needed

Database Migration Guide

To switch from SQLite to PostgreSQL:

Update database connection settings in config.yaml
Run migration script: python src/core/common/db_migration.py
Update Docker Compose file to include PostgreSQL service

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Customer Retention System

Project Overview

Architecture

Key Features

Models Implemented

Adaptability

Workflow Orchestration with Prefect

Scheduled Runs

Flow Diagram

Technologies Used

Getting Started

Prerequisites

Installation

Usage

API Endpoints

Streamlit UI

Future Improvements

Database Migration

Model Enhancements

Flexibility Improvements

Development

Adding New Models

Database Migration Guide

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 667 Commits
.github/workflows		.github/workflows
assets		assets
configs		configs
data		data
migrations		migrations
notebooks		notebooks
src		src
.gitignore		.gitignore
.ruff.toml		.ruff.toml
README.md		README.md
alembic.ini		alembic.ini
docker-compose.yml		docker-compose.yml
example-env.sh		example-env.sh
example.sh		example.sh
prepare_db.sh		prepare_db.sh
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Customer Retention System

Project Overview

Architecture

Key Features

Models Implemented

Adaptability

Workflow Orchestration with Prefect

Scheduled Runs

Flow Diagram

Technologies Used

Getting Started

Prerequisites

Installation

Usage

API Endpoints

Streamlit UI

Future Improvements

Database Migration

Model Enhancements

Flexibility Improvements

Development

Adding New Models

Database Migration Guide

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages