E-Commerce Analytics System

This project is an intelligent e-commerce analytics system that uses machine learning to perform customer segmentation, churn prediction, and Customer Lifetime Value (CLV) prediction. The system automatically detects data changes and retrains models when necessary.

Features

Data Ingestion: Automatic detection of data file changes with hash-based verification
Data Processing: Cleaning and feature engineering (RFM metrics, return rates, etc.)
Customer Segmentation: K-Means clustering for customer grouping
Churn Prediction: Classification model with GridSearch optimization
CLV Prediction: Regression model for lifetime value estimation
REST API: FastAPI-based endpoints for predictions and data retrieval
Database Integration: SQLite for data storage and retrieval

Architecture

The system follows a modular pipeline architecture:

Data Ingestion (data_ingestion.py): Loads and validates CSV data
Data Processing (data_processor.py): Cleans data and creates features
Model Training (model_trainer.py): Trains ML models and saves artifacts
API Service (api.py): Provides REST endpoints for predictions
Main Orchestrator (main.py): Coordinates the entire pipeline

Installation

Prerequisites

Python 3.8 or higher
pip package manager

Setup Steps

Clone or download the project:

git clone <repository-url>
cd OnlineRetailMachineLearningProject

Create a virtual environment:
```
python -m venv .venv
```
Activate the virtual environment:
- Windows: .venv\Scripts\activate
- Linux/Mac: source .venv/bin/activate
Install dependencies:
```
pip install -r requirements.txt
```
Place your data file:
- Copy your online_retail_II.csv file to data/raw/ directory
- The system supports both 'Invoice' and 'InvoiceNo' column names

Usage

Training Models

Run the main script to process data and train models:

python main.py

The system will:

Check for data changes
Process and clean data if needed
Train models (segmentation, churn, CLV)
Save models to artifacts/ directory
Display a status report

Starting the API Server

After training, start the FastAPI server:

uvicorn src.api:app --reload

The API will be available at http://localhost:8000

API Documentation

Visit http://localhost:8000/docs for interactive API documentation.

Endpoints

GET /: Welcome message
GET /customer/{customer_id}: Get predictions for a specific customer
POST /predict/live: Make live predictions with custom data

Example API Usage

Get customer predictions:

curl http://localhost:8000/customer/12345

Live prediction:

curl -X POST "http://localhost:8000/predict/live" \
     -H "Content-Type: application/json" \
     -d '{
       "recency": 30,
       "frequency": 5,
       "monetary": 1500.0,
       "avg_basket": 300.0,
       "return_rate": 0.1
     }'

Data Format

The system expects CSV data with the following columns:

Customer ID (numeric)
Invoice/InvoiceNo (transaction identifier)
Quantity (numeric)
Price/UnitPrice (numeric)
InvoiceDate (datetime)

Model Details

Customer Segmentation

Algorithm: K-Means Clustering
Features: Recency, Frequency, Monetary, AvgBasketSize
Number of segments: 4 (configurable)

Churn Prediction

Algorithm: Random Forest Classifier
Target: Customers with Recency > 90 days
Features: Frequency, Monetary, AvgBasketSize, return_rate
Optimization: GridSearchCV for hyperparameter tuning

CLV Prediction

Algorithm: Random Forest Regressor
Target: Total Monetary value
Features: Recency, Frequency, AvgBasketSize, return_rate

Project Structure

OnlineRetailMachineLearningProject/
├── main.py                 # Main orchestrator
├── requirements.txt        # Python dependencies
├── .gitignore             # Git ignore rules
├── src/
│   ├── __init__.py
│   ├── config.py          # Configuration settings
│   ├── data_ingestion.py  # Data loading and validation
│   ├── data_processor.py  # Data cleaning and feature engineering
│   ├── model_trainer.py   # ML model training
│   └── api.py             # FastAPI endpoints
├── data/
│   ├── raw/               # Raw data files
│   └── data_state.json    # Data change tracking
├── db/                    # SQLite database files
├── artifacts/             # Trained model files
└── notebooks/             # Jupyter notebooks (for analysis)

Configuration

Modify src/config.py to customize:

Database paths
Model artifacts location
Table names
File encodings

Logs

The system provides detailed console output for debugging. Check the terminal output for error messages and processing status.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

E-Commerce Analytics System

Features

Architecture

Installation

Prerequisites

Setup Steps

Usage

Training Models

Starting the API Server

API Documentation

Endpoints

Example API Usage

Data Format

Model Details

Customer Segmentation

Churn Prediction

CLV Prediction

Project Structure

Configuration

Logs

About

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
artifacts		artifacts
data/raw		data/raw
notebooks		notebooks
src		src
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Egehan134/OnlineRetailMachineLearningProject

Folders and files

Latest commit

History

Repository files navigation

E-Commerce Analytics System

Features

Architecture

Installation

Prerequisites

Setup Steps

Usage

Training Models

Starting the API Server

API Documentation

Endpoints

Example API Usage

Data Format

Model Details

Customer Segmentation

Churn Prediction

CLV Prediction

Project Structure

Configuration

Logs

About

Resources

Uh oh!

Stars

Watchers

Forks

Languages