A comprehensive web-based dashboard for customer segmentation analysis using K-Means clustering, built with FastAPI, Plotly, and Scikit-learn.
- Interactive Web Dashboard: Beautiful, responsive UI with multiple pages for different analyses
- Animated K-Means Visualization: Watch the clustering algorithm in action with step-by-step animations
- Comprehensive Analysis:
- Data exploration and distribution analysis
- Elbow method with multiple metrics (Silhouette, Calinski-Harabasz, Davies-Bouldin)
- 2D and 3D cluster visualizations
- PCA dimensionality reduction
- Quality metrics and statistical validation
- Cluster stability analysis
- Computational efficiency testing
- Business insights and segment profiles
- All Plotly Visualizations: Every chart is interactive with zoom, pan, and hover capabilities
- Complete Test Results: Comprehensive test suite results displayed on the dashboard
- Python 3.10+
- UV package manager
- Clone the repository and navigate to the project directory:
cd customer_segmentation- Install dependencies using UV:
uv syncThis will automatically:
- Create a virtual environment
- Install all required packages (FastAPI, Uvicorn, Plotly, Pandas, NumPy, Scikit-learn, etc.)
Start the server using UV:
uv run uvicorn app:app --host 127.0.0.1 --port 8000 --reloadThe server will:
- Load the customer data from
Mall_Customers.csv - Run comprehensive clustering analysis
- Generate all visualizations and animations
- Start the web server on http://127.0.0.1:8000
Open your browser and navigate to http://127.0.0.1:8000 to access the dashboard.
Welcome page with feature overview and quick navigation
- Dataset statistics and information
- Distribution plots for Age, Income, Spending Score, and Gender
- Elbow method visualization with WCSS
- Silhouette score analysis
- Calinski-Harabasz index
- Davies-Bouldin index
- Optimal K recommendations
Main Feature: Interactive animations showing:
- Step-by-step clustering process
- Centroid movements at each iteration
- Cluster assignments evolution
- Real-time metrics (Inertia, Silhouette score)
- Multiple feature pair visualizations:
- Income vs Spending Score
- Income vs Age
- Spending Score vs Age
- Metrics evolution chart showing convergence
- 2D scatter plots with centroids
- 3D interactive cluster visualization
- PCA visualization
- Color-coded clusters
- Silhouette analysis plot
- Cluster characteristics (sizes, distances, feature importance)
- Statistical validation (ANOVA tests)
- Cluster stability analysis
- Computation time vs number of clusters
- Iterations to convergence
- Performance analysis
- Business segment overview
- Cluster profiles table
- Segment interpretations:
- High Value Customers
- Budget Enthusiasts
- Wealthy but Conservative
- Low Value Customers
- Average Customers
Complete test.py output including:
- Quality metrics validation
- Stability analysis results
- Efficiency measurements
- Statistical significance tests
- Cluster characteristics
- Business validation
- Overall test summary with pass/fail indicators
customer_segmentation/
├── app.py # FastAPI application with all routes
├── analysis.py # Data loading and clustering analysis
├── kmeans_animation.py # K-Means animation generator
├── visualizations.py # Plotly visualization functions
├── Mall_Customers.csv # Dataset
├── pyproject.toml # UV package configuration
├── templates/ # HTML templates
│ ├── home.html
│ ├── overview.html
│ ├── elbow.html
│ ├── animation.html
│ ├── clustering.html
│ ├── quality.html
│ ├── efficiency.html
│ ├── business.html
│ └── test_results.html
├── main.py # (Old file - not used)
└── test.py # (Old file - not used)
- Backend: FastAPI (async web framework)
- Server: Uvicorn (ASGI server)
- Visualizations: Plotly (interactive charts)
- Data Processing: Pandas, NumPy
- Machine Learning: Scikit-learn
- Package Management: UV (fast Python package installer)
- Templates: Jinja2
The animation feature provides unique insights into how the algorithm works:
- Initialization: Shows random/k-means++ centroid initialization
- Assignment: Points are colored by their nearest centroid
- Update: Centroids move to the mean of their cluster
- Convergence: Process repeats until centroids stabilize
- Metrics: Real-time display of clustering quality at each step
All test results from the original test.py are integrated into the dashboard:
- Clustering quality metrics with interpretation
- Stability analysis across multiple runs
- Computational efficiency measurements
- Statistical validation (ANOVA tests for feature significance)
- Business validation with segment interpretations
GET /: Home pageGET /overview: Data overviewGET /elbow-method: Optimal cluster selectionGET /kmeans-animation: K-Means animationGET /clustering-results: Clustering visualizationsGET /quality-metrics: Quality analysisGET /efficiency: Efficiency analysisGET /business-insights: Business insightsGET /test-results: Test resultsGET /api/analysis: JSON API for analysis dataGET /api/animation: JSON API for animation metadata
This project demonstrates:
- Modern Python web development with FastAPI
- Interactive data visualization with Plotly
- Machine learning with Scikit-learn
- K-Means clustering algorithm
- Data analysis and business intelligence
- Package management with UV
- The analysis runs automatically on server startup (takes ~10-15 seconds)
- All visualizations are interactive - you can zoom, pan, and hover
- The animations use Play/Pause controls and a slider for step-by-step navigation
- The server supports hot-reload (code changes automatically restart the server)
- Dataset: Mall Customers Dataset [https://www.kaggle.com/datasets/vjchoudhary7/customer-segmentation-tutorial-in-python/data]
- Libraries: FastAPI, Plotly, Scikit-learn, Pandas, NumPy
- Package Manager: UV (Astral)