Gesture AI

Production-quality Hand Gesture Recognition with real-time webcam inference, trained classifiers, and modern web interfaces.

Features

Real-time Gesture Recognition: Live webcam inference with 5+ gesture classes
Landmark-based Pipeline: MediaPipe Hands → feature engineering → lightweight classifiers
Multiple Classifiers: Support for SVM, LightGBM, MLP, Random Forest, and Logistic Regression
Modern Web Interface: FastAPI backend + Streamlit UI with live camera and file upload
OpenCV Demo: Real-time overlay with landmarks, bounding boxes, and performance metrics
Production Ready: Docker support, comprehensive testing, logging, and monitoring
Easy Training: Data collection, feature engineering, and model training scripts

Quick Start

Local Development

Clone and Setup

git clone <repository-url>
cd Gesture-AI
make venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
make install

Collect Training Data

make collect
# Press keys 1-5 to label gestures, 'c' to capture, 'n' for next class

Build Dataset and Train

make build-dataset
make train CLF=svm  # or lgbm, mlp, rf, lr
make metrics

Run Applications

# Terminal 1: Start API
make dev

# Terminal 2: Start UI (in new terminal)
streamlit run app/ui/app.py

# Or run OpenCV demo
make demo

Docker Deployment

# Build and run with Docker Compose
make up

# Access applications
# API: http://localhost:8000
# UI: http://localhost:8501

📁 Project Structure

Gesture-AI/
├── README.md
├── pyproject.toml
├── Makefile
├── docker-compose.yml
├── .pre-commit-config.yaml
├── app/
│   ├── api/                    # FastAPI backend
│   │   ├── main.py            # API endpoints
│   │   └── schemas.py         # Pydantic models
│   ├── core/                  # Core configuration
│   │   ├── config.py         # Settings management
│   │   └── logging.py        # Structured logging
│   ├── inference/             # Inference pipeline
│   │   ├── mediapipe_wrapper.py # Hand detection
│   │   ├── features.py       # Feature engineering
│   │   ├── classifier.py     # Model inference
│   │   └── pipeline.py       # End-to-end pipeline
│   ├── training/              # Training scripts
│   │   ├── collect.py        # Data collection
│   │   ├── build_dataset.py  # Dataset creation
│   │   ├── train_clf.py      # Model training
│   │   ├── metrics.py        # Evaluation metrics
│   │   └── export.py         # Model export
│   ├── webcam/                # OpenCV demo
│   │   └── demo.py           # Real-time demo
│   └── ui/                    # Streamlit UI
│       └── app.py            # Web interface
├── infra/                     # Docker configuration
│   ├── Dockerfile.api        # API container
│   └── Dockerfile.ui         # UI container
├── models/                    # Trained models
├── data/                      # Data storage
│   ├── raw/                   # Collected frames
│   └── processed/             # Processed datasets
└── tests/                     # Test suite
    ├── test_features.py
    ├── test_pipeline.py
    └── test_api.py

Gesture Classes

The system recognizes 6 gesture classes:

none - No hand detected
open_palm - Open palm gesture
fist - Closed fist
thumbs_up - Thumbs up gesture
peace - Peace sign (V)
okay - OK sign

Development Workflow

Data Collection

# Start data collection
make collect

# Collect 100 samples per class
python app/training/collect.py --samples-per-class 100

# View collection statistics
python app/training/collect.py --show-stats

Training Pipeline

# 1. Build dataset from collected frames
make build-dataset

# 2. Train classifier (choose one)
make train CLF=svm      # Support Vector Machine
make train CLF=lgbm     # LightGBM
make train CLF=mlp      # Multi-layer Perceptron
make train CLF=rf       # Random Forest
make train CLF=lr       # Logistic Regression

# 3. Generate metrics and visualizations
make metrics

# 4. Export trained model
python app/training/export.py --model-name my_gesture_model

Quality Assurance

# Run tests
make test

# Code formatting and linting
make format
make lint

# Type checking
make type-check

# Install pre-commit hooks
pre-commit install

🔧 Configuration

Environment Variables

Create a .env file:

# Model Configuration
MODEL_PATH=models/gesture_classifier.pkl
FEATURE_CONFIG_PATH=models/feature_config.json
CLASSES_CONFIG_PATH=models/classes_config.json

# API Configuration
API_HOST=0.0.0.0
API_PORT=8000

# UI Configuration
UI_HOST=0.0.0.0
UI_PORT=8501

# Inference Configuration
CONFIDENCE_THRESHOLD=0.5
SMOOTHING_WINDOW=5
LANDMARK_CONFIDENCE_THRESHOLD=0.5

Feature Engineering

The system extracts comprehensive features from hand landmarks:

Normalized Coordinates: 21 landmarks × 3 coordinates = 63 features
Pairwise Distances: Key point distances (configurable)
Joint Angles: Finger joint angles using arctan2
Finger Tip Distances: All combinations of finger tips
Palm Center Distances: Distances from palm center to finger tips

API Endpoints

Health & Info

GET /health - Service health check
GET /classes - Available gesture classes
GET /model/info - Model information
GET /model/performance - Performance statistics
GET /features/info - Feature extraction info

Prediction

POST /predict - Single image prediction (base64)
POST /predict/batch - Batch image prediction
POST /predict/upload - File upload prediction

Example Usage

import requests
import base64

# Load image
with open("hand_image.jpg", "rb") as f:
    image_data = base64.b64encode(f.read()).decode()

# Make prediction
response = requests.post("http://localhost:8000/predict", json={
    "image_data": image_data,
    "confidence_threshold": 0.5,
    "return_landmarks": True
})

result = response.json()
print(f"Predicted: {result['label']} (confidence: {result['confidence']:.2f})")

Docker Deployment

Development

# Build and run all services
make up

# View logs
docker-compose logs -f

# Stop services
make down

Production

# Build production images
docker build -f infra/Dockerfile.api -t gesture-ai-api .
docker build -f infra/Dockerfile.ui -t gesture-ai-ui .

# Run with production settings
docker-compose -f docker-compose.yml up -d

Testing

# Run all tests
make test

# Run specific test files
pytest tests/test_features.py
pytest tests/test_pipeline.py
pytest tests/test_api.py

# Run with coverage
pytest --cov=app --cov-report=html

Performance

Benchmarks

Inference Speed: 20-50ms per frame (depending on hardware)
Accuracy: 85-95% on test datasets
Memory Usage: <500MB for inference
Model Size: 1-10MB (depending on classifier)

Optimization Tips

Use LightGBM for best speed/accuracy trade-off
Reduce smoothing window for lower latency
Adjust confidence threshold for your use case
Use GPU acceleration for MediaPipe (if available)

Troubleshooting

Common Issues

Camera not detected

# Check available cameras
python -c "import cv2; print([i for i in range(10) if cv2.VideoCapture(i).isOpened()])"

Model not loading

# Check model file exists
ls -la models/

# Retrain model
make train CLF=svm

API connection issues

# Check API health
curl http://localhost:8000/health

# Check logs
docker-compose logs api

Performance issues

# Check system resources
htop

# Monitor API performance
curl http://localhost:8000/model/performance

Debug Mode

# Enable debug logging
export LOG_LEVEL=DEBUG

# Run with verbose output
python app/api/main.py --log-level debug

Contributing

Fork the repository
Create a feature branch: git checkout -b feature-name
Make changes and add tests
Run quality checks: make format lint type-check test
Commit changes: git commit -m "Add feature"
Push to branch: git push origin feature-name
Create a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

MediaPipe for hand landmark detection
FastAPI for the API framework
Streamlit for the web interface
OpenCV for computer vision operations
scikit-learn for machine learning

Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Documentation: Project Wiki

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
app		app
data		data
infra		infra
models		models
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

License

CELIX2001/Gesture-Detection

Folders and files

Latest commit

History

Repository files navigation

Gesture AI

Features

Quick Start

Local Development

Docker Deployment

📁 Project Structure

Gesture Classes

Development Workflow

Data Collection

Training Pipeline

Quality Assurance

🔧 Configuration

Environment Variables

Feature Engineering

API Endpoints

Health & Info

Prediction

Example Usage

Docker Deployment

Development

Production

Testing

Performance

Benchmarks

Optimization Tips

Troubleshooting

Common Issues

Debug Mode

Contributing

License

Acknowledgments

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages