Production-quality Hand Gesture Recognition with real-time webcam inference, trained classifiers, and modern web interfaces.
- Real-time Gesture Recognition: Live webcam inference with 5+ gesture classes
- Landmark-based Pipeline: MediaPipe Hands β feature engineering β lightweight classifiers
- Multiple Classifiers: Support for SVM, LightGBM, MLP, Random Forest, and Logistic Regression
- Modern Web Interface: FastAPI backend + Streamlit UI with live camera and file upload
- OpenCV Demo: Real-time overlay with landmarks, bounding boxes, and performance metrics
- Production Ready: Docker support, comprehensive testing, logging, and monitoring
- Easy Training: Data collection, feature engineering, and model training scripts
-
Clone and Setup
git clone <repository-url> cd Gesture-AI make venv source .venv/bin/activate # On Windows: .venv\Scripts\activate make install
-
Collect Training Data
make collect # Press keys 1-5 to label gestures, 'c' to capture, 'n' for next class -
Build Dataset and Train
make build-dataset make train CLF=svm # or lgbm, mlp, rf, lr make metrics -
Run Applications
# Terminal 1: Start API make dev # Terminal 2: Start UI (in new terminal) streamlit run app/ui/app.py # Or run OpenCV demo make demo
# Build and run with Docker Compose
make up
# Access applications
# API: http://localhost:8000
# UI: http://localhost:8501Gesture-AI/
βββ README.md
βββ pyproject.toml
βββ Makefile
βββ docker-compose.yml
βββ .pre-commit-config.yaml
βββ app/
β βββ api/ # FastAPI backend
β β βββ main.py # API endpoints
β β βββ schemas.py # Pydantic models
β βββ core/ # Core configuration
β β βββ config.py # Settings management
β β βββ logging.py # Structured logging
β βββ inference/ # Inference pipeline
β β βββ mediapipe_wrapper.py # Hand detection
β β βββ features.py # Feature engineering
β β βββ classifier.py # Model inference
β β βββ pipeline.py # End-to-end pipeline
β βββ training/ # Training scripts
β β βββ collect.py # Data collection
β β βββ build_dataset.py # Dataset creation
β β βββ train_clf.py # Model training
β β βββ metrics.py # Evaluation metrics
β β βββ export.py # Model export
β βββ webcam/ # OpenCV demo
β β βββ demo.py # Real-time demo
β βββ ui/ # Streamlit UI
β βββ app.py # Web interface
βββ infra/ # Docker configuration
β βββ Dockerfile.api # API container
β βββ Dockerfile.ui # UI container
βββ models/ # Trained models
βββ data/ # Data storage
β βββ raw/ # Collected frames
β βββ processed/ # Processed datasets
βββ tests/ # Test suite
βββ test_features.py
βββ test_pipeline.py
βββ test_api.py
The system recognizes 6 gesture classes:
- none - No hand detected
- open_palm - Open palm gesture
- fist - Closed fist
- thumbs_up - Thumbs up gesture
- peace - Peace sign (V)
- okay - OK sign
# Start data collection
make collect
# Collect 100 samples per class
python app/training/collect.py --samples-per-class 100
# View collection statistics
python app/training/collect.py --show-stats# 1. Build dataset from collected frames
make build-dataset
# 2. Train classifier (choose one)
make train CLF=svm # Support Vector Machine
make train CLF=lgbm # LightGBM
make train CLF=mlp # Multi-layer Perceptron
make train CLF=rf # Random Forest
make train CLF=lr # Logistic Regression
# 3. Generate metrics and visualizations
make metrics
# 4. Export trained model
python app/training/export.py --model-name my_gesture_model# Run tests
make test
# Code formatting and linting
make format
make lint
# Type checking
make type-check
# Install pre-commit hooks
pre-commit installCreate a .env file:
# Model Configuration
MODEL_PATH=models/gesture_classifier.pkl
FEATURE_CONFIG_PATH=models/feature_config.json
CLASSES_CONFIG_PATH=models/classes_config.json
# API Configuration
API_HOST=0.0.0.0
API_PORT=8000
# UI Configuration
UI_HOST=0.0.0.0
UI_PORT=8501
# Inference Configuration
CONFIDENCE_THRESHOLD=0.5
SMOOTHING_WINDOW=5
LANDMARK_CONFIDENCE_THRESHOLD=0.5The system extracts comprehensive features from hand landmarks:
- Normalized Coordinates: 21 landmarks Γ 3 coordinates = 63 features
- Pairwise Distances: Key point distances (configurable)
- Joint Angles: Finger joint angles using arctan2
- Finger Tip Distances: All combinations of finger tips
- Palm Center Distances: Distances from palm center to finger tips
GET /health- Service health checkGET /classes- Available gesture classesGET /model/info- Model informationGET /model/performance- Performance statisticsGET /features/info- Feature extraction info
POST /predict- Single image prediction (base64)POST /predict/batch- Batch image predictionPOST /predict/upload- File upload prediction
import requests
import base64
# Load image
with open("hand_image.jpg", "rb") as f:
image_data = base64.b64encode(f.read()).decode()
# Make prediction
response = requests.post("http://localhost:8000/predict", json={
"image_data": image_data,
"confidence_threshold": 0.5,
"return_landmarks": True
})
result = response.json()
print(f"Predicted: {result['label']} (confidence: {result['confidence']:.2f})")# Build and run all services
make up
# View logs
docker-compose logs -f
# Stop services
make down# Build production images
docker build -f infra/Dockerfile.api -t gesture-ai-api .
docker build -f infra/Dockerfile.ui -t gesture-ai-ui .
# Run with production settings
docker-compose -f docker-compose.yml up -d# Run all tests
make test
# Run specific test files
pytest tests/test_features.py
pytest tests/test_pipeline.py
pytest tests/test_api.py
# Run with coverage
pytest --cov=app --cov-report=html- Inference Speed: 20-50ms per frame (depending on hardware)
- Accuracy: 85-95% on test datasets
- Memory Usage: <500MB for inference
- Model Size: 1-10MB (depending on classifier)
- Use LightGBM for best speed/accuracy trade-off
- Reduce smoothing window for lower latency
- Adjust confidence threshold for your use case
- Use GPU acceleration for MediaPipe (if available)
-
Camera not detected
# Check available cameras python -c "import cv2; print([i for i in range(10) if cv2.VideoCapture(i).isOpened()])"
-
Model not loading
# Check model file exists ls -la models/ # Retrain model make train CLF=svm
-
API connection issues
# Check API health curl http://localhost:8000/health # Check logs docker-compose logs api
-
Performance issues
# Check system resources htop # Monitor API performance curl http://localhost:8000/model/performance
# Enable debug logging
export LOG_LEVEL=DEBUG
# Run with verbose output
python app/api/main.py --log-level debug- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make changes and add tests
- Run quality checks:
make format lint type-check test - Commit changes:
git commit -m "Add feature" - Push to branch:
git push origin feature-name - Create a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- MediaPipe for hand landmark detection
- FastAPI for the API framework
- Streamlit for the web interface
- OpenCV for computer vision operations
- scikit-learn for machine learning
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: Project Wiki