A real-time hand gesture recognition system that uses machine learning to control a ball on screen. Built with Next.js, TensorFlow.js, and MediaPipe.
- Custom Gesture Training: Train your own hand gestures (UP, DOWN, LEFT, RIGHT, FREEZE)
- Real-time Recognition: Low-latency gesture detection using TensorFlow.js
- ML Model: Neural network with enhanced feature extraction (directional angles, displacement vectors)
- Performance Optimized: 10fps video processing, 400ms prediction throttle, model warm-up
- Data Persistence: Auto-save training data to localStorage and server
- Data Optimization: Reduce training samples for better performance
- Clean UI: Minimal developer-friendly design
- Frontend: Next.js 16 (App Router), React 19, TypeScript
- ML/AI: TensorFlow.js, MediaPipe Hand Landmarker
- Styling: CSS with custom properties
- Runtime: Bun (Node.js compatible)
# Clone the repository
git clone https://github.com/yourusername/gesture-works.git
cd gesture-works
# Install dependencies
bun install
# Run development server
bun run devOpen http://localhost:3000 in your browser.
- Visit the homepage - the game loads immediately
- Use your trained gestures to control the ball:
- UP (↑): Move ball upward
- DOWN (↓): Move ball downward
- LEFT (←): Move ball left
- RIGHT (→): Move ball right
- FREEZE (■): Stop ball movement
- Ball wraps around screen edges
- Navigate to
/manageor click "Manage Gestures" in the header - Click a gesture button (UP, DOWN, LEFT, RIGHT, or FREEZE)
- Perform your custom gesture in front of the camera
- System captures 15 samples automatically
- Repeat for all 5 gestures
- Training data auto-saves to both localStorage and server
Quick Start Gestures:
- Point index finger in direction for UP/DOWN/LEFT/RIGHT
- Open palm (all fingers) for FREEZE
Optimize Data: Reduces training samples from 30 to 15 per gesture for better performance
Reset Training: Deletes all training data to start fresh
gesture-works/
├── app/
│ ├── api/
│ │ └── save-gestures/
│ │ └── route.ts # Server-side JSON persistence
│ ├── manage/
│ │ └── page.tsx # Training screen route
│ ├── components.css # Component styles
│ ├── globals.css # Global styles
│ ├── layout.tsx # Root layout
│ └── page.tsx # Game screen (landing)
├── components/
│ ├── GameScreen.tsx # Ball control game
│ ├── GestureSnakeGame.tsx # (deprecated)
│ └── TrainingScreen.tsx # Gesture training UI
├── hooks/
│ ├── useGestureTraining.ts # ML model & training logic
│ └── useHandTracking.ts # MediaPipe hand detection
├── public/
│ ├── default-gestures.json # Persisted training data
│ └── mediapipe/ # MediaPipe WASM files
├── next.config.js
├── package.json
└── tsconfig.json
Feature Extraction:
- 21 hand landmarks (x, y, z coordinates) = 63 base features
- Enhanced with directional angles (sin/cos components)
- Displacement vectors from palm center
- Total: ~80 features per sample
Neural Network:
- Input layer: 63+ features
- Hidden layer 1: 64 units (ReLU)
- Hidden layer 2: 32 units (ReLU)
- Output layer: 5 units (Softmax)
- Training: 30 epochs, Adam optimizer
Performance Optimizations:
- Model warm-up with 3 dummy predictions (WebGL shader compilation)
- Prediction throttle: 400ms (2.5 predictions/sec)
- Video rendering: 10fps (every 6th frame)
- Confidence threshold: 0.3 (30%)
Save training data to server
Request Body:
{
"UP": [[...], [...], ...],
"DOWN": [[...], [...], ...],
"LEFT": [[...], [...], ...],
"RIGHT": [[...], [...], ...],
"FREEZE": [[...], [...], ...]
}Response:
{
"success": true,
"message": "Training data saved successfully"
}Load training data from server
Response:
{
"UP": [[...], [...], ...],
"DOWN": [...],
...
}First 1-2 seconds may be slow due to:
- WebGL shader compilation in TensorFlow.js
- MediaPipe model initialization
Solution: Model warm-up runs 3 dummy predictions during build to pre-compile shaders.
- Throttled predictions (400ms intervals)
- Reduced video frame rate (10fps)
- Cached canvas gradients
- Disabled alpha channel on canvas
- Sample rate logging (10% of predictions)
If default-gestures.json becomes large (>10,000 lines):
- Go to
/manage - Click "Optimize Data"
- Reduces samples from 30 to 15 per gesture
- Maintains accuracy while improving performance
- Chrome/Edge: Full support (recommended)
- Firefox: Full support
- Safari: Requires MediaPipe WASM polyfill
Requirements:
- WebGL 2.0 support
- Camera access permission
- Modern ES6+ JavaScript support
Edit hooks/useGestureTraining.ts:
const SAMPLES_PER_GESTURE = 15; // Training samples
const EPOCHS = 30; // Training epochs
const PREDICTION_THROTTLE_MS = 400; // Prediction intervalEdit components/GameScreen.tsx:
const BALL_SPEED = 5; // Ball movement speed
const PREDICTION_THROTTLE_MS = 400; // Gesture detection rateEdit hooks/useGestureTraining.ts:
const GESTURES = ['UP', 'DOWN', 'LEFT', 'RIGHT', 'FREEZE'];Add new gesture recognition in components/GameScreen.tsx.
bun run dev # Start development server
bun run build # Build for production
bun run start # Start production server
bun run lint # Run ESLintEnable debug logging in hooks/useGestureTraining.ts:
// Increase sample rate from 10% to 100%
if (Math.random() < 1.0) { // Change from 0.1
console.log('🎯 Predictions:', predictions);
}- Grant camera permissions in browser settings
- Check if another app is using the camera
- Try Chrome/Edge for best compatibility
- Ensure all 5 gestures are trained (15 samples each)
- Check camera has good lighting
- Position hand clearly in frame
- Try retraining gestures with exaggerated movements
- Click "Optimize Data" in
/manage - Close other browser tabs
- Check GPU acceleration is enabled in browser
- Reduce
BALL_SPEEDin GameScreen.tsx
- Check browser console for errors
- Verify MediaPipe files exist in
/public/mediapipe/ - Clear browser cache and reload
MIT
- MediaPipe by Google
- TensorFlow.js by Google
- Next.js by Vercel
Built with ❤️ using Next.js and TensorFlow.js

