A real-time sign language alphabet recognition system using MediaPipe for hand tracking and machine learning for classification.
This project uses MediaPipe's hand tracking solution to detect hand landmarks and a Random Forest classifier to recognize American Sign Language (ASL) alphabet gestures. The system can detect and classify hand signs in real-time through a webcam.
- Real-time hand tracking using MediaPipe
- Gesture recognition for ASL alphabet (A-Z)
- Data collection tool for creating custom training datasets
- Model training pipeline with evaluation metrics
- Live prediction with confidence scores
- Prediction smoothing for stable results
- Visual feedback with hand landmark overlay
- Python 3.8 or higher
- Webcam/Camera
- Windows/Linux/macOS
-
Clone or navigate to the project directory:
cd "Sign language detection"
-
Install required packages:
pip install -r requirements.txt
The following packages will be installed:
- opencv-python (for video capture and display)
- mediapipe (for hand tracking)
- numpy (for numerical operations)
- scikit-learn (for machine learning)
- matplotlib (for visualization)
- pandas (for data handling)
The system works in three stages: Data Collection β Model Training β Real-time Detection
Run the data collection script to capture hand gestures for each letter:
python collect_data.pyInstructions:
- The script will prompt you to select which letters to collect (A-Z or custom selection)
- Choose how many samples per letter (default: 100)
- For each letter:
- Position your hand in the ASL sign for that letter
- Press SPACE to start collecting samples
- Hold the gesture steady while samples are collected
- Press q to quit current collection if needed
- Data is automatically saved after each letter in
data/sign_language_data.pkl
Tips for good data collection:
- Use good lighting
- Keep your hand clearly visible
- Vary the hand position slightly during collection (different angles, distances)
- Use a plain background if possible
- Collect data from different people for better generalization
After collecting data, train the classifier:
python train_model.pyThe script will:
- Load the collected data
- Display dataset statistics
- Split data into training (80%) and testing (20%) sets
- Train a Random Forest classifier
- Evaluate the model and show accuracy metrics
- Optionally perform cross-validation
- Optionally plot confusion matrix and feature importance
- Save the trained model to
models/sign_language_model.pkl
Expected output:
- Accuracy score
- Classification report (precision, recall, F1-score)
- Confusion matrix visualization (optional)
- Feature importance plot (optional)
Run the detection script to recognize signs in real-time:
python detect_sign.pyControls:
- q: Quit the application
- c: Clear prediction history (useful if predictions get stuck)
Display elements:
- Hand landmarks overlay
- Current predicted letter
- Confidence percentage
- Large letter display in top-right corner
- Color-coded confidence (Green: high, Yellow: medium, Orange: low)
Sign language detection/
βββ collect_data.py # Data collection script
βββ train_model.py # Model training script
βββ detect_sign.py # Real-time detection script
βββ requirements.txt # Python dependencies
βββ README.md # This file
βββ data/ # Training data (created automatically)
β βββ sign_language_data.pkl
βββ models/ # Trained models (created automatically)
βββ sign_language_model.pkl
βββ confusion_matrix.png
βββ feature_importance.png
- MediaPipe detects 21 hand landmarks (fingertips, joints, wrist, etc.)
- Each landmark has 3D coordinates (x, y, z)
- Total: 63 features per hand gesture
- Landmarks are normalized relative to the wrist position
- Scaling is applied to make the features translation and scale-invariant
- This allows the model to recognize signs regardless of hand size or distance from camera
- Random Forest classifier learns patterns from normalized landmarks
- During detection, new gestures are classified into alphabet letters
- Prediction smoothing uses a sliding window to stabilize results
- Webcam feed is processed frame-by-frame
- Hand landmarks are extracted and normalized
- Model predicts the sign with confidence score
- Results are displayed with visual feedback
The model's performance depends on:
- Data quality: More varied and accurate training data = better results
- Number of samples: 100+ samples per letter recommended
- Lighting conditions: Consistent lighting improves accuracy
- Hand positioning: Clear, unobstructed hand view is essential
Expected accuracy with good training data: 85-95%
In collect_data.py and detect_sign.py, modify:
self.hands = mp_hands.Hands(
min_detection_confidence=0.7, # Lower for easier detection
min_tracking_confidence=0.5 # Lower for smoother tracking
)In train_model.py, adjust Random Forest parameters:
self.model = RandomForestClassifier(
n_estimators=100, # Number of trees
max_depth=20, # Maximum tree depth
min_samples_split=5, # Minimum samples to split
min_samples_leaf=2 # Minimum samples per leaf
)In detect_sign.py, modify:
self.history_size = 5 # Number of predictions to consider (increase for more smoothing)- Ensure no other application is using the camera
- Try changing camera index in code:
cv2.VideoCapture(0)βcv2.VideoCapture(1)
- Collect more training data (200+ samples per letter)
- Ensure consistent hand positioning during data collection
- Use better lighting conditions
- Train with data from multiple people
- Improve lighting
- Move hand closer to camera
- Lower
min_detection_confidenceparameter - Ensure hand is fully visible and not obstructed
- Increase
history_sizefor more smoothing - Increase
min_tracking_confidenceparameter - Collect more consistent training data
For proper hand signs, refer to ASL alphabet charts:
- ASL Alphabet Chart
- Practice each letter before collecting data
- Note: Letters J and Z involve motion (this system works best with static gestures)
Improvements and suggestions are welcome! Consider:
- Adding more sign language gestures (words, phrases)
- Implementing two-hand gesture recognition
- Adding different classification models (SVM, Neural Networks)
- Supporting different sign languages (BSL, ISL, etc.)
This project is free to use for educational and personal purposes.
- MediaPipe by Google for hand tracking solution
- Scikit-learn for machine learning tools
- ASL community for gesture references
For issues or questions:
- Check the Troubleshooting section
- Verify all dependencies are installed correctly
- Ensure you have followed all steps in order (collect β train β detect)
Happy Sign Language Detection! π€