A real-time American Sign Language (ASL) alphabet recognition system using computer vision and deep learning. The system uses MediaPipe for hand landmark detection and a custom neural network for gesture classification, achieving 98.46% accuracy on test data.
- Real-time ASL recognition - Recognizes 26 ASL alphabet letters from live webcam input
- High accuracy - 98.46% test accuracy with robust hand landmark extraction
- Efficient processing - Processes video at 15-20 FPS with sub-100ms latency
- Confidence scores - Displays prediction confidence for each letter
- Visual feedback - Shows hand landmarks and predictions overlaid on video
- Python 3.x
- PyTorch - Neural network framework
- MediaPipe - Hand landmark detection
- OpenCV - Webcam capture and image processing
- NumPy - Data processing
- scikit-learn - Data splitting
- Dataset: 78,000+ ASL alphabet images
- Training samples: 60,576 (after quality filtering)
- Test accuracy: 98.46%
- Validation accuracy: 98.39%
- Model architecture: 3-layer feedforward neural network (63 → 128 → 64 → 26)
- Clone the repository
git clone https://github.com/Aassi1/asl-translator.git
cd asl-translator- Create virtual environment
python -m venv venv
venv\Scripts\activate # Windows
# source venv/bin/activate # Mac/Linux- Install dependencies
pip install -r requirements.txtpython src/train_model.pyThis will:
- Load and preprocess the dataset
- Train the neural network for 50 epochs
- Save the trained model to
models/asl_model.pth
python src/realtime_inference.py- Show your hand to the webcam and make ASL letters
- The predicted letter and confidence score will appear on screen
- Press 'q' to quit
asl-translator/
├── data/
│ ├── asl_alphabet_train/ # Training dataset
│ ├── landmarks.npy # Preprocessed hand landmarks
│ └── labels.npy # Corresponding labels
├── models/
│ └── asl_model.pth # Trained model weights
├── src/
│ ├── preprocess_dataset.py # Dataset preprocessing pipeline
│ ├── train_model.py # Model training script
│ └── realtime_inference.py # Real-time webcam demo
├── notebooks/ # Exploration notebooks
├── requirements.txt
└── README.md
- Hand Detection: MediaPipe detects hands in webcam frames and extracts 21 landmark points
- Feature Extraction: Landmarks are converted to 63-dimensional vectors (21 points × 3 coordinates)
- Classification: Neural network predicts which of 26 letters is being signed
- Display: Prediction and confidence score are shown on the video feed
- Input layer: 63 features (hand landmark coordinates)
- Hidden layer 1: 128 neurons + ReLU + Dropout(0.3)
- Hidden layer 2: 64 neurons + ReLU + Dropout(0.3)
- Output layer: 26 neurons (one per letter)
- Loss function: Cross-entropy
- Optimizer: Adam (lr=0.001)
- Expand to recognize full words and phrases (dynamic signs)
- Add support for numbers and common words
- Implement sentence formation with word spacing detection
- Deploy as web application
- Support for multiple sign languages
See requirements.txt for full dependencies. Main requirements:
- torch>=2.0.0
- opencv-python>=4.8.0
- mediapipe>=0.10.0
- numpy>=1.24.0
- scikit-learn>=1.3.0
MIT License
- ASL Alphabet dataset from Kaggle
- MediaPipe by Google for hand tracking
- PyTorch for deep learning framework