Skip to content

Aassi1/ASL-Translate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ASL Real-Time Translation System

A real-time American Sign Language (ASL) alphabet recognition system using computer vision and deep learning. The system uses MediaPipe for hand landmark detection and a custom neural network for gesture classification, achieving 98.46% accuracy on test data.

Features

  • Real-time ASL recognition - Recognizes 26 ASL alphabet letters from live webcam input
  • High accuracy - 98.46% test accuracy with robust hand landmark extraction
  • Efficient processing - Processes video at 15-20 FPS with sub-100ms latency
  • Confidence scores - Displays prediction confidence for each letter
  • Visual feedback - Shows hand landmarks and predictions overlaid on video

Technical Stack

  • Python 3.x
  • PyTorch - Neural network framework
  • MediaPipe - Hand landmark detection
  • OpenCV - Webcam capture and image processing
  • NumPy - Data processing
  • scikit-learn - Data splitting

Results

  • Dataset: 78,000+ ASL alphabet images
  • Training samples: 60,576 (after quality filtering)
  • Test accuracy: 98.46%
  • Validation accuracy: 98.39%
  • Model architecture: 3-layer feedforward neural network (63 → 128 → 64 → 26)

Installation

  1. Clone the repository
git clone https://github.com/Aassi1/asl-translator.git
cd asl-translator
  1. Create virtual environment
python -m venv venv
venv\Scripts\activate  # Windows
# source venv/bin/activate  # Mac/Linux
  1. Install dependencies
pip install -r requirements.txt

Usage

Training the Model

python src/train_model.py

This will:

  • Load and preprocess the dataset
  • Train the neural network for 50 epochs
  • Save the trained model to models/asl_model.pth

Running Real-Time Recognition

python src/realtime_inference.py
  • Show your hand to the webcam and make ASL letters
  • The predicted letter and confidence score will appear on screen
  • Press 'q' to quit

Project Structure

asl-translator/
├── data/
│   ├── asl_alphabet_train/     # Training dataset
│   ├── landmarks.npy           # Preprocessed hand landmarks
│   └── labels.npy              # Corresponding labels
├── models/
│   └── asl_model.pth           # Trained model weights
├── src/
│   ├── preprocess_dataset.py  # Dataset preprocessing pipeline
│   ├── train_model.py          # Model training script
│   └── realtime_inference.py  # Real-time webcam demo
├── notebooks/                  # Exploration notebooks
├── requirements.txt
└── README.md

How It Works

  1. Hand Detection: MediaPipe detects hands in webcam frames and extracts 21 landmark points
  2. Feature Extraction: Landmarks are converted to 63-dimensional vectors (21 points × 3 coordinates)
  3. Classification: Neural network predicts which of 26 letters is being signed
  4. Display: Prediction and confidence score are shown on the video feed

Model Architecture

  • Input layer: 63 features (hand landmark coordinates)
  • Hidden layer 1: 128 neurons + ReLU + Dropout(0.3)
  • Hidden layer 2: 64 neurons + ReLU + Dropout(0.3)
  • Output layer: 26 neurons (one per letter)
  • Loss function: Cross-entropy
  • Optimizer: Adam (lr=0.001)

Future Improvements

  • Expand to recognize full words and phrases (dynamic signs)
  • Add support for numbers and common words
  • Implement sentence formation with word spacing detection
  • Deploy as web application
  • Support for multiple sign languages

Requirements

See requirements.txt for full dependencies. Main requirements:

  • torch>=2.0.0
  • opencv-python>=4.8.0
  • mediapipe>=0.10.0
  • numpy>=1.24.0
  • scikit-learn>=1.3.0

License

MIT License

Acknowledgments

  • ASL Alphabet dataset from Kaggle
  • MediaPipe by Google for hand tracking
  • PyTorch for deep learning framework

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages