This project aims to enable real-time gesture recognition within video streams for effective sign language interpretation. The Sign Language Action Detection system achieves an impressive accuracy of 92%, utilizing a combination of Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks.
- OpenCV
- Mediapipe
- TensorFlow
- Matplotlib
- Scikit Learn
- CNN
- LSTM
-
Data Collection:
- Utilized OpenCV and the OS module in Python to collect training data samples from the front camera.
- Data samples were labeled and organized by folder name for easy management.
-
Holistic Model:
- Employed Mediapipe's holistic model to draw a mesh around the body in the captured frames.
-
Frame Sequencing:
- For each input, 30 frames were used to form a sequence.
- Three different hand gestures were targeted: "hello," "thanks," and "I love you."
- Each of the frame is stored as a numpy array.
-
Data Splitting:
- Employed scikit-learn to split the data into training and testing sets.
-
Model Architecture:
- Constructed a sequential model with 3 LSTM layers followed by 3 Dense layers.
- The model architecture consists of a total of 596,675 parameters.
-
Training:
- Trained the model for 200 epochs to ensure robust learning.
- Weights were saved for future use.
-
Real-time Prediction:
- Utilized the trained model to make predictions in real-time using OpenCV.
- The real-time prediction enables effective interpretation of hand sign language gestures within video streams.
To use the real-time hand sign language detection model, follow these steps:
- Clone the repository.
- Install the required dependencies (OpenCV, Mediapipe, TensorFlow, Matplotlib).
- Run the provided scripts to capture and process real-time video streams.