SignTales: Storytelling for Hearing Impaired Children Using LSTM-based Temporal Model

Overview

SignTales is a storytelling system designed to enhance communication accessibility for deaf and hard-of-hearing children. It converts English text into stick figure animations representing American Sign Language (ASL) gestures. By leveraging Natural Language Processing (NLP) and deep learning (Seq2Seq LSTM models), the system generates smooth, temporally consistent pose animations that bring stories to life in sign language.

This project bridges the gap between textual storytelling and visual sign language, providing an educational and engaging tool for children.

Features

Text-to-Gloss Conversion: Uses SpaCy NLP pipeline to preprocess text and generate gloss sequences.
Gloss-to-Pose Translation: Employs a Seq2Seq model with BiLSTM encoder and LSTM decoder, enhanced by Bahdanau attention.
Stick Figure Animation Rendering: Real-time visualization in both 2D (Canvas) and 3D (Three.js).
Custom ASL Dataset: 572 gesture videos mapped to gloss tokens for training.
Web-Deployable: Lightweight, accessible, and adaptable to any text input within the trained vocabulary.

Methodology

The system follows a seven-stage pipeline:

Data Collection & Preparation
- ASL gesture dataset (572 videos)
- Domain-specific text corpus for storytelling
Text Preprocessing & Gloss Generation
- Tokenization, stopword removal, lemmatization
- ASL-style grammar ordering (Time → Subject → Noun → Verb → Object → Negation)
Video Data Preprocessing
- Frame extraction via OpenCV
- Landmark detection using MediaPipe Holistic
- Normalization, smoothing, and NPZ conversion
Model Training
- Seq2Seq BiLSTM encoder + LSTM decoder
- Attention mechanism for gloss alignment
- Teacher forcing with scheduled reduction
- Multi-component loss (position, velocity, acceleration, jerk, stop)
Model Tuning
- Optimized hyperparameters (hidden sizes, dropout, learning rates)
- Stop condition tuning for realistic animations
Model Integration
- REST API backend for inference
- Normalized joint coordinates for rendering
Pose Animation Rendering
- Dual rendering modes (2D & 3D)
- Real-time playback with user controls

Results

Final Generation Error: 0.1834
Velocity Error: 0.0674
Acceleration Error: 0.0578
Training Loss: 0.1080
Validation Loss: 0.0494

The system produces smooth, temporally consistent animations suitable for storytelling, outperforming static-image or pre-recorded video approaches.

Applications

Educational Tools: Interactive storytelling for deaf children.
Accessibility Solutions: Bridging communication gaps between signers and non-signers.
Research & Development: Foundation for future sign language animation systems.

Limitations

Vocabulary limited to 71 glosses.
No facial expression generation (critical for full ASL grammar).
Stick figure rendering lacks realism compared to full avatars.

Future Work

Expand gesture dataset for broader vocabulary.
Integrate facial landmark generation.
Replace stick figures with expressive 3D avatars (e.g., Blender-based).
Extend support to Nepali Sign Language (NSL) and other sign languages.

Tech Stack

Languages: Python
Libraries: SpaCy, Pandas, OpenCV, MediaPipe, PyTorch, NumPy
Rendering: Three.js (3D), Canvas (2D)
Model: Seq2Seq BiLSTM Encoder + LSTM Decoder with Attention

👩‍💻 Authors

Bidisha Amatya
Prasanna Shakya
Prinska Maharjan
Sajal Maharjan

Affiliation: Himalaya College of Engineering, Tribhuvan University, Nepal

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
backend		backend
ml		ml
signtale-frontend		signtale-frontend
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SignTales: Storytelling for Hearing Impaired Children Using LSTM-based Temporal Model

Overview

Features

Methodology

Results

Applications

Limitations

Future Work

Tech Stack

👩‍💻 Authors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SignTales: Storytelling for Hearing Impaired Children Using LSTM-based Temporal Model

Overview

Features

Methodology

Results

Applications

Limitations

Future Work

Tech Stack

👩‍💻 Authors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages