text-to-asl

A transformer build to take text and display American Sign Language

Summary

Developing a text-to-video transformer for ASL is a challenging and ambitious project, but it has the potential to make a significant impact on the accessibility of communication for the Deaf community. Anyone who wants to help is welcome to.

Strategy

Since sign language is a visual language that relies on movement and facial expressions, a text-to-video transformer would be more suitable for capturing the dynamics of ASL. This would enable the generation of more accurate and natural sign language representations from text input.

To create a text-to-video transformer, roughly:

Data preparation: Use the video datasets (ASL Lexicon Video Dataset, RWTH-BOSTON-104 Database, or others). We will need to preprocess these datasets to extract relevant information such as text labels, video frames, and time information.

Video feature extraction: Use pre-trained models like 3D CNNs (e.g., I3D, C3D, or TSN) to extract spatiotemporal features from the video frames. These features will serve as input for the transformer model.

Text feature extraction: Process the text input using a pre-trained language model, such as BERT or GPT, to obtain the text embeddings. This will help the model understand the context and semantics of the input text.

Transformer architecture: Design a transformer model that takes both video and text features as input and generates video sequences as output. Explore variations of popular transformer architectures, such as the original Transformer, BERT, or GPT, to find a suitable model for the task.

Training and fine-tuning: Train the model on the preprocessed dataset, using an appropriate loss function and optimization strategy. We might need to fine-tune the pre-trained video and text models on the task to achieve better performance.

Evaluation and improvement: Evaluate the model using relevant metrics, such as accuracy, F1-score, or BLEU score, to measure the quality of the generated videos. Use this feedback to iteratively improve the model's architecture, training strategy, or data augmentation techniques.

Training

Read about training here

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
text-to-image		text-to-image
train		train
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

text-to-image

text-to-image

train

train

.gitignore

.gitignore

README.md

README.md

Repository files navigation

text-to-asl

Summary

Strategy

Training

About

Releases

Packages

Languages

container-labs/text-to-asl

Folders and files

Latest commit

History

Repository files navigation

text-to-asl

Summary

Strategy

Training

About

Resources

Stars

Watchers

Forks

Languages