Skip to content

catapulta/attention-speech-recognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automatic Speech Recognition (speech-to-text)

Implementation based on Listen, Attend and Spell

The Listener (encoder) is a pyramidal recurrent network encoder that accepts filter bank spectra as inputs. The Speller (decoder) is an attention-based recurrent network decoder that emits characters as outputs. The network produces character sequences without making any independence assumptions between the characters.

las

Training objective: Predict the next phoneme in the sequence given the corresponding utterances (voice recordings) and transcripts.

Trained on the WSJ0 dataset

About

Speech to text using Attention

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages