Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/Speech to text transcription #495

merged 6 commits into from Jun 11, 2020


Copy link

@c-w c-w commented Dec 12, 2019

This pull request is based on the work of @tayciryahmed in #121 and implements speech-to-text transcription in Doccano.

To keep things simple, the implementation for now uses html5 audio instead of something more sophisticated like wavesurfer. This can be improved in the future if the requirement arises. The alt+p keyboard shortcut has been introduced to play/pause the audio player.

Animation showing speech to text transcription

For ease-of-use, speech-to-text data can be imported either by posting audio files (MP3, WAV, etc.) or by uploading a JSONL manifest that encodes the audio as data URIs or URLs to the audio files.

To make it easier to identify and distinguish audio files, the document left-navigation has been updated to display a file name (instead of file content) if the meta.filename attribute is set.

Resolves #95

@c-w c-w force-pushed the feature/speech-to-text branch from c4f6fb2 to 21e1dc9 Compare Dec 12, 2019
@Hironsan Hironsan added this to To do in v1.7.0 Mar 31, 2020
Copy link

@harmw harmw commented Jun 8, 2020


First question that comes to mind is (glimpsed through the code - couldn't find it), does this allow for pre-transcribed text to be added as part of the upload?
I'm thinking of having a pipeline that does the transcription, and we can utilise doccano to do the (human-in-the-loop) validation/addition of what our models came up with (a visual representation of the P-value would probably be part of that too).

@Hironsan Hironsan merged commit 60cb341 into doccano:master Jun 11, 2020
2 of 3 checks passed
v1.7.0 automation moved this from To do to Done Jun 11, 2020
@c-w c-w deleted the feature/speech-to-text branch Jun 12, 2020
@Hironsan Hironsan removed this from Done in v1.7.0 Mar 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet

Successfully merging this pull request may close these issues.

3 participants