Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/Speech to text transcription #495

Merged
merged 6 commits into from
Jun 11, 2020

Conversation

c-w
Copy link
Member

@c-w c-w commented Dec 12, 2019

This pull request is based on the work of @tayciryahmed in #121 and implements speech-to-text transcription in Doccano.

To keep things simple, the implementation for now uses html5 audio instead of something more sophisticated like wavesurfer. This can be improved in the future if the requirement arises. The alt+p keyboard shortcut has been introduced to play/pause the audio player.

Animation showing speech to text transcription

For ease-of-use, speech-to-text data can be imported either by posting audio files (MP3, WAV, etc.) or by uploading a JSONL manifest that encodes the audio as data URIs or URLs to the audio files.

To make it easier to identify and distinguish audio files, the document left-navigation has been updated to display a file name (instead of file content) if the meta.filename attribute is set.

Resolves #95

@harmw
Copy link
Contributor

harmw commented Jun 8, 2020

nice!

First question that comes to mind is (glimpsed through the code - couldn't find it), does this allow for pre-transcribed text to be added as part of the upload?
I'm thinking of having a pipeline that does the transcription, and we can utilise doccano to do the (human-in-the-loop) validation/addition of what our models came up with (a visual representation of the P-value would probably be part of that too).

@Hironsan Hironsan merged commit 60cb341 into doccano:master Jun 11, 2020
@c-w c-w deleted the feature/speech-to-text branch June 12, 2020 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WIP : Annotation of audios for Speech-to-Text parallel data set building
3 participants