Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding speech to text adapter for Google cloud platform #167

Merged
merged 6 commits into from
Jul 19, 2019

Conversation

sshniro
Copy link
Contributor

@sshniro sshniro commented Jul 18, 2019

Is your Pull Request request related to another issue in this repository ?
Fix for #152

Describe what the PR does
The PR converts google speech to text response to Draft Js format.

State whether the PR is ready for review or whether it needs extra work
Completed

Additional Context
Google's STT response is similar to IBM's response format, so this PR follows the similar pattern for formatting the text. For example the text is broken into smaller chunks, which mostly resembles a full sentence. There for the content is not broken by punctuation as punctuation is an additional attribute which should be specifically requested from the API.

* @param nanoSecond
* @returns {number}
*/
const computeTimeInSeconds = (startSecond, nanoSecond) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more of a question, I haven't seen the spec of the google STT schema, in a word like

 "startTime": {
                "seconds": "24",
                "nanos": 600000000
              },

I am assuming seconds and nanos need to be combined to get, in this case, the startTime?

Copy link
Contributor Author

@sshniro sshniro Jul 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea @pietrop , GCP is providing seconds in the following format , and if the second starts at exactly 24.00 then it return nothing for nanos attribute. The method is to handle this workflow and to compute the exact time.

Copy link
Contributor

@pietrop pietrop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @sshniro
Good shout breaking using the text grouping(?) from the API, if punctuation is an optional param.

  • I've run the tests and is all good✅
  • I've also tried it locally importing the sample json ✅

Screen Shot 2019-07-19 at 14 28 53

Minor tweaks, I've also left some comments in the code

  • It be good to rename gcp to something more consistent with the other adapters eg google-stt google-cloud-stt, google-cloud-platform etc..
  • in gcpStt.sample.js adding draftJs in the name makes it easier to spot at a glance that it's the draftJs data structure for the tests. eg something like googleSttToDraftJs.sample.js

Other that it's looking good!

sshniro and others added 5 commits July 19, 2019 22:32
Co-Authored-By: Pietro <pietro.passarelli@gmail.com>
Co-Authored-By: Pietro <pietro.passarelli@gmail.com>
Co-Authored-By: Pietro <pietro.passarelli@gmail.com>
@sshniro
Copy link
Contributor Author

sshniro commented Jul 19, 2019

@pietrop I have added the changes requested in the following comment.
#167 (review)

@pietrop pietrop merged commit be93c08 into bbc:master Jul 19, 2019
@pietrop
Copy link
Contributor

pietrop commented Jul 19, 2019

Awesome, thanks @sshniro !

Out of curiosity, what's your use case for this component?

@sshniro
Copy link
Contributor Author

sshniro commented Jul 19, 2019

Hi @pietrop :)

I was inspired by this following paper,
https://gfx.cs.princeton.edu/pubs/Jin_2017_VTI/Jin2017-VoCo-paper.pdf

I have the problem making soo much filler words (Eg , so, and, ehh) during screen cast/ video tutorials. So I wanted to build a opensource editor for voice. Did a basic search but couldn't find an opensource equivalent. So decided to create one. And by doing a initial research I found out Google is pretty good at transcribing audio than the opensource counterparts.

So the idea is to automatically transcribe the video content and let the user to crop/replace words in the editor. The removed text should be automatically removed from the audio content as well.

Replace / Re-arrange a word is a bit easy. The paper talks about speech synthesis by using phoneme and to completely modify the words. If time permits I'm planning to attempt it and see.

@pietrop
Copy link
Contributor

pietrop commented Jul 19, 2019

Very interesting, in a similar domain we are also working on a tool to edit audio/video interviews, at the moment is more around generating rough cuts, rather then removing filler words, but it sounds like there might be some overlap.

https://github.com/bbc/digital-paper-edit-client

you can see the demo here https://bbc.github.io/digital-paper-edit-client

The idea is that

  1. You could create an automatically generated transcript
  2. Correct it if needed using @bbc/react-transcript-editor - transcript correction example in demo
  3. you can then create a programme script, and highlight/annotate your material, and or use text selection to assemble a new programme/story/paper edit - programme script/paper-editing example in demo
  4. and quickly review an audio/video version without needing to export
  5. when done, you can export in an editing software to continue with your edit.

if that makes sense?

@sshniro
Copy link
Contributor Author

sshniro commented Jul 19, 2019

Oh this is so cool ! Thanks for pointing me to this repository. I Will go through the issues and see if I can contribute to some features. As per the issues its more towards AWS I presume?

@pietrop
Copy link
Contributor

pietrop commented Jul 19, 2019

It follows a modular architecture so there's a React client that in theory is not super opinionated about the backend. And the backend can be an API server or wrapped inside an electron app to package for mac, linux, and windows as desktop app. The README Project Architecture section does a better job at describing this. So yeah, larger project, and a variety of different kind of issues/tickets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants