Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transcribe audio to text using API #19

Closed
dylman123 opened this issue Apr 26, 2020 · 7 comments
Closed

Transcribe audio to text using API #19

dylman123 opened this issue Apr 26, 2020 · 7 comments
Labels
refactor Fundamentally change code structure

Comments

@dylman123
Copy link
Owner

Going to stick with Google Cloud Speech to Text API for now.

@dylman123 dylman123 added the refactor Fundamentally change code structure label Apr 26, 2020
@dylman123
Copy link
Owner Author

dylman123 commented Apr 26, 2020

In order to integrate Google Speech API to Swift, follow this:
https://github.com/GoogleCloudPlatform/ios-docs-samples/tree/master/speech/Swift/Speech-gRPC-Streaming

In the interest of time, I will first use Rev AI API for development (note: that Rev doesn't do diarization well) https://rapidapi.com/Rev.AI/api/rev-ai

@dylman123
Copy link
Owner Author

dylman123 commented Apr 26, 2020

Either need to:

  1. Learn how to use Rev AI API to accept an upload in the primary API call, rather than a reference to some public URL, or
  2. Swap Rev AI API for Google Speech to Text API (using a similar backbone/approach to what has been set up for Rev AI API)

Either way, will need to learn how to first pass a file object (eg. audio file) in a POST request in Swift.

@dylman123
Copy link
Owner Author

Watch this tutorial on how to upload files via API in Swift:

https://youtu.be/UMgApUhg7ic

@dylman123
Copy link
Owner Author

dylman123 commented Apr 27, 2020

On point 2 above, the answer is to use Google Firebase, a suite of APIs for the purpose of iOS (and by extension I assume MacOS) development. It interfaces with Google Cloud Storage so the original Speech to Text engine will be available for use.

See: https://firebase.google.com/docs/storage

Need to install Firebase with Cocoapods

@dylman123
Copy link
Owner Author

  • Once an audio track is uploaded to Google Cloud Storage via Firebase, the next step will be to process that audio file "serverlessly" in GCP.
  • We will use Cloud Functions to do this: https://cloud.google.com/functions
  • In other words, the plan is to set up an events-based rule that will run code to transcribe audio to text (and possibly even generate captions too) all in the cloud.
  • Finally the text (or captions) will be sent back to the client with some sort of asynchronous GET request to a "transcribed" bucket via Firebase - need to look into this.

@dylman123
Copy link
Owner Author

Will need to write the Functions code in Javascript or Typescript:
https://firebase.google.com/docs/functions/?authuser=0#implementation_paths

@dylman123
Copy link
Owner Author

Done! Using Node JS with Google Firebase Functions! Big win

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactor Fundamentally change code structure
Projects
None yet
Development

No branches or pull requests

1 participant