Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InfiniteStreaming with soft handover. #8250

Closed
bburli opened this issue Jun 11, 2023 · 3 comments
Closed

InfiniteStreaming with soft handover. #8250

bburli opened this issue Jun 11, 2023 · 3 comments
Assignees
Labels
priority: p3 Desirable enhancement or fix. May not be included in next release. samples Issues that are directly related to samples. triage me I really want to be triaged. type: question Request for information or clarification. Not an issue.

Comments

@bburli
Copy link

bburli commented Jun 11, 2023

Context

This feature suggestion is inspired from "Soft Handover" that happens in mobile towers. For details: https://en.wikipedia.org/wiki/Handover#Types

Basically, with a 5 min streaming limit for Google STT, there is an example provided in the repo which continues to stream audio by opening a new stream and resending audio by calculating the audio packets to be sent from last final transcript's end time. Please correct me if my understanding is incorrect.

This uses what is referred to as "break-and-make" (As mentioned in the wiki article above). This has some problems:

  1. When we can't establish a new stream (for whatever reason), there is unavailability.
  2. It's highly coupled with word timings which may not be supported by all ASRs. I understand that's not very relevant here but to an application that uses Google along with some other ASRs (For ex, specifically trained on corpus), this is critical.
  3. Calculation of audio packets to be sent is dependent on encoding and sampling. While this is acceptable as a specification in some cases it might be a limitation.

Alternative:

I tried to work with a "Soft handover" method where we open another stream early (Say, 1 min before) and send audio to both streams until the transcripts align. When they do, we switch to new stream.

While this has its own problems, I believe this gives better control and is not tied to word timings.
I wanted to get feedback on the same. Attaching the code file (in java) here. For testing purposes, I have kept the STREAMING_LIMIT as 30 seconds.

I am aware of some areas where I can sharpen this more, but I am looking for concrete and major concerns from experts or authors.

Thanks in advance!

InfiniteStreamRecognizeSoftHandOver.java.txt

@bburli bburli added priority: p3 Desirable enhancement or fix. May not be included in next release. triage me I really want to be triaged. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. labels Jun 11, 2023
@product-auto-label product-auto-label bot added the samples Issues that are directly related to samples. label Jun 11, 2023
@minherz minherz added type: question Request for information or clarification. Not an issue. and removed type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. labels Jun 11, 2023
@minherz
Copy link
Contributor

minherz commented Jun 11, 2023

Hi @bburli , thank you for your application. If I get it right, you are saying that the code sample in InfiniteStreamRecognize.java lacks some functionality that includes:

  • Error handling (e.g. on failure of establishing a new stream)
  • Universality (e.g. ability to support a variety of input streams or composition of the audio clip that is sent for recognition)

Please, note that the samples in this repo demonstrate an opinionated way of using Google APIs. The samples usually require additional work to include all functionality necessary to use the code in production. For example, the error handling is very basic and does not demonstrate exponential backoff error handling technique and/or other error handling practices that aren't necessary related to demonstration of the specific Google API.

If you find this functionality important and consider to contribute to the collection of samples, we are welcome contribution to the code samples.

@minherz minherz assigned bburli and unassigned minherz Jun 11, 2023
@bburli
Copy link
Author

bburli commented Jun 12, 2023

@minherz I agree with the nature of the repo and I am aware these samples are not to be used in production as is but would require work on operational concerns.

With that said, I do think that the resetting stream because of the 5 minute timeout is fundamentally an error handling problem and since most consumers use the Streaming Speech API in real-time, I think it's essential to provide this example as it indicates the only other alternative approach for real-time switching of streams that I could think of.

I would be happy to bring in a PR for this to the repo and tag you for review. Please do suggest any other reviewers. Regardless of whether this goes into the repo (for any reason whatsoever) I would welcome feedback of any kind.

I will keep this open until PR is up.

bburli added a commit to bburli/java-docs-samples that referenced this issue Jun 12, 2023
- Adding a sample for Soft Handover in stream switching.

Please refer GoogleCloudPlatform#8250 for issue background.
@anguillanneuf
Copy link
Member

anguillanneuf commented Jun 14, 2023

Closing. See PR comment from me for context.

@bburli posted his code sample and the discussion at https://www.googlecloudcommunity.com/gc/AI-ML/Soft-Handover-in-Infinite-streaming/m-p/602877/thread-id/2153. thanks Badari.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p3 Desirable enhancement or fix. May not be included in next release. samples Issues that are directly related to samples. triage me I really want to be triaged. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants