InfiniteStreaming with soft handover. #8250

bburli · 2023-06-11T16:35:11Z

Context

This feature suggestion is inspired from "Soft Handover" that happens in mobile towers. For details: https://en.wikipedia.org/wiki/Handover#Types

Basically, with a 5 min streaming limit for Google STT, there is an example provided in the repo which continues to stream audio by opening a new stream and resending audio by calculating the audio packets to be sent from last final transcript's end time. Please correct me if my understanding is incorrect.

This uses what is referred to as "break-and-make" (As mentioned in the wiki article above). This has some problems:

When we can't establish a new stream (for whatever reason), there is unavailability.
It's highly coupled with word timings which may not be supported by all ASRs. I understand that's not very relevant here but to an application that uses Google along with some other ASRs (For ex, specifically trained on corpus), this is critical.
Calculation of audio packets to be sent is dependent on encoding and sampling. While this is acceptable as a specification in some cases it might be a limitation.

Alternative:

I tried to work with a "Soft handover" method where we open another stream early (Say, 1 min before) and send audio to both streams until the transcripts align. When they do, we switch to new stream.

While this has its own problems, I believe this gives better control and is not tied to word timings.
I wanted to get feedback on the same. Attaching the code file (in java) here. For testing purposes, I have kept the STREAMING_LIMIT as 30 seconds.

I am aware of some areas where I can sharpen this more, but I am looking for concrete and major concerns from experts or authors.

Thanks in advance!

InfiniteStreamRecognizeSoftHandOver.java.txt

The text was updated successfully, but these errors were encountered:

minherz · 2023-06-11T19:24:19Z

Hi @bburli , thank you for your application. If I get it right, you are saying that the code sample in InfiniteStreamRecognize.java lacks some functionality that includes:

Error handling (e.g. on failure of establishing a new stream)
Universality (e.g. ability to support a variety of input streams or composition of the audio clip that is sent for recognition)

Please, note that the samples in this repo demonstrate an opinionated way of using Google APIs. The samples usually require additional work to include all functionality necessary to use the code in production. For example, the error handling is very basic and does not demonstrate exponential backoff error handling technique and/or other error handling practices that aren't necessary related to demonstration of the specific Google API.

If you find this functionality important and consider to contribute to the collection of samples, we are welcome contribution to the code samples.

bburli · 2023-06-12T04:11:09Z

@minherz I agree with the nature of the repo and I am aware these samples are not to be used in production as is but would require work on operational concerns.

With that said, I do think that the resetting stream because of the 5 minute timeout is fundamentally an error handling problem and since most consumers use the Streaming Speech API in real-time, I think it's essential to provide this example as it indicates the only other alternative approach for real-time switching of streams that I could think of.

I would be happy to bring in a PR for this to the repo and tag you for review. Please do suggest any other reviewers. Regardless of whether this goes into the repo (for any reason whatsoever) I would welcome feedback of any kind.

I will keep this open until PR is up.

- Adding a sample for Soft Handover in stream switching. Please refer GoogleCloudPlatform#8250 for issue background.

anguillanneuf · 2023-06-14T16:08:42Z

Closing. See PR comment from me for context.

@bburli posted his code sample and the discussion at https://www.googlecloudcommunity.com/gc/AI-ML/Soft-Handover-in-Infinite-streaming/m-p/602877/thread-id/2153. thanks Badari.

bburli added priority: p3 Desirable enhancement or fix. May not be included in next release. triage me I really want to be triaged. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. labels Jun 11, 2023

product-auto-label bot added the samples Issues that are directly related to samples. label Jun 11, 2023

blunderbuss-gcf bot assigned minherz Jun 11, 2023

minherz added type: question Request for information or clarification. Not an issue. and removed type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. labels Jun 11, 2023

minherz assigned bburli and unassigned minherz Jun 11, 2023

bburli added a commit to bburli/java-docs-samples that referenced this issue Jun 12, 2023

Create InfiniteStreamRecognizeSoftHandover.java

d18d228

- Adding a sample for Soft Handover in stream switching. Please refer GoogleCloudPlatform#8250 for issue background.

bburli mentioned this issue Jun 12, 2023

Create InfiniteStreamRecognizeSoftHandover.java #8251

Closed

11 tasks

anguillanneuf closed this as completed Jun 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InfiniteStreaming with soft handover. #8250

InfiniteStreaming with soft handover. #8250

bburli commented Jun 11, 2023 •

edited

Loading

minherz commented Jun 11, 2023

bburli commented Jun 12, 2023

anguillanneuf commented Jun 14, 2023 •

edited

Loading

InfiniteStreaming with soft handover. #8250

InfiniteStreaming with soft handover. #8250

Comments

bburli commented Jun 11, 2023 • edited Loading

Context

Alternative:

minherz commented Jun 11, 2023

bburli commented Jun 12, 2023

anguillanneuf commented Jun 14, 2023 • edited Loading

bburli commented Jun 11, 2023 •

edited

Loading

anguillanneuf commented Jun 14, 2023 •

edited

Loading