Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long running recognize example code for @google-cloud/speech #5140

Open
SorataBaka opened this issue Mar 15, 2024 · 0 comments
Open

Long running recognize example code for @google-cloud/speech #5140

SorataBaka opened this issue Mar 15, 2024 · 0 comments
Assignees
Labels
type: docs Improvement to the documentation for an API.

Comments

@SorataBaka
Copy link

SorataBaka commented Mar 15, 2024

Is your feature request related to a problem? Please describe.
I was trying to use google speech to text api to transcribe a video file. Because the video is longer than 1 minute, i had to use google cloud storage as stated in the documentation. However, the example code provided tries to make the process promise based;

const [operation] = await speechClient.longRunningRecognize(request);
const [response] = await operation.promise();

as in waiting, for the time taking operation to finish before ending the application and giving the transcription.

From what i could understand, a long running operation is basically submitting a transcription request and then receiving an operation id where we can repeatedly check if the operation is completed or not. The process of checking whether the operation is completed or not is where the main problem of this feature request lies.

The google cloud SDK documentation mentioned a getOperation() function that takes in the name/id of the operation. However, the resulting transcribing process only returns a buffer array that is supposed to be converted into JSON so that the result can be viewed.

[
  {
    name: '1189218350739101011',
    metadata: {
      type_url: 'type.googleapis.com/google.cloud.speech.v1p1beta1.LongRunningRecognizeMetadata',
      value: <Buffer 08 64 12 0c 08 c2 dd ce af 06 10 d0 b7 91 e2 01 1a 0b 08 c4 dd ce af 06 10 a8 b8 84 1c 22 32 67 73 3a 2f 2f 63 6c 6f 75 64 2d 73 61 6d 70 6c 65 73 2d ... 31 more bytes>
     //Buffer array here
    },
    done: true,
    response: {
      type_url: 'type.googleapis.com/google.cloud.speech.v1p1beta1.LongRunningRecognizeResponse',
      value: <Buffer 12 38 0a 25 0a 1e 68 6f 77 20 6f 6c 64 20 69 73 20 74 68 65 20 42 72 6f 6f 6b 6c 79 6e 20 42 72 69 64 67 65 15 4b 92 7b 3f 22 08 08 01 10 80 89 95 ef ... 22 more bytes>
     //Buffer array here
    },
    result: 'response'
  },
  null,
  null
]

This was not clear in the SDK documentation. a getJSON() function is supposedly included within the operation result to convert the buffer into a readable JSON but the function is not yet implemented in the @google-cloud/speech module; causing hours of scouring the documentation for the correct way to do it, finding nothing.
image
https://googleapis.dev/nodejs/speech/latest/google.longrunning.Operations.html#getOperation1
https://googleapis.dev/nodejs/speech/latest/google.longrunning.Operation.html

      const current_result = await speechClient.getOperation({
        name: operation_name
      })
      const json_result = current_result.toJSON()
      console.log(json_result)
node .\generated\v1p1beta1\speech.long_running_recognize.js
current_result.toJSON is not a function

I eventually found the correct way by using the checkLongRunningRecognizeProgress() function that returns the correct type of readable data. The use of this function is not mentioned anywhere in the samples; and I would love to contribute a sample code for this repository.

Describe the solution you'd like
I would like to make a new sample that utilizes both the longRunningRecognize() function as well as checkLongRunningRecognizeProgress() function. The code will be a loop that repeatedly checks for completion while informing the user of its current state instead of halting process until the operation is finished.

Describe alternatives you've considered
An alternative would be modifying the existing long running operation example to not use promises.

context
package version

"@google-cloud/speech": "^6.3.0",

node version

node -v
v18.18.0

Speech Client initialization

const speechClient = new speech.SpeechClient({
  credentials: speech_key,
});
@sofisl sofisl added the type: docs Improvement to the documentation for an API. label Mar 15, 2024
@sofisl sofisl self-assigned this Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: docs Improvement to the documentation for an API.
Projects
None yet
Development

No branches or pull requests

2 participants