Skip to content
This repository has been archived by the owner on Mar 7, 2024. It is now read-only.

Add Support for Google Cloud Speech-To-Text v2 in mod_google_transcribe #164

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

entenschnabel
Copy link
Contributor

This PR addresses #149 and offers support for the v2 version of the Speech-To-Text library whilst still supporting v1 simultaneously. The default behaviour is to use the v1 version of the library where everything works identically to the way it did in the previous version. In order to use v2 the FreeSWITCH variable GOOGLE_SPEECH_CLOUD_SERVICES_VERSION must be set to the value "v2". Setting it to "v1" or not setting it at all results in the default behaviour.

If the variable is used then it is essential to provide a so called recognizer parent path in the GOOGLE_SPEECH_RECOGNIZER_PARENT FreeSWITCH variable. Failure to do so will result in a failure to construct the GStreamer class. Recognizers allow commonly used streaming recognition parameters to be stored in the cloud. These stored values can be overridden with parameters passed at runtime but it is essential to provide a recognizer to v2 streaming recognition invocations. If you happen to have already created a recognizer in your Google Cloud account its id can be passed using the GOOGLE_SPEECH_RECOGNIZER_ID variable. If this is not set then mod_google_transcribe will just use the so called wildcard recognizer id ( the "_" character) and a recognizer will be created on the fly and not stored for future use. Note that even if a persistent recognizer is not required, it is always necessary to provide at least the parent id of the recognizer in GOOGLE_SPEECH_RECOGNIZER_PARENT, otherwise even the wildcard recognizer cannot be created. This parent id is a path string which consists of the google cloud project id which was used to create the google credentials file used, and a geographical location. For more details about recognizers, see https://cloud.google.com/speech-to-text/v2/docs/recognizers

As long as GOOGLE_SPEECH_CLOUD_SERVICES_VERSION is set to "v2" and GOOGLE_SPEECH_RECOGNIZER_PARENT is also set to a valid recognizer parent id then the "v2" library will be used and calls to uuid_google_transcribe should function as it did previously and any configuration parameters provided at runtime will override anything already defined in a predefined recognizer.

Differences between v1 and v2

There are sure to be many more differences but these are the main things I found so far.

Some Notes on the Code and Building

To avoid code duplication we placed 'v1 specific code in google_glue_v1.cpp and the v2 specific stuff in google_glue_v2.cpp. Generic code used by both libraries now resides in generic_google_glue.h. We use our own docker image to build the drachtio modules but our make file is based on this one:
https://github.com/drachtio/docker-drachtio-freeswitch-base/blob/main/files/Makefile.am.extra
In order to compile and link the v2 stuff we had to add the following lines to the nodist_libfreeswitch_libgoogleapis_la_SOURCES assignment:

libs/googleapis/gens/google/api/policy.pb.cc \
libs/googleapis/gens/google/cloud/speech/v1/resource.pb.cc \
libs/googleapis/gens/google/cloud/speech/v1/resource.grpc.pb.cc \
libs/googleapis/gens/google/cloud/speech/v2/cloud_speech.pb.cc \
libs/googleapis/gens/google/cloud/speech/v2/cloud_speech.grpc.pb.cc \

If you don't do this, you'll most likely get some problems linking.

That's all I can think of for now. It would be really great if you also find this useful and we manage to get it merged. I am of course available for questions.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant