# An A-Z of Google Cloud Speech-to-Text

Welcome to my quick guide to audio file transcription. If you are a beginner in th python or cloud world, you are probably familiar with how frustrating it can be to find the appropriate information to build your application in python. I have written this with you in mind. I hope you will find this useful for your first speech-to-text project.

<p> To get started you will need the following:<br>
1) Jupyter Notebook<br>
2)  <a href="https://cloud.google.com/" target="_blank">A Google Cloud Account</a> - If you are new at this, start with the free account. After creating the account, go to your <a href="https://cloud.google.com/speech-to-text/docs/quickstart-client-libraries?hl=en_US" target="_blank">Google Cloud Console</a> and set up a project and install the client library.</p>

## About Audio Files
<p>
- Google Cloud's speech-to-text API accepts various audio file formats. However, I found that my .m4a files could not be processed so I converted them to .wav. You should preferably save your files in .wav<br>
- Audio files can be transcribed directly from a local folder or when uploaded into a <a href="https://console.cloud.google.com/storage" target="_blank">bucket</a>. (You will need to create a bucket first)<br>
- I also found that it was impossible to transcribe long audio files using a local file on the free account. So, you are better off uploading into a bucket. Although, there is no harm in trying to transcribe a local file first. But just in case, I have included programs for the two different scenarios.

In [1]:
#(1) Import all the modules you will need
from google.cloud import speech_v1p1beta1
from google.cloud.speech_v1p1beta1 import enums
from google.cloud import speech_v1
from google.cloud.speech_v1 import enums
import io
import os

In [8]:
#(2) Check your current working directory path and verify that your project .json file is there.
os.getcwd()

'/Users/ayandadube/Dropbox/Python Real World Applications/Class of 2000 Legacy Meetings'

In [10]:
ls 

29_Aug_pt1.wav
29_Aug_pt2.wav
29_August_2020_Zoom_Call_Pt1.ipynb
An A-Z of Google Cloud Speech-to-Text.ipynb
[34mAug 29 Zoom Call[m[m/
My First Project-04d24feaec4b.json


## Setting up your GOOGLE_APPLICATION_CREDENTIALS environment variable in Jupyter

In [30]:
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "your-json- file-path-with-filename.json"
#eg. '/Users/ayandadube/Dropbox/Python Real World Applications/Class of 2000 Legacy Meetings/My First Project-04d24feaec4b.json'

# Transcribing files
Google Cloud provides you with <a href="https://cloud.google.com/speech-to-text/docs/async-recognize?hl=en_US" target="_blank">code</a> in various languages including Python, C#, Java, Node.js and many others. Below I have set up the programs for transcription using local files and Google Cloud Storage Files. Personally, I prefer transcribing using Google Cloud Storage Files. It generally seems less error-prone.

## Transcribing long audio files using a local file

In [37]:
def speech_recognize(audio_file):
    """
    Performs synchronous speech recognition on an audio file
    Args:
      audio_file URI for audio file.
      you can define the audio file inside the function or outside it. My opinion - define it outside the function since you are likely to transcribe different files.
    audio_file = "/Users/ayandadube/Dropbox/Python Real World Applications/Class of 2000 Legacy Meetings/29_Aug_pt1.wav"
    """

    client = speech_v1p1beta1.SpeechClient()
   
    # The language of the supplied audio
    language_code = "en-US"

    # Sample rate in Hertz of the audio data sent
    sample_rate_hertz = 44100

    # Encoding of audio data sent. This sample sets this explicitly.
    # This field is optional for FLAC and WAV audio formats.
    encoding = enums.RecognitionConfig.AudioEncoding.LINEAR16
    config = {
        "language_code": language_code,
        "sample_rate_hertz": sample_rate_hertz,
        "encoding": encoding,
    }
    
    with io.open(audio_file, "rb") as f:
        content = f.read()
        
    audio = {"content": content}
    
    operation = client.long_running_recognize(config, audio)

    print(u"Waiting for operation to complete...")
    response = operation.result()

    for result in response.results:
        # First alternative is the most probable result
        alternative = result.alternatives[0]
        print(u"Transcript: {}".format(alternative.transcript))


In [None]:
audio_file = "/Users/ayandadube/Dropbox/Python Real World Applications/Class of 2000 Legacy Meetings/29_Aug_pt1.wav"

In [None]:
speech_recognize(audio_file)

## Transcribing long audio files using Google Cloud Storage file

In [42]:
def speech_recognize(storage_uri):
    """
    Transcribe long audio file from Cloud Storage using asynchronous speech
    recognition

    Args:
      storage_uri URI for audio file in Cloud Storage, e.g. gs://[BUCKET]/[FILE]
    """

    client = speech_v1.SpeechClient()

    # storage_uri = 'gs://cloud-samples-data/speech/brooklyn_bridge.raw'

    # Sample rate in Hertz of the audio data sent
    sample_rate_hertz = 32000

    # The language of the supplied audio
    language_code = "en-US"

    # Encoding of audio data sent. This sample sets this explicitly.
    # This field is optional for FLAC and WAV audio formats.
    encoding = enums.RecognitionConfig.AudioEncoding.LINEAR16
    config = {
        "sample_rate_hertz": sample_rate_hertz,
        "language_code": language_code,
        "encoding": encoding,
    }
    audio = {"uri": storage_uri}

    operation = client.long_running_recognize(config, audio)

    print(u"Waiting for operation to complete...")
    response = operation.result()

    for result in response.results:
        # First alternative is the most probable result
        alternative = result.alternatives[0]
        print(u"Transcript: {}".format(alternative.transcript))

In [40]:
storage_uri = 'gs://op-2000-legacy/29_Aug_pt1.wav'

In [None]:
speech_recognize(storage_uri)