## Google Speech to Text API

In this section we will be utilizing the google speech to text API to obtain a transcription for all the birthday songs in our dataset. The path to API keys have been removed for confidentiality.

In [1]:
# Imports
import io
import os
from google.cloud import speech
from pydub import AudioSegment
import scipy.io.wavfile
import pandas as pd
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = #Path to keys go here

In [2]:
# Function to check if connected to google api
def implicit():
    from google.cloud import storage

    # If you don't specify credentials when constructing the client, the
    # client library will look for credentials in the environment.
    storage_client = storage.Client()

    # Make an authenticated API request
    buckets = list(storage_client.list_buckets())
    print(buckets)

In [3]:
# Function call
implicit()

[]


It appears we are connected to the API.

In [98]:
# Collect all file names in directory
onlyfiles = [f for f in os.listdir('bdaysongs/output/') if os.path.isfile(os.path.join('bdaysongs/output/', f))]

In [100]:
# Check how many files in directory
len(onlyfiles)

107

In [101]:
# View sampling frequency and file name
for url in onlyfiles:
    fs, _ = scipy.io.wavfile.read('bdaysongs/output/'+url)
    print(fs, url)

44100 Abby_and_Chris_-_Happy_Birthday_Now_vocals.wav
44100 Ali_Sternburg_-_So_Glad_Youre_On_This_Earth_vocals.wav
44100 Ali_Sternburg_-_Your_Birthday_vocals.wav
44100 Allen_Riley_and_Fran_Agnone_-_Let_The_Truth_Be_Told_Individual_Version_vocals.wav
44100 arif_usmani_-_hap_hap_happy_birthday_vocals.wav
44100 Armin_Rdiger_Vieweg_-_Happy_Birthday_To_You_Alternative_vocals.wav
44100 Aron_Blue_-_Happy_Birthday_vocals.wav
44100 A_JonesOBF_-_Born_on_this_Day_vocals.wav
44100 B2B_-_Its_a_Very_Happy_Birthday_to_You_3_variations_for_different_occasions_vocals.wav
44100 Backyard_Music_-_A_Bright_Happy_Birthday_vocals.wav
44100 Backyard_Music_-_Badumm_Badumm_Happy_Birthday_to_ya_vocals.wav
44100 Backyard_Music_-_Birthday_blues_vocals.wav
44100 Backyard_Music_-_Blow_out_the_candles_vocals.wav
44100 Backyard_Music_-_Happy_Birthday_is_What_Well_Say_vocals.wav
44100 Backyard_Music_-_Happy_Birthday_to_ya_vocals.wav
44100 Backyard_Music_-_Happy_Birthday_wash_your_blues_away_vocals.wav
44100 Backyard_Mus

In [131]:
# Function to call google speech to text api that returns list of:
# list of words with corresponding start time, end time, and probability
def transcribe_gcs_with_word_time_offsets(file_name):
    """Transcribe the given audio file asynchronously and output the word time
    offsets."""
    from google.cloud import speech

    client = speech.SpeechClient()
    
    with io.open(file_name, "rb") as audio_file:
        content = audio_file.read()
        audio = speech.RecognitionAudio(content=content)

    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=44100,
        language_code="en-US",
        enable_word_time_offsets=True,
        model = 'video',
    )

    operation = client.long_running_recognize(
        request={"config": config, "audio": audio}
    )

    print("Waiting for operation to complete...")
    result = operation.result(timeout=90)
    
    transcription_list = []
    
    for result in result.results:
        alternative = result.alternatives[0]
        print("Transcript: {}".format(alternative.transcript))
        print("Confidence: {}".format(alternative.confidence))
        print('')

        for word_info in alternative.words:
            word = word_info.word
            start_time = word_info.start_time
            end_time = word_info.end_time
            transcription_list.append([alternative.confidence, word_info.word, word_info.start_time.total_seconds(), word_info.end_time.total_seconds()])
            
    return(transcription_list)


### Need to use `video` model
We need to use the `video` model as we require the exact time each word occurs, and it is the most accurate.


In [138]:
# for loop to iterate over each item in `onlyfiles` to convert to audio file to mono and call google speech to text API.
# data is saved as a dataframe for every iteration.
transcription_list = []
word_time_list = []
for song in onlyfiles:
    # input and output of stereo to mono conversion
    orig_song = 'bdaysongs/output/'+song
    new_song = 'bdaysongs/output2/'+song
    
    # stereo to mono conversion
    sound = AudioSegment.from_wav(orig_song)
    sound = sound.set_channels(1)
    sound.export(new_song, format="wav")
    
    try:
        # google speech to text api call
        word_time_list = transcribe_gcs_with_word_time_offsets(new_song)
        transcription_list.append([song, word_time_list])
        pd.DataFrame(transcription_list, columns = ['filename', 'wordtime']).to_csv('datasets/transcription.csv')
    except:
        print(f"Was not able to process {song}")

Waiting for operation to complete...
Transcript: happy birthday to you Kevin happy birthday to you Kevin happy birthday to you Kevin happy birthday now happy birthday to you Samantha happy birthday to you happy birthday to you happy birthday now
Confidence: 0.912838339805603

Waiting for operation to complete...
Transcript: you yeah you I'm so glad you're on this Earth hey you yeah you it's time to celebrate your birthday you yeah you I'm so glad you're on this Earth hey you yeah you it's time to celebrate your birthday
Confidence: 0.9021165370941162

Waiting for operation to complete...
Transcript: it's time to celebrate your birthday yet there is nowhere else on this Earth hey hey I'd rather be than with you next to me on your birthday it's your date and it's time to celebrate your birthday it is here it only happens once a year happy birthday
Confidence: 0.8672227263450623

Waiting for operation to complete...
Transcript: 
Confidence: 0.0

Transcript: let the truth be told you're 50

Waiting for operation to complete...
Transcript: happy birthday Cody we'd like to share some cheer happy birthday Cody on the state of the year it's your birthday Cody shot a great hooray it's your birthday Cody your life today hey
Confidence: 0.8861423134803772

Waiting for operation to complete...
Transcript: happy birthday and hey it's your day it's time to just say happy birthday Dan or Uncle Fred or fluffy whatever have a good day bye
Confidence: 0.9128384590148926

Waiting for operation to complete...
Transcript: happy happy birthday happy happy birthday happy birthday Thomas happy happy birthday
Confidence: 0.9128386378288269

Waiting for operation to complete...
Transcript: happy birthday happy birthday it's your day it's your day
Confidence: 0.8173288106918335

Waiting for operation to complete...
Transcript: one two one two three four birthday birthday birthday birthday what a surprise what a surprise are you older and younger than me but I didn't know
Confidence: 0.797141194

Waiting for operation to complete...
Transcript: 
Confidence: 0.0

Transcript: but they must be merry birthday birthday cake but they pose a happy today's your special day
Confidence: 0.7003616690635681

Transcript:  birthday birthday Mary birthday birthday birthday birthday happy today is daisies day
Confidence: 0.8632793426513672

Transcript: 
Confidence: 0.0

Waiting for operation to complete...
Transcript: 
Confidence: 0.0

Transcript: it's your birthday yes it is it your birthday yes it is it your birthday so happy birthday to you yes you it's your birthday yes it is it your birthday yes it is introverted so happy birthday to you used to you
Confidence: 0.9086448550224304

Waiting for operation to complete...
Transcript: happy birthday thank you day be fun it's your birthday happy birthday I hope you have a good one
Confidence: 0.8201473355293274

Waiting for operation to complete...
Transcript: happy birthday and many cheers you are today this many years
Confidence: 0.84631419181

Waiting for operation to complete...
Transcript: give me chance for whoever it's your special day give me a chance for whoever is we celebrate your birthday happy birthday come on everyone let's give three cheers for whoever hooray hooray hooray
Confidence: 0.8570423722267151

Waiting for operation to complete...
Transcript: 
Confidence: 0.0

Transcript: this is a song for someone special
Confidence: 0.8744868040084839

Transcript:  this is my song for you
Confidence: 0.8589123487472534

Transcript:  Beautiful memory of this special more years will Bessie
Confidence: 0.7882370352745056

Transcript:  happy birthday happy birthday happy birthday to you happy birthday happy birthday happy birthday to you
Confidence: 0.8256301283836365

Waiting for operation to complete...
Transcript: he birthday to you happy birthday to you we wish you all the best dude Jones Sally Barbra Jean whatever your name is so happy birthday to you
Confidence: 0.8727531433105469

Waiting for operation to complete.