# Speech-to-Text API in Collaboration with Google Cloud

A Speech-to-Text API synchronous recognition request is the simplest method for performing recognition on speech audio data. Speech-to-Text can process **up to 1 minute** of speech audio (and size **up to 10MB**) data sent in a synchronous request (can be uploaded from the storage of my own computer). After Speech-to-Text processes and recognizes all of the audio, it returns a response.

I am going to convert to text a recorded session (from TedTalks) between an interviewer and co-founder of Microsoft Bill Gates. The length for this conversion is **longer** than 1 minute and the size is **greater** than 10MB. For that reason I no longer can upload a short recording from my computer, but instead use the local storage at google cloud service and provide the designated key (or location) of that recorded session for the purpose of speech-to-text conversion. 

A link for the recorded interview can be found [here](https://www.ted.com/talks/bill_gates_the_innovations_we_need_to_avoid_a_climate_disaster/transcript?language=en)

In [None]:
# Installing the Google Cloud speech API service
!pip install --upgrade google-cloud-speech

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
# Importing the relevant liberates 
import os
from google.cloud import speech

In [None]:
# Setting the environment and using a unique json key from Google Cloud Credentials, 
# so I could be able to initialize the conversion from speech to text. 
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'key.json'

In [None]:
speech_client = speech.SpeechClient()

First, I would like to test a speech to text conversion locally, meaning the file size is less than 10MB and the length is less than 1 minute. I recorded myself reading from a random paragraph taken from a book and checked the transcript after the conversion has finished. 

Below are the steps taken in converting the speech to text using the Google Cloud API:


In [None]:
# Example 1. Transcribing the local media file
# File Size: < 10 MB; Length < 1 Minute

# Step 1. Loading the media file
media_file_name_mp3 = '/content/Dune Recording.mp3'

with open(media_file_name_mp3, 'rb') as f:
  byte_data_mp3 = f.read()

audio_mp3 = speech.RecognitionAudio(content=byte_data_mp3)

In [None]:
# Step 2. Configuring the media file output
config_mp3 = speech.RecognitionConfig(
    sample_rate_hertz=48000,
    enable_automatic_punctuation=True,
    language_code='en-US'
)

In [None]:
# Step 3. Transcribing the RecognitionAudio object
response_standard_mp3 = speech_client.recognize(
    config=config_mp3,
    audio=audio_mp3
)

In [None]:
# Printing the transcription 
print(response_standard_mp3)

results {
  alternatives {
    transcript: "Not the blood sir, but all of a men\'s water polo team. Italy belongs to his people to his tribe is a necessary when you live near the great flats or Waters, freshest are in the human body is composed of some 70% Water by weight a dead man. Surely no longer requires that what?"
    confidence: 0.9383620619773865
  }
  result_end_time {
    seconds: 25
    nanos: 360000000
  }
  language_code: "en-us"
}
total_billed_time {
  seconds: 30
}



In [None]:
# Formatting the transcription into an organized transcript sentence 
for result in response_standard_mp3.results:
    print('Transcript: {}'.format(result.alternatives[0].transcript))

Transcript: Not the blood sir, but all of a men's water polo team. Italy belongs to his people to his tribe is a necessary when you live near the great flats or Waters, freshest are in the human body is composed of some 70% Water by weight a dead man. Surely no longer requires that what?


As seen above the transcription works 😊  

Now, I will proceed in converting the long-recorded session interview with bill gates:


In [None]:
# Example 2. Transcribing a long media file
# File Size: > 10 MB; Length > 1 Minute

# Location of the audio file that was uploaded to the Google cloud service local storage
media_uri = 'gs://speech_to_text_bucket_new/BillGates_2021T_VO_Intro.mp3'
long_audio_mp3 = speech.RecognitionAudio(uri=media_uri)

config_mp3 = speech.RecognitionConfig(
    sample_rate_hertz=16000,
    language_code='en-US',
    audio_channel_count=2
    
    
    
)

In [None]:
operation = speech_client.long_running_recognize(
    config=config_mp3,
    audio=long_audio_mp3
)

response = operation.result()
print(response)

results {
  alternatives {
    transcript: "it said that\'s a lie I merely asked you on Today Show the philanthropist and Microsoft co-founder Bill Gates in conversation with Ted Global curator changes for the world to avoid climate disaster he talks or something called the green premium lays out Innovations we need to invest in and shares why younger Generations are the key to getting to net zero emissions and also have his love for burgers is changing the conversation is from March 2021 and part of countdown Ted Global initiative to xcelerate solutions to The Climate Crisis get involved at countdown head.com"
    confidence: 0.9528067111968994
  }
  result_end_time {
    seconds: 40
    nanos: 720000000
  }
  language_code: "en-us"
}
results {
  alternatives {
    transcript: "Bill Gates cousin self an imperfect Messenger on climate because of his high carbon footprint and the lifestyle however he is just made a major contribution to our thinking about confronting climate change a bo

In [None]:
# Formatting the transcription into an organized transcript sentence 
text=[]
for result in response.results:
    print('Transcript: {}'.format(result.alternatives[0].transcript))
    text.append(result.alternatives[0].transcript)

Transcript: it said that's a lie I merely asked you on Today Show the philanthropist and Microsoft co-founder Bill Gates in conversation with Ted Global curator changes for the world to avoid climate disaster he talks or something called the green premium lays out Innovations we need to invest in and shares why younger Generations are the key to getting to net zero emissions and also have his love for burgers is changing the conversation is from March 2021 and part of countdown Ted Global initiative to xcelerate solutions to The Climate Crisis get involved at countdown head.com
Transcript: Bill Gates cousin self an imperfect Messenger on climate because of his high carbon footprint and the lifestyle however he is just made a major contribution to our thinking about confronting climate change a book A book about decarbonizing our economy and Society it's an optimistic can do kind of book with a strong focus on technological solutions he discusses the things we have such as wind and sola

In [None]:
# Viewing the text of the transcription. 
text

["it said that's a lie I merely asked you on Today Show the philanthropist and Microsoft co-founder Bill Gates in conversation with Ted Global curator changes for the world to avoid climate disaster he talks or something called the green premium lays out Innovations we need to invest in and shares why younger Generations are the key to getting to net zero emissions and also have his love for burgers is changing the conversation is from March 2021 and part of countdown Ted Global initiative to xcelerate solutions to The Climate Crisis get involved at countdown head.com",
 "Bill Gates cousin self an imperfect Messenger on climate because of his high carbon footprint and the lifestyle however he is just made a major contribution to our thinking about confronting climate change a book A book about decarbonizing our economy and Society it's an optimistic can do kind of book with a strong focus on technological solutions he discusses the things we have such as wind and solar power the things

Now, I will convert the text file into a csv file and define in the csv file the columns.

In [None]:
# Saving it first as a text file
with open('tedtalk_corpus.txt', 'w') as out:
  out.writelines(text)

In [None]:
# Converting the text file into a csv file and defining the columns  
with open('tedtalk_corpus.txt', 'r') as file:
    stripped = (line.strip() for line in file)
    lines = (line.split("\n") for line in stripped if line)
    with open('tedtalk_corpus.csv', 'w') as out_file:
      writer = csv.writer(out_file)
      writer.writerow(('Sentences', 'Sentence#'))
      writer.writerows(lines)

The final step is converting the csv file into a json file. In addition, I will delete the **‘Sentence#’** column name and rename it as **‘Document#’** and enumerate it from the first document until the last.  

In [None]:
df = pd.read_csv ('/content/tedtalk_corpus.csv')
del df['Sentence#']
df.insert(1, 'Document#', range(1, 1 + len(df)))
df.to_json ('/content/tedtalk_corpus.json')

Now I am ready to begin exploring the dataset for the purpose of implanting different topic modeling techniques and extracting the hidden topics for this recorded session interview.

This will take please in a separate notebook for convenient and organization of the code. 
