# Homework 4
## QTM 250
Anna Connolly, Cahill Carusos, Caroline Lee, Emma Joseph, Forrest Martin, Jash Shah

## Introduction 

Today, we can see services that turn speech into text almost everywhere. For example, the dictate feature in phones and smart TVs. That being said, it is often a struggle to get the device to type what you are saying as it may misinterpret the words as something else. This is why, in this notebook, we decided to test out Google Cloud's Speech to Text machine learning API to see how well it would do when transcribing speech.

Data used in this project can be found in this [GitHub Repository](https://github.com/carolinelee78/HW4_Caprese).

## API Overview
## Google Cloud's Speech to Text

Google Cloud's machine learning API Speech to Text allows you to convert speech into text using Google’s AI technologies. Speech to Text can transcribe content in real time or from stored files, deliver an improved user experience through voice commands, and improve service based on customer interactions.

Benefits of this API include state-of-the-art accuracy, global reach, and flexible deployment. Google’s most advanced deep learning neural network algorithms enable automatic speech recognition (ASR). Voice recognition supports over 125 different languages, and speech recognition can be deployed in the cloud or on premises.

Key features, such as speech adaptation, allow you to customize speech recognition to transcribe domain-specific terms and rare words. You can also automatically convert spoken numbers into addresses, years, and currencies using classes.

You are able to choose from different domain-specific models for voice control, phone call, and video transcription, which are optimized for domain-specific quality requirements.

Streaming speech recognition provides real-time speech recognition results as the API processes the audio input streamed from your application’s microphone.

And lastly, Speech-to-Text On-Prem can be deployed in your own private data centers to ensure full conrtol and protection.

##  How it Works
![hw 4 chart.png](https://raw.githubusercontent.com/carolinelee78/HW4_Caprese/main/DrawIo.png)

Our process involves downloading sample recordings of speech in varying languages from omniglot and storing them in cloud storage. Then, using the Google Speech to Text API we run the audio files to get the text output. These results were then stored in BigQuery and confidence intervals were created to assess the accuracy of the Speech to Text API for each language. 

# Deployment 


In [None]:
import pandas as pd
import numpy as np

API Key: AIzaSyBM38rsFvik1mVwO9fyzSiMfne2GrqT4BM

In [None]:
import getpass

APIKEY = getpass.getpass()

··········


In [None]:
APIKEY

' AIzaSyBM38rsFvik1mVwO9fyzSiMfne2GrqT4BM'

We used sample audio files in different languages from [Omniglot Sound Files](https://omniglot.com/soundfiles/) to convert to text with Speech-to-Text API. 

In [None]:
# import build
from googleapiclient.discovery import build

Confidence levels of the Speech-to-Text conversions for three phrases in five languages were recorded and compared. The average confidence level for each phrase was also calculated. 

### Phrase 1: "Thanks"

**Korean**

In [None]:
# sample Korean audio file for "thanks"
sservice = build('speech', 'v1', developerKey=APIKEY)
kr_thanks_response = sservice.speech().recognize(
    body={
        'config': {
            'encoding': 'FLAC',
            'sampleRateHertz': 48000,
            "languageCode": "ko-KR",
        },
        'audio': {
            'uri': 'gs://hw4-data-caprese/thanks1_kr.flac'
            }
        }).execute()
print(kr_thanks_response)

{'results': [{'alternatives': [{'transcript': '감사합니다', 'confidence': 0.92365956}]}]}


In [None]:
kr_thanks_values = kr_thanks_response['results']
kr_thanks_conf = str(kr_thanks_values)[57:67]
kr_thanks_conf

'0.92365956'

**Slovenian**

In [None]:
# sample Slovenian audio file for "thanks"
sservice = build('speech', 'v1', developerKey=APIKEY)
sl_thanks_response = sservice.speech().recognize(
    body={
        'config': {
            'encoding': 'FLAC',
            'sampleRateHertz': 48000,
            "languageCode": "sl-SI",
        },
        'audio': {
            'uri': 'gs://hw4-data-caprese/thanks1_sl.flac'
            }
        }).execute()
print(sl_thanks_response)

{'results': [{'alternatives': [{'transcript': 'hvala', 'confidence': 0.601508}]}]}


In [None]:
sl_thanks_values = sl_thanks_response['results']
sl_thanks_conf = str(sl_thanks_values)[57:65]
sl_thanks_conf

'0.601508'

**Vietnamese**

In [None]:
# sample Vietnamese audio file for "thanks"
sservice = build('speech', 'v1', developerKey=APIKEY)
vi_thanks_response = sservice.speech().recognize(
    body={
        'config': {
            'encoding': 'FLAC',
            'sampleRateHertz': 48000,
            "languageCode": "vi-VN",
        },
        'audio': {
            'uri': 'gs://hw4-data-caprese/thanks1_vi.flac'
            }
        }).execute()
print(vi_thanks_response)

{'results': [{'alternatives': [{'transcript': 'cảm ơn', 'confidence': 0.90891194}]}]}


In [None]:
vi_thanks_values = vi_thanks_response['results']
vi_thanks_conf = str(vi_thanks_values)[58:67]
vi_thanks_conf

'0.9089119'

**Icelandic**

In [None]:
# sample Icelandic audio file for "thanks"
sservice = build('speech', 'v1', developerKey=APIKEY)
ic_thanks_response = sservice.speech().recognize(
    body={
        'config': {
            'encoding': 'FLAC',
            'sampleRateHertz': 48000,
            "languageCode": "is-IS",
        },
        'audio': {
            'uri': 'gs://hw4-data-caprese/thanks1_ic.flac'
            }
        }).execute()
print(ic_thanks_response)

{'results': [{'alternatives': [{'transcript': 'takk', 'confidence': 0.97623503}]}]}


In [None]:
ic_thanks_values = ic_thanks_response['results']
ic_thanks_conf = str(ic_thanks_values)[56:66]
ic_thanks_conf

'0.97623503'

**Persian**

In [None]:
# sample Persian audio file for "thanks"
sservice = build('speech', 'v1', developerKey=APIKEY)
fa_thanks_response = sservice.speech().recognize(
    body={
        'config': {
            'encoding': 'FLAC',
            'sampleRateHertz': 48000,
            "languageCode": "fa-IR",
        },
        'audio': {
            'uri': 'gs://hw4-data-caprese/thanks1_fa.flac'
            }
        }).execute()
print(fa_thanks_response)

{'results': [{'alternatives': [{'transcript': 'ممنونم', 'confidence': 0.9275573}]}]}


In [None]:
fa_thanks_values = fa_thanks_response['results']
fa_thanks_conf = str(fa_thanks_values)[58:67]
fa_thanks_conf

'0.9275573'

**Summary**

In [None]:
thanks_data = {'language': ['Korean', 'Slovenian', 'Vietnamese', 'Icelandic', 'Persian'],
                'thanks_conflevel': [kr_thanks_conf, sl_thanks_conf, vi_thanks_conf, ic_thanks_conf, fa_thanks_conf]}
df_thanks = pd.DataFrame(thanks_data, columns = ['language', 'thanks_conflevel'])
df_thanks

Unnamed: 0,language,thanks_conflevel
0,Korean,0.92365956
1,Slovenian,0.601508
2,Vietnamese,0.9089119
3,Icelandic,0.97623503
4,Persian,0.9275573


### Phrase 2: "How are you?"

**Korean**

In [None]:
# sample Korean audio file for "how are you"
sservice = build('speech', 'v1', developerKey=APIKEY)
kr_howareyou_response = sservice.speech().recognize(
    body={
        'config': {
            'encoding': 'FLAC',
            'sampleRateHertz': 48000,
            "languageCode": "ko-KR",
        },
        'audio': {
            'uri': 'gs://hw4-data-caprese/howareyou1_kr.flac'
            }
        }).execute()
print(kr_howareyou_response)

{'results': [{'alternatives': [{'transcript': '어떻게 지내세요', 'confidence': 0.92365956}]}]}


In [None]:
kr_howareyou_values = kr_howareyou_response['results']
kr_howareyou_conf = str(kr_howareyou_values)[60:70]
kr_howareyou_conf

'0.92365956'

**Slovenian**

In [None]:
# sample Slovenian audio file for "how are you"
sservice = build('speech', 'v1', developerKey=APIKEY)
sl_howareyou_response = sservice.speech().recognize(
    body={
        'config': {
            'encoding': 'FLAC',
            'sampleRateHertz': 48000,
            "languageCode": "sl-SI",
        },
        'audio': {
            'uri': 'gs://hw4-data-caprese/howareyou1_sl.flac'
            }
        }).execute()
print(sl_howareyou_response)

{'results': [{'alternatives': [{'transcript': 'kako se imate', 'confidence': 0.9583776}]}]}


In [None]:
sl_howareyou_values = sl_howareyou_response['results']
sl_howareyou_conf = str(sl_howareyou_values)[65:74]
sl_howareyou_conf

'0.9583776'

**Vietnamese**

In [None]:
# sample Vietnamese audio file for "how are you"
sservice = build('speech', 'v1', developerKey=APIKEY)
vi_howareyou_response = sservice.speech().recognize(
    body={
        'config': {
            'encoding': 'FLAC',
            'sampleRateHertz': 48000,
            "languageCode": "vi-VN",
        },
        'audio': {
            'uri': 'gs://hw4-data-caprese/howareyou1_vi.flac'
            }
        }).execute()
print(vi_howareyou_response)

{'results': [{'alternatives': [{'transcript': 'ăn quả', 'confidence': 0.8342612}]}]}


In [None]:
vi_howareyou_values = vi_howareyou_response['results']
vi_howareyou_conf = str(vi_howareyou_values)[58:67]
vi_howareyou_conf

'0.8342612'

**Icelandic**

In [None]:
# sample Icelandic audio file for "how are you"
sservice = build('speech', 'v1', developerKey=APIKEY)
ic_howareyou_response = sservice.speech().recognize(
    body={
        'config': {
            'encoding': 'FLAC',
            'sampleRateHertz': 48000,
            "languageCode": "is-IS",
        },
        'audio': {
            'uri': 'gs://hw4-data-caprese/howareyou1_ic.flac'
            }
        }).execute()
print(ic_howareyou_response)

{'results': [{'alternatives': [{'transcript': 'hvað segir þú', 'confidence': 0.96122074}]}]}


In [None]:
ic_howareyou_values = ic_howareyou_response['results']
ic_howareyou_conf = str(ic_howareyou_values)[65:74]
ic_howareyou_conf

'0.9612207'

**Persian**

In [None]:
# sample Persian audio file for "how are you"
sservice = build('speech', 'v1', developerKey=APIKEY)
fa_howareyou_response = sservice.speech().recognize(
    body={
        'config': {
            'encoding': 'FLAC',
            'sampleRateHertz': 48000,
            "languageCode": "fa-IR",
        },
        'audio': {
            'uri': 'gs://hw4-data-caprese/howareyou1_fa.flac'
            }
        }).execute()
print(fa_howareyou_response)

{'results': [{'alternatives': [{'transcript': 'حال شما چطور است', 'confidence': 0.9275573}]}]}


In [None]:
fa_howareyou_values = fa_howareyou_response['results']
fa_howareyou_conf = str(fa_howareyou_values)[68:77]
fa_howareyou_conf

'0.9275573'

**Summary**

In [None]:
howareyou_data = {'language': ['Korean', 'Slovenian', 'Vietnamese', 'Icelandic', 'Persian'],
                'howareyou_conflevel': [kr_howareyou_conf, sl_howareyou_conf, vi_howareyou_conf, ic_howareyou_conf, fa_howareyou_conf]}
df_howareyou = pd.DataFrame(howareyou_data, columns = ['language', 'howareyou_conflevel'])
df_howareyou

Unnamed: 0,language,howareyou_conflevel
0,Korean,0.92365956
1,Slovenian,0.9583776
2,Vietnamese,0.8342612
3,Icelandic,0.9612207
4,Persian,0.9275573


### Phrase 3: "Happy Birthday"

**Korean**

In [None]:
# sample Korean audio file for "happy birthday"
sservice = build('speech', 'v1', developerKey=APIKEY)
kr_hbd_response = sservice.speech().recognize(
    body={
        'config': {
            'encoding': 'FLAC',
            'sampleRateHertz': 48000,
            "languageCode": "ko-KR",
        },
        'audio': {
            'uri': 'gs://hw4-data-caprese/happybirthday_kr.flac'
            }
        }).execute()
print(kr_hbd_response)

{'results': [{'alternatives': [{'transcript': '생일 축하합니다', 'confidence': 0.74648046}]}]}


In [None]:
kr_hbd_values = kr_hbd_response['results']
kr_hbd_conf = str(kr_hbd_values)[60:70]
kr_hbd_conf

'0.74648046'

**Slovenian**

In [None]:
# sample Slovenian audio file for "hapy birthday"
sservice = build('speech', 'v1', developerKey=APIKEY)
sl_hbd_response = sservice.speech().recognize(
    body={
        'config': {
            'encoding': 'FLAC',
            'sampleRateHertz': 48000,
            "languageCode": "sl-SI",
        },
        'audio': {
            'uri': 'gs://hw4-data-caprese/happybithday_sl.flac'
            }
        }).execute()
print(sl_hbd_response)

{'results': [{'alternatives': [{'transcript': 'vse najboljše', 'confidence': 0.9583776}]}]}


In [None]:
sl_hbd_values = sl_hbd_response['results']
sl_hbd_conf = str(sl_hbd_values)[65:74]
sl_hbd_conf

'0.9583776'

**Vietnamese**

In [None]:
# sample Vietnamese audio file for "happy birthday"
sservice = build('speech', 'v1', developerKey=APIKEY)
vi_hbd_response = sservice.speech().recognize(
    body={
        'config': {
            'encoding': 'FLAC',
            'sampleRateHertz': 48000,
            "languageCode": "vi-VN",
        },
        'audio': {
            'uri': 'gs://hw4-data-caprese/happybirthday_vi.flac'
            }
        }).execute()
print(vi_hbd_response)

{'results': [{'alternatives': [{'transcript': 'chúc mừng sinh', 'confidence': 0.97175497}]}]}


In [None]:
vi_hbd_values = vi_hbd_response['results']
vi_hbd_conf = str(vi_hbd_values)[66:76]
vi_hbd_conf

'0.97175497'

**Icelandic**

In [None]:
# sample Icelandic audio file for "how are you"
sservice = build('speech', 'v1', developerKey=APIKEY)
ic_hbd_response = sservice.speech().recognize(
    body={
        'config': {
            'encoding': 'FLAC',
            'sampleRateHertz': 48000,
            "languageCode": "is-IS",
        },
        'audio': {
            'uri': 'gs://hw4-data-caprese/happybirthday_ic.flac'
            }
        }).execute()
print(ic_hbd_response)

{'results': [{'alternatives': [{'transcript': 'til hamingju með afmælið', 'confidence': 0.97623503}]}]}


In [None]:
ic_hbd_values = ic_hbd_response['results']
ic_hbd_conf = str(ic_hbd_values)[76:86]
ic_hbd_conf

'0.97623503'

**Persian**

In [None]:
# sample Persian audio file for "happy birthday"
sservice = build('speech', 'v1', developerKey=APIKEY)
fa_hbd_response = sservice.speech().recognize(
    body={
        'config': {
            'encoding': 'FLAC',
            'sampleRateHertz': 48000,
            "languageCode": "fa-IR",
        },
        'audio': {
            'uri': 'gs://hw4-data-caprese/happybirthday_fa.flac'
            }
        }).execute()
print(fa_hbd_response)

{'results': [{'alternatives': [{'transcript': 'تولدت مبارک', 'confidence': 0.9275573}]}]}


In [None]:
fa_hbd_values = fa_hbd_response['results']
fa_hbd_conf = str(fa_hbd_values)[63:72]
fa_hbd_conf

'0.9275573'

**Summary**

In [None]:
hbd_data = {'language': ['Korean', 'Slovenian', 'Vietnamese', 'Icelandic', 'Persian'],
                'happy_birthday_conflevel': [kr_hbd_conf, sl_hbd_conf, vi_hbd_conf, ic_hbd_conf, fa_hbd_conf]}
df_hbd = pd.DataFrame(hbd_data, columns = ['language', 'happy_birthday_conflevel'])
df_hbd

Unnamed: 0,language,happy_birthday_conflevel
0,Korean,0.74648046
1,Slovenian,0.9583776
2,Vietnamese,0.97175497
3,Icelandic,0.97623503
4,Persian,0.9275573


In [None]:
df1 = pd.merge(df_thanks, df_howareyou, on=["language"])
final_df = pd.merge(df1, df_hbd, on=["language"])
final_df

Unnamed: 0,language,thanks_conflevel,howareyou_conflevel,happy_birthday_conflevel
0,Korean,0.92365956,0.92365956,0.74648046
1,Slovenian,0.601508,0.9583776,0.9583776
2,Vietnamese,0.9089119,0.8342612,0.97175497
3,Icelandic,0.97623503,0.9612207,0.97623503
4,Persian,0.9275573,0.9275573,0.9275573


In [None]:
# Mounting Google Drive at the specified mountpoint path with authentication
from google.colab import drive
drive.mount('drive')

Mounted at drive


In [None]:
# converting the dataframe into csv and saving it to Google Drive
final_df.to_csv('final_df.csv')
!cp final_df.csv "drive/My Drive/"

### Visualizations in Data Studio
*   https://datastudio.google.com/reporting/10d771f4-7eb0-46f8-abe8-d485dc8c9ab4



## Conclusion

We conducted tests using Google’s Speech-to-Text API for three different phrases and in 5 different languages. We gave the API phrases in Korean, Slovenian, Vietnamese, Icelandic, and Persian. There was a range of accuracy from the different languages as well as the different phrases. For “Thanks”, the confidence level ranged from 60.1% with Slovenian to 97.6% with Icelandic. For “How are you?”, the confidence levels ranged from 83.4% in Vietnamese to 96.1% confidence in Icelandic. The phrase “Happy Birthday” was interesting as well, because we saw Korean fall off in confidence a lot compared to the other two phrases, reaching only 74.6% confidence, while Slovenian shot all the way up to 95.8% confidence. Icelandic remained solid in the third phrase though, at 97.6%, demonstrating that Google’s Speech-to-Text API is quite good at discriminating the translation of phrases in Icelandic compared to the other languages.