DSI BOS 11 (May 2020) Project 5

Alex Golden, Jungmoon Ham, Luke Podsiadlo, Zach Tretter

Workbook 4 - Speech to Text Transcription

----------

## Speech Recognition

#### Workflow Steps

1. Import audio segments (.wav files)

2. Transcribe via google's cloud speech-to-text API

3. Export results as dataframe

#### Requirements

* Key for google API
* Input path for audio files to be processed
* Output path for csv file (the dataframe)
* Name for said csv file

Core code adapted from
* DSI-SF-9 [(Grant Wilson, J. Hall, Gabriel Perez Prieto)](https://github.com/GWilson97/san_francisco_dispatch_audio_mapping/blob/master/code/03a_speech_to_text.ipynb)


In [1]:
# !pip install --upgrade google-cloud-speech
import os
import io
import pandas as pd
import time

from google.cloud import speech_v1p1beta1 as speech
from google.cloud.speech_v1p1beta1 import enums

### Establish Credentials for Google Application

In [2]:
path_to_key = '[YOUR PATH HERE]'

key_name = '[YOUR KEY NAME HERE]'

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = path_to_key + key_name

client = speech.SpeechClient()

### Establish Input Path (Specific to Your Machine)

In [3]:
input_path = './Datasets/dolby_tests/chunks/enhanced/'

os.listdir(input_path)

['sample282-enhanced-25818-20200502-1401.wav',
 'sample76-enhanced-25818-20200502-1332.wav',
 'sample195-enhanced-25818-20200502-1222.wav',
 'sample300-enhanced-25818-20200502-1401.wav',
 'sample249-enhanced-25818-20200502-1401.wav',
 'sample95-enhanced-25818-20200502-1332.wav',
 'sample261-enhanced-25818-20200502-1401.wav',
 'sample256-enhanced-25818-20200502-1401.wav',
 'sample27-enhanced-25818-20200501-1210.wav',
 'sample4-enhanced-25818-20200501-1210.wav',
 'sample169-enhanced-25818-20200502-1431.wav',
 'sample4-enhanced-25818-20200502-1531.wav',
 'sample27-enhanced-25818-20200502-1531.wav',
 'sample236-enhanced-25818-20200502-1222.wav',
 'sample61-enhanced-25818-20200502-1531.wav',
 'sample10-enhanced-25818-20200502-1531.wav',
 'sample201-enhanced-25818-20200502-1222.wav',
 'sample247-enhanced-25818-20200502-1222.wav',
 'sample56-enhanced-25818-20200502-1531.wav',
 'sample104-enhanced-25818-20200502-1332.wav',
 'sample10-enhanced-25818-20200501-1210.wav',
 'sample323-enhanced-2581

### Establish Output Path (Specific to Your Machine)

In [4]:
output_path = './Datasets/dolby_tests/'

file_name = 'enhanced-test-df.csv'

os.listdir(output_path)

['.DS_Store',
 'enhanced-test-df.csv',
 'raw-test-df.csv',
 'chunks',
 'enhanced',
 'raw']

### Transcribe to Dataframe

In [5]:
start_time = time.time()

df = pd.DataFrame()

for sample_audio in os.listdir(input_path):
    loop_time = time.time()
    
    # Examine wav files
    if sample_audio.endswith('.wav'):
        
        # Open files
        with io.open(input_path + sample_audio,'rb') as audio_to_transcribe:
            content = audio_to_transcribe.read()
            audio = speech.types.RecognitionAudio(content = content)
            
        # Declare speech recognition parameters
        config = speech.types.RecognitionConfig(
            encoding = enums.RecognitionConfig.AudioEncoding.LINEAR16,
            sample_rate_hertz = 22050,
            language_code = 'en-US',
            audio_channel_count = 1,
            enable_separate_recognition_per_channel = True,
            use_enhanced = True,
            model = 'phone_call',
            speech_contexts = [{'boost':20.0}]
        )
        
        # This models equivalent of fit/predict
        response = client.recognize(config,audio)
        
        # Build Dictionary that becomes a Dataframe
        for result in response.results:
            d = {}
            d['transcript'] = result.alternatives[0].transcript
            d['confidence'] = result.alternatives[0].confidence
            d['file_name'] = sample_audio
            d['audio_length'] = round(int(sample_audio.split("-")[-1].split(".")[0])/1_000,1)
            processing_time = round(time.time() - loop_time,1)
            d['transcribe_time'] = processing_time
            df = df.append(d, ignore_index=True)
                    
        print(f"File {sample_audio} transcribed to df in {round(processing_time,0)} secs")

print(f'total time of {time.time() - start_time}')

File sample282-enhanced-25818-20200502-1401.wav transcribed to df in 2.0 secs
File sample76-enhanced-25818-20200502-1332.wav transcribed to df in 4.0 secs
File sample195-enhanced-25818-20200502-1222.wav transcribed to df in 3.0 secs
File sample300-enhanced-25818-20200502-1401.wav transcribed to df in 17.0 secs
File sample249-enhanced-25818-20200502-1401.wav transcribed to df in 4.0 secs
File sample95-enhanced-25818-20200502-1332.wav transcribed to df in 4.0 secs
File sample261-enhanced-25818-20200502-1401.wav transcribed to df in 13.0 secs
File sample256-enhanced-25818-20200502-1401.wav transcribed to df in 6.0 secs
File sample27-enhanced-25818-20200501-1210.wav transcribed to df in 11.0 secs
File sample4-enhanced-25818-20200501-1210.wav transcribed to df in 2.0 secs
File sample169-enhanced-25818-20200502-1431.wav transcribed to df in 5.0 secs
File sample4-enhanced-25818-20200502-1531.wav transcribed to df in 5.0 secs
File sample27-enhanced-25818-20200502-1531.wav transcribed to df in 

File sample308-enhanced-25818-20200502-1401.wav transcribed to df in 1.0 secs
File sample82-enhanced-25818-20200502-1332.wav transcribed to df in 7.0 secs
File sample276-enhanced-25818-20200502-1401.wav transcribed to df in 3.0 secs
File sample269-enhanced-25818-20200502-1401.wav transcribed to df in 4.0 secs
File sample295-enhanced-25818-20200502-1401.wav transcribed to df in 7.0 secs
File sample79-enhanced-25818-20200502-1332.wav transcribed to df in 7.0 secs
File sample85-enhanced-25818-20200502-1332.wav transcribed to df in 7.0 secs
File sample271-enhanced-25818-20200502-1401.wav transcribed to df in 7.0 secs
File sample259-enhanced-25818-20200502-1401.wav transcribed to df in 7.0 secs
File sample292-enhanced-25818-20200502-1401.wav transcribed to df in 1.0 secs
File sample151-enhanced-25818-20200502-1431.wav transcribed to df in 8.0 secs
File sample342-enhanced-25818-20200502-1501.wav transcribed to df in 4.0 secs
File sample248-enhanced-25818-20200502-1222.wav transcribed to df i

File sample192-enhanced-25818-20200502-1222.wav transcribed to df in 2.0 secs
File sample307-enhanced-25818-20200502-1401.wav transcribed to df in 31.0 secs
File sample92-enhanced-25818-20200502-1332.wav transcribed to df in 2.0 secs
File sample266-enhanced-25818-20200502-1401.wav transcribed to df in 1.0 secs
File sample251-enhanced-25818-20200502-1401.wav transcribed to df in 9.0 secs
File sample160-enhanced-25818-20200502-1431.wav transcribed to df in 7.0 secs
File sample68-enhanced-25818-20200502-1531.wav transcribed to df in 7.0 secs
File sample373-enhanced-25818-20200502-1501.wav transcribed to df in 3.0 secs
File sample335-enhanced-25818-20200502-1501.wav transcribed to df in 3.0 secs
File sample344-enhanced-25818-20200502-1501.wav transcribed to df in 3.0 secs
File sample19-enhanced-25818-20200502-1531.wav transcribed to df in 8.0 secs
File sample208-enhanced-25818-20200502-1222.wav transcribed to df in 7.0 secs
File sample19-enhanced-25818-20200501-1210.wav transcribed to df i

File sample48-enhanced-25818-20200501-1210.wav transcribed to df in 5.0 secs
File sample177-enhanced-25818-20200502-1431.wav transcribed to df in 8.0 secs
File sample39-enhanced-25818-20200501-1210.wav transcribed to df in 6.0 secs
File sample131-enhanced-25818-20200502-1431.wav transcribed to df in 2.0 secs
File sample364-enhanced-25818-20200502-1501.wav transcribed to df in 12.0 secs
File sample228-enhanced-25818-20200502-1222.wav transcribed to df in 12.0 secs
File sample39-enhanced-25818-20200502-1531.wav transcribed to df in 2.0 secs
File sample322-enhanced-25818-20200502-1501.wav transcribed to df in 5.0 secs
File sample16-enhanced-25818-20200501-1210.wav transcribed to df in 5.0 secs
File sample158-enhanced-25818-20200502-1431.wav transcribed to df in 7.0 secs
File sample102-enhanced-25818-20200502-1332.wav transcribed to df in 13.0 secs
File sample50-enhanced-25818-20200501-1210.wav transcribed to df in 4.0 secs
File sample241-enhanced-25818-20200502-1222.wav transcribed to df 

File sample36-enhanced-25818-20200502-1531.wav transcribed to df in 7.0 secs
File sample178-enhanced-25818-20200502-1431.wav transcribed to df in 6.0 secs
File sample36-enhanced-25818-20200501-1210.wav transcribed to df in 6.0 secs
File sample184-enhanced-25818-20200502-1431.wav transcribed to df in 5.0 secs
File sample122-enhanced-25818-20200502-1332.wav transcribed to df in 2.0 secs
File sample115-enhanced-25818-20200502-1332.wav transcribed to df in 2.0 secs
File sample47-enhanced-25818-20200501-1210.wav transcribed to df in 13.0 secs
File sample47-enhanced-25818-20200502-1531.wav transcribed to df in 17.0 secs
File sample210-enhanced-25818-20200502-1222.wav transcribed to df in 17.0 secs
total time of 2730.8150882720947


### View Dataframe

In [6]:
pd.set_option('display.max_columns',None)  

df = df[['file_name',
         'audio_length',
         'transcribe_time',
         'confidence',
         'transcript']]
df

Unnamed: 0,file_name,audio_length,transcribe_time,confidence,transcript
0,sample282-enhanced-25818-20200502-1401.wav,1.4,1.7,0.780835,no he is currently outside
1,sample76-enhanced-25818-20200502-1332.wav,1.3,3.7,0.535276,I think one of the empty.
2,sample76-enhanced-25818-20200502-1332.wav,1.3,3.7,0.801636,Call.
3,sample195-enhanced-25818-20200502-1222.wav,1.2,3.0,0.713038,welcome
4,sample300-enhanced-25818-20200502-1401.wav,1.4,17.0,0.766309,yeah correct 106 Galveston this process your X...
...,...,...,...,...,...
481,sample122-enhanced-25818-20200502-1332.wav,1.3,2.4,0.673614,so forth
482,sample115-enhanced-25818-20200502-1332.wav,1.3,2.4,0.596133,one we were right now download is that all right
483,sample47-enhanced-25818-20200501-1210.wav,1.2,12.6,0.620793,check out I can put them on my car. He'll find...
484,sample47-enhanced-25818-20200501-1210.wav,1.2,12.7,0.764385,yeah I don't think he has a clue he's been ru...


### Export CSV

In [7]:
df.to_csv(output_path + file_name,
          index_label = False)