DSI BOS 11 (May 2020) Project 5

Alex Golden, Jungmoon Ham, Luke Podsiadlo, Zach Tretter

Workbook 4 - Speech to Text Transcription

----------

## Speech Recognition

#### Workflow Steps

1. Import audio segments (.wav files)

2. Transcribe via google's cloud speech-to-text API

3. Export results as dataframe

#### Requirements

* Key for google API
* Input path for audio files to be processed
* Output path for csv file (the dataframe)
* Name for said csv file

Core code adapted from
* DSI-SF-9 [(Grant Wilson, J. Hall, Gabriel Perez Prieto)](https://github.com/GWilson97/san_francisco_dispatch_audio_mapping/blob/master/code/03a_speech_to_text.ipynb)


In [2]:
# !pip install --upgrade google-cloud-speech
import os
import io
import pandas as pd
import time

from google.cloud import speech_v1p1beta1 as speech
from google.cloud.speech_v1p1beta1 import enums

### Establish Credentials for Google Application

In [3]:
path_to_key = '/Users/alex/ga-dsi-11/'

key_name = 'police-scanner-speech-to-text-c09b11750e4e.json'

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = path_to_key + key_name

client = speech.SpeechClient()

### Establish Input Path (Specific to Your Machine)

In [9]:
input_path = './Datasets/dolby_tests/chunks/enhanced/'

os.listdir(input_path)

['sample27-enhanced-25818-20200501-1210.wav',
 'sample4-enhanced-25818-20200501-1210.wav',
 'sample10-enhanced-25818-20200501-1210.wav',
 'sample38-enhanced-25818-20200501-1210.wav',
 'sample49-enhanced-25818-20200501-1210.wav',
 'sample23-enhanced-25818-20200501-1210.wav',
 'sample14-enhanced-25818-20200501-1210.wav',
 'sample45-enhanced-25818-20200501-1210.wav',
 'sample34-enhanced-25818-20200501-1210.wav',
 'sample8-enhanced-25818-20200501-1210.wav',
 'sample18-enhanced-25818-20200501-1210.wav',
 'sample41-enhanced-25818-20200501-1210.wav',
 'sample30-enhanced-25818-20200501-1210.wav',
 'sample28-enhanced-25818-20200501-1210.wav',
 'sample46-enhanced-25818-20200501-1210.wav',
 'sample37-enhanced-25818-20200501-1210.wav',
 'sample42-enhanced-25818-20200501-1210.wav',
 'sample33-enhanced-25818-20200501-1210.wav',
 'sample24-enhanced-25818-20200501-1210.wav',
 'sample7-enhanced-25818-20200501-1210.wav',
 'sample13-enhanced-25818-20200501-1210.wav',
 'sample3-enhanced-25818-20200501-121

### Establish Output Path (Specific to Your Machine)

In [10]:
output_path = './Datasets/dolby_tests/'

file_name = 'enhanced-test-df.csv'

os.listdir(output_path)

['.DS_Store', 'raw-test-df.csv', 'chunks', 'enhanced', 'raw']

### Transcribe to Dataframe

In [11]:
start_time = time.time()

df = pd.DataFrame()

for sample_audio in os.listdir(input_path):
    loop_time = time.time()
    
    # Examine wav files
    if sample_audio.endswith('.wav'):
        
        # Open files
        with io.open(input_path + sample_audio,'rb') as audio_to_transcribe:
            content = audio_to_transcribe.read()
            audio = speech.types.RecognitionAudio(content = content)
            
        # Declare speech recognition parameters
        config = speech.types.RecognitionConfig(
            encoding = enums.RecognitionConfig.AudioEncoding.LINEAR16,
            sample_rate_hertz = 22050,
            language_code = 'en-US',
            audio_channel_count = 1,
            enable_separate_recognition_per_channel = True,
            use_enhanced = True,
            model = 'phone_call',
            speech_contexts = [{'boost':20.0}]
        )
        
        # This models equivalent of fit/predict
        response = client.recognize(config,audio)
        
        # Build Dictionary that becomes a Dataframe
        for result in response.results:
            d = {}
            d['transcript'] = result.alternatives[0].transcript
            d['confidence'] = result.alternatives[0].confidence
            d['file_name'] = sample_audio
            d['audio_length'] = round(int(sample_audio.split("-")[-1].split(".")[0])/1_000,1)
            processing_time = round(time.time() - loop_time,1)
            d['transcribe_time'] = processing_time
            df = df.append(d, ignore_index=True)
                    
        print(f"File {sample_audio} transcribed to df in {round(processing_time,0)} secs")

print(f'total time of {time.time() - start_time}')

File sample27-enhanced-25818-20200501-1210.wav transcribed to df in 10.0 secs
File sample4-enhanced-25818-20200501-1210.wav transcribed to df in 2.0 secs
File sample10-enhanced-25818-20200501-1210.wav transcribed to df in 13.0 secs
File sample38-enhanced-25818-20200501-1210.wav transcribed to df in 13.0 secs
File sample49-enhanced-25818-20200501-1210.wav transcribed to df in 14.0 secs
File sample23-enhanced-25818-20200501-1210.wav transcribed to df in 10.0 secs
File sample14-enhanced-25818-20200501-1210.wav transcribed to df in 3.0 secs
File sample45-enhanced-25818-20200501-1210.wav transcribed to df in 9.0 secs
File sample34-enhanced-25818-20200501-1210.wav transcribed to df in 9.0 secs
File sample8-enhanced-25818-20200501-1210.wav transcribed to df in 5.0 secs
File sample18-enhanced-25818-20200501-1210.wav transcribed to df in 2.0 secs
File sample41-enhanced-25818-20200501-1210.wav transcribed to df in 4.0 secs
File sample30-enhanced-25818-20200501-1210.wav transcribed to df in 11.0 

### View Dataframe

In [12]:
pd.set_option('display.max_columns',None)  

df = df[['file_name',
         'audio_length',
         'transcribe_time',
         'confidence',
         'transcript']]
df

Unnamed: 0,file_name,audio_length,transcribe_time,confidence,transcript
0,sample27-enhanced-25818-20200501-1210.wav,1.2,10.1,0.711187,1175
1,sample27-enhanced-25818-20200501-1210.wav,1.2,10.1,0.748561,and shoes help to 1433 Washington Street West...
2,sample27-enhanced-25818-20200501-1210.wav,1.2,10.2,0.613585,for a 3-3 Blog
3,sample4-enhanced-25818-20200501-1210.wav,1.2,2.0,0.779042,Galaxy S7 so
4,sample10-enhanced-25818-20200501-1210.wav,1.2,12.7,0.808266,I want to say 78 customer rescue on 70th possi...
...,...,...,...,...,...
62,sample43-enhanced-25818-20200501-1210.wav,1.2,8.3,0.635526,I spoke to the answering service for the sprin...
63,sample29-enhanced-25818-20200501-1210.wav,1.2,11.2,0.752587,yeah that cost an occupied across McDonald's p...
64,sample36-enhanced-25818-20200501-1210.wav,1.2,5.2,0.678468,that's all the oil wants to plug up a little
65,sample47-enhanced-25818-20200501-1210.wav,1.2,13.1,0.620793,check out I can put them on my car. He'll find...


### Export CSV

In [13]:
df.to_csv(output_path + file_name,
          index_label = False)