# Audio Speech Recognition Experiments for Quiz Bowl

The purpose of this code is to evaluate if using ASR transcriptions, relative to original text, hurts Quiz Bowl performance.  The Analysis of 136 quiz bowl questions, correctly answered by inputs of both ASR and original text, demonstrates that the Quiz Bowl system needs to see 8.95% of an ASR transcribed question, but only 6.57% of the original text.  

In [48]:
from gtts import gTTS
import speech_recognition as sr
import glob
import subprocess
import requests
import csv
import time
import json
import statistics

## Part 1: Generate Audio Data and Process with ASR 

### Used to query Quiz Bowl

In [5]:
def answer_question(text):
    response = requests.post(
        'http://trantor.entilzha.io:5000/api/answer_question',
        data={'text': text}
    ).json()
    return response['guess'], response['score']

### Generate .wav files from Text to Audio 

In [101]:
#stores questions by document
storage = dict()

#loop through each document (only CSV files in the QB data folder)
for each_file in glob.glob('../../../../qb/data/questions/expo/*.csv'):
    
    print (each_file)
    
    with open(each_file) as f:
        file_storage = []
        
        data = csv.reader(f)
        
        #dump header
        header = next(data)
        if "text" in header:
            #find proper index of question text
            correct_col = header.index("text")
            answer_col = header.index("answer")
            #keep track of question number
            counter = 0
            for line in data:
                try:
                    text = (line[correct_col])
                    answer = (line[answer_col])
                    file_dict = {}
                    file_dict['text'] = text
                    file_dict['answer'] = answer
                    file_storage.append(file_dict)
                    
                    #convert into audio with gTTS, save it to mp3, convert it to WAV
                    sentTTS = gTTS(text, lang='en', slow=True)
                    file_name = each_file + "_" +str(counter) 
                    sentTTS.save(file_name+".mp3")
                    subprocess.call(['ffmpeg', '-i', file_name+".mp3",
                    file_name + '.wav'])
                    counter += 1   
                
                except:
                    print ("Issue caused by " + str(line))                      
            storage[each_file] = file_storage

../../../../qb/data/questions/expo/2015_hsnct.csv
../../../../qb/data/questions/expo/2015_jennings.csv
../../../../qb/data/questions/expo/2015_jennings.power.csv
../../../../qb/data/questions/expo/2016_hsnct.csv
../../../../qb/data/questions/expo/2016_naacl.csv
../../../../qb/data/questions/expo/2017_hsnct.csv
../../../../qb/data/questions/expo/2017_hsnct.power.csv


### Transcribe Speech to Text with IBM

In [102]:
#update here - redacted for Github
IBM_USERNAME = ""
IBM_PASSWORD = ""

processed_speech = []
    
record_data = dict()
#Update file path as needed
for each_file in glob.glob('../../../../qb/data/questions/expo/*.wav'):
    r = sr.Recognizer()
    with sr.AudioFile(each_file) as source:              
        audio = r.record(source)
    #PocketSphinx is used locally to decipher the audio
    audio_data = r.recognize_ibm(audio, IBM_USERNAME, IBM_PASSWORD)
    #find the appropriate file and question number.  WAV files contain this information
    #lower for bleu score calculation
    text_data = storage[each_file[0:each_file.rfind('_')]][int(each_file [each_file.rfind('_')+1:each_file.rfind('.')])]['text'].lower() 
    answer = storage[each_file[0:each_file.rfind('_')]][int(each_file [each_file.rfind('_')+1:each_file.rfind('.')])]['answer'].lower()
    print ("Original ", text_data)
    print ("Transcribed ", (audio_data)) 
    print ("Correct Answer", answer)
    final_output = {}
    final_output['answer_original'] = answer_question(text_data)
    final_output['answer_asr'] = answer_question(audio_data)
    final_output['original'] = text_data
    final_output['transcribed'] = audio_data
    final_output['answer'] = answer
    
    processed_speech.append(final_output)
    #add pause to avoid spamming API
    time.sleep(5)

Original  one of the main figures of this book tames a squirrel called "red" while living with undersheriff wendle meier.  in this book's final chapter, "the corner," al dewey meets up with susan kidwell and witnesses an execution.  this book begins in holcomb, kansas, where dick hickock and and perry smith murder the clutter family in the title fashion.  for 10 points--name this "nonfiction novel" by truman capote [kuh-poh-tee].
Transcribed  one of the main figures of this book James a squirrel called red while living with under share a 
Wendell Meyer in this book final chapter the corner L. doing meets up with Susan Kidwell and witnesses in execution this book begins in Holcomb Kansas where Dick Hagen and Perry Smith murder the clatter family in the tidal fashion 
for ten points name this non fiction novel by Truman Capote 
%HESITATION Poteet 
Correct Answer in_cold_blood
Original  inhomogeneities of this vector quantity are reduced by shim coils.  it gives an effective tension that 

### Export JSON For Future Reference

In [103]:
exportable = { 'data': processed_speech}

with open('expo.json', 'w') as fp:
    json.dump(exportable, fp)

## Part 2: Analysis

In [3]:
with open('expo.json') as json_data:
    d = json.load(json_data)
    data = d['data']

### Identfy the cases in which the QB Prediction of ASR == the one on the original text

In [19]:
count = 0
agreement_batch = []

for item in data:
    if item['answer_original'][0].lower()  == item['answer'] == item['answer_asr'][0].lower():
        storage = [item['original'], item['transcribed'], item['answer']] 
        agreement_batch.append(storage)

### Calculate the average words needed for the first accurate QB prediction - the original text

In [54]:
avg_original = 0
avg_asr = 0
original_results = []
original_percentage = []
count = 0

for item in agreement_batch:
    count += 1
    if count%10 == 0:
        print (count)
    
    words = item[0].split()
    for i in range(2, len(item[0])):
        query = answer_question(' '.join(words[0:i]))
        if query[0].lower() == item[2]:
            original_results.append(i)
            original_percentage.append(float(i)/len(item[0]))
            break
        
        if i == len(item[0]):
            original_results.append("Something went wrong")

#Extra Code to see Avg Length of ASR and original text
#avg_original += len(item[0].split())
#avg_asr += len(item[1].split())
#print (avg_original/len(agreement_batch), avg_asr/len(agreement_batch))

magnetic_field
lethal_injection
olmec
war_of_the_spanish_succession
albert_camus
stem_cell
édouard_manet
vladivostok
atlanta
10
japan
fort_ticonderoga
william_blake
miranda_v._arizona
humerus
vietnam
søren_kierkegaard
robert_herrick
yangtze
quasar
20
æthelred_the_unready
mid-atlantic_ridge
alberta
psycho_(1960_film)
toronto
william_makepeace_thackeray
lebanon
the_bald_soprano
saxophone
electrolysis
30
death_valley_national_park
benjamin_disraeli
slaughterhouse-five
thanatopsis
petrarch
spleen
malaria
venus
sikhism
theseus
40
praxiteles
george_bernard_shaw
joseph_mccarthy
battle_of_stalingrad
peroxisome
utah
lucia_di_lammermoor
china
sigmund_freud
magnetic_domain
50
snake
benjamin_britten
to_an_athlete_dying_young
indonesia
the_decameron
lima
prince_henry_the_navigator
pre-raphaelite_brotherhood
ketone
spain
60
merovingian_dynasty
cat's_cradle
poisson_distribution
the_crucible
huntington's_disease
the_tin_drum
wounded_knee_massacre
john_donne
andrew_wyeth
book_of_deuteronomy
70
warren_b

### Calculate the average words needed for the first accurate QB prediction - ASR Transcription

In [42]:
asr_results = []
asr_percentage = []
count = 0

for item in agreement_batch:
    count += 1
    if count%10 == 0:
        print (count)
        
    count = 3
    words = item[1].split()
    for i in range(2, len(item[1])):
        query = answer_question(' '.join(words[0:i]))
        if query[0].lower() == item[2]:
            asr_results.append(i)
            asr_percentage.append(float(i)/len(item[1]))
            break
        
        if i == len(item[1]):
            asr_results.append("Something went wrong")
    
   

In [60]:
print ("Statistics for the original text")
print (statistics.mean(original_results))
print (statistics.mean(original_percentage))
print()
print ("Statistics for the ASR Transcription")
print (statistics.mean(asr_results))
print (statistics.mean(asr_percentage))

Statistics for the original text
17.345588235294116
0.06570769705904247

Statistics for the ASR Transcription
25.794117647058822
0.08956855146818574
