# Experiment: Presidential Campaigns Ads Dataset - Feature Extraction -

This notebook shows how to use cloud services using REST API to convert audio to text, to analyze the extracted text and frames contents. Using the files previously collected (see Experiment: Predict Elections Using Presidential Commercial Campaign - Data Collection - notebook), You are going to use cognitive services, text analytics, speech recognition and optical character recognition that are among the most powerful tools for data augmentation offered by Azure Microsoft public cloud. By using these tools you will increase the availability of data in your possess from a lagely available source: YouTube videos. This notebook ends with a an application of statistical tool with the aim of predicting presidential candidates likelihood of winning the presidential elections using principally campaign Ads from the past elections.

#### Data collection contents are in the notebook DC

We are going to download videos from the webpage the [Ten of the Most Succesfull Presidential Campaign Ads Ever Made](https://www.kqed.org/lowdown/3955/ten-of-the-best-presidential-campaign-commercials-of-all-time). 
[The Living Room Candidate at Museum of the Moving Image: presidential campaign commercials](http://www.livingroomcandidate.org/)

# Table of Contents
* [Experiment: Predict Elections Outcomes Using Presidential Commercial Campaign](#Experiment:-Predict-Elections-Outcomes-Using-Presidential-Commercial-Campaign)
* [Feature engineering: extract data using Microsoft Azure public cloud services](#Paragraph-3)
    * [Set up containers and upload files (audio & image)](#Set-up-containers-and-upload-files(audio-&-image))
    * [Extract speech from audio using Bing Speech Recognition API](#Extract-speech-from-audio-using-Bing-Speech-Recognition-API)
    * [Extract sentiment and key phrases from text using Text Analytics API](#Extract-sentiment-and-key-phrases-from-text-using-Text-Analytics-API)
    * [Extract images contents and text using Vision API](#Paragraph-3)
* [Dataset](#euflfh)
    * [Data Structure](#euflfh)
    * [Data Description](#sehkgi)

## Feature engineering: extract data using Microsoft Azure public cloud services

### Set up containers and upload files: audio and  image (video frames)

To set up containers, follows these steps:

- access Azure Portal using your account [[Link here](https://portal.azure.com)]
- import libraries and run functions we will use to accomplish tasks quickly and without hardcoding
- set directories to import videos and images
- retrieve storage account service credentials from your azure_keys (public_cloud_computing\guides\keys)
- create container, retrive files to download path and upload them. Repeat the task twice using **`upload_files_to_container()`**. First to upload audio files and then to upload the image files. The function call these funtions at once:
    - retrieve files name, path and extensions (use **`get_files()`**
    - set two containers name and create containers: audio and image (use **`make_public_container()`**)
    - upload files to containers (use **`upload_file()`**)

#### Import libraries and functions

In [704]:
#import libraries
import os
import time
import pickle
from azure.storage.blob import BlockBlobService, PublicAccess
from azure.storage.blob import ContentSettings

In [727]:
def get_files(dir_files):
    """"store file name, extension and path """
    
    files_name = []
    files_path = []
    files_extension = []
    
    for root, directories, files in os.walk(dir_files):
        for file in files:
            files_name.append(file)
            files_path.append(os.path.join(root,file))
            files_extension.append(file.split('.')[-1])
            
    print('Data stored from directory):\t {}'.format(dir_files))
          
    return files_name, files_path, files_extension

In [706]:
def retrive_keys(service_name, PATH_TO_KEYS, KEYS_FILE_NAME):
    """"function to retrieve_keys. return name and key for the selected cloud computing service"""
  
    path_to_keys = os.path.join(PATH_TO_KEYS, KEYS_FILE_NAME)

    with open(path_to_keys, 'rb') as handle:
        azure_keys = pickle.load(handle)

    service_key = azure_keys[service_name]
    
    return service_key

In [712]:
def make_public_container(STORAGE_NAME, STORAGE_KEY, NEW_CONTAINER_NAME):
    """"create blob service, blob container and set it to public access. return blob service"""
    
    blob_service = BlockBlobService(account_name= STORAGE_NAME, account_key=STORAGE_KEY)
    new_container_status = blob_service.create_container(NEW_CONTAINER_NAME) 
    blob_service.set_container_acl(NEW_CONTAINER_NAME, public_access=PublicAccess.Container)
    
    if new_container == True:
        print('{} BLOB container has been successfully created: {}'.format(NEW_CONTAINER_NAME, new_container_status))
    else:
        print('{] something went wrong: check parameters and subscription'.format(NEW_CONTAINER_NAME))

In [737]:
def upload_file(STORAGE_NAME, STORAGE_KEY, NEW_CONTAINER_NAME, file, path, extension, content_type):
    """"create blob service, and upload files to container"""
    
    blob_service = BlockBlobService(account_name= STORAGE_NAME, account_key=STORAGE_KEY)
    
    try:
        blob_service.create_blob_from_path(NEW_CONTAINER_NAME, file, path, content_settings=ContentSettings(content_type= content_type+extension))    
        print("{} // BLOB upload status: successful".format(file))

    except:
        print("{} // BLOB upload status: failed".format(file))

In [738]:
def upload_files_to_container(STORAGE_NAME, STORAGE_KEY, NEW_CONTAINER_NAME, DIR_FILES, CONTENT_TYPE):
    """"create container, get files, and upload to storage"""

    #call funtion to make container
    make_public_container(STORAGE_NAME, STORAGE_KEY, NEW_CONTAINER_NAME)

    print('---------------------------------------------------------')

    #find names, paths and extension of the files stored into directory
    files_name, files_path, files_extension = get_files(DIR_FILES)

    #set uploading procedure starting time
    print('---------------------------------------------------------')
    print("Start uploading files")
    print('---------------------------------------------------------')
    start = time.time()


    #upload all files at once to the new container
    count = 0
    for path, file, ext in zip(files_path, files_name, files_extension):
        upload_file(STORAGE_NAME, STORAGE_KEY, NEW_CONTAINER_NAME, file, path, ext, CONTENT_TYPE) #(blob_service, NEW_CONTAINER_NAME, file, path, ext, CONTENT_TYPE) 
        count += 1
        #add print only failed otherwise good to go

    #set procedure ending time
    end = time.time()
    print('---------------------------------------------------------')
    print('Uploading completed')
    print('---------------------------------------------------------')
    print('It took {} seconds to upload {} files'.format(round(end - start, 2), count))

In [734]:
#make a function to delete container instead!!!

##############################################################
#RUN THIS ONLY IF YOU WANT TO DELETE A CONTAINTER            #
#REMEMBER TO DOWNLOAD YOUR DATA BEFORE DELETING THE CONTAINER#
#IMPORTANT: YOU WILL LOOSE YOUR BLOB INTO THE CONTAINER      #
##############################################################

#uncomment this lines below, add container name, and run to delete container

#select container to delete 
CONTAINER_NAME = '' #add container name
#delete container
delete_container = blob_service.delete_container(NEW_CONTAINER_NAME)
print("{} delition status success: {}".format(NEW_CONTAINER_NAME, delete_container))

myaudio delition status success: True


#### Set directories

In [735]:
#set notebook current directory
cur_dir = os.getcwd()

#set directory to the folder to import azure keys
os.chdir('../../guides/keys/')
dir_azure_keys = os.getcwd()

#set directory to the folder to import audio files
os.chdir('../../data/video/audio/')
dir_audio_files = os.getcwd()

#set directory to the folder to import image files
os.chdir(cur_dir)
os.chdir('../../data/image/frames/')
dir_image_files = os.getcwd()

#print your notebook directory 
#print directories where files are goint to be saved
print('---------------------------------------------------------')
print('Your documents directories are:')
print('- notebook:\t', cur_dir)
print('- azure keys:\t', dir_azure_keys)
print('- audio files:\t', dir_audio_files)
print('- image files:\t', dir_image_files)
print('---------------------------------------------------------')

---------------------------------------------------------
Your documents directories are:
- notebook:	 C:\Users\popor\iqss_workshop\workshops\public_cloud_computing\data\image
- azure keys:	 C:\Users\popor\iqss_workshop\workshops\public_cloud_computing\guides\keys
- audio files:	 C:\Users\popor\iqss_workshop\workshops\public_cloud_computing\data\video\audio
- image files:	 C:\Users\popor\iqss_workshop\workshops\public_cloud_computing\data\image\frames
---------------------------------------------------------


#### Retrieve storage account credentials

In [710]:
#ERASE MY PATH BEFORE REALISING THE WORKSHOP MATERIALS
my_path_to_keys = 'C:/Users/popor/Desktop/'

#set service name, path to the keys and keys file name
SERVICE_NAME = 'STORAGE' #add here: STORAGE, FACE, COMPUTER_VISION, SPEECH_RECOGNITION, TEXT_ANALYTICS, ML_STUDIO
PATH_TO_KEYS = my_path_to_keys #add here (use dir_azure_keys)
KEYS_FILE_NAME = 'azure_services_keys_v1.1.json' #add file name (eg 'azure_services_keys.json')

#call function to retrive
storage_keys = retrive_keys(SERVICE_NAME, PATH_TO_KEYS, KEYS_FILE_NAME)

#set storage name and keys
STORAGE_NAME = storage_keys['NAME']
STORAGE_KEY = storage_keys['API_KEY']

#### Create container, get files and upload audio files 

In [733]:
#set a name for a new container
NEW_CONTAINER_NAME ='myaudio'

#set the audio file directory
DIR_FILES = dir_audio_files

#set content type of the file, in this case is a audio .wav
CONTENT_TYPE = 'audio/x-'

upload_files_to_container(STORAGE_NAME, STORAGE_KEY, NEW_CONTAINER_NAME, DIR_FILES, CONTENT_TYPE)

myaudio BLOB container has been successfully created: True
---------------------------------------------------------
Data stored from directory):	 C:\Users\popor\iqss_workshop\workshops\public_cloud_computing\data\video\audio
-------------------
Start uploading files
-------------------
1988_george_bush_sr_revolving_door_attack_ad_campaign_chunck_1.wav BLOB upload status: successful
1988_george_bush_sr_revolving_door_attack_ad_campaign_chunck_2.wav BLOB upload status: successful
1988_george_bush_sr_revolving_door_attack_ad_campaign_chunck_3.wav BLOB upload status: successful
bill_clinton_hope_ad_1992_chunck_1.wav BLOB upload status: successful
bill_clinton_hope_ad_1992_chunck_2.wav BLOB upload status: successful
bill_clinton_hope_ad_1992_chunck_3.wav BLOB upload status: successful
bill_clinton_hope_ad_1992_chunck_4.wav BLOB upload status: successful
bill_clinton_hope_ad_1992_chunck_5.wav BLOB upload status: successful
bill_clinton_hope_ad_1992_chunck_6.wav BLOB upload status: successfu

#### Create container, get files and upload image files

In [736]:
#set a name for a new container
NEW_CONTAINER_NAME ='myimage'

#set the audio file directory
DIR_FILES = dir_image_files

#set content type of the file, in this case is a audio .wav
CONTENT_TYPE = 'image/'

upload_files_to_container(STORAGE_NAME, STORAGE_KEY, NEW_CONTAINER_NAME, DIR_FILES, CONTENT_TYPE)

myimage BLOB container has been successfully created: True
---------------------------------------------------------
Data stored from directory):	 C:\Users\popor\iqss_workshop\workshops\public_cloud_computing\data\image\frames
-------------------
Start uploading files
-------------------
1988_george_bush_sr_revolving_door_attack_ad_campaign_frame0.jpg BLOB upload status: successful
1988_george_bush_sr_revolving_door_attack_ad_campaign_frame100.jpg BLOB upload status: successful
1988_george_bush_sr_revolving_door_attack_ad_campaign_frame200.jpg BLOB upload status: successful
1988_george_bush_sr_revolving_door_attack_ad_campaign_frame300.jpg BLOB upload status: successful
1988_george_bush_sr_revolving_door_attack_ad_campaign_frame400.jpg BLOB upload status: successful
1988_george_bush_sr_revolving_door_attack_ad_campaign_frame500.jpg BLOB upload status: successful
1988_george_bush_sr_revolving_door_attack_ad_campaign_frame600.jpg BLOB upload status: successful
1988_george_bush_sr_revolvi

kennedy_for_me_campaign_jingle_jfk_1960_frame1600.jpg BLOB upload status: successful
kennedy_for_me_campaign_jingle_jfk_1960_frame1700.jpg BLOB upload status: successful
kennedy_for_me_campaign_jingle_jfk_1960_frame200.jpg BLOB upload status: successful
kennedy_for_me_campaign_jingle_jfk_1960_frame300.jpg BLOB upload status: successful
kennedy_for_me_campaign_jingle_jfk_1960_frame400.jpg BLOB upload status: successful
kennedy_for_me_campaign_jingle_jfk_1960_frame500.jpg BLOB upload status: successful
kennedy_for_me_campaign_jingle_jfk_1960_frame600.jpg BLOB upload status: successful
kennedy_for_me_campaign_jingle_jfk_1960_frame700.jpg BLOB upload status: successful
kennedy_for_me_campaign_jingle_jfk_1960_frame800.jpg BLOB upload status: successful
kennedy_for_me_campaign_jingle_jfk_1960_frame900.jpg BLOB upload status: successful
mcgovern_defense_plan_ad_nixon_1972_presidential_campaign_commercial_frame0.jpg BLOB upload status: successful
mcgovern_defense_plan_ad_nixon_1972_presidentia

## Extract text from audio using Bing Speech Recognition API

To extract text from the audio files uploaded to cloud storage previously, follows these steps:

- access Azure Portal using your account [[Link here](https://portal.azure.com)]
- import libraries and run functions
- retrieve speech recognition credentials and configure API to access sevice
- get a list of the files from the cloud
- request speech recognition services to the public cloud
- extract text from response
- recompose the text of each video adding audio chuncks together
- collect results into a dataframe

In [739]:
#import libraries
import requests
import urllib
import uuid
import json
import pandas as pd

In [742]:
def get_list_blob(STORAGE_NAME, STORAGE_KEY, CONTAINER_NAME):
    """"create blob service and return list of blobs in the container"""
    
    blob_service = BlockBlobService(account_name= STORAGE_NAME, account_key=STORAGE_KEY)
    
    uploaded_file = blob_service.list_blobs(CONTAINER_NAME)
    blob_name_list = []
    for blob in uploaded_file:
        blob_name_list.append(blob.name)
        
    return blob_name_list

In [744]:
#set service name
SERVICE_NAME = 'SPEECH_RECOGNITION' #add here: STORAGE, FACE, COMPUTER_VISION, SPEECH_RECOGNITION, TEXT_ANALYTICS, ML_STUDIO

#call function to retrive keys
storage_keys = retrive_keys(SERVICE_NAME, PATH_TO_KEYS, KEYS_FILE_NAME)

#set speech recognition keys
SPEECH_RECOGNITION_KEY = storage_keys['API_KEY']

#configure API access to request speech recognition service
URI_TOKEN_SPEECH = 'https://api.cognitive.microsoft.com/sts/v1.0/issueToken'
URL_SPEECH = 'https://speech.platform.bing.com/recognize'

#set token request REST headers
headers_token = {}
headers_token['Ocp-Apim-Subscription-Key'] = SPEECH_RECOGNITION_KEY
headers_token['Content-Length'] = '0'

#set api request REST headers
headers_api = {}
headers_api['Authorization'] = 'Bearer {0}'.format(access_token)
headers_api['Content-type'] = 'audio/wav'
headers_api['codec'] = 'audio/pcm'
headers_api['samplerate'] = '16000'

#set api request parameters
params_set = {}
params_set['scenarios'] = 'ulm'
params_set['appid'] = 'D4D52672-91D7-4C74-8AD8-42B1D98141A5'
params_set['locale'] = 'en-US'
params_set['device.os'] = 'PC'
params_set['version'] = '3.0'
params_set['format'] = 'json'
params_set['instanceid'] = str(uuid.uuid1())
params_set['requestid'] = str(uuid.uuid1())

In [452]:
#set container to retrieve files from
CONTAINER_NAME = 'myaudio'

#get list of blob
blob_list = get_list_blob(STORAGE_NAME, STORAGE_KEY, CONTAINER_NAME)

#store http response and json file
responses = []
http_responses = []

#set procedure starting time
print('-------------------')
print("Start speech to text conversion")
print('-------------------')
start = time.time()

#run speech recognition on uploaded audio files (i.e. extension .wax)
for blob_name in blob_list:
    if blob_name.split('.')[-1] == 'wav':

        #request for token
#         headers = {'Content-Length': '0', #need this?
#                    'Ocp-Apim-Subscription-Key': SPEECH_RECOGNITION_KEY}
        api_response = requests.post(URI_TOKEN_SPEECH, headers=headers_token)
        access_token = str(api_response.content.decode('utf-8'))

        #convert blob to bytes
        blob = blob_service.get_blob_to_bytes(NEW_CONTAINER_NAME, blob_name)

        #request for speech recognition service
#         headers = {'Authorization': 'Bearer {0}'.format(access_token),
#                    'Content-type': 'audio/wav', 
#                    'codec': 'audio/pcm', 
#                    'samplerate': '16000'}
#         params = urllib.parse.urlencode({'scenarios': 'ulm',
#                                          'appid': 'D4D52672-91D7-4C74-8AD8-42B1D98141A5',
#                                          'locale': 'en-US',
#                                          'device.os': 'PC',
#                                          'version': '3.0',
#                                          'format': 'json',
#                                          'instanceid': str(uuid.uuid1()),
#                                          'requestid': str(uuid.uuid1())})

        params = urllib.parse.urlencode(params_set)

        api_response = requests.post(URL_SPEECH, headers=headers_api, params=params, data=blob.content)
        print('{} had a {} response'.format(blob_name, api_response))

        #extract data from response
        res_json = json.loads(api_response.content.decode('utf-8'))
        http_responses.append(api_response)
        responses.append(res_json)

#load output next cell       
        
#set procedure ending time
end = time.time()
print('-------------------')
print('Conversion completed')
print('-------------------')
print('It took {} seconds to '.format(round(end - start, 2)))

-------------------
Start speech to text conversion
-------------------
1988_george_bush_sr_revolving_door_attack_ad_campaign_chunck_1.wav had a <Response [200]> response
1988_george_bush_sr_revolving_door_attack_ad_campaign_chunck_2.wav had a <Response [200]> response
1988_george_bush_sr_revolving_door_attack_ad_campaign_chunck_3.wav had a <Response [200]> response
bill_clinton_hope_ad_1992_chunck_1.wav had a <Response [200]> response
bill_clinton_hope_ad_1992_chunck_2.wav had a <Response [200]> response
bill_clinton_hope_ad_1992_chunck_3.wav had a <Response [200]> response
bill_clinton_hope_ad_1992_chunck_4.wav had a <Response [200]> response
bill_clinton_hope_ad_1992_chunck_5.wav had a <Response [200]> response
bill_clinton_hope_ad_1992_chunck_6.wav had a <Response [200]> response
bill_clinton_hope_ad_1992_chunck_7.wav had a <Response [200]> response
eisenhower_for_president_1952_chunck_1.wav had a <Response [200]> response
eisenhower_for_president_1952_chunck_2.wav had a <Response 

In [584]:
#organize response output
status = []
name = []
lexical = []
request_id = []
confidence = []

#select variables from output
for i, response in enumerate(responses):
    if responses[i]['header']['status'] == 'success':
        status.append(responses[i]['header']['status'])
        name.append(responses[i]['header']['name'])
        lexical.append(responses[i]['header']['lexical'])
        request_id.append(responses[i]['header']['properties']['requestid'])
        confidence.append(responses[i]['results'][0]['confidence'])
    else:
        status.append('Error')
        name.append('Nan')
        lexical.append('Nan')
        request_id.append('Nan')
        confidence.append('Nan')

#combine output into df
df_log_response = pd.DataFrame({'file_name' : blob_name_list,
                                'stt_http_response' :  http_responses,
                                'stt_id' : request_id,
                                'stt_status' : status,
                                'stt_name' : name,
                                'stt_text' : lexical,
                                'stt_confidence' : confidence})

#display df
df_log_response.head()

## dump to pickle

Unnamed: 0,file_name,stt_http_response,stt_id,stt_status,stt_name,stt_text,stt_confidence
0,1988_george_bush_sr_revolving_door_attack_ad_c...,<Response [200]>,49e7ae57-2d45-423a-afeb-b9e405d28ffa,success,who is governor michael dukakis vitov mandator...,who is governor michael dukakis vitov mandator...,0.8767573
1,1988_george_bush_sr_revolving_door_attack_ad_c...,<Response [200]>,8e327848-32ca-4d66-abd7-636b2c1019f9,success,impolicy gave weekend frontales to first degre...,impolicy gave weekend frontales to first degre...,0.8108339
2,1988_george_bush_sr_revolving_door_attack_ad_c...,<Response [200]>,a9a576e0-4a40-4b84-bb02-188a13cd9baf,success,large how michael dukakis says he wants to do ...,large how michael dukakis says he wants to do ...,0.8396592
3,bill_clinton_hope_ad_1992_chunck_1.wav,<Response [200]>,4fa5c0c3-4c8b-4988-91b3-34e0028915bc,success,I was born a little town called hope arkansas ...,i was born a little town called hope arkansas ...,0.8724898
4,bill_clinton_hope_ad_1992_chunck_2.wav,<Response [200]>,7de8dcc0-b7e3-4ec4-923d-e7f43e8de2d0,success,very limited income it was in 1963 that I went...,very limited income it was in nineteen sixty t...,0.8587455


In [741]:
#recompone text from speech recognition service into a df 
dict_speech_recognition = dict()
candidate_president = ['eisenhower',
                       'george_bush_sr',
                       '1964',
                       'humphrey',
                       'jfk',
                       'nixon',
                       'ronald_reagan',
                       'bill_clinton',
                       'bushcheney',
                       'barack_obama'] 

#extract text for each candidate and join it
for name in candidate_president:    
    dict_name = dict()
    audio_text = []

    for i, entry in enumerate(df_log_response.loc[:,'file_name']):
        if name in entry:
            #uncomment the line below and indent the next
            #if you want to get a ride of Nan
            if df_re.loc[i, 'tts_text'] != 'Nan': 
                audio_text.append(df_log_response.loc[i, 'stt_text'])
                        
    n_words = []
    for words in audio_text: 
        n_words.append(int(len(words.split(' '))))
    words_count = sum(n_words)    
    
    joined_audio = " ".join(audio_text)
    dict_name['stt_text'] = joined_audio
    dict_name['stt_words_count'] = words_count
    dict_speech_recognition[name] = dict_name 

#convert dictionary to df
df_stt = pd.DataFrame.from_dict(dict_speech_recognition , orient='index').reset_index()
df_stt.columns =  'title', 'stt_text', 'stt_words_count'
df_stt

Unnamed: 0,title,stt_text,stt_words_count
0,1964,Nan play hello by standing ben Nan please are ...,50
1,barack_obama,Nan how Nan Nan how what Nan what people of th...,33
2,bill_clinton,i was born a little town called hope arkansas ...,173
3,bushcheney,i'm george W bush and i approve this message i...,78
4,eisenhower,i for president for president i like my comput...,31
5,george_bush_sr,who is governor michael dukakis vitov mandator...,63
6,humphrey,Nan Nan,2
7,jfk,Nan do you wanna man for president who season ...,17
8,nixon,the mcgovern defense plan he would cut the mar...,84
9,ronald_reagan,it's morning again in america today more men a...,112


In [628]:
#output of the text from the audio of a selected Ad
print('Below, the text used in the presidential campaign Ad by candidate {}:\n"{}"'.format(df_stt.loc[2,'title'], df_stt.loc[2,'stt_text'].replace('Nan', '').replace('   ', ' ')))

Below, the text used in the presidential campaign Ad by candidate bill_clinton:
"i was born a little town called hope arkansas three months after my father died i remembered old two story house where i live in the grandparents very limited income it was in nineteen sixty three that i went to washington invent president kennedy at the boys nation program an IRA member just thinking morning kredible country this was it somebody like me you had no money or anything would be given the opportunity to meet the president rocket really do public service 'cause i care so much about people i work my way through law school with part time jobs anything i could find after i graduated i really didn't care about making a lot of money i just wanna go home and see if i can make a difference between work hard and education and healthcare create jobs and we real progress now it's exhilarating tomato think that is present i could help to change all are peoples lives for the better and bring hope back to t

### Extract sentiment and key phrases from text using Text Analytics API

To extract sentiment and key phrases from the text, follows these steps:

- access Azure Portal using your account [[Link here](https://portal.azure.com)]
- retrieve text analytics service credentials and configure API to access service
- use text from the audio files
- request sentiment analysis and extract key phrases services to the public cloud
- recompose the text of each video adding audio chuncks together
- collect results into a dataframe

In [541]:
#set service name
SERVICE_NAME = 'TEXT_ANALYTICS' #add here: STORAGE, FACE, COMPUTER_VISION, SPEECH_RECOGNITION, TEXT_ANALYTICS, ML_STUDIO

#call function to retrive keys
storage_keys = retrive_keys(SERVICE_NAME, PATH_TO_KEYS, KEYS_FILE_NAME)

#set text analytics keys
TEXT_ANALYTICS_KEY = storage_keys['API_KEY']

#configure API access to request text analytics service
URI_SENTIMENT = 'https://eastus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment'
URI_KEY_PHRASES = 'https://eastus.api.cognitive.microsoft.com/text/analytics/v2.0/keyPhrases'

#set REST headers
headers = {}
headers['Ocp-Apim-Subscription-Key'] = TEXT_ANALYTICS_KEY
headers['Content-Type'] = 'application/json'
headers['Accept'] = 'application/json'

In [618]:
#set procedure starting time
print('--------------------------------------')
print("Start text analysis")
start = time.time()

#store text analysis to list
sentiment_text = []
key_phrases = []
sentiment_mean_key_phrases = []   

#perform on text for each audio
for i, entry in enumerate(df_stt.index):
    text = df_stt.loc[i,'stt_text'].replace('Nan', '').replace('   ', ' ')

    #create request to determine sentiment from text
    data = json.dumps({"documents":[{"id":str(uuid.uuid1()), "language":"en", "text":text}]}).encode('utf-8')
    request = urllib.request.Request(URI_SENTIMENT, data, headers)
    response = urllib.request.urlopen(request)
    responsejson = json.loads(response.read().decode('utf-8'))
    try:
        sentiment = responsejson['documents'][0]['score']
    except:
        sentiment = 'Nan'
    sentiment_text.append(sentiment)

    #create request to determine key phrases from text
    data = data
    request = urllib.request.Request(URI_KEY_PHRASES, data, headers)
    response = urllib.request.urlopen(request)
    responsejson = json.loads(response.read().decode('utf-8'))
    try:
        key_phrase = responsejson['documents'][0]['keyPhrases']
    except:
        key_phrase = 'Nan'
    key_phrases.append(key_phrase)
    
    #create request to determine sentiment from key phrases
    sentiment_key_phrases = []
    for key in key_phrase:
        data = json.dumps({"documents":[{"id":str(uuid.uuid1()), "language":"en", "text":key}]}).encode('utf-8')
        request = urllib.request.Request(URI_SENTIMENT, data, headers)
        response = urllib.request.urlopen(request)
        responsejson = json.loads(response.read().decode('utf-8'))
        sentiment = responsejson['documents'][0]['score']
        sentiment_key_phrases.append(round(sentiment, 2))
        time.sleep(1)

    sentiment_mean = sum(sentiment_key_phrases)/len(sentiment_key_phrases)
    sentiment_mean_key_phrases.append(sentiment_mean)

#assign new column to df_stt
df_stt['ta_sentiment_text'] = sentiment_text
df_stt['ta_key_phrases'] = key_phrases
df_stt['ta_sentiment_key_phrases'] = sentiment_mean_key_phrases

#set procedure ending time
end = time.time()
print('Text analysis completed')
print('--------------------------------------')
print('It took {} to perform text analysis'.format(round(end - start, 2)))

-------------------
Start text analysis
-------------------
-------------------
Text analysis completed
-------------------
It took 144.0 to perform text analysis


In [630]:
#display dataset after text analytics
df_text_analytics = df_stt.copy()
df_text_analytics

Unnamed: 0,title,stt_text,stt_words_count,sentiment_text,key_phrases,sentiment_key_phrases
0,1964,Nan play hello by standing ben Nan please are ...,50,0.856453,"[stakes, god, president johnson, s children, v...",0.7625
1,barack_obama,Nan how Nan Nan how what Nan what people of th...,33,0.243644,"[people, nation false hope]",0.815
2,bill_clinton,i was born a little town called hope arkansas ...,173,0.5,"[cause i, present i, graduated i, lot of money...",0.571613
3,bushcheney,i'm george W bush and i approve this message i...,78,0.5,"[john kerry lead carey, john kerry whichever w...",0.677273
4,eisenhower,i for president for president i like my comput...,31,0.918349,"[president i, computer billy graham washingto...",0.7
5,george_bush_sr,who is governor michael dukakis vitov mandator...,63,0.0557907,"[massachusetts america, governor michael dukak...",0.529167
6,humphrey,Nan Nan,2,Nan,Nan,0.4
7,jfk,Nan do you wanna man for president who season ...,17,0.5,"[wanna man, president]",0.38
8,nixon,the mcgovern defense plan he would cut the mar...,84,0.5,"[cut navy personnel, navy fleet, mcgovern defe...",0.602667
9,ronald_reagan,it's morning again in america today more men a...,112,0.5,"[short years, half, young men, rates, leadersh...",0.590588


## Extract images contents and text using Vision API

To extract text from the audio files uploaded to cloud storage previously, follows these steps:

- access Azure Portal using your account [Link here]
- import libraries and run functions
- retrieve computer vision service credentials and configure API to access sevice
- get a list of url blob stored in the cloud
- request vision services to the public cloud
- organize results from response
- recompose the results of all frames for each video
- collect results into a dataframe

In [756]:
#set service name
SERVICE_NAME = 'COMPUTER_VISION' #add here: STORAGE, FACE, COMPUTER_VISION, SPEECH_RECOGNITION, TEXT_ANALYTICS, ML_STUDIO

#call function to retrive keys
storage_keys = retrive_keys(SERVICE_NAME, PATH_TO_KEYS, KEYS_FILE_NAME)

#set text analytics keys
COMPUTER_VISION_KEY = storage_keys['API_KEY']

#configure API access to request text analytics service
URI_ANALYZE = 'https://eastus.api.cognitive.microsoft.com/vision/v1.0/analyze'
URI_OCR = 'https://eastus.api.cognitive.microsoft.com/vision/v1.0/ocr'

#set REST headers
headers = {}
headers['Ocp-Apim-Subscription-Key'] = COMPUTER_VISION_KEY
headers['Content-Type'] = 'application/json'
headers['Accept'] = 'application/json'

#set api request parameters
params_set = {}
params_set['visualFeatures'] = 'Categories,Tags,Description,Faces,ImageType,Color,Adult'

In [761]:
#set container to retrieve files from
CONTAINER_NAME = 'myimage'

#get list of blob
blob_list, blob_url = retrive_blob_list(azure_keys, CONTAINER_NAME)

#store http response and json file
responses = []
http_responses = []

#set procedure starting time
print('-------------------')
print("Start computer vision")
print('-------------------')
start = time.time()

#run analyze image service on video frames (i.e. extension .wax)
for blob_name in blob_url:
    print(blob_name)
    if blob_name.split('.')[-1] == 'jpg':
        
        #convert blob to bytes
        #blob = blob_service.get_blob_to_bytes(NEW_CONTAINER_NAME, blob_name)
        
        params = urllib.parse.urlencode(params_set)
        query_string = '?{0}'.format(params) 
        url = URI_ANALYZE + query_string
        body = '{\'url\':\'' + blob_name + '\'}'
        
        #request for computer vision service   
        api_response = requests.post(url, headers=headers, data=body)
        print('{} had a {} response'.format(blob_name, api_response))

        #extract data from response
        res_json = json.loads(api_response.content.decode('utf-8'))
        http_responses.append(api_response)
        responses.append(res_json)

#load output next cell       
        
#set procedure ending time
end = time.time()
print('-------------------')
print('Conversion completed')
print('-------------------')
print('It took {} seconds to '.format(round(end - start, 2)))

-------------------
Start speech to text conversion
-------------------
https://cloudcomputingplayground.blob.core.windows.net/myimage/1988_george_bush_sr_revolving_door_attack_ad_campaign_frame0.jpg
https://cloudcomputingplayground.blob.core.windows.net/myimage/1988_george_bush_sr_revolving_door_attack_ad_campaign_frame0.jpg had a <Response [200]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/1988_george_bush_sr_revolving_door_attack_ad_campaign_frame100.jpg
https://cloudcomputingplayground.blob.core.windows.net/myimage/1988_george_bush_sr_revolving_door_attack_ad_campaign_frame100.jpg had a <Response [200]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/1988_george_bush_sr_revolving_door_attack_ad_campaign_frame200.jpg
https://cloudcomputingplayground.blob.core.windows.net/myimage/1988_george_bush_sr_revolving_door_attack_ad_campaign_frame200.jpg had a <Response [200]> response
https://cloudcomputingplayground.blob.core.windows.net/m

https://cloudcomputingplayground.blob.core.windows.net/myimage/eisenhower_for_president_1952_frame1200.jpg had a <Response [429]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/eisenhower_for_president_1952_frame1300.jpg
https://cloudcomputingplayground.blob.core.windows.net/myimage/eisenhower_for_president_1952_frame1300.jpg had a <Response [429]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/eisenhower_for_president_1952_frame1400.jpg
https://cloudcomputingplayground.blob.core.windows.net/myimage/eisenhower_for_president_1952_frame1400.jpg had a <Response [429]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/eisenhower_for_president_1952_frame1500.jpg
https://cloudcomputingplayground.blob.core.windows.net/myimage/eisenhower_for_president_1952_frame1500.jpg had a <Response [429]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/eisenhower_for_president_1952_frame1600.jpg
https://cloudcom

https://cloudcomputingplayground.blob.core.windows.net/myimage/high_quality_famous_daisy_attack_ad_from_1964_presidential_election_frame300.jpg had a <Response [429]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/high_quality_famous_daisy_attack_ad_from_1964_presidential_election_frame400.jpg
https://cloudcomputingplayground.blob.core.windows.net/myimage/high_quality_famous_daisy_attack_ad_from_1964_presidential_election_frame400.jpg had a <Response [429]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/high_quality_famous_daisy_attack_ad_from_1964_presidential_election_frame500.jpg
https://cloudcomputingplayground.blob.core.windows.net/myimage/high_quality_famous_daisy_attack_ad_from_1964_presidential_election_frame500.jpg had a <Response [429]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/high_quality_famous_daisy_attack_ad_from_1964_presidential_election_frame600.jpg
https://cloudcomputingplayground.blob.co

https://cloudcomputingplayground.blob.core.windows.net/myimage/kennedy_for_me_campaign_jingle_jfk_1960_frame1600.jpg had a <Response [429]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/kennedy_for_me_campaign_jingle_jfk_1960_frame1700.jpg
https://cloudcomputingplayground.blob.core.windows.net/myimage/kennedy_for_me_campaign_jingle_jfk_1960_frame1700.jpg had a <Response [429]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/kennedy_for_me_campaign_jingle_jfk_1960_frame200.jpg
https://cloudcomputingplayground.blob.core.windows.net/myimage/kennedy_for_me_campaign_jingle_jfk_1960_frame200.jpg had a <Response [429]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/kennedy_for_me_campaign_jingle_jfk_1960_frame300.jpg
https://cloudcomputingplayground.blob.core.windows.net/myimage/kennedy_for_me_campaign_jingle_jfk_1960_frame300.jpg had a <Response [429]> response
https://cloudcomputingplayground.blob.core.windows.net/my

https://cloudcomputingplayground.blob.core.windows.net/myimage/ronald_reagan_tv_ad_its_morning_in_america_again_frame0.jpg had a <Response [429]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/ronald_reagan_tv_ad_its_morning_in_america_again_frame100.jpg
https://cloudcomputingplayground.blob.core.windows.net/myimage/ronald_reagan_tv_ad_its_morning_in_america_again_frame100.jpg had a <Response [200]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/ronald_reagan_tv_ad_its_morning_in_america_again_frame1000.jpg
https://cloudcomputingplayground.blob.core.windows.net/myimage/ronald_reagan_tv_ad_its_morning_in_america_again_frame1000.jpg had a <Response [200]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/ronald_reagan_tv_ad_its_morning_in_america_again_frame1100.jpg
https://cloudcomputingplayground.blob.core.windows.net/myimage/ronald_reagan_tv_ad_its_morning_in_america_again_frame1100.jpg had a <Response [200]> resp

https://cloudcomputingplayground.blob.core.windows.net/myimage/yes_we_can__barack_obama_music_video_frame2000.jpg had a <Response [429]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/yes_we_can__barack_obama_music_video_frame2100.jpg
https://cloudcomputingplayground.blob.core.windows.net/myimage/yes_we_can__barack_obama_music_video_frame2100.jpg had a <Response [429]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/yes_we_can__barack_obama_music_video_frame2200.jpg
https://cloudcomputingplayground.blob.core.windows.net/myimage/yes_we_can__barack_obama_music_video_frame2200.jpg had a <Response [429]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/yes_we_can__barack_obama_music_video_frame2300.jpg
https://cloudcomputingplayground.blob.core.windows.net/myimage/yes_we_can__barack_obama_music_video_frame2300.jpg had a <Response [429]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/yes_we_can_

In [1118]:
#keys available from analyze
print(responses[15].keys())
print('-------------------')
#response example
print(responses[15])
print('-------------------')
#response example
print(responses[15]['color'])

dict_keys(['categories', 'tags', 'description', 'faces', 'adult', 'color', 'imageType', 'requestId', 'metadata'])
-------------------
{'categories': [{'name': 'people_many', 'score': 0.359375, 'detail': {'celebrities': []}}], 'tags': [{'name': 'person', 'confidence': 0.9894850850105286}, {'name': 'indoor', 'confidence': 0.971613883972168}, {'name': 'people', 'confidence': 0.5700908303260803}, {'name': 'crowd', 'confidence': 0.0051325480453670025}], 'description': {'tags': ['person', 'indoor', 'man', 'people', 'looking', 'woman', 'food', 'red', 'group', 'restaurant', 'standing', 'holding', 'room', 'table', 'large', 'train', 'kitchen', 'plate'], 'captions': [{'text': 'a group of people looking at each other', 'confidence': 0.7775052562780648}]}, 'faces': [{'age': 42, 'gender': 'Male', 'faceRectangle': {'top': 103, 'left': 192, 'width': 60, 'height': 60}}], 'adult': {'isAdultContent': False, 'adultScore': 0.01876509189605713, 'isRacyContent': False, 'racyScore': 0.06318016350269318}, 'col

In [1109]:
#organize results from response

response_status = [] #append status good and bad
fr_category = []
fr_category_confidence = []
fr_detail_celebrities = []
fr_detail_celebrities_confidence = []
fr_tag_name = []
fr_tag_confidence = []
fr_tag_description = []
fr_caption = []
fr_caption_confidence = []
fr_face_age = []
fr_face_gender = []



for i, response in enumerate(responses):
    if next(iter(responses[i])) == 'statusCode':
        response_status.append(responses[i]['statusCode'])
        fr_category.append('Nan')
        fr_category_confidence.append('Nan') 
        fr_detail_celebrities.append('Nan')
        fr_detail_celebrities_confidence.append('Nan')
        fr_tag_name.append('Nan')
        fr_tag_confidence.append('Nan')
        fr_tag_description.append('Nan')
        fr_caption.append('Nan')
        fr_caption_confidence.append('Nan')
        fr_face_age.append('Nan')
        fr_face_gender.append('Nan')
        
        #add here other features
    
    else:
        response_status.append('<200>')
        
        #parse over the categories key of the response
        for j, response in enumerate(responses[i]['categories']):
                                
                #get all the category with a relatively high score
                count =  0
                if response['score'] > 0.25:
                    #check for multiple high score category
                    fr_category.append(response['name'].strip('_'))
                    fr_category_confidence.append(response['score']) 
                    
                    #extract celebrities
                    if 'detail' in response.keys():
                        if 'celebrities' in response['detail'].keys():
                            if response['detail']['celebrities'] != []:
                                fr_detail_celebrities.append(response['detail']['celebrities'][0]['name'])
                                fr_detail_celebrities_confidence.append(response['detail']['celebrities'][0]['confidence'])             
                            else:
                                fr_detail_celebrities.append('Nan')
                                fr_detail_celebrities_confidence.append('Nan')        
                        else:
                            fr_detail_celebrities.append('Nan')
                            fr_detail_celebrities_confidence.append('Nan')
                    else:
                        fr_detail_celebrities.append('Nan')
                        fr_detail_celebrities_confidence.append('Nan')
                    
                    break
                
                else:           
                    if count == j:
                        fr_category.append('Nan')
                        fr_category_confidence.append('Nan') 
                        fr_detail_celebrities.append('Nan')
                        fr_detail_celebrities_confidence.append('Nan')
                        break
                        
                    count =+ 1
        
        #parse over the tags key of the response
        tags_name = []
        tags_confidence = []
        for k, response in enumerate(responses[i]['tags']):
            tags_name.append(response['name'])
            tags_confidence.append(response['confidence'])
        fr_tag_name.append(tags_name)
        fr_tag_confidence.append(tags_confidence)
        
        #parse over the description key of the response
        tags_description = []
        for k, response in enumerate(responses[i]['description']['tags']):
            tags_description.append(response)
        fr_tag_description.append(tags_description) 
        
        caption = []
        caption_confidence = []
        for k, response in enumerate(responses[i]['description']['captions']):
            caption.append(response['text'])
            caption_confidence.append(response['confidence'])
        fr_caption.append(caption)
        fr_caption_confidence.append(caption_confidence)
        
        #parse over the faces key of the response
        #print(i)
        face_age = []
        face_gender = []
        for k, response in enumerate(responses[i]['faces']):
            face_age.append(response['age'])
            face_gender.append(response['gender'])
        fr_face_age.append(face_age)
        fr_face_gender.append(face_gender)

In [1142]:
#display results
log_ta_response = {'file_name' : blob_list,
                   'vis_http_response' :  response_status,
                   'vis_fr_caption' : fr_caption,
                   'vis_fr_caption_score[%]' : fr_caption_confidence,
                   'vis_tag_description': fr_tag_description,
                   'vis_tag_name' :  fr_tag_name,
                   'vis_tag_confidence' :  fr_tag_confidence, 
                   'vis_face_gender' : fr_face_gender,
                   'vis_face_age' : fr_face_age,
                   'vis_ocr' : fr_ocr_words,
                   'vis_fr_category' : fr_category,
                   'vis_fr_category_score[%]' : fr_category_confidence,
                   'vis_fr_celebrities' : fr_detail_celebrities,
                   'vis_fr_celebrities_score[%]' : fr_detail_celebrities_confidence}

df_log_ta_response = pd.DataFrame.from_dict(log_ta_response, orient='index')
df_log_ta_response = df_log_ta_response.transpose()
df_log_ta_response[df_log_ta_response['vis_http_response'] == '<200>']

Unnamed: 0,file_name,vis_http_response,vis_fr_caption,vis_fr_caption_score[%],vis_tag_description,vis_tag_name,vis_tag_confidence,vis_face_gender,vis_face_age,vis_ocr,vis_fr_category,vis_fr_category_score[%],vis_fr_celebrities,vis_fr_celebrities_score[%]
0,1988_george_bush_sr_revolving_door_attack_ad_c...,<200>,[a tall glass building],[0.6281513925662964],"[building, photo, sitting, black, large, sign,...",[],[],[],[],"[.11, THE, DUKAKIS, FURLOUGH, PROGRAM]",Nan,Nan,Nan,Nan
1,1988_george_bush_sr_revolving_door_attack_ad_c...,<200>,[a gate in front of a window],[0.5849918523400203],"[building, standing, sitting, window, large, r...",[building],[0.8614632487297058],[],[],[],Nan,Nan,Nan,Nan
2,1988_george_bush_sr_revolving_door_attack_ad_c...,<200>,[a tall glass building],[0.4673530253144375],"[building, window, clock, photo, water, small,...","[building, silhouette, clouds, tower, distance]","[0.8810973763465881, 0.23954088985919952, 0.20...",[],[],[],Nan,Nan,Nan,Nan
3,1988_george_bush_sr_revolving_door_attack_ad_c...,<200>,[a group of people standing in front of a mirr...,[0.5246359866485301],"[man, standing, front, building, mirror, peopl...",[],[],[],[],[],Nan,Nan,Nan,Nan
4,1988_george_bush_sr_revolving_door_attack_ad_c...,<200>,[a group of people standing in front of a mirr...,[0.8146214362046119],"[person, man, standing, looking, photo, people...",[person],[0.9888064861297607],[],[],[],Nan,Nan,Nan,Nan
5,1988_george_bush_sr_revolving_door_attack_ad_c...,<200>,[a group of people in a cage],[0.944788099490945],"[person, people, photo, building, window, man,...","[person, people]","[0.899941086769104, 0.6188361644744873]",[],[],"[268, Escaped.]",Nan,Nan,Nan,Nan
6,1988_george_bush_sr_revolving_door_attack_ad_c...,<200>,[a group of people in a cage],[0.9060217418649054],"[person, building, man, group, people, photo, ...","[person, group, people]","[0.9542752504348755, 0.5842924118041992, 0.551...",[],[],"[Many, are, still, at, large.]",Nan,Nan,Nan,Nan
7,1988_george_bush_sr_revolving_door_attack_ad_c...,<200>,[a group of people standing in front of a fence],[0.9185269151722641],"[person, fence, group, outdoor, people, buildi...","[person, group, people]","[0.9850817918777466, 0.8664748072624207, 0.741...",[],[],[],Nan,Nan,Nan,Nan
8,1988_george_bush_sr_revolving_door_attack_ad_c...,<200>,[a tall building],[0.5896768068792926],"[building, window, table, living, clock, stand...","[building, tower]","[0.8481139540672302, 0.31022948026657104]",[],[],"[van..••, r]",Nan,Nan,Nan,Nan
9,bill_clinton_hope_ad_1992_frame0.jpg,<200>,[a close up of a computer],[0.4195503962965528],"[laptop, computer]",[],[],[],[],[],Nan,Nan,Nan,Nan


In [660]:
### Combine datasets together

#load data collection df and merge with Text analytics
df_data_collection = pd.read_pickle('../Ads_presidential_election')

#make column title look the same to join dataset
df_data_collection['titles'] = df_data_collection['titles'].apply(
    lambda value: 'eisenhower' if value == 'eisenhower_for_president_1952'
    else 'jfk' if value == 'kennedy_for_me_campaign_jingle_jfk_1960'  
    else '1964' if value == 'high_quality_famous_daisy_attack_ad_from_1964_presidential_election'
    else 'humphrey' if value == 'humphrey_laughing_at_spiro_agnew_1968_political_ad'
    else 'nixon' if value == 'mcgovern_defense_plan_ad_nixon_1972_presidential_campaign_commercial'
    else 'ronald_reagan' if value == 'ronald_reagan_tv_ad_its_morning_in_america_again'
    else 'george_bush_sr' if value == '1988_george_bush_sr_revolving_door_attack_ad_campaign'
    else 'bill_clinton' if value == 'bill_clinton_hope_ad_1992'
    else 'bushcheney' if value == 'historical_campaign_ad_windsurfing_bushcheney_04'
    else 'barack_obama' if value == 'yes_we_can__barack_obama_music_video'
    else 'unknown')

#make a copy and display the dataset
df_data_collection = df_data_collection.iloc[:,2:].copy()
df_data_collection

Unnamed: 0,titles,length[sec],year,candidate_name,party,win
0,eisenhower,62.09,1952,IKE,republican,-
1,jfk,60.23,1960,JFK,democratic,-
2,1964,66.9,1964,LBJ,democratic,-
3,humphrey,19.25,1968,HHH,republican,0
4,nixon,60.05,1972,NIX,republican,-
5,ronald_reagan,59.95,1984,---,---,-
6,george_bush_sr,29.88,1988,---,---,-
7,bill_clinton,60.26,1992,---,---,-
8,bushcheney,30.09,2004,---,---,-
9,barack_obama,270.21,2008,---,---,-


In [659]:
#combine two df
df_data_collection.set_index('titles')
df_text_analytics.set_index('title')
df_experiment = pd.concat([df_data_collection, df_text_analytics], axis=1)

#display dataset for experiment
df_experiment

Unnamed: 0,titles,length[sec],year,candidate_name,party,win,title,stt_text,stt_words_count,sentiment_text,key_phrases,sentiment_key_phrases
0,eisenhower,62.09,1952,IKE,republican,-,1964,Nan play hello by standing ben Nan please are ...,50,0.856453,"[stakes, god, president johnson, s children, v...",0.7625
1,jfk,60.23,1960,JFK,democratic,-,barack_obama,Nan how Nan Nan how what Nan what people of th...,33,0.243644,"[people, nation false hope]",0.815
2,1964,66.9,1964,LBJ,democratic,-,bill_clinton,i was born a little town called hope arkansas ...,173,0.5,"[cause i, present i, graduated i, lot of money...",0.571613
3,humphrey,19.25,1968,HHH,republican,0,bushcheney,i'm george W bush and i approve this message i...,78,0.5,"[john kerry lead carey, john kerry whichever w...",0.677273
4,nixon,60.05,1972,NIX,republican,-,eisenhower,i for president for president i like my comput...,31,0.918349,"[president i, computer billy graham washingto...",0.7
5,ronald_reagan,59.95,1984,---,---,-,george_bush_sr,who is governor michael dukakis vitov mandator...,63,0.0557907,"[massachusetts america, governor michael dukak...",0.529167
6,george_bush_sr,29.88,1988,---,---,-,humphrey,Nan Nan,2,Nan,Nan,0.4
7,bill_clinton,60.26,1992,---,---,-,jfk,Nan do you wanna man for president who season ...,17,0.5,"[wanna man, president]",0.38
8,bushcheney,30.09,2004,---,---,-,nixon,the mcgovern defense plan he would cut the mar...,84,0.5,"[cut navy personnel, navy fleet, mcgovern defe...",0.602667
9,barack_obama,270.21,2008,---,---,-,ronald_reagan,it's morning again in america today more men a...,112,0.5,"[short years, half, young men, rates, leadersh...",0.590588


# -- Notebook End --

In [1]:
#dump the dataframe on a file
import pickle
import json

#load dataframe on a file
#df = pd.read_pickle('../Ads_presidential_election')

In [1123]:
#keys available from ocr
print(responses[0].keys())
print('-------------------')
#response example
print(responses[0])
print('-------------------')
#response example
print(responses[0]['regions'])

dict_keys(['language', 'orientation', 'textAngle', 'regions'])
-------------------
{'language': 'en', 'orientation': 'Up', 'textAngle': 0.0, 'regions': [{'boundingBox': '73,155,337,170', 'lines': [{'boundingBox': '266,155,27,21', 'words': [{'boundingBox': '266,155,27,21', 'text': '.11'}]}, {'boundingBox': '131,274,219,23', 'words': [{'boundingBox': '131,274,65,22', 'text': 'THE'}, {'boundingBox': '206,274,144,23', 'text': 'DUKAKIS'}]}, {'boundingBox': '73,301,337,24', 'words': [{'boundingBox': '73,301,170,24', 'text': 'FURLOUGH'}, {'boundingBox': '256,301,154,24', 'text': 'PROGRAM'}]}]}]}
-------------------
[{'boundingBox': '73,155,337,170', 'lines': [{'boundingBox': '266,155,27,21', 'words': [{'boundingBox': '266,155,27,21', 'text': '.11'}]}, {'boundingBox': '131,274,219,23', 'words': [{'boundingBox': '131,274,65,22', 'text': 'THE'}, {'boundingBox': '206,274,144,23', 'text': 'DUKAKIS'}]}, {'boundingBox': '73,301,337,24', 'words': [{'boundingBox': '73,301,170,24', 'text': 'FURLOUGH'},

In [1144]:
## DO NOT DELETE THIS

response_status = []

fr_ocr_words = []


for i, response in enumerate(responses):
    if next(iter(responses[i])) == 'statusCode':
        response_status.append(responses[i]['statusCode'])
        fr_ocr_words.append('Nan')

    else:
        response_status.append('<200>')
        
        words = []
        #parse over the categories key of the response
        for j, response in enumerate(responses[i]['regions']):
           # print(j,response)
            for k, box in enumerate(response['lines']):
                
                for l, word in enumerate(box['words']):
                    
                    words.append(word['text'])
                                        
        fr_ocr_words.append(words)

In [1135]:
#organize response output
status = []
name = []
lexical = []
request_id = []
confidence = []

#select variables from output
for i, response in enumerate(responses):
    if responses[i]['header']['status'] == 'success':
        status.append(responses[i]['header']['status'])
        name.append(responses[i]['header']['name'])
        lexical.append(responses[i]['header']['lexical'])
        request_id.append(responses[i]['header']['properties']['requestid'])
        confidence.append(responses[i]['results'][0]['confidence'])
    else:
        status.append('Error')
        name.append('Nan')
        lexical.append('Nan')
        request_id.append('Nan')
        confidence.append('Nan')

#combine output into df
df_log_response = pd.DataFrame({'file_name' : blob_name_list,
                                'stt_http_response' :  http_responses,
                                'stt_id' : request_id,
                                'stt_status' : status,
                                'stt_name' : name,
                                'stt_text' : lexical,
                                'stt_confidence' : confidence})

#display df
df_log_response.head()

## dump to pickle

KeyError: 'header'

In [759]:
def retrive_blob_list(keys, container_name):
    """ 
    function to get a list of blobs' URLs
    INPUT: - dictionary with storage info. dictionary format: {storage:{storage_name:name,storage_api_key:api_key}}
           - container name      
    OUTPUT: a list of BLoBs' name and URL
    """
    storage_name = keys['STORAGE']['NAME']
    storage_key = keys['STORAGE']['API_KEY']
    blob_service = BlockBlobService(storage_name, storage_key)
    uploaded_file = blob_service.list_blobs(container_name)
    blob_url_format = 'https://{0}.blob.core.windows.net/{1}/{2}'
    #store blobs' name and URLs in list
    blob_name_list = []
    blob_url_list = []
    # retrive each blob name
    for blob in uploaded_file:
        blob_name_list.append(blob.name)
        blob_url_list.append(blob_url_format.format(blob_service.account_name, container_name, blob.name))
    return blob_name_list, blob_url_list

In [1119]:
#set service name
SERVICE_NAME = 'COMPUTER_VISION' #add here: STORAGE, FACE, COMPUTER_VISION, SPEECH_RECOGNITION, TEXT_ANALYTICS, ML_STUDIO

#call function to retrive keys
storage_keys = retrive_keys(SERVICE_NAME, PATH_TO_KEYS, KEYS_FILE_NAME)

#set text analytics keys
COMPUTER_VISION_KEY = storage_keys['API_KEY']

#configure API access to request text analytics service
URI_ANALYZE = 'https://eastus.api.cognitive.microsoft.com/vision/v1.0/analyze'
URI_OCR = 'https://eastus.api.cognitive.microsoft.com/vision/v1.0/ocr'

#set REST headers
headers = {}
headers['Ocp-Apim-Subscription-Key'] = COMPUTER_VISION_KEY
headers['Content-Type'] = 'application/json'
headers['Accept'] = 'application/json'

#set api request parameters
params_set = {}
params_set['visualFeatures'] = 'Categories,Tags,Description,Faces,ImageType,Color,Adult'

In [1120]:
#set container to retrieve files from
CONTAINER_NAME = 'myimage'

#get list of blob
blob_list, blob_url = retrive_blob_list(azure_keys, CONTAINER_NAME)

#store http response and json file
responses = []
http_responses = []

#set procedure starting time
print('-------------------')
print("Start computer vision")
print('-------------------')
start = time.time()

#run analyze image service on video frames (i.e. extension .wax)
for blob_name in blob_url:
    print(blob_name)
    if blob_name.split('.')[-1] == 'jpg':
        
        #convert blob to bytes
        #blob = blob_service.get_blob_to_bytes(NEW_CONTAINER_NAME, blob_name)
        
        params = urllib.parse.urlencode(params_set)
        query_string = '?{0}'.format(params)
        
        url = URI_OCR + query_string
        
        body = '{\'url\':\'' + blob_name + '\'}'
        
        #request for speech recognition service
        
        api_response = requests.post(url, headers=headers, data=body)
        print('{} had a {} response'.format(blob_name, api_response))

        #extract data from response
        res_json = json.loads(api_response.content.decode('utf-8'))
        http_responses.append(api_response)
        responses.append(res_json)

#load output next cell       
        
#set procedure ending time
end = time.time()
print('-------------------')
print('Conversion completed')
print('-------------------')
print('It took {} seconds to '.format(round(end - start, 2)))

-------------------
Start computer vision
-------------------
https://cloudcomputingplayground.blob.core.windows.net/myimage/1988_george_bush_sr_revolving_door_attack_ad_campaign_frame0.jpg
https://cloudcomputingplayground.blob.core.windows.net/myimage/1988_george_bush_sr_revolving_door_attack_ad_campaign_frame0.jpg had a <Response [200]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/1988_george_bush_sr_revolving_door_attack_ad_campaign_frame100.jpg
https://cloudcomputingplayground.blob.core.windows.net/myimage/1988_george_bush_sr_revolving_door_attack_ad_campaign_frame100.jpg had a <Response [200]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/1988_george_bush_sr_revolving_door_attack_ad_campaign_frame200.jpg
https://cloudcomputingplayground.blob.core.windows.net/myimage/1988_george_bush_sr_revolving_door_attack_ad_campaign_frame200.jpg had a <Response [200]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/198

https://cloudcomputingplayground.blob.core.windows.net/myimage/eisenhower_for_president_1952_frame1300.jpg had a <Response [429]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/eisenhower_for_president_1952_frame1400.jpg
https://cloudcomputingplayground.blob.core.windows.net/myimage/eisenhower_for_president_1952_frame1400.jpg had a <Response [429]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/eisenhower_for_president_1952_frame1500.jpg
https://cloudcomputingplayground.blob.core.windows.net/myimage/eisenhower_for_president_1952_frame1500.jpg had a <Response [429]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/eisenhower_for_president_1952_frame1600.jpg
https://cloudcomputingplayground.blob.core.windows.net/myimage/eisenhower_for_president_1952_frame1600.jpg had a <Response [429]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/eisenhower_for_president_1952_frame1700.jpg
https://cloudcom

https://cloudcomputingplayground.blob.core.windows.net/myimage/high_quality_famous_daisy_attack_ad_from_1964_presidential_election_frame500.jpg had a <Response [429]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/high_quality_famous_daisy_attack_ad_from_1964_presidential_election_frame600.jpg
https://cloudcomputingplayground.blob.core.windows.net/myimage/high_quality_famous_daisy_attack_ad_from_1964_presidential_election_frame600.jpg had a <Response [429]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/high_quality_famous_daisy_attack_ad_from_1964_presidential_election_frame700.jpg
https://cloudcomputingplayground.blob.core.windows.net/myimage/high_quality_famous_daisy_attack_ad_from_1964_presidential_election_frame700.jpg had a <Response [429]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/high_quality_famous_daisy_attack_ad_from_1964_presidential_election_frame800.jpg
https://cloudcomputingplayground.blob.co

https://cloudcomputingplayground.blob.core.windows.net/myimage/kennedy_for_me_campaign_jingle_jfk_1960_frame200.jpg had a <Response [429]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/kennedy_for_me_campaign_jingle_jfk_1960_frame300.jpg
https://cloudcomputingplayground.blob.core.windows.net/myimage/kennedy_for_me_campaign_jingle_jfk_1960_frame300.jpg had a <Response [429]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/kennedy_for_me_campaign_jingle_jfk_1960_frame400.jpg
https://cloudcomputingplayground.blob.core.windows.net/myimage/kennedy_for_me_campaign_jingle_jfk_1960_frame400.jpg had a <Response [429]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/kennedy_for_me_campaign_jingle_jfk_1960_frame500.jpg
https://cloudcomputingplayground.blob.core.windows.net/myimage/kennedy_for_me_campaign_jingle_jfk_1960_frame500.jpg had a <Response [429]> response
https://cloudcomputingplayground.blob.core.windows.net/myima

https://cloudcomputingplayground.blob.core.windows.net/myimage/ronald_reagan_tv_ad_its_morning_in_america_again_frame100.jpg had a <Response [429]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/ronald_reagan_tv_ad_its_morning_in_america_again_frame1000.jpg
https://cloudcomputingplayground.blob.core.windows.net/myimage/ronald_reagan_tv_ad_its_morning_in_america_again_frame1000.jpg had a <Response [429]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/ronald_reagan_tv_ad_its_morning_in_america_again_frame1100.jpg
https://cloudcomputingplayground.blob.core.windows.net/myimage/ronald_reagan_tv_ad_its_morning_in_america_again_frame1100.jpg had a <Response [429]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/ronald_reagan_tv_ad_its_morning_in_america_again_frame1200.jpg
https://cloudcomputingplayground.blob.core.windows.net/myimage/ronald_reagan_tv_ad_its_morning_in_america_again_frame1200.jpg had a <Response [429]> 

https://cloudcomputingplayground.blob.core.windows.net/myimage/yes_we_can__barack_obama_music_video_frame2100.jpg had a <Response [429]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/yes_we_can__barack_obama_music_video_frame2200.jpg
https://cloudcomputingplayground.blob.core.windows.net/myimage/yes_we_can__barack_obama_music_video_frame2200.jpg had a <Response [429]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/yes_we_can__barack_obama_music_video_frame2300.jpg
https://cloudcomputingplayground.blob.core.windows.net/myimage/yes_we_can__barack_obama_music_video_frame2300.jpg had a <Response [429]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/yes_we_can__barack_obama_music_video_frame2400.jpg
https://cloudcomputingplayground.blob.core.windows.net/myimage/yes_we_can__barack_obama_music_video_frame2400.jpg had a <Response [429]> response
https://cloudcomputingplayground.blob.core.windows.net/myimage/yes_we_can_

# -- Notebook End --

In [555]:
#import library to display notebook as HTML
import os
from IPython.core.display import HTML

#path to .ccs style script
cur_path = os.path.dirname(os.path.abspath("__file__"))
new_path = os.path.relpath('..\\..\\..\\styles\\custom_styles_public_cloud_computing.css', cur_path)

#function to display notebook
def css():
    style = open(new_path, "r").read()
    return HTML(style)

In [556]:
#run this cell to apply HTML style
css()

In [None]:
#create request to determine sentiment from key phrases
sentiment_mean_key_phrases = []   
sentiment_key_phrases = []

for key in key_phrase:
    data = json.dumps({"documents":[{"id":str(uuid.uuid1()), "language":"en", "text":key}]}).encode('utf-8')
    request = urllib.request.Request(uri_sentiment, data, headers)
    response = urllib.request.urlopen(request)
    responsejson = json.loads(response.read().decode('utf-8'))
    sentiment = responsejson['documents'][0]['score']
    sentiment_key_phrases.append(round(sentiment, 2))
    
sentiment_mean = sum(sentiment_key_phrases)/len(sentiment_key_phrases)
sentiment_mean_key_phrases.append(sentiment_mean)

df_stt['sentiment_key_phrases'] = sentiment_mean_key_phrases

In [170]:
azure_storage_account_name = 'cloudcomputingplayground' # add name
azure_storage_account_key = azure_keys['STORAGE']['API_KEY']
blob_container_name = 'cloudcomputingcontainer'
blob_service = BlockBlobService(azure_storage_account_name, azure_storage_account_key)

In [171]:
blob_name = 'Eisenhower_1952.chunk2.wav'
blob = blob_service.get_blob_to_bytes(blob_container_name, blob_name)

In [172]:
uri_token = 'https://api.cognitive.microsoft.com/sts/v1.0/issueToken'

headers = {'Content-Length': '0', 
           'Ocp-Apim-Subscription-Key': speech_key}

api_response = requests.post(uri_token, headers=headers)

access_token = str(api_response.content.decode('utf-8'))

In [136]:
blob.content[0:20]

b'RIFFD((\x00WAVEfmt \x10\x00\x00\x00'

In [173]:
url_stt_api = 'https://speech.platform.bing.com/recognize' # service address 

headers = {
           'Authorization': 'Bearer {0}'.format(access_token),
           'Content-type': 'audio/wav', 'codec': 'audio/pcm', 'samplerate': '16000'}

params = urllib.parse.urlencode({
    'scenarios': 'ulm',
    'appid': 'D4D52672-91D7-4C74-8AD8-42B1D98141A5', # dont change, it is fixed by design
    'locale': 'en-US', # speech in english
    'device.os': 'PC',
    'version': '3.0',
    'format': 'json', # return value in json
    'instanceid': str(uuid.uuid1()), # any guid
    'requestid': str(uuid.uuid1()),
})

api_response2 = requests.post(url_stt_api, headers=headers, params=params, data=blob.content)

In [174]:
api_response2

<Response [200]>

In [178]:
res_json2 = json.loads(api_response2.content.decode('utf-8'))
#text2 = res_json2['results'][0]['lexical']
res_json2

{'version': '3.0',
 'header': {'status': 'error',
  'properties': {'requestid': 'dc0538ff-1755-4f44-8b4b-d94511da15a8',
   'FALSERECO': '1'}}}

In [122]:
blob_service = BlockBlobService(account_name= storage_name, account_key=storage_key)
uploaded_file = blob_service.list_blobs(container_name)
blob_url_format = 'https://{0}.blob.core.windows.net/{1}/{2}'

blob_bytes = blob_service.get_blob_to_bytes(container_name, blob_name)

token = get_token(speech_key)
r = get_text(token, blob_bytes)

In [123]:
r

<Response [403]>

In [118]:
result = []
for blob in uploaded_file:
    print(blob.name)
    blob_url = blob_url_format.format(blob_service.account_name, container_name, blob.name)
    print(blob_url)
    blob_bytes = blob_service.get_blob_to_bytes(container_name, str(blob.name))
    print(blob_bytes.content[0:20])
    r = get_text(token, blob_bytes)
    print(r)
    #result.append(r)

eisenhower_for_president_1952_chunck_1.wav
https://cloudcomputingplayground.blob.core.windows.net/audio/eisenhower_for_president_1952_chunck_1.wav
b'RIFF\xc4\xea\x1a\x00WAVEfmt \x10\x00\x00\x00'
<Response [403]>


In [860]:
#recombine

for i, response in enumerate(responses):
    if next(iter(responses[i])) == 'statusCode':
        print(responses[i]['statusCode'])
    
    else:
        print('hello')
        
        multiple_tags = dict()

        if len(responses[i]['tags']) > 1:
            print('hello>1')
            print(responses[i]['tags'])
            for j, response in enumerate(responses[i]['tags']):
                print(j, response,'loop')
                multiple_tags['name_{}'.format(j)] = response['name']
        
            print(multiple_tags)
        #append multiple_tags

        elif len(responses[i]['tags']) == 1: 
            print('hello==1')
            print(responses[i]['tags'][0]['name'])
            print(round(responses[i]['tags'][0]['confidence'],2))
            #print(tags_name, tags_confidence)

        else:
            print('Nan')

hello
Nan
hello
hello==1
building
0.86
hello
hello>1
[{'name': 'building', 'confidence': 0.8810973763465881}, {'name': 'silhouette', 'confidence': 0.23954088985919952}, {'name': 'clouds', 'confidence': 0.2054312378168106}, {'name': 'tower', 'confidence': 0.19723017513751984}, {'name': 'distance', 'confidence': 0.12399410456418991}]
0 {'name': 'building', 'confidence': 0.8810973763465881} loop
1 {'name': 'silhouette', 'confidence': 0.23954088985919952} loop
2 {'name': 'clouds', 'confidence': 0.2054312378168106} loop
3 {'name': 'tower', 'confidence': 0.19723017513751984} loop
4 {'name': 'distance', 'confidence': 0.12399410456418991} loop
{'name_0': 'building', 'name_1': 'silhouette', 'name_2': 'clouds', 'name_3': 'tower', 'name_4': 'distance'}
hello
Nan
hello
hello==1
person
0.99
hello
hello>1
[{'name': 'person', 'confidence': 0.899941086769104}, {'name': 'people', 'confidence': 0.6188361644744873}]
0 {'name': 'person', 'confidence': 0.899941086769104} loop
1 {'name': 'people', 'confiden

In [114]:
result

[<Response [403]>]

In [186]:
#speech recognition (basic words count / bag of words)

def get_token(speech_key):
    """return an authorization token by making a 
    HTTP POST request to Cognitive Services with a valid API key"""
    
    url = 'https://api.cognitive.microsoft.com/sts/v1.0/issueToken'
    headers = {'Ocp-Apim-Subscription-Key': speech_key}
    
    token = requests.post(uri_token, headers=headers)

    return(token)

In [103]:
def get_text(token, blob):
    """Request that the Bing Speech API convert the audio to text"""

    url =  'https://speech.platform.bing.com/recognize'
    
    headers = {'Authorization': 'Bearer {0}'.format(token),
               'Content-type': 'audio/wav; codec=audio/pcm; samplerate=16000'}
    
    #'Accept': 'application/json',
    
    params = urllib.parse.urlencode({'scenarios': 'ulm',
                                     'appid': 'D4D52672-91D7-4C74-8AD8-42B1D98141A5',
                                     'locale': 'en-US',
                                     'device.os': 'PC',
                                     'version': '3.0',
                                     'format': 'json',
                                     'instanceid': str(uuid.uuid1()),
                                     'requestid': str(uuid.uuid1())})
    
    response = requests.post(url, headers=headers, params=params, data=blob.content)
    return(response)
#     r = requests.post(url, headers=headers, data=stream_audio_file(audio))
#     results = json.loads(r.content)
#     return(results)

In [None]:
def stream_audio_file(speech_file, chunk_size=1024):
    # Chunk audio file
    with open(speech_file, 'rb') as f:
        while 1:
            data = f.read(1024)
            if not data:
                break
            yield data

In [None]:
YOUR_AUDIO_FILE = 'https://cloudcomputingplayground.blob.core.windows.net/cloudcomputingcontainer/Eisenhower_1952.chunk0.wav'


In [None]:



#key phrases
#sentiment analyses
#assemble into a dataframe


#detect face in thumbnails
#analyze image
#character recognition
#add to dataframe

In [None]:
from IPython.display import Audio, display
import requests
import urllib
import uuid

In [None]:
#import library and retrieve keys

#from azure library import methods to use storage 
from azure.storage.blob import BlockBlobService, PublicAccess

#retrive your keys
import pickle
with open('guides/keys/azure_services_keys.json', 'rb') as handle:
    azure_keys = pickle.load(handle)

#select storage account name and API key from keys
storage_name = azure_keys['STORAGE']['NAME']
storage_key = azure_keys['STORAGE']['API_KEY']

In [None]:
# create a new container

#blob service
blob_service = BlockBlobService(storage_name, storage_key)

In [None]:
#set a name for a new container
new_container_name ='videocontainer'

In [None]:
#create a new container and set public access
try:
    container_status = blob_service.create_container(new_container_name, public_access=PublicAccess.Container) 
    print("{} creation success status: {}".format(new_container_name, container_status))
except:
    print("{} creation failed".format(new_container_name))

In [None]:
os.getcwd()+'\\data\\video\\video.wav\\'

In [None]:
blob_service.create_blob_from_path(new_container_name, local_file_name, full_path)

In [None]:
#get the file from local 
local_path = os.getcwd()+'\\data\\video\\video.wav'
local_file_name = "Eisenhower_1952.chunk0.wav"
full_path = os.path.join(local_path, local_file_name)
full_path


# Upload the created file
try:
    blob_service.create_blob_from_path(new_container_name, local_file_name, full_path)    
    print("{} upload status: successful".format(local_file_name))
except:
    print("{} upload status: failed".format(local_file_name))

In [None]:
#get a complete list of images' name and urls
retrive_blob_list(azure_keys,new_container_name)

In [None]:
# load speech file to process
blob_name = 'Eisenhower_1952.chunk1.wav'
blob = blob_service.get_blob_to_bytes(new_container_name, blob_name)

wav_bytes = Audio(data=blob.content)
display(wav_bytes)

In [None]:
url_token_api = 'https://api.cognitive.microsoft.com/sts/v1.0/issueToken'

In [None]:
headers = {'Content-Length': '0', 'Ocp-Apim-Subscription-Key': storage_key}

#"Content-Type", "application/octet-stream"

api_response = requests.post(url_token_api, headers=headers)

In [None]:
api_response

In [None]:
access_token = str(api_response.content.decode('utf-8'))

# Service
# Call Speech to text service
url_stt_api = 'https://speech.platform.bing.com/recognize' # service address 

headers = {'Authorization': 'Bearer {0}'.format(access_token), 
           'Content-type': 'audio/wav', 
           'codec': 'audio/pcm',  
           'samplerate': '16000'}

# 'Content-Length': len(blob.content), \

params = urllib.parse.urlencode({
    'scenarios': 'ulm',
    'appid': 'D4D52672-91D7-4C74-8AD8-42B1D98141A5',
    'locale': 'en-US',
    'device.os': 'PC',
    'version': '3.0',
    'format': 'json',
    'instanceid': str(uuid.uuid1()),
    'requestid': str(uuid.uuid1()),
})

api_response = requests.post(url_stt_api, headers=headers, params=params, data=blob.content)

In [None]:
api_response

In [None]:
import json
res_json = json.loads(api_response.content.decode('utf-8'))

print(json.dumps(res_json, indent=2, sort_keys=True))