# Speech-to-Text, Natural Language Understanding and Text-to-Speech
To convert audio file to text use [IBM Watson Speech-to-Text service](http://www.ibm.com/watson/developercloud/speech-to-text.html), which accompanies each recognized word by a confidence level, start and end time. We also relied on the [official Watson Python SDK](https://github.com/watson-developer-cloud/python-sdk) to interact with the APIs.

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.dates as md
import time
import json
%matplotlib notebook

## How to get your own API key for [IBM Watson Services](https://cloud.ibm.com/catalog?category=ai):

1. Create free account at IBM Cloud and get 6-month free access to it via [IBM Academic Initiative](https://my15.digitalexperience.ibm.com/b73a5759-c6a6-4033-ab6b-d9d4f9a6d65b/dxsites/151914d1-03d2-48fe-97d9-d21166848e65/home) or via  https://cognitiveclass.ai/ibm-cloud-promotion/
2. Login to IBM Cloud and navigate to Watson AI Services page in Catalog https://cloud.ibm.com/catalog?category=ai 
![Watson Services on IBM Cloud](http://analytics.romanko.ca/images/IBM_Cloud_AI1.png "Watson Services on IBM Cloud")
3. In the IBM Cloud catalog, find the **Speech to Text service** under the "AI" category, and then, as in the screenshot below, click "Create Service"
![Speech to Text Watson service on IBM Cloud](http://analytics.romanko.ca/images/Speech_to_Text4.png "Speech to Text Watson service on IBM Cloud")
4. Once you have created the service, you can now access it via your dashboard: [https://cloud.ibm.com/dashboard/](https://cloud.ibm.com/dashboard/services?env_id=ibm:yp:us-south)
5. To get your API Key, follow the instructions in the screenshot below. Go to "Service credentials", click "New credential" and copy apikey to this Python notebook.
![Speech to Text Watson service on IBM Cloud](http://analytics.romanko.ca/images/Speech_to_Text3.png "Speech to Text Watson service on IBM Cloud")

Play input audio file.

In [2]:
try:
    import wget
except:
    !pip install wget
    import wget

Collecting wget
  Downloading https://files.pythonhosted.org/packages/47/6a/62e288da7bcda82b935ff0c6cfe542970f04e29c756b0e147251b2fb251f/wget-3.2.zip
Building wheels for collected packages: wget
  Building wheel for wget (setup.py) ... [?25ldone
[?25h  Stored in directory: /home/dsxuser/.cache/pip/wheels/40/15/30/7d8f7cea2902b4db79e3fea550d7d7b85ecb27ef992b618f3f
Successfully built wget
Installing collected packages: wget
Successfully installed wget-3.2


In [3]:
link_to_audio = 'http://analytics.romanko.ca/data/sample.wav'
filename = wget.download(link_to_audio)

print(filename)

sample.wav


In [5]:
import IPython
IPython.display.Audio(filename)

### Speech-to-Text

In [6]:
try:
    from ibm_watson import SpeechToTextV1
except:
    !pip install ibm-watson
    from ibm_watson import SpeechToTextV1

Collecting ibm-watson
[?25l  Downloading https://files.pythonhosted.org/packages/d0/30/6e444a420b533b53e6b8ab4318ce1a9b19662067515aca0351403bdb615c/ibm-watson-4.0.1.tar.gz (297kB)
[K     |████████████████████████████████| 307kB 17.8MB/s eta 0:00:01
Collecting websocket-client==0.48.0 (from ibm-watson)
[?25l  Downloading https://files.pythonhosted.org/packages/8a/a1/72ef9aa26cfe1a75cee09fc1957e4723add9de098c15719416a1ee89386b/websocket_client-0.48.0-py2.py3-none-any.whl (198kB)
[K     |████████████████████████████████| 204kB 32.2MB/s eta 0:00:01
[?25hCollecting ibm_cloud_sdk_core==1.0.0 (from ibm-watson)
  Downloading https://files.pythonhosted.org/packages/e8/43/a13a5956c69b7becce7a0df6d2340c1e32322df3b39f57a3b33dc4645a34/ibm-cloud-sdk-core-1.0.0.tar.gz
Collecting PyJWT>=1.7.1 (from ibm_cloud_sdk_core==1.0.0->ibm-watson)
  Downloading https://files.pythonhosted.org/packages/87/8b/6a9f14b5f781697e51259d81657e6048fd31a113229cf346880bb7545565/PyJWT-1.7.1-py2.py3-none-any.whl
Building

In [10]:
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

speech_to_text = SpeechToTextV1(
   authenticator=IAMAuthenticator('Pj0eRgxR1GjWdK7znIv5qBZ9S5RZMwc25hdMkdmJM0QC')
)

audio_file = open(filename, "rb")

result = speech_to_text.recognize(audio_file, content_type="audio/wav").get_result()

result

{'results': [{'alternatives': [{'confidence': 0.99,
     'transcript': 'thunderstorms could produce large hail isolated tornadoes and heavy rain '}],
   'final': True}],
 'result_index': 0}

In [11]:
transcript = result["results"][0]["alternatives"][0]["transcript"]
print(transcript)

thunderstorms could produce large hail isolated tornadoes and heavy rain 


In [12]:
# Locally save the results for later use
with open("speech_to_text_res.txt", 'w+') as f:
    f.write(json.dumps(result))

### Natural Language Understanding of Text

For natural language processing of text we use [IBM Watson Natural Language Understanding service](http://www.ibm.com/watson/services/natural-language-understanding/).

In [14]:
from ibm_watson import NaturalLanguageUnderstandingV1 as NLU
from ibm_watson.natural_language_understanding_v1 import Features, EntitiesOptions, KeywordsOptions

nlu = NLU(
    authenticator=IAMAuthenticator('RBBVRlCLOQ3mVZVQjXtyTw6cgb6tUcWmac0xrjp0GrZe'),
    version='2018-11-16'
)

In [15]:
response = nlu.analyze(text=transcript, features=Features(entities=EntitiesOptions(emotion=True, sentiment=True,limit=2), keywords=KeywordsOptions(emotion=True, sentiment=True,limit=2))).get_result()
print(json.dumps(response, indent=2))

{
  "usage": {
    "text_units": 1,
    "text_characters": 73,
    "features": 2
  },
  "language": "en",
  "keywords": [
    {
      "text": "large hail",
      "sentiment": {
        "score": -0.779156,
        "label": "negative"
      },
      "relevance": 0.994203,
      "emotion": {
        "sadness": 0.31748,
        "joy": 0.242799,
        "fear": 0.21567,
        "disgust": 0.020892,
        "anger": 0.063758
      },
      "count": 1
    },
    {
      "text": "heavy rain",
      "sentiment": {
        "score": -0.779156,
        "label": "negative"
      },
      "relevance": 0.917787,
      "emotion": {
        "sadness": 0.31748,
        "joy": 0.242799,
        "fear": 0.21567,
        "disgust": 0.020892,
        "anger": 0.063758
      },
      "count": 1
    }
  ],
  "entities": []
}


In [16]:
for i in range(len(response['keywords'])):
       print(response['keywords'][i]['text'],"-",response['keywords'][i]['sentiment']['label'], "sentiment")

large hail - negative sentiment
heavy rain - negative sentiment


### Text-to-Speech

To convert text to audio file we use [IBM Watson Text-to-Speech service](http://www.ibm.com/watson/developercloud/text-to-speech.html).

In [17]:
from ibm_watson import TextToSpeechV1

# Text to Speech
text_to_speech = TextToSpeechV1(
   authenticator=IAMAuthenticator('DEBhCnUBakNiM3Kcn0TFiQzSbLrsBK7cdPek2ClIzBWZ')
)
 
output_audio_file = open("output.wav", "wb")
response_text = 'I detected this keyword "' + response['keywords'][1]['text'] + '" with ' + response['keywords'][1]['sentiment']['label'] + ' sentiment'
audio_data = text_to_speech.synthesize(response_text, accept="audio/wav").get_result().content
output_audio_file.write(audio_data)
print(response_text)

I detected this keyword "heavy rain" with negative sentiment


In [18]:
IPython.display.Audio("output.wav")

More complex examples of speech-to-text.

In [19]:
link_to_audio1 = 'http://analytics.romanko.ca/data/0001.wav'
filename1 = wget.download(link_to_audio1)
audio_file1 = open(filename1, "rb")
result1 = speech_to_text.recognize(audio_file1, content_type="audio/wav").get_result()
print("\n")
print(result1["results"][0]["alternatives"][0]["transcript"])



several tornadoes touch down as a line of severe thunderstorms swept through Colorado on Sunday 


In [20]:
link_to_audio2 = 'http://analytics.romanko.ca/data/en-US_Broadband_sample1.wav'
filename2 = wget.download(link_to_audio2)
audio_file2 = open(filename2, "rb")
result2 = speech_to_text.recognize(audio_file2, content_type="audio/wav").get_result()
print("\n")
print(result2["results"][0]["alternatives"][0]["transcript"])



so thank you very much for coming David it's good to have you here good as my pleasure Michael glad to be with you how real is artificial intelligence the question of how real is artificial intelligence is a complex one on I would say %HESITATION if if we define artificial intelligence is the ability of a machine on its own to understand large volumes of data to reason that data with a purpose to it to predict the future and then tell you continue to learn and get better that is happening today in certain fields how far in the continuum is IBM Watson in operability artificial intelligence yes so so first of all once once it's actually intelligent it will no longer be artificial so we're moving to the point that these systems increasingly understand enormous volumes of data 


In [21]:
result2

{'results': [{'alternatives': [{'confidence': 0.98,
     'transcript': "so thank you very much for coming David it's good to have you here good as my pleasure Michael glad to be with you how real is artificial intelligence the question of how real is artificial intelligence is a complex one on I would say %HESITATION if if we define artificial intelligence is the ability of a machine on its own to understand large volumes of data to reason that data with a purpose to it to predict the future and then tell you continue to learn and get better that is happening today in certain fields how far in the continuum is IBM Watson in operability artificial intelligence yes so so first of all once once it's actually intelligent it will no longer be artificial so we're moving to the point that these systems increasingly understand enormous volumes of data "}],
   'final': True}],
 'result_index': 0}