## Speech to Text recognition

#### This notebook demonstartes Speech to Text recognition implemented using IBM Watson REST-API service.
#### Automatically transcribe audio from 7 languages in real-time. 
#### Rapidly identify and transcribe what is being discussed, even from lower quality audio, across a variety of audio formats and programming interfaces (HTTP REST, Websocket, Asynchronous HTTP)

#### The audio files for the Speech to Text are located here: https://watson-developer-cloud.github.io/doc-tutorial-downloads/speech-to-text/reference/audio-files.zip

#### Let's write some code to use this service


In [1]:
# imported required packages
import json
from watson_developer_cloud import SpeechToTextV1
from os.path import join

### IBM Bluemix is used to create Speech to Text service

In [2]:
# Once Bluemix service is added, apikey and url will be provided for the call
speech_to_text = SpeechToTextV1(
    iam_apikey='put api key here',
    url='url for api endpoint'
)

### For Speech to Text recognition "en-US_BroadbandModel" model is used.

In [3]:
# connect to service and use en-US_BroadbandModel model
try:
    speech_model = speech_to_text.get_model('en-US_BroadbandModel').get_result()
    print(json.dumps(speech_model, indent=2)) 
except e:
    print("Error occured."+str(e))

{
  "rate": 16000,
  "name": "en-US_BroadbandModel",
  "language": "en-US",
  "sessions": "https://stream.watsonplatform.net/speech-to-text/api/v1/sessions?model=en-US_BroadbandModel",
  "url": "https://stream.watsonplatform.net/speech-to-text/api/v1/models/en-US_BroadbandModel",
  "supported_features": {
    "custom_language_model": true,
    "speaker_labels": true
  },
  "description": "US English broadband model."
}


#### Now our service is ready to be called for any speech to text conversion 

In [4]:
with open('audio-file2.flac','rb') as audio_file:
    speech_recognition_results = speech_to_text.recognize(
        audio=audio_file,
        content_type='audio/flac',
        timestamps=True,
        word_alternatives_threshold=0.9,
        keywords=['colorado', 'tornado', 'tornadoes'],
        keywords_threshold=0.5
    ).get_result()
print(json.dumps(speech_recognition_results, indent=2))

{
  "results": [
    {
      "word_alternatives": [
        {
          "start_time": 0.15,
          "alternatives": [
            {
              "confidence": 0.9999,
              "word": "a"
            }
          ],
          "end_time": 0.3
        },
        {
          "start_time": 0.3,
          "alternatives": [
            {
              "confidence": 1.0,
              "word": "line"
            }
          ],
          "end_time": 0.64
        },
        {
          "start_time": 0.64,
          "alternatives": [
            {
              "confidence": 1.0,
              "word": "of"
            }
          ],
          "end_time": 0.73
        },
        {
          "start_time": 0.73,
          "alternatives": [
            {
              "confidence": 1.0,
              "word": "severe"
            }
          ],
          "end_time": 1.08
        },
        {
          "start_time": 1.08,
          "alternatives": [
            {
              "confidence": 1.0,

In [5]:
# covert output to json as string to make it more readable
#mydict= json.dumps(speech_recognition_results, indent=2)
json_string = json.dumps(speech_recognition_results, indent=2)

In [6]:
# let's check type of the response 
type(speech_recognition_results)

dict

#### From the json string, we can see that we have to select right object to get the Transcript. The Transcript object stores the text.

In [7]:
print((speech_recognition_results['results'][0]).keys())

dict_keys(['word_alternatives', 'keywords_result', 'alternatives', 'final'])


In [9]:
# Print the final result, the result matches with the audio file.

transcript = speech_recognition_results['results'][0]['alternatives'][0]['transcript']

In [10]:
print(transcript)

a line of severe thunderstorms with several possible tornadoes is approaching Colorado on Sunday 


#### This code demonstrates Speech to Text for audio files using IBM Watson's api. The IBM Speech to Text service transcribes audio to text to enable speech transcription capabilities for applications. 

#### The full API documentation can be found here: https://ng.bluemix.net/apidocs/speech-to-text