# 14.6 Case Study: Traveler’s Companion Translation App
* Use three **IBM Watson services** to quickly **implement** a **traveler’s companion translation app**
    * Enables people who **speak only English** and **speak only Spanish to **converse**
* Combining services like this is known as creating a **mashup**

## 14.6.1 Before You Run the App 
* You’ll use **Lite (free) tiers** 
* Requires an **IBM Cloud account** and **credentials** for each service
* **Once you have your credentials** (described below), **insert them in our `keys.py` file** (located in the `ch14` examples folder) that we import into the example.
* Never share your credentials. 

### Registering for the Speech to Text Service 
* App uses **Watson Speech to Text service** to transcribe English and Spanish audio files to English and Spanish text, respectively. 
* To get credentials:
	1. **Create a Service Instance:** Go to https://console.bluemix.net/catalog/services/speech-to-text and click the **Create** button on the bottom of the page. This auto-generates an API key for you and takes you to a tutorial for working with the Speech to Text service.
	2. **Get Your Service Credentials:** To see your API key, click **Manage** at the top-left of the page. To the right of **Credentials**, click **Show credentials**, then copy the **API Key**, and paste it into the variable `speech_to_text_key`’s string in the `keys.py` file provided in this chapter’s `ch14` examples folder. 

### Registering for the Text to Speech Service
* In this app, you’ll use the **Watson Text to Speech service** to synthesize speech from text. 
* Toget credentials:
	1. **Create a Service Instance:** Go to https://console.bluemix.net/catalog/services/text-to-speech and click the **Create** button on the bottom of the page. This auto-generates an API key for you and takes you to a tutorial for working with the Text to Speech service.
	2. **Get Your Service Credentials:** To see your API key, click **Manage** at the top-left of the page. To the right of **Credentials**, click **Show credentials**, then copy the **API Key** and paste it into the variable `text_to_speech_key`’s string in the `keys.py` file provided in this chapter’s `ch14` examples folder. 

### Registering for the Language Translator Service
* In this app, you’ll use the **Watson Language Translator service** to pass text to Watson and receive back the text translated into another language. 
* To get credentials:
	1. **Create a Service Instance:** Go to https://console.bluemix.net/catalog/services/language-translator and click the **Create** button on the bottom of the page. This **auto-generates an API key** for you and takes you to a page to manage your instance of the service.
	2. **Get Your Service Credentials:** To the right of **Credentials**, click **Show credentials**, then copy the **API Key** and paste it into the variable `translate_key`’s string in the `keys.py` file provided in this chapter’s chch1413 examples folder. 

### Retrieving Your Credentials 
* To view your credentials at any time, click the appropriate service instance at: 
>https://console.bluemix.net/dashboard/apps

## 14.6.2 Test-Driving the App
* Once you’ve added your credentials to the script, run it by executing the following command from the `ch14` examples folder:
```python
ipython SimpleLanguageTranslator.py
```
    * **NOTE:** The `pydub.playback` module we use in this app may issue a warning having to do with features we don’t use. This can be ignored. To eliminate it, install `ffmpeg` for Windows, macOS or Linux from https://www.ffmpeg.org. 

### Processing the Question
* The app performs **10 steps**, which we point out via comments in the code. 
* **Step 1** prompts for and records a question. 
    * First, the app displays the following text and waits for you to press _Enter_
    >`Press Enter then ask your question in English`
    * When you do, the app displays:
    >`Recording 5 seconds of audio`
    * Speak your question: We said, “Where is the closest bathroom?” 
    * After five seconds, the app displays:
    >`Recording complete`

### Processing the Question (cont.)
* **Step 2** interacts with **Watson’s Speech to Text service** to **transcribe your audio to text and displays the result:
>`English: where is the closest bathroom` 
* **Step 3** then uses **Watson’s Language Translator service** to **translate the English text to Spanish** and displays the translated text returned by Watson:
>`Spanish: ¿Dónde está el baño más cercano?` 
* **Step 4** passes this Spanish text to **Watson’s Text to Speech service** to **convert the text to an audio file**. 
* **Step 5** plays the resulting Spanish audio file. 

### Processing the Response
* At this point, we’re ready to **process the Spanish speaker’s response**. 
* **Step 6** displays the following text and waits for you to press _Enter_
>`Press Enter then speak the Spanish answer`
* When you do, the app displays the following text and the  **Spanish speaker records a response.**:
>`Recording 5 seconds of audio`
    * **We do not speak Spanish, so we used Watson’s Text to Speech service to _prerecord_ Watson saying the Spanish response “El baño más cercano está en el restaurante,” then played that audio loud enough for our computer’s microphone to record it.**
    * Provided this **prerecorded audio** for you as `SpokenResponse.wav` in the `ch14` folder. 
    * If you use this file, play it quickly after pressing _Enter_ above as the app records for only 5 seconds
    * To ensure that the audio loads and plays quickly, you might want to play it once before you press _Enter_ to begin recording. 
    * For simplicity, we set the app to record five seconds of audio. You can control the duration with the variable SECONDS in function `record_audio`. It’s possible to create a recorder that begins recording once it detects sound and stops recording after a period of silence, but the code is more complicated.


### Processing the Response (cont.)
* After five seconds, the app displays:
>`Recording complete`
* **Step 7** interacts with **Watson’s Speech to Text service** to **transcribe the Spanish audio to text** and displays the result:
>`Spanish response: el baño más cercano está en el restaurante` 
* **Step 8** then uses **Watson’s Language Translator service** to **translate the Spanish text to English** and displays the result:
>`English response: The nearest bathroom is in the restaurant` 
* **Step 9** passes the English text to **Watson’s Text to Speech service** to **convert the text to an audio file**. 
* **Step 10** then plays the resulting English audio. 

# 13.6.3 `SimpleLanguageTranslator.py` Script Walkthrough
* The script is divided into **10 steps**, each marked with comments in the code
* **Processing the English question** 
> **Step 1:** Prompt for then **record English speech** into an audio file  
**Step 2:** **Transcribe** the English speech to **English text**  
**Step 3:** **Translate** the English text into Spanish text  
**Step 4:** **Synthesize** the Spanish text into **Spanish speech** and save it into an audio file  
**Step 5:** **Play** the Spanish **audio** file  
* **Processing the Spanish response**  
> **Step 6:** Prompt for then **record Spanish speech** into an audio file  
    **Step 7:** **Transcribe** the Spanish speech to **Spanish text**  
    **Step 8:** **Translate** the Spanish text into English text  
    **Step 9:** **Synthesize** the English text into **English speech** and save it into an audio file  
    **Step 10:** **Play** the English **audio**

### Importing Watson SDK Classes  
* **`SpeechToTextV1`** 
    * Passes an **audio file** to the **Watson Speech to Text service** 
    * Receives a **JSON document** containing the **text transcription**
* **`LanguageTranslatorV3`** 
    * Passes **text** to the **Watson Language Translator service** 
    * Receives a **JSON document** containing the **translated text** 
* **`TextToSpeechV1`** 
    * Passes **text** to the **Watson Text to Speech service** 
    * Receives **audio** of the text **spoken in a specified language**


```python
# SimpleLanguageTranslator.py
"""Use IBM Watson Speech to Text, Language Translator and Text to Speech
   APIs to enable English and Spanish speakers to communicate."""
from ibm_watson import SpeechToTextV1
from ibm_watson import LanguageTranslatorV3
from ibm_watson import TextToSpeechV1
```

### Other Imported Modules
* **`pyaudio`** for **recording audio** 
* **`pydub`** and **`pydub.playback`** to **load and play audio files**
* **`wave`** to save **WAV (Waveform Audio File Format) files**

```python
import keys  # contains your API keys for accessing Watson services
import pyaudio  # used to record from mic
import pydub  # used to load a WAV file
import pydub.playback  # used to play a WAV file
import wave  # used to save a WAV file
```

### Main Program: Function `run_translator` (1 of 6)
* **`run_translator`** invoked when **`SimpleLanguageTranslator.py` executed as a script**

```python
def run_translator():
    """Calls the functions that interact with Watson services."""
    # Step 1: Prompt for then record English speech into an audio file 
    input('Press Enter then ask your question in English')
    record_audio('english.wav')
```

### Main Program: Function `run_translator` (2 of 6)
* **Step 2**: Call **`speech_to_text`**
    * **Speech to Text service** transcribes text using **predefined models**
        * Most languages have **broadband** (**>=16kHZ**) and **narrowband** (**<16kHZ**) models (based on **audio quality**)
        * App **captures** audio at **44.1 kHZ**, so we use **`'en-US_BroadbandModel'`**

```python
    # Step 2: Transcribe the English speech to English text
    english = speech_to_text(
        file_name='english.wav', model_id='en-US_BroadbandModel')
    print('English:', english)  # display transcription
```

### Main Program: Function `run_translator` (3 of 6)
* **Step 3**: Call **`translate`**
    * **Predefined model `'en-es'`** translates from **English (`en`) to Spanish (`es`)**

```python
    # Step 3: Translate the English text into Spanish text
    spanish = translate(text_to_translate=english, model='en-es')
    print('Spanish:', spanish)  # display translated text
```

### Main Program: Function `run_translator` (4 of 6)
* **Voice `'es-US_SofiaVoice'`** is for Spanish as spoken in the U.S.

```python    
    # Step 4: Synthesize the Spanish text into Spanish speech 
    text_to_speech(text_to_speak=spanish, voice_to_use='es-US_SofiaVoice',
        file_name='spanish.wav')
```

### Main Program: Function `run_translator` (5 of 6)
* **Step 5**: Call **`play_audio`** to play the file **`'spanish.wav'`**.

```python
    # Step 5: Play the Spanish audio file
    play_audio(file_name='spanish.wav')
```

### Main Program: Function `run_translator` (6 of 6)
* **Steps 6–10** repeat previous steps for **Spanish speech to English speech**: 
    * **Step 6** **records** the Spanish audio
    * **Step 7** **transcribes** the **Spanish audio** to Spanish text using predefined model **`'es-ES_BroadbandModel'`**
    * **Step 8** **translates** the **Spanish text** to English text using predefined model **`'es-en'`** (Spanish-to-English)
    * **Step 9** **creates** the **English audio** using **`'en-US_AllisonVoice'`**
    * **Step 10** **plays** the English **audio**

```python
    # Step 6: Prompt for then record Spanish speech into an audio file
    input('Press Enter then speak the Spanish answer')
    record_audio('spanishresponse.wav')

    # Step 7: Transcribe the Spanish speech to Spanish text
    spanish = speech_to_text(
        file_name='spanishresponse.wav', model_id='es-ES_BroadbandModel')
    print('Spanish response:', spanish)

    # Step 8: Translate the Spanish text to English text
    english = translate(text_to_translate=spanish, model='es-en')
    print('English response:', english)

    # Step 9: Synthesize the English text to English speech
    text_to_speech(text_to_speak=english,
        voice_to_use='en-US_AllisonVoice',
        file_name='englishresponse.wav')

    # Step 10: Play the English audio
    play_audio(file_name='englishresponse.wav')
```

### Function `speech_to_text` (1 of 4) Accesses **Watson’s Speech to Text Service**
```python
def speech_to_text(file_name, model_id):
    """Use Watson Speech to Text to convert audio file to text."""
    # create Watson Speech to Text client 
    stt = SpeechToTextV1(iam_apikey=keys.speech_to_text_key)
```

### Function `speech_to_text` (2 of 4) Accesses **Watson’s Speech to Text Service**
```python
    # open the audio file 
    with open(file_name, 'rb') as audio_file:
        # pass the file to Watson for transcription
        result = stt.recognize(audio=audio_file, 
            content_type='audio/wav', model=model_id).get_result()
```

### Function `speech_to_text` (3 of 4)
* **`recognize`** returns a **`DetailedResponse` object** 
    * Depending on arguments to **`recognize`**, may contain intermediate and final results
    * Useful when transcribing **live audio**, such as a newscast
    * [Method `recognize`’s arguments and JSON response details](https://www.ibm.com/watson/developercloud/speech-to-text/api/v1/python.html?python#recognize-sessionless).
* **`getResult` method** returns **JSON** containing **`transcript`**:
    ![JSON returned from SpeechToTextV1 recognize method](./ch14images/RecognizeDetailedResponse.png "JSON returned from SpeechToTextV1 recognize method")


### Function `speech_to_text` (4 of 4) Accesses **Watson’s Speech to Text Service**
```python
    # Get the 'results' list. This may contain intermediate and final
    # results, depending on method recognize's arguments. We asked 
    # for only final results, so this list contains one element.
    results_list = result['results'] 

    # Get the final speech recognition result--the list's only element.
    speech_recognition_result  = results_list[0]

    # Get the 'alternatives' list. This may contain multiple alternative
    # transcriptions, depending on method recognize's arguments. We did
    # not ask for alternatives, so this list contains one element.
    alternatives_list = speech_recognition_result['alternatives']

    # Get the only alternative transcription from alternatives_list.
    first_alternative = alternatives_list[0]

    # Get the 'transcript' key's value, which contains the audio's 
    # text transcription.
    transcript = first_alternative['transcript']

    return transcript  # return the audio's text transcription
```

### Function `translate` (1 of 4) Accesses the **Watson Language Translator Service**
* Creates a **`LanguageTranslatorV3`**, passing **service version (`'2018-05-01'`)** and **API Key**
    * **Version string (`'2018-05-01'`)** changes only if IBM makes **breaking API changes** 
    * Service still responds using **API version you specify**
    * [More details](https://cloud.ibm.com/apidocs/language-translator?code=python)

### Function `translate` (2 of 4)
```python
def translate(text_to_translate, model):
    """Use Watson Language Translator to translate English to Spanish 
       (en-es) or Spanish to English (es-en) as specified by model."""
    # create Watson Translator client
    language_translator = LanguageTranslatorV3(version='2018-05-01',
        iam_apikey=keys.translate_key)

    # perform the translation
    translated_text = language_translator.translate(
        text=text_to_translate, model_id=model).get_result()
```

### Function `translate` Returns a **`DetailedResponse`** (4 of 4)
* **`getResult` method** returns **JSON** containing **translation**:  
    ![JSON returned from LanguageTranslatorV3 translate method](./ch14images/TranslateDetailedResponse.png "JSON returned from LanguageTranslatorV3 translate method")


### Function `translate` (3 of 4)
```python
    # get 'translations' list. If method translate's text argument has 
    # multiple strings, the list will have multiple entries. We passed
    # one string, so the list contains only one element.
    translations_list = translated_text['translations']
    
    # get translations_list's only element
    first_translation = translations_list[0]

    # get 'translation' key's value, which is the translated text
    translation = first_translation['translation']

    return translation  # return the translated string
```

### Function `text_to_speech` Accesses **Watson Text to Speech Service** (1 of 2)
* Creates a **`TextToSpeechV1` object** named `tts` (short for text-to-speech), passing the **API key**. 
* **`with` statement** opens audio file for writing.  

```python
def text_to_speech(text_to_speak, voice_to_use, file_name):
    """Use Watson Text to Speech to convert text to specified voice
       and save to a WAV file."""
    # create Text to Speech client
    tts = TextToSpeechV1(iam_apikey=keys.text_to_speech_key)

    # open file and write the synthesized audio content into the file
    with open(file_name, 'wb') as audio_file:
        audio_file.write(tts.synthesize(text_to_speak, 
            accept='audio/wav', voice=voice_to_use).get_result().content)
```

### Function `text_to_speech` (3 of 3)
* **`synthesize`** method's **`voice`** argument is a **predefined voice** 
    * **`'en-US_AllisonVoice'`** or **`'es-US_SofiaVoice'`** in this example
    *  [**Voices for various languages**](https://cloud.ibm.com/apidocs/text-to-speech?code=python)
* **`get_result`** returns a **`DetailedResponse`** containing **spoken audio as bytes**
    * **`content` attribute** gets the **audio bytes**

### Functions `record_audio` and `play_audio`
* **Instructor Note:** We recommend having students use these two functions as is, rather than presenting them in class

### Function `record_audio` (2 of 4)
```python 
def record_audio(file_name):
    """Use pyaudio to record 5 seconds of audio to a WAV file."""
    FRAME_RATE = 44100  # number of frames per second
    CHUNK = 1024  # number of frames read at a time
    FORMAT = pyaudio.paInt16  # each frame is a 16-bit (2-byte) integer
    CHANNELS = 2  # 2 samples per frame
    SECONDS = 5  # total recording time
```

```python 
    recorder = pyaudio.PyAudio()  # opens/closes audio streams

    # configure and open audio stream for recording (input=True)
    audio_stream = recorder.open(format=FORMAT, channels=CHANNELS, 
        rate=FRAME_RATE, input=True, frames_per_buffer=CHUNK)
    audio_frames = []  # stores raw bytes of mic input
    print('Recording 5 seconds of audio')

    # read 5 seconds of audio in CHUNK-sized pieces
    for i in range(0, int(FRAME_RATE * SECONDS / CHUNK)):
        audio_frames.append(audio_stream.read(CHUNK))
```

```python 
    print('Recording complete')
    audio_stream.stop_stream()  # stop recording
    audio_stream.close()  
    recorder.terminate()  # release underlying resources used by PyAudio

    # save audio_frames to a WAV file
    with wave.open(file_name, 'wb') as output_file:
        output_file.setnchannels(CHANNELS)
        output_file.setsampwidth(recorder.get_sample_size(FORMAT))
        output_file.setframerate(FRAME_RATE)
        output_file.writeframes(b''.join(audio_frames))
```

### Function `play_audio` Using Features of **`pydub`** and **`pydub.playback`** Modules 
```python
def play_audio(file_name):
    """Use the pydub module (pip install pydub) to play a WAV file."""
    sound = pydub.AudioSegment.from_wav(file_name)  # load audio
    pydub.playback.play(sound)  # play audio
```

### Executing the `run_translator` Function
* **`run_translator`** called only when **`SimpleLanguageTranslator.py`** executes as a script:

```python
if __name__ == '__main__':
    run_translator()
```

------
&copy;1992&ndash;2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book [**Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud**](https://amzn.to/2VvdnxE).

DISCLAIMER: The authors and publisher of this book have used their 
best efforts in preparing the book. These efforts include the 
development, research, and testing of the theories and programs 
to determine their effectiveness. The authors and publisher make 
no warranty of any kind, expressed or implied, with regard to these 
programs or to the documentation contained in these books. The authors 
and publisher shall not be liable in any event for incidental or 
consequential damages in connection with, or arising out of, the 
furnishing, performance, or use of these programs.                  