# Text to Speech and Speech to Text APIs
### Preconditions:
* Installed google cloud speech, texttospeech and storage client libraries

``` 
pip3 install google-cloud-speech google-cloud-texttospeech google-cloud-storage --user
```

* Configured Google cloud account
* Created project
* Speech and storage APIs should be [enabled for the project](https://console.cloud.google.com/apis/library) using GCloud Console
* Created [service account](https://console.cloud.google.com/apis/credentials) to enable cloud manipulation via client libraries
* Downloaded service account authentification JSON file. The path to this file should be exported as __GOOGLE_APPLICATION_CREDENTIALS__ environment variable.

# Google Text to Speech API

In [1]:
# google cloud text to speech client library
from google.cloud import texttospeech

In [2]:
def text_to_speech(input_path: str, 
                   voice:texttospeech.types.VoiceSelectionParams= None,
                   output_path:str = "output.mp3"):
    with open(input_path, "r") as input_file:
        input_text = input_file.read()
    
    input = texttospeech.types.SynthesisInput(ssml=input_text)

    # voice = texttospeech.types.VoiceSelectionParams(
    #     language_code='uk-UA',
    #     name='uk-UA-Wavenet-A',
    #     ssml_gender=texttospeech.enums.SsmlVoiceGender.FEMALE)
    if not voice:
        voice = texttospeech.types.VoiceSelectionParams(
            language_code='en-US',
            ssml_gender=texttospeech.enums.SsmlVoiceGender.MALE)

    audio_config = texttospeech.types.AudioConfig(audio_encoding=texttospeech.enums.AudioEncoding.MP3)

    client = texttospeech.TextToSpeechClient()

    response = client.synthesize_speech(input, voice, audio_config)

    with open(output_path, 'wb') as out:
        # Write the response to the output file.
        out.write(response.audio_content)
        print('Audio content written to file "{}"'.format(output_path))

### Speech synthesis markup language
Speech to text API can read text of any kind. However, the qualiry of the result can be improved by adding a special markup language to the input. Example:
```
<speak>
  This is an introduction to the Google Cloud text to speech and speech to text <say-as interpret-as="characters">APIs</say-as> <break time="1s"/>.

  In order to run this Jupyter Notebook, you have to complete the following prerequisites:<break time="500ms"/>

  <say-as interpret-as="ordinal">1</say-as>. You have to install Google Cloud Python Client libraries.<break time="500ms"/>
  <say-as interpret-as="ordinal">2</say-as>. You have to create both a Google Cloud account with a configured billing and a new project.<break time="500ms"/>
  <say-as interpret-as="ordinal">3</say-as>. You have to enable corresponding <say-as interpret-as="characters">APIs</say-as> for your project.<break time="500ms"/>

  Let's make a moderate emphasis. <emphasis level="moderate">You have to complete all of the prerequisites in order to run the code in this presentation</emphasis>. Once again, we will strongly emphasise: <emphasis level="strong"> all of the prerequisites</emphasis>!
</speak>
```

In [6]:
text_to_speech("introduction.txt")
!xdg-open output.mp3

Audio content written to file "output.mp3"


### Other voices
Google Cloud Speech to Text API supports a lot of voices in different languages. The cool thing is that you can read the text in English using, for example, Russian or Ukrainian language. 

In [3]:
text_to_speech("small_part.txt",
               voice = texttospeech.types.VoiceSelectionParams(
                                        language_code="ru-RU",
                                        ssml_gender=texttospeech.enums.SsmlVoiceGender.MALE),
               output_path="rus_small_part.mp3")
!xdg-open rus_small_part.mp3

Audio content written to file "rus_small_part.mp3"


In [11]:
text_to_speech("small_part.txt",
               voice = texttospeech.types.VoiceSelectionParams(
                                        language_code="uk-UK",
                                        ssml_gender=texttospeech.enums.SsmlVoiceGender.FEMALE),
              output_path="ukr_small_part.mp3")
!xdg-open ukr_small_part.mp3

Audio content written to file "output.mp3"


# Google Speech to Text API

Currently, this API supports speech recognition from the files uploaded to the Googel Cloud Storage. To dynamically upload the generated audio files to the Cloud Storage, we will use Storage client library.

In [22]:
import io

from google.cloud import storage
from google.cloud import speech_v1p1beta1 as speech

Uploading the input file to Google Cloud Storage

In [4]:
def upload_to_storage(file_path:str):
    storage_client = storage.Client()
    bucket_name = "tym-chud-general-bucket"

    bucket = storage_client.bucket(bucket_name)

    file_blob = bucket.blob(file_path)
    file_blob.upload_from_filename(file_path)

In [8]:
text_to_speech("small_part.txt")
upload_to_storage("output.mp3")

Audio content written to file "output.mp3"


In [18]:
def print_speech_to_text(input_file_name:str, language:str = 'en-US'):
    client = speech.SpeechClient()

    sample_rate = 24000

    config = {
        "language_code": language,
        "sample_rate_hertz": sample_rate,
        "encoding": speech.enums.RecognitionConfig.AudioEncoding.MP3,
    }

    audio = {
        "uri": "gs://tym-chud-general-bucket/{}".format(input_file_name)
    }

    response = client.recognize(config, audio)
    for result in response.results:
        print(result)

In [19]:
print_speech_to_text("output.mp3")

alternatives {
  transcript: "this is an introduction to the Google Cloud text to speech to text"
  confidence: 0.9250052571296692
}
language_code: "en-us"



### Streaming API
It is possible to recognize speech from a local file using the streaming API. 

In [29]:
def print_speech_to_text_streaming(input_file_name:str, language:str = 'en-US'):
    client = speech.SpeechClient()

    config = speech.types.RecognitionConfig(
        encoding=speech.enums.RecognitionConfig.AudioEncoding.MP3,
        language_code='en-US',
        sample_rate_hertz=24000,
    )

    with io.open(input_file_name, "rb") as stream:
        requests = [speech.types.StreamingRecognizeRequest(audio_content=stream.read())]

    results = client.streaming_recognize(speech.types.StreamingRecognitionConfig(config=config), requests)
    for result in results:
        print("="*20)
        print(result)

In [30]:
print_speech_to_text_streaming("output.mp3")

results {
  alternatives {
    transcript: "this is an introduction to the Google Cloud text to speech to text"
    confidence: 0.9419823288917542
  }
  is_final: true
  result_end_time {
    seconds: 5
    nanos: 340000000
  }
  language_code: "en-us"
}

