# **Text to Speech Demo**


---


The notebook demonstrates how the OpenDevLibrary can be used to run Speech to Text recognition using OpenVINO. For more speech-related applications, check out the official GitHub page: [Speech-VINO](https://github.com/Speech-VINO).It is inspired by the OpenVINO v2020.1 docs for [Offline Speech Recognition Demo](https://docs.openvinotoolkit.org/latest/_inference_engine_samples_speech_libs_and_demos_Offline_speech_recognition_demo.html). 

Users can test/use the notebook to run custom files by editing the **TODO** sections in the code cells. 

**Note: The version for OpenVINO matters as it has been observed that different versions have slighly different information**


##**Install packages and dependencies**

In [0]:
#For audio preprocessing and audio manipulation
import wave
!apt-get install libsox-fmt-all libsox-dev sox

##**Install OpenVINO toolkit and dependencies**


In [0]:
!wget "https://raw.githubusercontent.com/alihussainia/OpenDevLibrary/master/openvino_initialization_script.py"
!python openvino_initialization_script.py

## **Initialize the OpenVINO environment and download related files**
The bash script will
- Download pre-trained Intel Models
- Create configuration file (needed for making inference on speech)
- Required pre-requisites for LibriSpeech Model (graph file, etc)
- Test **Online** and **Offline** demos to validate if all pre-requisites were installed properly. 

**Note: If working on Google Collab, the online demo may not work and hence the execution may seem to get entrapped into a never ending cycle. To avoid this you must replace the "demo_speech_recognition.sh" file with a custom one.**

So, we tackle this issue by un-commenting the call below. This will replace the original file with a modified one, maintained in the repository.


In [0]:
#!wget -P "/content/" "https://github.com/PrashantDandriyal/OpenDevLibrary/tree/master/demo_files/Speech_to_Text/demo_speech_recognition.sh"
#!rm "/opt/intel/openvino_2020.1.023/deployment_tools/demo/demo_speech_recognition.sh" 
#!cp -f "/content/demo_speech_recognition.sh" "/opt/intel/openvino_2020.1.023/deployment_tools/demo/demo_speech_recognition.sh" 

In [0]:
!bash /opt/intel/openvino_2020.1.023/deployment_tools/demo/demo_speech_recognition.sh

## **Running Offline DEMO**
*(To use it for custom WAV file, edit the "run_demo.sh" file and add the path to your file)*

The output generated by the *run_demo.sh* is similar to :


>[ INFO ] Using feature transformation /root/openvino_models/ir/intel/lspeech_s5_ext/FP32/lspeech_s5_ext.feature_transform        
[ INFO ] InferenceEngine API ver. 2.1 (build: 37988)        
[ INFO ] Device info:        
[ INFO ] 	CPU: MKLDNNPlugin ver. 2.1        
[ INFO ] Batch size: 8        
[ INFO ] Model loading time: 49.93 ms        
Recognition result:        
**HOW ARE YOU DOING**

We extract this output (in a naive way) by simply asking *sed* method to filter the console output as we wish to use only the text generated from the speech.

Next, we save this output to a txt file.



In [0]:
import os
# TODO:
#Add the path to your WAV file
!wget "https://github.com/PrashantDandriyal/OpenDevLibrary/tree/master/demo_files/Speech_to_Text/blowup.wav"
wav_path = "/content/blowup.wav"

##**Preprocessing audio file**
As per the OpenVINO v2020.1 docs [here](https://docs.openvinotoolkit.org/latest/_inference_engine_samples_speech_libs_and_demos_Offline_speech_recognition_demo.html), WAV file needs to be in following format: RIFF WAVE PCM 16bit, 16kHz, 1 channel i.e.,

>Sample size : 16bit    
Sampling Rate : 16kHz    
Number of channels : 1        

We preprocess audio and convert it if needed and replace the old file with new.

In [0]:
def preprocess(org_aud_path):
  tx = wave.open(org_aud_path, 'r')
  print ("Initial Parameters:")
  !sox --i "$org_aud_path"
  if(tx.getnchannels() > 1):
    #Convert stereo to mono
    #and replace the original with new
    !sox "$org_aud_path" processed.wav channels 1
    !rm -r "$org_aud_path"
    !mv "processed.wav" "$org_aud_path"
    print("Converted Stereo to Mono")

  if(tx.getframerate() != 16000):
    #Downsample (if > 16k) and Upsample (if < 16k)
    #and replace the original with new
    !sox "$org_aud_path" processed.wav rate 16000
    !rm -r "$org_aud_path"
    !mv "processed.wav" "$org_aud_path"
    print("Changed sample rate to 16k")

    print("Processed file into the same path with name 'processed.wav' ")

preprocess(wav_path)
print("Update file parameters")
!sox --i "$wav_path"

The demo uses the "how_are_you_doing.wav" audio file stored in the location         
```/opt/intel/openvino/deployment_tools/demo/how_are_you_doing.wav```
This file is fed to the inference engine using the bash file ```run_demo.sh```. Instead of editing another bash file or creating a new one, we rename our WAV file to **how_are_you_doing.wav`** and replace the original file with ours.

In [0]:
%cd "/content/"

#Rename file here OR edit the bash file
!mv "$wav_path" "how_are_you_doing.wav"

#Replace the file for test on custom file by removing it first 
!rm -r "/opt/intel/openvino/deployment_tools/demo/how_are_you_doing.wav"
!cp "/content/how_are_you_doing.wav" "/opt/intel/openvino/deployment_tools/demo/"


##**Perform Inference**
The OpenVINO dependencies have successfully been installed and the environment has also been initialized. Its time to make the inference ! Run the cell to make inference. As the shell script echoes the result onto the terminal, we use ```sed``` piping to publish our results onto a text file. Another instance of the same command but without this pipe is run, to provide status of the inference.

In [0]:
!/opt/intel/openvino/data_processing/audio/speech_recognition/demos/offline_speech_recognition_demo/run_demo.sh 
#Running again to save the output
!/opt/intel/openvino/data_processing/audio/speech_recognition/demos/offline_speech_recognition_demo/run_demo.sh | sed '1,/Recognition result/d' > /content/out_text.txt 


##Important filepaths
####WAV file path: 
**"/opt/intel/openvino/deployment_tools/demo/how_are_you_doing.wav"**

####Configuration file path: 
**"/root/openvino_models/ir/intel/lspeech_s5_ext/FP32/speech_lib.cfg"**