Text to Speech

Barry Walker edited this page Mar 2, 2015 · 7 revisions

DEPRECATION NOTICE: This is old documentation relevant to Adhearsion 1.x and will soon be removed. See the main documentation for up-to-date info.

##Overview##

There are several engines and approaches that may be used to generate Text to Speech (TTS) sound files for your voice-enabled applications. This includes the ability to generate sound files on a call by call basis on the fly through database driven applications. You can also create a group of custom sound files that have the same 'brand' throughout your application without having to hire professional voice talent.

This page covers some of the numerous approaches to generating TTS files. Additions to this page are welcome, as developers explore more engines and uses of TTS in their applications.

##Howto##

TTS works by passing a text string to a TTS application and generating a sound file. That sound file is then played back to the caller just like any other sound file. If you're going to generate dynamic sound files on the fly from a database, you will need to use a GUID for unique filenames and a temporary directory to store them in. We recommend creating a sub-directory in the standard Asterisk sound files directory that uses your application name:

/var/lib/asterisk/sounds/you_ahn_app_name

If you're using Asterisk 1.8.x the sound files directory is located at:

/var/lib/asterisk/sounds/en/

This directory must be readable and writable by the user running your Adhearsion application. Here is an example showing how you can generate the temporary file names used to generate the TTS sound files:

filename = COMPONENTS.your_ahn_app_name[:sound_dir] + new_guid

This generates a filename based on the directory you set in your configuration file for your component, as well as using the new_guid method of Adhearsion to generate the unique file name. The next step is to invoke the engine you chose based on the unique method for each.

One thing to keep in mind when generating your files is the sample rate and format of your audio files. Since Asterisk tends to be connected to the telephone network, an 8K sampling rate is generally what you want to generate. Also, it's best to generate the format in the same format as the rest of your sound files, although not required since Asterisk can transcode audio formats. The best formats are 'ulaw/alaw', 'gsm' or 'wav'.

##Cepstral##

Cepstral is an easy to install, commercially licensed engine for TTS with good quality voices.

text = "What I would like to say"
system("swift -o #{filename}.wav -p audio/channels=1,audio/sampling-rate=8000 '#{text}'")
play filename

The above shows to generate an audio file using Cepstral TTS, named for the Ruby variable filename and based on the text stored in the text variable.

##Festival##

Festival is an open source speech engine that may be freely downloaded and installed. In comparison to Cebstral, it's somewhat more complex to install and manage.

text = "What I would like to say"
system("echo #{text} | text2wave -o #{filename} -otype ulaw")
play filename

##PlainTalk##

PlainTalk is the speech engine included with OSX. If you use a Mac for development, this is a great engine to use since it's already installed.

text = "What I would like to say"
system("say -v #{COMPONENTS.your_ahn_app_name[:sound_dir]} -o #{filename}.aiff #{text}")
system("sox #{filename}.aiff -r 8000 -t ul #{filename}.ulaw")
play filename

##VoiceForge##

VoiceForge provides a web service based on SOAP/XML for passing a text string and returning an audio file. This is a great approach, given that you do not need to acquire a local TTS engine. VoiceForge also makes a multitude of different voice available.

##Google Translate##

Google Translate provides not only text input language translation but also the ability to listen to the text. There's a gem called tts that you can use in your application to consume Google Translate TTS.

require 'tts'
text = "Google Translate is purely awesome"
filename = text.downcase.gsub(' ','-')
text.to_file "en", "/tmp/#{filename}.mp3"
system("sox /tmp/#{filename}.mp3 -r 8000 -t ul /var/lib/asterisk/en/#{filename}.ulaw")
play filename

Usage details coming soon...

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.