Skip to content

TTS Alternatives

Michael Hallock edited this page Jun 4, 2021 · 4 revisions

The text to speech option functionally just handles taking the provided text, and some supported options, and builds a URL for http://translate.google.com that uses a "preview" endpoint of the Google TTS API that generates an MP3. It then sends a simple media cast command with this URL, equivalent to the following:

{
  payload: {
    type: "MEDIA",
    media: {
      url: "https://translate.google.com/translate_tts?ie=UTF-8&client=tw-ob&tl=en&ttsspeed=1&q=Test",
      contentType: "audio/mp3"
    }
  }
}

Unfortunately, this endpoint is not something that is officially supported by Google, and they have broken this implementation multiple times in the past with things like rate limits, CAPTCHAs, NONCE's, etc. This tends to happen more if Google detects repeated usage from your IP address at some unknown rate.

If this occurs, this page supplies some alternatives.

Alternative 1: Drop-in replacements

Because the TTS implementation can be easily duplicated as a regular MEDIA stream of any URL serving an MP3, any provider that can be URL driven can replace translate.google.com (including, your own endpoint: see Alternative 2).

Below are some providers reported to be compatible. Setup for each is out of scope of this document.

  1. VoiceRSS - Requires registering to get an API key, free tier is limited by number of requests per day.

Once a provider is selected and setup, you can replace a TTS cast with a simple media cast instead. Below is an example of how to build a message in a node-red function node:

let text = "Whatever you want to say";
return {
    payload: {
        type: "MEDIA",
        mediaL {    
            url: "https://api.voicerss.org/?key=InsertYourKeyHere&hl=en-us&c=MP3&src=" + encodeURIComponent(text),
            contentType: "audio/mp3"
        }
    }
};

Alternative 2: Own endpoint

If your provider is not designed in a way to allow the simple drop-in replacement approach, you can build your own "wrapper" endpoint around some more complicated setup instead. Node-red provides an easy way to do this (assuming you aren't securing the non-admin areas by the built in authentication).

Here are two different implementations that still utilize Google's services. Both of these options will expose an endpoint from node-red at "/textToSpeech?text=XXXX", which can then be cast to the target with the following message:

let text = "Whatever you want to say";
return {
    payload: {
        type: "MEDIA",
        media: {
            url: "https://urlOfYourNodeRedInstance/textToSpeech?text=" + encodeURIComponent(text),
            contentType: "audio/mp3"
        }
    }
};

Google translate V2

Google has begun to roll out a replacement to their translate.google.com endpoint that works differently. There is no guarantee that this endpoint will not suffer the same rate limit adjustments that Google made to the original, so this options is (a) a little harder to setup, and (b) might still have the same issues eventually.

The following flow provides an implementation of the textToSpeech endpoint that uses this endpoint.

[{"id":"41344e9a.6d9da","type":"function","z":"8b1b11bf.ab7d8","name":"Extract params","func":"let text = msg.payload.text;\nlet lang = 'en';\nlet slow = false;\nlet headers = {\n    \"Referer\": \"http://translate.google.com/\",\n    \"User-Agent\":\n        \"Mozilla/5.0 (Windows NT 10.0; WOW64) \" +\n        \"AppleWebKit/537.36 (KHTML, like Gecko) \" +\n        \"Chrome/47.0.2526.106 Safari/537.36\",\n    \"Content-Type\": \"application/x-www-form-urlencoded;charset=utf-8\"\n};\n\nreturn {\n    origMsg: msg,\n    headers: headers,\n    payload: 'f.req=' +\n        encodeURIComponent(\n            JSON.stringify([\n                [['jQ1olc', JSON.stringify([text, lang, slow ? true : null, 'null']), null, 'generic']],\n            ])\n        )\n};","outputs":1,"noerr":0,"initialize":"","finalize":"","libs":[],"x":1040,"y":820,"wires":[["c5c8001e.6e4f6"]]},{"id":"c5c8001e.6e4f6","type":"http request","z":"8b1b11bf.ab7d8","name":"","method":"POST","ret":"txt","paytoqs":"ignore","url":"https://translate.google.com/_/TranslateWebserverUi/data/batchexecute","tls":"","persist":false,"proxy":"","authType":"","x":1210,"y":820,"wires":[["116b4c4e.de5eb4"]]},{"id":"116b4c4e.de5eb4","type":"function","z":"8b1b11bf.ab7d8","name":"Build response","func":"let origMsg = msg.origMsg;\n\norigMsg.statusCode = 200;\norigMsg.headers = {\n    \"Content-Type\": \"audio/mp3\"\n};\n\nlet result = eval(msg.payload.slice(5))[0][2];\nlet result2 = eval(result)[0];\n\norigMsg.payload = new Buffer(result2, 'base64');\n\nreturn newMsg;","outputs":1,"noerr":0,"initialize":"","finalize":"","libs":[],"x":1040,"y":860,"wires":[["26541de0.f07992"]]},{"id":"26541de0.f07992","type":"http response","z":"8b1b11bf.ab7d8","name":"","statusCode":"","headers":{},"x":1190,"y":860,"wires":[]},{"id":"c0058fe7.b105a","type":"http in","z":"8b1b11bf.ab7d8","name":"","url":"/textToSpeech","method":"get","upload":false,"swaggerDoc":"","x":810,"y":820,"wires":[["41344e9a.6d9da"]]}]

Google TTS API Full

Using the same endpoint strategy, and the node-red-contrib-google-cloud nodes, you can also use the full Google TTS API to provide TTS translation. This has the benefit of not having the same limits applied... but with the additional cost of setting up a full Google Cloud account, which will require the following:

  1. Go to https://cloud.google.com/text-to-speech and walk through the steps to setup a Google Developer account. This will require a credit card attached to the Google account, and exceeding Google's "free tier" usage of any Google Service will be automatically charged. If you are uncomfortable with this setup, you will have to find alternative providers.
  2. Create a "credential" for your new Google Cloud account. When you do, it'll download a .JSON file.
  3. Install node-red-contrib-google-cloud
  4. Import the following flow for the textToSpeech endpoint, and copy and paste the contents of the credentials JSON file into a new "credential" in the Text-to-speech node from the flow.
[{"id":"3bb2af23.8ba42","type":"http in","z":"8b1b11bf.ab7d8","name":"","url":"/textToSpeech","method":"get","upload":false,"swaggerDoc":"","x":880,"y":900,"wires":[["331e2a71.6fb956"]]},{"id":"68700862.931ed8","type":"http response","z":"8b1b11bf.ab7d8","name":"","statusCode":"","headers":{},"x":1160,"y":1020,"wires":[]},{"id":"331e2a71.6fb956","type":"function","z":"8b1b11bf.ab7d8","name":"","func":"let text = msg.payload.text;\n\nmsg.payload = text;\n\nreturn msg;","outputs":1,"noerr":0,"initialize":"","finalize":"","libs":[],"x":1000,"y":960,"wires":[["2f4fe9aa.b1c766"]]},{"id":"2f4fe9aa.b1c766","type":"google-cloud-text-to-speech","z":"8b1b11bf.ab7d8","account":"","keyFilename":"","name":"","languageCode":"en-US","gender":"FEMALE","encoding":"MP3","rate":1,"pitch":0,"voiceName":"","x":1180,"y":960,"wires":[["344f04e6.6dd11c"]]},{"id":"344f04e6.6dd11c","type":"function","z":"8b1b11bf.ab7d8","name":"","func":"msg.statusCode = 200;\nmsg.headers = {\n    \"content-type\": msg.payload.audio.mime\n};\n\nmsg.payload = msg.payload.audio.data;\n\nreturn msg;","outputs":1,"noerr":0,"initialize":"","finalize":"","libs":[],"x":1000,"y":1020,"wires":[["68700862.931ed8"]]}]

Other MP3 Providers

Note that the two examples above functionally work the same way, but replace what is generating the MP3. Other providers can be subbed into the same approach. For instance, one could use a local TTS service external to node-red as well here just as easily if you can get an MP3 out of it. The two function nodes in the flow just need to be changed to facilitate this MP3 generation, and write the MP3 binary Buffer to the HTTP response node. New alternatives may be added here as they are found.