Document the Watson speechToText action

apache · Jun 14, 2016 · 5c2c004 · 5c2c004
1 parent 8cee87b
commit 5c2c004
Showing 1 changed file with 43 additions and 0 deletions.
diff --git a/docs/catalog.md b/docs/catalog.md
@@ -309,6 +309,7 @@ The package includes the following actions.
 | `/whisk.system/watson` | package | username, password | Actions for the Watson analytics APIs |
 | `/whisk.system/watson/translate` | action | translateFrom, translateTo, translateParam, username, password | Translate text |
 | `/whisk.system/watson/languageId` | action | payload, username, password | Identify language |
+| `/whisk.system/watson/speechToText` | action | payload, content_type, encoding, username, password, continuous, inactivity_timeout, interim_results, keywords, keywords_threshold, max_alternatives, model, timestamps, watson-token, word_alternatives_threshold, word_confidence, X-Watson-Learning-Opt-Out | Convert audio into text |
 | `/whisk.system/watson/textToSpeech` | action | payload, voice, accept, encoding, username, password | Convert text into audio |
 
 While not required, it's suggested that you create a package binding with the `username` and `password` values. This way you don't need to specify these credentials every time you invoke the actions in the package.
@@ -405,6 +406,48 @@ Here is an example of creating a package binding and converting some text to spe
   ```
 
 
+### Converting speech to text
+
+The `/whisk.system/watson/speechToText` action converts audio speech into text. The parameters are as follows:
+
+- `username`: The Watson API username.
+- `password`: The Watson API password.
+- `payload`: The encoded speech binary data to turn into text.
+- `content_type`: The MIME type of the audio.
+- `encoding`: The encoding of the speech binary data.
+- `continuous`: Indicates whether multiple final results that represent consecutive phrases separated by long pauses are returned.
+- `inactivity_timeout`: The time in seconds after which, if only silence is detected in submitted audio, the connection is closed.
+- `interim_results`: Indicates whether the service is to return interim results.
+- `keywords`: A list of keywords to spot in the audio.
+- `keywords_threshold`: A confidence value that is the lower bound for spotting a keyword.
+- `max_alternatives`: The maximum number of alternative transcripts to be returned.
+- `model`: The identifier of the model to be used for the recognition request.
+- `timestamps`: Indicates whether time alignment is returned for each word.
+- `watson-token`: Provides an authentication token for the service as an alternative to providing service credentials.
+- `word_alternatives_threshold`: A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative.
+- `word_confidence`: Indicates whether a confidence measure in the range of 0 to 1 is to be returned for each word.
+- `X-Watson-Learning-Opt-Out`: Indicates whether to opt out of data collection for the call.
+
+Here is an example of creating a package binding and converting speech to text.
+
+1. Create a package binding with your Watson credentials.
+
+  ```
+  $ wsk package bind /whisk.system/watson myWatson -p username 'MY_WATSON_USERNAME' -p password 'MY_WATSON_PASSWORD'
+  ```
+
+2. Invoke the `speechToText` action in your package binding to convert the encoded audio.
+
+  ```
+  $ wsk action invoke myWatson/speechToText --blocking --result --param payload <base64 encoding of a .wav file> --param content_type 'audio/wav' --param encoding 'base64'
+  ```
+  ```
+  {
+    "data": "Hello Watson"
+  }
+  ```
+
+
 ## Using the Slack package
 
 The `/whisk.system/slack` package offers a convenient way to use the [Slack APIs](https://api.slack.com/).