Skip to content

Commit

Permalink
Document the Watson speechToText action
Browse files Browse the repository at this point in the history
  • Loading branch information
jasonpet authored and Carlos Santana committed Jun 14, 2016
1 parent 8cee87b commit 5c2c004
Showing 1 changed file with 43 additions and 0 deletions.
43 changes: 43 additions & 0 deletions docs/catalog.md
Expand Up @@ -309,6 +309,7 @@ The package includes the following actions.
| `/whisk.system/watson` | package | username, password | Actions for the Watson analytics APIs |
| `/whisk.system/watson/translate` | action | translateFrom, translateTo, translateParam, username, password | Translate text |
| `/whisk.system/watson/languageId` | action | payload, username, password | Identify language |
| `/whisk.system/watson/speechToText` | action | payload, content_type, encoding, username, password, continuous, inactivity_timeout, interim_results, keywords, keywords_threshold, max_alternatives, model, timestamps, watson-token, word_alternatives_threshold, word_confidence, X-Watson-Learning-Opt-Out | Convert audio into text |
| `/whisk.system/watson/textToSpeech` | action | payload, voice, accept, encoding, username, password | Convert text into audio |

While not required, it's suggested that you create a package binding with the `username` and `password` values. This way you don't need to specify these credentials every time you invoke the actions in the package.
Expand Down Expand Up @@ -405,6 +406,48 @@ Here is an example of creating a package binding and converting some text to spe
```


### Converting speech to text

The `/whisk.system/watson/speechToText` action converts audio speech into text. The parameters are as follows:

- `username`: The Watson API username.
- `password`: The Watson API password.
- `payload`: The encoded speech binary data to turn into text.
- `content_type`: The MIME type of the audio.
- `encoding`: The encoding of the speech binary data.
- `continuous`: Indicates whether multiple final results that represent consecutive phrases separated by long pauses are returned.
- `inactivity_timeout`: The time in seconds after which, if only silence is detected in submitted audio, the connection is closed.
- `interim_results`: Indicates whether the service is to return interim results.
- `keywords`: A list of keywords to spot in the audio.
- `keywords_threshold`: A confidence value that is the lower bound for spotting a keyword.
- `max_alternatives`: The maximum number of alternative transcripts to be returned.
- `model`: The identifier of the model to be used for the recognition request.
- `timestamps`: Indicates whether time alignment is returned for each word.
- `watson-token`: Provides an authentication token for the service as an alternative to providing service credentials.
- `word_alternatives_threshold`: A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative.
- `word_confidence`: Indicates whether a confidence measure in the range of 0 to 1 is to be returned for each word.
- `X-Watson-Learning-Opt-Out`: Indicates whether to opt out of data collection for the call.

Here is an example of creating a package binding and converting speech to text.

1. Create a package binding with your Watson credentials.

```
$ wsk package bind /whisk.system/watson myWatson -p username 'MY_WATSON_USERNAME' -p password 'MY_WATSON_PASSWORD'
```

2. Invoke the `speechToText` action in your package binding to convert the encoded audio.

```
$ wsk action invoke myWatson/speechToText --blocking --result --param payload <base64 encoding of a .wav file> --param content_type 'audio/wav' --param encoding 'base64'
```
```
{
"data": "Hello Watson"
}
```


## Using the Slack package

The `/whisk.system/slack` package offers a convenient way to use the [Slack APIs](https://api.slack.com/).
Expand Down

0 comments on commit 5c2c004

Please sign in to comment.