This repository was archived by the owner on Nov 29, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 38
VAPI-1390: Real-Time Transcription Doc Proposal #992
Merged
Merged
Changes from all commits
Commits
Show all changes
42 commits
Select commit
Hold shift + click to select a range
b1d68f4
VAPI-1390: Real-Time Transcription Doc Proposal
nashley 7a6f150
Merge remote-tracking branch 'origin/main' into vapi-1390-real-time-t…
nashley a967773
add to sidebar
nashley b6f62af
Fix mediaStreamStopped docs
nashley a6e34a2
more skeleton docs
nashley 4960784
add stopTranscription to sidebar
nashley b76d1b7
add webhooks to sidebar
nashley b8f64b2
replace 'transcription stream' with 'real-time transcription'
nashley 8267e2d
Merge remote-tracking branch 'origin/main' into vapi-1390-real-time-t…
nashley 44ddf68
add stabilized, make destination optional, cleanup, etc
nashley 7bcfda2
Merge remote-tracking branch 'origin/main' into vapi-1390-real-time-t…
nashley 00a2557
Add Code Snippets to Spec Files
DX-Bandwidth 86fc5f5
Add Code Snippets to Spec Files
DX-Bandwidth 491a660
Add Code Snippets to Spec Files
DX-Bandwidth 8d58dac
Add Code Snippets to Spec Files
DX-Bandwidth 8d208a8
Add Code Snippets to Spec Files
DX-Bandwidth ccc5175
Add Code Snippets to Spec Files
DX-Bandwidth 96a90af
Add Code Snippets to Spec Files
DX-Bandwidth bc5f7cc
Add Code Snippets to Spec Files
DX-Bandwidth a357439
Add Code Snippets to Spec Files
DX-Bandwidth 6a88dbb
s/Media/Transcription
nashley 9e0d88c
Merge remote-tracking branch 'origin/vapi-1390-real-time-transcriptio…
nashley 8eca0a4
clean up stopTranscription
nashley d1d8693
clean up realtimeTranscriptionRejected
nashley 006f607
make name option for stoptranscription
nashley 258618a
Merge remote-tracking branch 'origin/main' into vapi-1390-real-time-t…
nashley 9788b63
Merge branch 'main' into vapi-1390-real-time-transcription
nashley 525a52a
Merge branch 'main' into vapi-1390-real-time-transcription
nashley 1b6a7d0
Remove alternatives
nashley 7056649
document default stabilized behavior
nashley a6ae9fa
Change realtime -> realTime
marcelohossomi 29966bb
Add realTimeTranscriptionAvailable
marcelohossomi 8f179d0
Add realTimeTranscriptionAvailable
marcelohossomi 5a784a5
Fixes...
marcelohossomi bb93790
Fixes...
marcelohossomi 7d567cc
Update site/docs/voice/bxml/startTranscription.mdx
marcelohossomi 9997aff
PR comments
marcelohossomi 8d21720
Merge branch 'vapi-1390-real-time-transcription' of marcelohossomi.gi…
marcelohossomi 302fd47
Add startTime to realTimeTranscriptionAvailable.
marcelohossomi 59a6226
Merge branch 'main' into vapi-1390-real-time-transcription
marcelohossomi 55f8a01
Fix typo
marcelohossomi c4b2110
Fix typo
marcelohossomi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,210 @@ | ||
--- | ||
id: startTranscription | ||
title: Start Transcription | ||
slug: /voice/bxml/startTranscription | ||
description: A general overview of Bandwidth's startTranscription BXML Verb | ||
keywords: | ||
- bandwidth | ||
- voice | ||
- bxml | ||
- start | ||
- transcribing | ||
hide_title: false | ||
image: ../../static/img/bandwidth-logo.png | ||
--- | ||
|
||
import Tabs from '@theme/Tabs'; | ||
import TabItem from '@theme/TabItem'; | ||
|
||
The `StartTranscription` verb allows a segment of a call to be transcribed, and optionally for the live transcription to be sent off to another destination for additional processing. | ||
The transcription will continue until the call ends or the [`<StopTranscription>`][1] verb is used. | ||
When a `destination` is specified, live transcription updates for one or both sides (tracks) of the call will be sent to the specified destination. | ||
A total of 4 concurrent track transcriptions are allowed on a call. A `<StartTranscription>` request that uses `both` tracks will count as 2 of the permitted 4 concurrent track transcriptions. | ||
|
||
A call has only two tracks, which are named after the direction of the media from the perspective of the Programmable Voice platform: | ||
- `inbound`: media received by Programmable Voice from the call executing the BXML; | ||
marcelohossomi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- `outbound`: media sent by Programmable Voice to the call executing the BXML. | ||
nashley marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Note that this has no correlation to the direction of the call itself. For example, if either an inbound or outbound call is being transcribed and executes a `<SpeakSentence>`, the `inbound` track will be the callee's audio and the `outbound` track will be the text-to-speech audio. | ||
tmoney-bw marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Text Content | ||
|
||
There is no text content available to be set for the `<StartTranscription>` verb. | ||
|
||
## Attributes | ||
|
||
| Attribute | Description | | ||
|:-------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| name | (optional) A name to refer to this transcription by. Used when sending [`<StopTranscription>`][1]. If not provided, it will default to the generated transcription id as sent in the [`Real-Time Transcription Started`][2] webhook. | | ||
| tracks | (optional) The part of the call to send a transcription from. `inbound`, `outbound` or `both`. Default is `inbound`. | | ||
| transcriptionEventUrl | (optional) URL to send the associated Webhook events to during this real-time transcription's lifetime. Does not accept BXML. May be a relative URL. | | ||
| transcriptionEventMethod | (optional) The HTTP method to use for the request to `transcriptionEventUrl`. GET or POST. Default value is POST. | | ||
| username | (optional) The username to send in the HTTP request to `transcriptionEventUrl`. If specified, the `transcriptionEventUrl` must be TLS-encrypted (i.e., `https`). | | ||
| password | (optional) The password to send in the HTTP request to `transcriptionEventUrl`. If specified, the `transcriptionEventUrl` must be TLS-encrypted (i.e., `https`). | | ||
| destination | (optional) A websocket URI to send the transcription to. A transcription of the specified tracks will be sent via websocket to this URL as a series of JSON messages. See below for more details on the websocket packet format. | | ||
| stabilized | (optional) Whether to send transcription update events to the specified `destination` only after they have become stable. Requires `destination`. Defaults to `true`. | | ||
|
||
If the `destination` and `transcriptionEventUrl` attributes are specified, then the [Real-Time Transcription Started][2], [Real-Time Transcription Rejected][3] and [Real-Time Transcription Stopped][4] events will be sent to the URL when the transcription starts, if there is an error starting the transcription and when the transcription ends respectively. BXML returned in response to this callback will be ignored. | ||
If the `transcriptionEventUrl` attribute is specified, then the [Real-Time Transcription Available][5] event will be sent once the transcription has ended providing a URL from where the transcription can be downloaded. BXML returned in response to this callback will be ignored. | ||
|
||
:::note | ||
While multiple real-time transcriptions for the same call are allowed, each real-time transcription MUST have a unique name. Attempting to start a real-time transcription on the same call with the name of an already existing real-time transcription will result in a [Real-Time Transcription Rejected][3] event. | ||
::: | ||
|
||
## Webhooks Received | ||
|
||
| Webhooks | Can reply with more BXML | | ||
|:---------------------------|:-------------------------| | ||
| [Real-Time Transcription Started][2] | No | | ||
| [Real-Time Transcription Rejected][3] | No | | ||
| [Real-Time Transcription Stopped][4] | No | | ||
| [Real-Time Transcription Available][5] | No | | ||
|
||
## Nested Tags | ||
|
||
You may specify up to 12 `<CustomParam/>` elements nested within a `<StartTranscription>` tag. These elements define optional user specified parameters that will be sent to the destination URL when the real-time transcription is first started. | ||
|
||
### CustomParam Attributes | ||
|
||
| Attribute | Description | | ||
|:----------|:---------------------------------------------------------------| | ||
| name | (required) The name of this parameter, up to 256 characters. | | ||
| value | (required) The value of this parameter, up to 2048 characters. | | ||
|
||
## Websocket Packet Format | ||
|
||
If a `destination` is specified, it will be sent JSON messages for the duration of the real-time transcription. There will be an initial `start` message when the connection is first established. This will be followed by zero or more `transcription` messages containing transcription updates for the tracks being transcribed. Finally, when a real-time transcription is stopped, a `stop` message will be sent. | ||
|
||
### Start and Stop Message Parameters | ||
|
||
| Parameter | Description | | ||
|:-------------|:------------| | ||
| eventType | What type of message this is, one of `start`, or `stop` | | ||
| metadata | Details about the real-time transcription this message is for. See further details below. | | ||
| customParams | (optional) (`start` message only) If any `<CustomParam/>` elements were specified in the `<StartTranscription>` request, they will be copied here as a map of `name : value` pairs | | ||
|
||
#### Metadata Parameters | ||
|
||
| Parameter | Description | | ||
|:------------------------------|:------------| | ||
| accountId | The user account associated with the call | | ||
| callId | The call id associated with the real-time transcription | | ||
| realTimeTranscriptionId | The unique id of the real-time transcription | | ||
| transcriptionName | The user supplied name of the real-time transcription | | ||
| tracks | A list of one or more tracks being transcribed in real-time | | ||
| tracks.name | The name of the track being transcribed, will be used to identify which transcription updates belong to which track | | ||
| stabilized | Whether transcription updates will be sent only after they have become stable or not | | ||
|
||
### Transcription Message Parameters | ||
|
||
| Parameter | Description | | ||
|:----------|:-----------------------| | ||
| eventType | Will always be `transcription` | | ||
| track | The name of the track this transcription update is for, will be one of the names specified in the `start` message | | ||
| startTime | The time at which this segment started | | ||
| endTime | The time at which this segment ended | | ||
| isPartial | Indicates if the segment is complete | | ||
marcelohossomi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| language | The detected language of the segment | | ||
| transcript | The transcription of this segment as a flattened string | | ||
| items | The list of items making up this segment | | ||
| items.content | A word or punctuation | | ||
| items.startTime | The time at which this item started | | ||
| items.endTime | The time at which this item ended | | ||
| items.confidence | The confidence score associated with a word or phrase in your transcript. | | ||
| items.stable | Indicates whether the specified item is stable (true) or if it may change when the segment is complete (false). | | ||
| items.type | Either `PRONUNCIATION` or `PUNCTUATION` | | ||
|
||
## Examples | ||
|
||
### A `start` Websocket Message | ||
|
||
```json | ||
{ | ||
"eventType": "start", | ||
"metadata": { | ||
"accountId": "5555555", | ||
"callId": "c-2a913f94-7fa91773-a426-4118-8b8b-b691ab0a0ae1", | ||
"realTimeTranscriptionId": "s-2a913f94-93e372e2-60da-4c89-beb0-0d3a219b287c", | ||
"transcriptionName": "live_audience", | ||
"tracks": [ | ||
{ | ||
"name": "inbound", | ||
}, | ||
{ | ||
"name": "outbound", | ||
} | ||
] | ||
}, | ||
"customParams": { | ||
"foo": "bar", | ||
"foos": "bars" | ||
} | ||
} | ||
``` | ||
|
||
### A `transcription` Websocket Message | ||
```json | ||
{ | ||
"eventType": "transcription", | ||
"track": "inbound", | ||
"startTime": "2023-03-31T20:05.101Z", | ||
"endTime": "2023-03-31T20:07.493Z", | ||
marcelohossomi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
"isPartial": false, | ||
"language": "en-US", | ||
"transcript": "hello world!", | ||
"items": [ | ||
{ | ||
"content": "hello", | ||
"startTime": "2023-03-31T20:05.101Z", | ||
"endTime": "2023-03-31T20:06.285Z", | ||
"confidence": 0.9, | ||
"stable": true, | ||
"type": "PRONUNCIATION" | ||
}, | ||
{ | ||
"content": "world", | ||
"startTime": "2023-03-31T20:06.984Z", | ||
"endTime": "2023-03-31T20:07.493Z", | ||
"confidence": 0.6, | ||
"stable": true, | ||
"type": "PRONUNCIATION" | ||
}, | ||
{ | ||
"content": "!", | ||
"startTime": "2023-03-31T20:07.493Z", | ||
"endTime": "2023-03-31T20:07.493Z", | ||
"confidence": 0.9, | ||
"stable": false, | ||
"type": "PUNCTUATION" | ||
} | ||
] | ||
} | ||
``` | ||
|
||
### A `stop` Websocket Message | ||
|
||
```json | ||
{ | ||
"eventType": "stop", | ||
"metadata": { | ||
"accountId": "5555555", | ||
"callId": "c-2a913f94-7fa91773-a426-4118-8b8b-b691ab0a0ae1", | ||
"realTimeTranscriptionId": "s-2a913f94-93e372e2-60da-4c89-beb0-0d3a219b287c", | ||
"transcriptionName": "live_audience", | ||
"tracks": [ | ||
{ | ||
"name": "inbound", | ||
}, | ||
{ | ||
"name": "outbound", | ||
} | ||
] | ||
} | ||
} | ||
``` | ||
|
||
[1]: /docs/voice/bxml/stopTranscription | ||
[2]: /docs/voice/webhooks/realTimeTranscriptionStarted | ||
[3]: /docs/voice/webhooks/realTimeTranscriptionRejected | ||
[4]: /docs/voice/webhooks/realTimeTranscriptionStopped | ||
[5]: /docs/voice/webhooks/realTimeTranscriptionAvailable |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
--- | ||
id: stopTranscription | ||
title: Stop Transcription | ||
slug: /voice/bxml/stopTranscription | ||
description: A general overview of Bandwidth's StopTranscription BXML Verb | ||
keywords: | ||
- bandwidth | ||
- voice | ||
- bxml | ||
- stop | ||
- transcribing | ||
hide_title: false | ||
image: ../../static/img/bandwidth-logo.png | ||
--- | ||
|
||
import Tabs from '@theme/Tabs'; | ||
import TabItem from '@theme/TabItem'; | ||
|
||
The `StopTranscription` verb is used to stop a real-time transcription that was started with a previous [`<StartTranscription>`][1] verb. | ||
|
||
If there is no real-time transcription with the given name active on the call, this verb has no effect. | ||
If no `name` is specified, all active call transcriptions (does not include transcribed recordings) are stopped. | ||
|
||
## Text Content | ||
|
||
There is no text content available to be set for the `<StopTranscription>` verb. | ||
|
||
## Attributes | ||
|
||
| Attribute | Description | | ||
|:-------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| name | (optional) The name of the real-time transcription to stop. This is either the user selected name when sending the [`<StartTranscription>`][1] verb, or the system generated name returned in the [Real-Time Transcription Started][2] webhook if `<StartTranscription>` was sent with no `name` attribute. If no `name` is specified, then all active call transcriptions will be stopped. | | ||
|
||
## Webhooks Received | ||
|
||
| Webhooks | Can reply with more BXML | | ||
|:---------------------------|:-------------------------| | ||
| [Real-Time Transcription Stopped][3] | No | | ||
| [Real-Time Transcription Available][4] | No | | ||
|
||
## Examples | ||
|
||
[1]: /docs/voice/bxml/startTranscription | ||
[2]: /docs/voice/webhooks/realTimeTranscriptionStarted | ||
[3]: /docs/voice/webhooks/realTimeTranscriptionStopped | ||
[4]: /docs/voice/webhooks/realTimeTranscriptionAvailable |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
88 changes: 88 additions & 0 deletions
88
site/docs/voice/webhooks/realTimeTranscriptionAvailable.mdx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
--- | ||
id: realTimeTranscriptionAvailable | ||
title: Real-Time Transcription Available | ||
slug: /voice/webhooks/realTimeTranscriptionAvailable | ||
description: A general overview of Bandwidth's Real-Time Transcription Available Webhook | ||
keywords: | ||
- bandwidth | ||
- voice | ||
- webhook | ||
- transcribing | ||
- available | ||
hide_title: false | ||
image: ../../static/img/bandwidth-logo.png | ||
--- | ||
|
||
This event may be sent to the url specified when sending a [`<StartTranscription>`][1] verb. | ||
|
||
## Request Parameters | ||
|
||
| Property | Description | | ||
|:------------------------|:-------------| | ||
| accountId | The user account associated with the call. | | ||
| answerTime | Time the call was answered, in ISO 8601 format. | | ||
| applicationId | The id of the application associated with the call. | | ||
| callId | The call id associated with the event. | | ||
| callUrl | The URL of the call associated with the event. | | ||
| direction | The direction of the call. Either `inbound` or `outbound`. The direction of a call never changes. | | ||
| enqueuedTime | (optional) If [call queueing](/apis/voice#operation/createCall/) is enabled and this is an outbound call, this is the time the call was queued, in ISO 8601 format. Otherwise, this is omitted. | | ||
| eventTime | The approximate UTC date and time when the event was generated by the Bandwidth server, in ISO 8601 format. This may not be exactly the time of event execution. | | ||
| eventType | The event type, value is `realTimeTranscriptionAvailable` | | ||
| from | The provided identifier of the caller: can be a phone number in E.164 format (e.g. +15555555555) or one of `Private`, `Restricted`, `Unavailable`, or `Anonymous`. | | ||
| realTimeTranscription | Details about the transcription. | | ||
| realTimeTranscription.id | The unique id of the transcription. | | ||
| realTimeTranscription.name | The name of this transcription. If the `name` attribute was specified in the [`StartTranscription`][1] verb, then this will be the value of that attribute, otherwise it will default to the transcription id. | ||
| realTimeTranscription.startTime | The approximate UTC date and time the transcription was started | | ||
| realTimeTranscription.tracks | The segments of the call that are being sent in the transcription, values will be one or both of `inbound` and `outbound` | | ||
| realTimeTranscription.status | The status of the transcription. Can be either `available`, meaning that the transcription is ready for downloading, or `failed` otherwise. | | ||
| realTimeTranscription.url | The URL of the transcription. | | ||
| realTimeTranscription.completedTime | The time at which the transcription was completed and ready for download. | | ||
| realTimeTranscription.destination | (optional) The destination URL to which the transcription is sending media | | ||
| realTimeTranscription.stabilized | (optional) Whether to send transcription update events to the specified `destination` only after they have become stable. Requires `destination`. | | ||
| startTime | Time the call was started, in ISO 8601 format. | | ||
| to | The phone number that received the call, in E.164 format (e.g. +15555555555). | | ||
| tag | (optional) The `tag` specified on call creation. If no `tag` was specified or it was previously cleared, this field will not be present. | | ||
|
||
## Expected Response | ||
|
||
```http | ||
HTTP/1.1 204 | ||
``` | ||
|
||
## Examples | ||
|
||
### Real-Time Transcription Available event with destination | ||
|
||
```json | ||
POST http://myapp.example/realTimeTranscriptionEvents | ||
Content-Type: application/json | ||
|
||
{ | ||
"accountId" : "55555555", | ||
"answerTime" : "2022-06-30T18:55:02.080Z", | ||
"applicationId" : "7fc9698a-b04a-468b-9e8f-91238c0d0086", | ||
"callId" : "c-95ac912f-68aacdd7-4a8e-4223-a7fd-020e02fa6bf2", | ||
"callUrl" : "https://voice.bandwidth.com/api/v2/accounts/55555555/calls/c-95ac912f-68aacdd7-4a8e-4223-a7fd-020e02fa6bf2", | ||
"direction" : "outbound", | ||
"enqueuedTime" : "2022-06-30T18:54:59.172Z", | ||
"eventTime" : "2022-06-30T18:55:02.489Z", | ||
"eventType" : "realTimeTranscriptionAvailable", | ||
"from" : "+15551112222", | ||
"realTimeTranscription" : { | ||
"id" : "t-95ac90b3-bfc81595-35fc-4b64-8265-fab6855b74a2", | ||
"name" : "example_transcription", | ||
"startTime" : "2022-06-30T18:55:02.489Z", | ||
"tracks" : ["inbound", "outbound"], | ||
"destination" : "wss://websocket.myapp.example", | ||
"stabilized" : "true", | ||
"status" : "available", | ||
"url" : "https://voice.bandwidth.com/api/v2/accounts/55555555/calls/c-95ac912f-68aacdd7-4a8e-4223-a7fd-020e02fa6bf2/transcriptions/t-95ac90b3-bfc81595-35fc-4b64-8265-fab6855b74a2", | ||
"completedTime" : "2022-06-30T18:55:02.489Z", | ||
}, | ||
"startTime" : "2022-06-30T18:54:59.175Z", | ||
"to" : "+15553334444" | ||
} | ||
``` | ||
|
||
[1]: /docs/voice/bxml/startTranscription | ||
[2]: /docs/voice/bxml/startTranscription |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small but
transcription
should be here probablyThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for the other places ig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has "transcribing"