Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

Sample no longer works with Custom Speech service after //BUILD 2018 product updates #81

Open
mikebranstein opened this issue May 18, 2018 · 20 comments

Comments

@mikebranstein
Copy link

mikebranstein commented May 18, 2018

The JavaScript SDK only works with Bing Speech API endpoints. Custom Speech endpoints need to be supported. PR incoming.

@mikebranstein
Copy link
Author

PR #82 submitted.

@mageshpurpleslate
Copy link

Hi,

I have used the sample and changed the URI to wss://westus.stt.speech.microsoft.com in the speechConnectionFactory.js. I kept getting error 403 Forbidden.

May I know what should be the URI?

Thanks in Advance.

@mikebranstein
Copy link
Author

@mageshpurpleslate it depends. If you're using the Bing Speech service, nothing needs to change. If you're going to use the Custom Speech Service, you need to append an endpoint Id to the URI. Check out PR #82 for the details on everything that needs to change.

@mageshpurpleslate
Copy link

Mike,

Thanks a lot for your help. Works like a charm. Pretty nicely done.

Is this acceptable to send the API subscription key in the query parameter? Are you planning to do any changes to that?

Regards,

Magesh

@mikebranstein
Copy link
Author

@mageshpurpleslate I'm glad this worked for you - I recall starting out with Bing and Custom Speech ~ a year ago and the samples were pretty rough.

There are 2 ways to authenticate to the speech services with WebSockets. The first is using the the query string format. It's acceptable to send it that way because it's over HTTPS (WSS). The second way to authenticate is to pre-authenticate with an HTTP POST to the Cognitive Services secure token service. This returns a bearer token that is added to the WebSocket connection header. Docs on how to do this is here.

@mraguraman3
Copy link

@mikebranstein - I tried the custom speech implementation with your proposed code changes. But i am getting "403 Forbidden error" in the WSS call. The path i have copied from F12 dev tools looks like:

wss://westus.stt.speech.microsoft.com/speech/recognition/interactive/cognitiveservices/v1?cid=https://westus.api.cognitive.microsoft.com/sts/v1.0&format=simple&language=en-US&Ocp-Apim-Subscription-Key=<...... key......>&X-ConnectionId=<..... connection id .... >

Is this a valid path formation, have you ever faced 403 error during your testing ?

@mikebranstein
Copy link
Author

mikebranstein commented Jul 9, 2018

@mraguraman3 I believe you are placing the entire endpoint URL in the "Custom Speech Endpoint ID" textbox. Instead, use the Endpoint ID, which is a GUID. You can find the endpoint ID on the custom speech portal.

@mraguraman3
Copy link

Thanks a lot @mikebranstein , its working now after placing the endpoint ID.

Also you mentioned that we can use token based authentication, so in this case we don't need to pass endpoint ID in HTTP Post header , just the subscription key is enough to generate the token ?

@mikebranstein
Copy link
Author

@mraguraman3 - yes, token-based auth is also available. I did not use token-based auth because the original solution used the query string parameter auth. I wanted to augment the solution in a specific way for this PR. A different PR would be necessary to change the auth.

@mraguraman3
Copy link

mraguraman3 commented Jul 12, 2018

Thanks @mikebranstein .

Anyways i can confirm token based auth is not working with the custom speech implementation.

Though the token is generated using the subscription ID, I am getting 401 Unauthorized when hitting the web sockets. Below is the wss call format:

wss://westus.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?cid=#endpointID#&format=simple&language=en-US&Authorization=#token#

@mikebranstein
Copy link
Author

@mraguraman3 token based auth does work, but it's tricky. I have a C# SDK I had to roll for the Custom Speech Service websocket speech protocol before Microsoft released their own.

@mraguraman3
Copy link

mraguraman3 commented Jul 13, 2018

Great @mikebranstein, but i am using Javascript Node App to generate the token. Anyways, all i want to know is whether this is a valid wss call format or am i missing some thing ?

wss://westus.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?cid=#endpointID#&format=simple&language=en-US&Authorization=#token#

@mikebranstein
Copy link
Author

@mraguraman3 the C# SDK was released after //BUILD this year. You can find it on NuGet: https://www.nuget.org/packages/Microsoft.CognitiveServices.Speech/.

@mraguraman3
Copy link

Thanks @mikebranstein , but i don't think there is an option in this SDK to provide the endpoint ID for custom speech.

Only EndpointURL is supported which i believe is the actual http host for speech service. Here is the documentation of supported properties in C# sdk:

https://docs.microsoft.com/en-gb/dotnet/api/microsoft.cognitiveservices.speech.speechfactory?view=azure-dotnet

Do you have any plans to support token auth in "SpeechToText-WebSockets-Javascript" for custom speech ?

@mikebranstein
Copy link
Author

@mraguraman3 from what I understand, the EndpointURL property is part of the URL. So, the EndpointURL for custom speech could be wss://westus.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?cid=#endpointID#.

Underneath the SDK the streaming protocol supported is the Speech Service WebSocket protocol, outlined here: https://docs.microsoft.com/en-us/azure/cognitive-services/speech/api-reference-rest/websocketprotocol.

If you were going to implement the speech protocol yourself, you'd have to request an auth token using your subscription id (like this code snippet below).

 private async Task<string> FetchToken()
{
  using (var client = new HttpClient())
  {
    client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", "<Subscription Id>");
    UriBuilder uriBuilder = new UriBuilder("https://westus.api.cognitive.microsoft.com/sts/v1.0/issueToken");

    var result = await client.PostAsync(uriBuilder.Uri.AbsoluteUri, null);
    return await result.Content.ReadAsStringAsync();
  }
}

When you have that token, you can use a ClientWebSocket and set the Authorization bearer token on the web socket connection. Assuming _cws is the client web socket:

var authToken = await FetchToken();
_cws.Options.SetRequestHeader("Authorization", $"Bearer {authToken}");

In review of the JavaScript SDK, it supports auth token connections. The sample HTML does not use it, but you can modify the sample code slightly to take advantage of the auth token approach. See

. I believe you can change this value to true and your solution would use the auth token approach.

@mageshpurpleslate
Copy link

Hi @mikebranstein, I am here to check one more item with you. Is there a way, we can save the clip, while it is being sent for recognition as well? We are trying to save it for auditing purposes.

@mikebranstein
Copy link
Author

@mageshpurpleslate there's no native SDK way of doing this (to my knowledge), so you'd have to write the code to do this. For example, you could write a middle layer that collects the audio from a microphone, then funnels it to your desired location, then writes the same stream to the Speech SDK. If you don't want to do that client-side with JavaScript, then you could host your own WebSocket app that uses the C# Speech SDK. Your websocket app would act as the middle layer, intercepting the audio stream. I have a solution that does this that is hosted as a Service Fabric Web Socket app in Azure.

@mikebranstein
Copy link
Author

@mageshpurpleslate After thinking for a few more minutes, the C# SDK has a custom audio source/stream you can create. You could create one that audits the audio bytes as they are being fed to the service via the SDK.

@mageshpurpleslate
Copy link

@mikebranstein thank you. This would help. I will try it out.

@hellowonders
Copy link

wss://westus.stt.speech.microsoft.com is working for latest speech api.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants