using pronunciation assessment - rest api problem #680

sathayen · 2020-06-09T18:39:27Z

Hello team -- this related to issue 618

I am trying to use the pronunciation assessment rest api and getting an unsupported audio format error, despite the audio being a wav file with a sample rate of 16000. Here is my curl command ( added new lines for easier reading)

curl -X POST "https://centralindia.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=en-US"

-H "accept: application/json"

-H "Ocp-Apim-Subscription-Key: MY_SUBSCRIPTION_KEY"

-H "Content-Type: audio/wav; codecs=audio/pcm; samplerate=16000"

-H "Pronunciation-Assessment: eyJSZWZlcmVuY2VUZXh0IjoiemVybyIsIkdyYWRpbmdTeXN0ZW0iOiJIdW5kcmVkTWFyayIsIkdyYW51bGFyaXR5IjoiRnVsbFRleHQiLCJEaW1lbnNpb24iOiJDb21wcmVoZW5zaXZlIn0="

-d "{ \"recordingsUrl\": \"https://MYBLOBSTORAGE.blob.core.windows.net/MYCONTAINER/MY_AUDIO.wav\", \"locale\": \"en-US\", \"name\": \"Transcription using locale en-US\"}"

I have uploaded the audio file uploaded as a blob in my blobstorage account on azure. I have also verified separately that I can create a transcription for this file.
The pronunciation assessement is a base64 format conversion for this json -- {"ReferenceText": "zero","GradingSystem": "HundredMark","Granularity":"FullText","Dimension": "Comprehensive"} . I converted this json directly to base64 using this tool
Am I doing anything wrong in the code above? Could you please post a sample code that shows how to feed in the audio file in the curl command (if different than how I am sending this data) ?
If there is nothing wrong in this code, can you please share any audio example that works (so that I can create similar audio files)

#618
@ram-msft -- see also this issue

The text was updated successfully, but these errors were encountered:

pankopon · 2020-06-10T00:05:27Z

@ram-msft or @yinhew Please comment.

yinhew · 2020-06-10T01:44:42Z

@sathayen does your wave file have a riff header? The riff header is required by REST API.
If it does have a riff header, do you mind sharing the file so that we can look into the detail of format?

sathayen · 2020-06-10T03:57:50Z

0_jackson_0_16000.wav.zip

@yinhew attached please find the wave file. The transcription is "zero". I have independently run the STT service with a different endpoint (without the pronunciation assessment) and it works as expected. This file is from this opensource repo. You can also try any other files in this repo. I have only resampled the wav file to make the sample rate as 16000 (using python's soundfile package)

Can you please share any sample wave file which actually works? (including the reference text) . Also, is the code in my initial comment look OK to you?

yinhew · 2020-06-10T04:22:40Z

@sathayen I tried below (without pronunciation-assessment) and got the same error:

curl -X POST "https://centralindia.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=en-US"

-H "accept: application/json"

-H "Ocp-Apim-Subscription-Key: MY_SUBSCRIPTION_KEY"

-H "Content-Type: audio/wav; codecs=audio/pcm; samplerate=16000"

-d "{ "recordingsUrl": "https://MYBLOBSTORAGE.blob.core.windows.net/MYCONTAINER/MY_AUDIO.wav\", "locale": "en-US", "name": "Transcription using locale en-US"}"

The problem is on the "-d" parameter. It should be the audio binary instead of a json text.
It worked on my side after changing the "-d" parameter to below:
--data-binary @./0_jackson_0_16000.wav

sathayen · 2020-06-10T05:06:19Z

hi, @yinhew , thanks for a quick response. I tried replacing -d parameter with the --data-binary exactly as you suggested . I executed the curl command from the directory where I have the wav file. But now it is returning a message Client disconnected with IOException .

Am I missing anything here?

The following is my full command (only removed the token), executed from the path where I have this audio file:

curl -X POST "https://centralindia.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=en-US&format=detailed" -H "accept: application/json" -H "Ocp-Apim-Subscription-Key: XXXXX" -H "Content-Type: audio/wav; codecs=audio/pcm; samplerate=16000" -H "Pronunciation-Assessment: eyJSZWZlcmVuY2VUZXh0IjoiemVybyIsIkdyYWRpbmdTeXN0ZW0iOiJIdW5kcmVkTWFyayIsIkdyYW51bGFyaXR5IjoiRnVsbFRleHQiLCJEaW1lbnNpb24iOiJDb21wcmVoZW5zaXZlIn0=" --data-binary @./0_jackson_0_16000.wav

sathayen · 2020-06-10T06:49:28Z

@yinhew, please disregard my last comment. This appears to be a firewall/proxy issue. I was able to successfully execute this from another server.

Related questions-

How do I specify a remote blob URL as a data binary? To my understanding, cURL can not process a remote data binary. I do not want to download the file locally and then use the curl. Basically, looking for a seamless solution just like Azure transcription service which can read directly from the blob.
When will this be available in Python SDK? We are looking at rolling out this service in a production environment and might be making hundreds (if not thousands) of rest api calls a day on an average day.

Thanks!

yinhew · 2020-06-10T07:00:17Z

@sathayen we have python sample code here:
https://github.com/Azure-Samples/Cognitive-Speech-TTS/tree/master/PronunciationAssessment/Python

For remote blob, I don't think our API support that, neither STT API.

sathayen · 2020-06-10T08:07:02Z

@yinhew , thanks for the Python sample. I have actually developed something similar, but this is useful!

For remote blob, I don't think our API support that, neither STT API.

FYI The STT batch transcription API supports transcription for a remote blob. I just need to provide the followng (and of course there are some other post processing steps after you get the response to your post request). I have successfully used this to get the transcription.

`data = { "recordingsUrl": REMOTE_BLOB_URL
"locale": "en-US",
"name": "Transcription using locale en-US",
"properties": {
"AddWordLevelTimestamps" : "True",
"AddDiarization" : "True"

         }
}`

tuanle07 · 2021-01-01T15:37:37Z

@sathayen : Did you manage to get the pronunciation assessment done by passing the remote blob URL? I tried but it didn't work for me. :(.

pankopon added batch Issues related to batch transcription/synthesis Rest API and removed batch Issues related to batch transcription/synthesis labels Jun 9, 2020

sathayen closed this as completed Jun 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using pronunciation assessment - rest api problem #680

using pronunciation assessment - rest api problem #680

sathayen commented Jun 9, 2020 •

edited

Loading

pankopon commented Jun 10, 2020

yinhew commented Jun 10, 2020

sathayen commented Jun 10, 2020

yinhew commented Jun 10, 2020

sathayen commented Jun 10, 2020

sathayen commented Jun 10, 2020

yinhew commented Jun 10, 2020

sathayen commented Jun 10, 2020

tuanle07 commented Jan 1, 2021

using pronunciation assessment - rest api problem #680

using pronunciation assessment - rest api problem #680

Comments

sathayen commented Jun 9, 2020 • edited Loading

pankopon commented Jun 10, 2020

yinhew commented Jun 10, 2020

sathayen commented Jun 10, 2020

yinhew commented Jun 10, 2020

sathayen commented Jun 10, 2020

sathayen commented Jun 10, 2020

yinhew commented Jun 10, 2020

sathayen commented Jun 10, 2020

tuanle07 commented Jan 1, 2021

sathayen commented Jun 9, 2020 •

edited

Loading