Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using pronunciation assessment - rest api problem #680

Closed
sathayen opened this issue Jun 9, 2020 · 9 comments
Closed

using pronunciation assessment - rest api problem #680

sathayen opened this issue Jun 9, 2020 · 9 comments
Labels

Comments

@sathayen
Copy link

sathayen commented Jun 9, 2020

Hello team -- this related to issue 618

I am trying to use the pronunciation assessment rest api and getting an unsupported audio format error, despite the audio being a wav file with a sample rate of 16000. Here is my curl command ( added new lines for easier reading)

curl -X POST "https://centralindia.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=en-US"

-H "accept: application/json"

-H "Ocp-Apim-Subscription-Key: MY_SUBSCRIPTION_KEY"

-H "Content-Type: audio/wav; codecs=audio/pcm; samplerate=16000"

-H "Pronunciation-Assessment: eyJSZWZlcmVuY2VUZXh0IjoiemVybyIsIkdyYWRpbmdTeXN0ZW0iOiJIdW5kcmVkTWFyayIsIkdyYW51bGFyaXR5IjoiRnVsbFRleHQiLCJEaW1lbnNpb24iOiJDb21wcmVoZW5zaXZlIn0="

-d "{ \"recordingsUrl\": \"https://MYBLOBSTORAGE.blob.core.windows.net/MYCONTAINER/MY_AUDIO.wav\", \"locale\": \"en-US\", \"name\": \"Transcription using locale en-US\"}"

  1. I have uploaded the audio file uploaded as a blob in my blobstorage account on azure. I have also verified separately that I can create a transcription for this file.
  2. The pronunciation assessement is a base64 format conversion for this json -- {"ReferenceText": "zero","GradingSystem": "HundredMark","Granularity":"FullText","Dimension": "Comprehensive"} . I converted this json directly to base64 using this tool
  3. Am I doing anything wrong in the code above? Could you please post a sample code that shows how to feed in the audio file in the curl command (if different than how I am sending this data) ?
  4. If there is nothing wrong in this code, can you please share any audio example that works (so that I can create similar audio files)

#618
@ram-msft -- see also this issue

@pankopon pankopon added batch Issues related to batch transcription/synthesis Rest API and removed batch Issues related to batch transcription/synthesis labels Jun 9, 2020
@pankopon
Copy link
Contributor

@ram-msft or @yinhew Please comment.

@yinhew
Copy link
Contributor

yinhew commented Jun 10, 2020

@sathayen does your wave file have a riff header? The riff header is required by REST API.
If it does have a riff header, do you mind sharing the file so that we can look into the detail of format?

@sathayen
Copy link
Author

0_jackson_0_16000.wav.zip

@yinhew attached please find the wave file. The transcription is "zero". I have independently run the STT service with a different endpoint (without the pronunciation assessment) and it works as expected. This file is from this opensource repo. You can also try any other files in this repo. I have only resampled the wav file to make the sample rate as 16000 (using python's soundfile package)

Can you please share any sample wave file which actually works? (including the reference text) . Also, is the code in my initial comment look OK to you?

@yinhew
Copy link
Contributor

yinhew commented Jun 10, 2020

@sathayen I tried below (without pronunciation-assessment) and got the same error:

curl -X POST "https://centralindia.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=en-US"

-H "accept: application/json"

-H "Ocp-Apim-Subscription-Key: MY_SUBSCRIPTION_KEY"

-H "Content-Type: audio/wav; codecs=audio/pcm; samplerate=16000"

-d "{ "recordingsUrl": "https://MYBLOBSTORAGE.blob.core.windows.net/MYCONTAINER/MY_AUDIO.wav\", "locale": "en-US", "name": "Transcription using locale en-US"}"

The problem is on the "-d" parameter. It should be the audio binary instead of a json text.
It worked on my side after changing the "-d" parameter to below:
--data-binary @./0_jackson_0_16000.wav

@sathayen
Copy link
Author

hi, @yinhew , thanks for a quick response. I tried replacing -d parameter with the --data-binary exactly as you suggested . I executed the curl command from the directory where I have the wav file. But now it is returning a message Client disconnected with IOException .

Am I missing anything here?

The following is my full command (only removed the token), executed from the path where I have this audio file:

curl -X POST "https://centralindia.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=en-US&format=detailed" -H "accept: application/json" -H "Ocp-Apim-Subscription-Key: XXXXX" -H "Content-Type: audio/wav; codecs=audio/pcm; samplerate=16000" -H "Pronunciation-Assessment: eyJSZWZlcmVuY2VUZXh0IjoiemVybyIsIkdyYWRpbmdTeXN0ZW0iOiJIdW5kcmVkTWFyayIsIkdyYW51bGFyaXR5IjoiRnVsbFRleHQiLCJEaW1lbnNpb24iOiJDb21wcmVoZW5zaXZlIn0=" --data-binary @./0_jackson_0_16000.wav

@sathayen
Copy link
Author

@yinhew, please disregard my last comment. This appears to be a firewall/proxy issue. I was able to successfully execute this from another server.

Related questions-

  1. How do I specify a remote blob URL as a data binary? To my understanding, cURL can not process a remote data binary. I do not want to download the file locally and then use the curl. Basically, looking for a seamless solution just like Azure transcription service which can read directly from the blob.
  2. When will this be available in Python SDK? We are looking at rolling out this service in a production environment and might be making hundreds (if not thousands) of rest api calls a day on an average day.

Thanks!

@yinhew
Copy link
Contributor

yinhew commented Jun 10, 2020

@sathayen we have python sample code here:
https://github.com/Azure-Samples/Cognitive-Speech-TTS/tree/master/PronunciationAssessment/Python

For remote blob, I don't think our API support that, neither STT API.

@sathayen
Copy link
Author

@yinhew , thanks for the Python sample. I have actually developed something similar, but this is useful!

For remote blob, I don't think our API support that, neither STT API.

FYI The STT batch transcription API supports transcription for a remote blob. I just need to provide the followng (and of course there are some other post processing steps after you get the response to your post request). I have successfully used this to get the transcription.

`data = { "recordingsUrl": REMOTE_BLOB_URL
"locale": "en-US",
"name": "Transcription using locale en-US",
"properties": {
"AddWordLevelTimestamps" : "True",
"AddDiarization" : "True"

         }
}`

@tuanle07
Copy link

tuanle07 commented Jan 1, 2021

@sathayen : Did you manage to get the pronunciation assessment done by passing the remote blob URL? I tried but it didn't work for me. :(.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants