Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with the model for PronunciationAssessment #1530

Closed
fabswt opened this issue Jun 13, 2022 · 15 comments
Closed

Issues with the model for PronunciationAssessment #1530

fabswt opened this issue Jun 13, 2022 · 15 comments
Assignees
Labels
accepted Issue moved to product team backlog. Will be closed when addressed. enhancement New feature or request pronunciation assessment

Comments

@fabswt
Copy link

fabswt commented Jun 13, 2022

Describe the bug
PronunciationAssessment reports as erroneous pronunciations that are perfectly correct.

To Reproduce
Steps to reproduce the behavior:

  1. Take any of the sentences provided below
  2. Generate TTS for it using en-US, ChristopherNeural
  3. Use PronunciationAssessment to rate the pronunciation
  4. Observe how PronunciationAssessment is giving low ratings to pronunciations that may be deemed perfect.

Expected behavior
I would expect PronunciationAssessment to return a perfect score for the audio produced by the TTS (or for identical pronunciations from native speakers.)

Instead, some phonemes are being rated with very low scores, because the model does not seem to understand that some words accept multiple pronunciations.

Version of the Cognitive Services Speech SDK
azure-cognitiveservices-speech Python 1.21.0

Platform, Operating System, and Programming Language

  • macOS / ARM M1 / Python 3.9.12

Sentences

  • What are you thinking about?
From NBest[0].Words[X].Phonemes: Phoneme and AccuracyScore:
w       ɑ       t            ɑ       r            j       u            θ       ɪ       ŋ       k       ɪ       ŋ            ə       b       aʊ      t           
87      4       69           1       100          100     3            100     100     100     100     100     100          100     100     100     100         

From NBest[0].Words[X].Phonemes.NBestPhonemes[0]: Phoneme and Score:
w       ʌ       t            ə       r            j       ʊ            θ       ɪ       ŋ       k       ɪ       ŋ            ə       b       aʊ      t           
100     100     100          100     100          100     100          100     100     100     100     100     100          100     100     100     100                 

In other words: PronunciationAssessment considers the pronunciation of "what are" as [wɑt ɑr] to be the only one correct, even though pronouncing it as [wʌt ər] is correct – it's correct in the sense that the TTS does it, that Wiktionary gives these transcriptions, or that I perceive it as correct among native speakers of American English.

Likewise, it's considering "you" as [jʊ] to be incorrect, even though it's fine.

  • Hello, world!
From NBest[0].Words[X].Phonemes: Phoneme and AccuracyScore:
 h       ɛ       l       oʊ           w       ɝ       r       l       d           
100     59      100     100          100     100     100     100     100    

From NBest[0].Words[X].Phonemes.NBestPhonemes[0]: Phoneme and Score:
h       ə       l       oʊ           w       ɝ       r       l       d           
100     100     100     100          100     100     100     100     100

In other words: PronunciationAssessment considers "Hello" as [hɛloʊ] to be the only one correct, even though [həloʊ] is common and correct (and, again, given by the TTS API.)

(On the bright side, it confirms the API's accuracy: it did detect the phones realized by the TTS, it just doesn't know that such pronunciations are correct, if not common.)

I'm stopping at these two-three examples, but I know I could produce dozens if not hundreds more. Seems to be common on words that accept multiple pronunciations.

@yulin-li
Copy link
Contributor

@wangkenpu could you take a look?
cc @yinhew

@wangkenpu
Copy link
Contributor

Thanks to @fabswt. We will look into your feedback.

@yulin-li yulin-li assigned wangkenpu and unassigned yulin-li Jun 22, 2022
@wangkenpu wangkenpu removed their assignment Jun 23, 2022
@pankopon
Copy link
Contributor

@wangkenpu @yulin-li Please update with status.

@wangkenpu
Copy link
Contributor

transfer to @yinhew

@yinhew
Copy link
Contributor

yinhew commented Jul 4, 2022

Hi, @fabswt

This is a known gap of our API.
We currently only apply single pronunciation as expected pronunciation.
We used to be tolerant by applying multiple allowed pronunciations.
But that introduced a problem, which was complained by customer.
e.g.: for word "read", we used to allow both / r i d / and / r ɛ d /.
But actually, only one pronunciation should be allowed at certain context.
For sentence "I read a book yesterday.", the customer expected us to give low score when the kid spoken "read" as / r i d / and give good score when the kid spoken it as / r ɛ d /. But we gave good score for both at that time.

We are currently not able to distinguish "multiple allowed pronunciations for same context" and "multiple allowed pronunciations for different contexts". We will need improvement on such capability.

BTW, are you working on the prototype or product?
How much potential usage do you estimate if this is a product?

Thanks,
Yinhe

@fabswt
Copy link
Author

fabswt commented Jul 29, 2022

Hi @yinhew,

Function words and strong/weak forms... everywhere

The problem is that function words are everywhere and many of them accept both a weak form that's more common (e.g. ‘as’ as [əz]) and a strong form that is less common, but still would not be considered a mistake (in this case, ‘as’ as [æz].)

I was about to share a demo with my list to start launching the product, until I realized that about every other sentence I tried returned a false positive (really, any sentence with a function word) because of this very issue. Just consider:

image
image
image
image
image

I just generated the sentences above with TTS (en-US-ChristopherNeural, if that matters) and then fed the audio to the PronunciationAPI. In red are sounds the PronunciationAPI considered wrong, in most cases with a score close to zero and in most (all?) cases what happened is that the TTS used a schwa (like a normal person would) i.e. the weak form, whereas the PronunciationAPI expected whatever symbol is shown, i.e. the strong form.

(Only the word ‘the’ seems to be unaffected by this issue. Imagine if every occurence of it was reported as wrong for using unstressed [ðə] in place of stressed [ði]... this is exactly what we have above.)

These are pretty basic sentences and the PronunciationAPI is off.

Other words

Other words that accept multiple pronunciations suffer from the same issue. e.g.:

  • caramel
    • the model expects EH even though a schwa is fine.
    • caramel
    • Wiktionary gives us: image
  • data
  • business
    • if using a schwa for the second syllable (which is a thing), the API will complain.
  • chocolate
    • the TTS uses AA, the API expects AO.

read, read, read

About the ‘read’ example:

  1. it's an interesting example for sure, but there are way less words of this type than there are function words. Function words, in their weak form, are simply everywhere.
  2. The PronunciationAPI still returns a false positive on “I read a lot” in the present form (it will only accept it in the past, with EH.)

Variable syllable count

  • Catholic.
    • The TTS uses a 2-syllable pronunciation
    • The PronunciationAssessment API expects three syllables, and will not tolerate a 2-syllable pronunciation.
  • broccoli
    • The TTS almost elides the schwa, but the API expects it.

I'm close to launching (or at least so I thought) an early version of the product to about 100 paying customers.

This issue is really a big deal. If the PronunciationAPI cannot rate output from the TTS (and the TTS is fine) accurately then how is it supposed to rate learners.

@fabswt
Copy link
Author

fabswt commented Aug 5, 2022

Hey,
Any news on this? Would help me see what I can plan with the API.

@yinhew
Copy link
Contributor

yinhew commented Aug 5, 2022

Hi, @fabswt

Appreciate for your deep testing on our API.
There are indeed some gaps pending improvement. However we are having limited resource and need to prioritize carefully.

Can you please answer my question above?
Are you working on the prototype or product?
How much potential usage do you estimate if this is a product?

The answer can help us prioritize the work.

Thanks
Yinhe

@fabswt
Copy link
Author

fabswt commented Aug 5, 2022

Hey @yinhew

I'm close to launching [...] an early version of the product to about 100 paying customers.

Planning to improve the product with the users themselves, in early access.

Got 1,500+ customers who bought my previous product and for whom this would be a good fit.

But I feel handicapped by the false positives.

I'd submit more constructive feedback, but I guess I'll wait after the above improvements.

Best –

@pankopon pankopon assigned yinhew and unassigned yulin-li Aug 31, 2022
@pankopon
Copy link
Contributor

@yinhew Can you please comment on the plans for support, based on customer info from fabswt?

@yinhew
Copy link
Contributor

yinhew commented Sep 2, 2022

We are still doing some internal researching work to determine which behavior is the best one to apply.

@pankopon
Copy link
Contributor

@yinhew Is there any feature work expected due to this? If yes please provide a work item id and we can mark this as an accepted enhancement request.

@pankopon pankopon added enhancement New feature or request accepted Issue moved to product team backlog. Will be closed when addressed. pronunciation assessment and removed pronunciation assessment in-review In review labels Feb 28, 2023
@pankopon
Copy link
Contributor

Internal work item ref. 4930020.

@pankopon
Copy link
Contributor

Closing the issue as the enhancement request is now being tracked with a task on the team backlog, no ETA. This item will be updated with information on availability after changes have been implemented and deployed.

@wangkenpu
Copy link
Contributor

Has been fixed. @fabswt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Issue moved to product team backlog. Will be closed when addressed. enhancement New feature or request pronunciation assessment
Projects
None yet
Development

No branches or pull requests

5 participants