Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Callback commands are not handled correctly. #22

Open
mltony opened this issue Oct 6, 2019 · 9 comments · May be fixed by #96
Open

Callback commands are not handled correctly. #22

mltony opened this issue Oct 6, 2019 · 9 comments · May be fixed by #96

Comments

@mltony
Copy link

mltony commented Oct 6, 2019

Hello, This is mltony, I am working on NVDA add-on called phonetic punctuation:
https://github.com/mltony/nvda-phonetic-punctuation
IBM TTS driver doesn't seem to handle callback commands in python 3 version correctly in some cases. Here are steps to reproduce:

  1. Install phonetic punctuation add-on:
    https://github.com/mltony/nvda-phonetic-punctuation/releases/download/v0.2dev/phoneticPunctuation-0.2dev.nvda-addon
    Note that it requires NVDA 2019.3 alpha.
  2. Speak the following phrase:
    Test test!!!

Expected behavior: Ding-ding-ding sounds should start playing after "test test"
Actual behavior: ding-ding sounds play at the same time "test test" utterance is spoken.

Phonetic punctuation converts this utterance into:

["Test test", <CallBackCommand that actually plays those ding sounds>, <BreakCommand for the duration of the sound>]

SO I suspect that your driver triggers callback sooner then the utterance has been fully spoken.

With other synthesizers phonetic punctuation works correctly; I tested with espeak, SAPI, OneCore and the other version of eloquence.

There is also another small problem, it seems like the duration of the pause in break command must be multiplied by some coefficient, that seem to be equal to 3. If you try to speak this phrase with phonetic punctuation:
!!!!!!!!Test
You will hear the word "test" much sooner than the dings end, because of that problem.

@Neurrone
Copy link

@mltony which other version of Eloquence were you testing with? CodeFactory's?

@davidacm
Copy link
Owner

Thanks. Its urgent to solve but I can't at this time because I'm outside of my country. I'll fix it when get back to my country. If someone can fix it, I can review the pull request and accept it.

@mltony
Copy link
Author

mltony commented Oct 12, 2019

I was testing with this one:
https://github.com/pumper42nickel/eloquence_threshold/
With this one phonetic punctuation works fine.

@davidacm
Copy link
Owner

Hi, I need your collaboration. Let me know if you can find a solution for this issue, please read my entire long comment.

The situation:

  1. The other driver that you mentioned has the same cracking issue, I don't want to introduce another issue to solve this.
  2. The issue: IBMTTS driver use a stream to buffer certain quantity of audio. When that buffer is full, the audio is send to the NVDA's player. All indexes received are sent also. By this way, we avoid voice breakage but the index accuracy is lost. On the IBMTTS driver the indexes are sent early, I could change this behavior but then the indexes will be sent delayed due to the audio bufferr.
  3. The solution I tried: send the audio stream when the buffer is full or when an index is received.

results:

The issue of point 2 appeared for many sentences in spanish language, the cases are different for each language. I tried with some english cases for you.

steps to reproduce:

I don't know if this issue depends on hardware specs, maybe on your computer you need to adjust it to distinct parameters. but here are my main computer specs:

environment:
  • Application: Notepad.
  • Operating system: windows 10 pro (1903) 64 bits.
  • Computer Brand: MSI.
  • Model: gs65.
  • CPU: intel i7 8750h.
  • ram: 32gb kingston HiperX.
  • Drive: ssd 512gb samsung 970 pro.
  • GPU: NVIDIA 1070.
Steps:

The breaks happen at the end of a string with specific speeds and sentences. I can mention many cases in spanish (my language) but in english you need to find them. Although here are some that I found in 5 mins using american english language.

  1. Set Eloquence driver to american english language.
  2. Adjust the eloquence driver at the specified speed. You can test it with IBMTTS also to test that the issue is not present in the second driver.
  3. Read the following sentences.
rate at 0%:

rate 0

at 10%:

rate 10

at 15%:

this is the number 20

at 20%:

eco
comma
number
papa
alpha

at 30%:

colon
50

at 50%:

rate 50
rate 0

@davidacm davidacm reopened this Oct 27, 2019
@Mohamed00
Copy link
Collaborator

I believe I may have found a solution to this on the eloquence_threshold side of things. Adding buffered=True to the nvwave.WavePlayer constructor, and setting nvwave.WavePlayer.MIN_BUFFER_MS to at least 900 seems to fix the issue.

davidacm pushed a commit that referenced this issue Mar 18, 2021
@davidacm
Copy link
Owner

davidacm commented May 9, 2021

Hi, has this issue been fixed?
I can't find the code fix, but I tried the proposed solution and it introduced another issue for me. Sometimes the synth has a lag of some MS if I use buffered=True.

@Mohamed00
Copy link
Collaborator

Mohamed00 commented May 9, 2021 via email

@ultrasound1372
Copy link
Contributor

I saw a commit up a ways that apparently did something related to this, do we still have this issue? Have we examined how other synthesizers handle accurate indexing without crackling like that? Could it have to do with the fact that NVWave also has to do some internal resampling as the audio is sent to the output? Eloquence runs at 11025Hz, while most contemporary synthesizers run at 22050Hz. Some 16000. eSpeak might actually run higher. If your system samplerate is set to 44100 upsampling from 11025 is easy, as integer ratios always are. Just some brief interpolation. But if your system is set to 48000 perhaps it has to do more work? Or does it pass that off to Windows? Have we looked at the DECtalk access32 drivers to see if they have accurate indexing, and if they do, what settings do they use? They are another known set of synths that run at 11025.

@ultrasound1372
Copy link
Contributor

May be worth revisiting this discussion in relation to the NVDA alphas that add WASAPI support and the accompanying refactor of NVWave, as this might mitigate the crackling altogether. We can then either choose to install support for both the current method and a new, more accurate method depending on NVDA's version, or make a release of that add-on after that version is put out as an RC that has it as a minimum. @davidacm What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants