New speech framework including callbacks, beeps, sounds, profile switches and prioritized queuing #7599

jcsteh · 2017-09-13T05:53:35Z

Link to issue number:

Fixes #4877. Fixes #1229.

Summary of the issue:

We want to be able to easily and accurately perform various actions (beep, play sounds, switch profiles, etc.) during speech. We also want to be able to have prioritized speech which interrupts lower priority speech and then have the lower priority speech resume. This is required for a myriad of use cases, including switching to specific synths for specific languages (#279), changing speeds for different languages (#4738), audio indication of spelling errors when reading text (#4233), indication of links using beeps (#905), reading of alerts without losing other speech forever (#3807, #6688) and changing speech rate for math (#7274). Our old speech code simply sends utterances to the synthesizer; there is no ability to do these things. Say all and speak spelling continually poll the last index, but this is ugly and not feasible for other features.

Description of how this pull request fixes the issue:

Enhance nvwave to simplify accurate indexing for speech synthesizers.

Add an onDone argument to WavePlayer.feed which accepts a function to be called when the provided chunk of audio has finished playing. Speech synths can simply feed audio up to an index and use the onDone callback to be accurately notified when the index is reached.
Add a buffered argument to the WavePlayer constructor. If True, small chunks of audio will be buffered to prevent audio glitches. This avoids the need for tricky buffering across calls in the synth driver if the synth provides fixed size chunks and an index lands near the end of a previous chunk. It is also useful for synths which always provide very small chunks.

Enhancements to config profile triggers needed for profile switching within speech sequences.

Allow triggers to specify that handlers watching for config profile switches should not be notified. In the case of profile switches during speech sequences, we only want to apply speech settings, not switch braille displays.
Add some debug logging for when profiles are activated and deactivated.

Add support for callbacks, beeps, sounds, profile switches and utterance splits during speech sequences, as well as prioritized queuing.

Changes for synth drivers:

SynthDrivers must now accurately notify when the synth reaches an index or finishes speaking using the new synthIndexReached and synthDoneSpeaking extension points in the synthDriverHandler module. The lastIndex property is deprecated. See below regarding backwards compatibility for SynthDrivers which do not support these notifications.
SynthDrivers must now support PitchCommand if they which to support capital pitch change.
SynthDrivers now have supportedCommands and supportedNotifications attributes which specify what they support.
Because there are some speech commands which trigger behaviour unrelated to synthesizers (e.g. beeps, callbacks and profile switches), commands which are passed to synthesizers are now subclasses of speech.SynthCommand.

Central speech manager:

The core of this new functionality is the speech._SpeechManager class. It is intended for internal use only. It is called by higher level functions such as speech.speak.
It manages queuing of speech utterances, calling callbacks at desired points in the speech, profile switching, prioritization, etc. It relies heavily on index reached and done speaking notifications from synths. These notifications alone trigger the next task in the flow.
It maintains separate queues (speech._ManagerPriorityQueue) for each priority. As well as holding the pending speech sequences for that priority, each queue holds other information necessary to restore state (profiles, etc.) when that queue is preempted by a higher priority queue.
See the docstring for the speech._SpeechManager class for a high level summary of the flow of control.

New/enhanced speech commands:

EndUtteranceCommand ends the current utterance at this point in the speech. This allows you to have two utterances in a single speech sequence.
CallbackCommand calls a function when speech reaches the command.
BeepCommand produces a beep when speech reaches the command. This is the basis for features such as When in a sayall on webpages or in other html documents, nvda should be able to produce beeps when indicating links #905.
WaveFileCommand plays a wave file when speech reaches the command. This is the basis for features such as Provision of indication options for reporting spelling errors. #4233 and Trigger a Sound for Control Types and States #4089.
The above three commands are all subclasses of BaseCallbackCommand. You can subclass this to implement other commands which run a pre-defined function.
ConfigProfileTriggerCommand applies (or stops applying) a configuration profile trigger to subsequent speech. This is the basis for switching profiles (and thus synthesizers, speech rates, etc.) for specific languages, math, etc.; Virtual synth driver which can automatically recognise and switch between certain languages/synths #279, Changing speeds for different languages #4738, Voice aliases #4433, Introduce a special Math speech rate #7274, etc.
PitchCommand, RateCommand and VolumeCommand can now take either a multiplier or an offset. In addition, they can convert between the two on demand, which makes it easier to handle these commands in synth drivers based on the synth's requirements. They also have an isDefault attribute which specifies whether this is returning to the default value (as configured by the user).

Speech priorities:

speech.speak now accepts a priority argument specifying one of three priorities: SPRI_NORMAL (normal priority), SPRI_NEXT (speak after next utterance of lower priority)or SPRI_NOW (speech is very important and should be spoken right now, interrupting lower priority speech).
Interrupted lower priority speech resumes after any higher priority speech is complete.
This is the basis for fixing issues such as On a Form that is enhanced with Aria invalid, NVDA does not read the next Formfield after announcing the error alert. #3807 and Firefox 'Remember Password' popup causes NVDA to drop aria-live messages #6688.

Refactored functionality to use the new framework:

Rather than using a polling generator, spelling is now sent as a single speech sequence, including EndUtteranceCommands, BeepCommands and PitchCommands as appropriate. This can be created and incorporated elsewhere using the speech.getSpeechForSpelling function. This fixes Problem spelling things when speaking of tooltips is turn on. #1229 (since it's a single sequence) and is also the basis for fixing issues such as Optional reporting of capitals while reading full text (not just characters) #3286, When selecting text, the various options for distinguishing capital letters don't apply #4874 and No language switching when selecting text #4661.
Say all has been completely rewritten to use CallbackCommands instead of a polling generator. The code should also be a lot more readable now, as it is now classes with methods for the various stages in the process.

Backwards compatibility for old synths:

For synths that don't support index and done speaking notifications, we don't use the speech manager at all. This means none of the new functionality (callbacks, profile switching, etc.) will work.
This means we must fall back to the old code for speak spelling, say all, etc. This code is in the speechCompat module.
This compatibility fallback is considered deprecated and will be removed eventually. Synth drivers should be updated ASAP.

Deprecated/removed:

speech.getLastIndex is deprecated and will simply return None.
IndexCommand should no longer be used in speech sequences passed to speech.speak. Use a subclass of speech.BaseCallbackCommand instead.
In the speech module, speakMessage, speakText, speakTextInfo, speakObjectProperties and speakObject no longer take an index argument. No add-ons in the official repository use this, so I figured it was safe to just remove it rather than having it do nothing.
speech.SpeakWithoutPausesBreakCommand has been removed. Use speech.EndUtteranceCommand instead. No add-ons in the official repository use this.
speech.speakWithoutPauses.lastSentIndex has been removed. Instead, speakWithoutPauses returns True if something was actually spoken, False if only buffering occurred.

Update comtypes to version 1.1.3.

This is necessary to handle events from SAPI 5, as one of the parameters is a decimal which is not supported by our existing (very outdated) version of comtypes .
comtypes has now been added as a separate git submodule.

Updated synth drivers

The espeak, oneCore and sapi5 synth drivers have all been updated to support the new speech framework.

Testing performed:

Unfortunately, I'm out of time to write unit tests for this, though much of this should be suitable for unit testing. I've been testing with the Python console test cases below. Note that the wx.CallLater is necessary so that speech doesn't get silenced straight away; that's just an artefact of testing with the console.

For the profile tests, you'll need to set up two profiles, one triggered for say all and the other triggered for the notepad app.

Python Console test cases:

# Text, beep, beep, sound, text.
wx.CallLater(500, speech.speak, [u"This is some speech and then comes a", speech.BeepCommand(440, 10), u"beep. If you liked that, let's ", speech.BeepCommand(880, 10), u"beep again. I'll speak the rest of this in a ", speech.PitchCommand(offset=50), u"higher pitch. And for the finale, let's ", speech.WaveFileCommand(r"waves\browseMode.wav"), u"play a sound."])
# Text, end utterance, text.
wx.CallLater(500, speech.speak, [u"This is the first utterance", speech.EndUtteranceCommand(), u"And this is the second"])
# Change pitch, text, end utterance, text. Expected: All should be higher pitch.
wx.CallLater(500, speech.speak, [speech.PitchCommand(offset=50), u"This is the first utterance in a higher pitch", speech.EndUtteranceCommand(), u"And this is the second"])
# Text, pitch, text, enter profile1, enter profile2, text, exit profile1, text. Expected: All text after 1 2 3 4 should be higher pitch. 5 6 7 8 should have profile 1 and 2. 9 10 11 12 should be just profile 2.
import sayAllHandler, appModuleHandler; t1 = sayAllHandler.SayAllProfileTrigger(); t2 = appModuleHandler.AppProfileTrigger("notepad"); wx.CallLater(500, speech.speak, [u"Testing testing ", speech.PitchCommand(offset=100), "1 2 3 4", speech.ConfigProfileTriggerCommand(t1, True), speech.ConfigProfileTriggerCommand(t2, True), u"5 6 7 8", speech.ConfigProfileTriggerCommand(t1, False), u"9 10 11 12"])
# Enter profile, text, exit profile. Expected: 5 6 7 8 in different profile, 9 10 11 12 with base config.
import sayAllHandler; trigger = sayAllHandler.SayAllProfileTrigger(); wx.CallLater(500, speech.speak, [speech.ConfigProfileTriggerCommand(trigger, True), u"5 6 7 8", speech.ConfigProfileTriggerCommand(trigger, False), u"9 10 11 12"])
# Two utterances at SPRI_NORMAL in same sequence. Two separate sequences at SPRI_NEXT. Expected result: numbers in order from 1 to 20.
wx.CallLater(500, speech.speak, [u"1 2 3 ", u"4 5", speech.EndUtteranceCommand(), u"16 17 18 19 20"]); wx.CallLater(510, speech.speak, [u"6 7 8 9 10"], priority=speech.SPRI_NEXT); wx.CallLater(520, speech.speak, [u"11 12 13 14 15"], priority=speech.SPRI_NEXT)
# Utterance at SPRI_NORMAL including a beep. Utterance at SPRI_NOW. Expected: Text before the beep, beep, Text after..., This is an interruption., Text after the beep, text...
wx.CallLater(500, speech.speak, [u"Text before the beep ", speech.BeepCommand(440, 10), u"text after the beep, text, text, text, text"]); wx.CallLater(1500, speech.speak, [u"This is an interruption"], priority=speech.SPRI_NOW)
# Utterance with two sequences at SPRI_NOW. Utterance at SPRI_NOW. Expected result: First utterance, second utterance
wx.CallLater(500, speech.speak, [u"First ", u"utterance"], priority=speech.SPRI_NOW); wx.CallLater(510, speech.speak, [u"Second ", u"utterance"], priority=speech.SPRI_NOW)
# Utterance with two sequences at SPRI_NOW. Utterance at SPRI_NEXT. Expected result: First utterance, second utterance
wx.CallLater(500, speech.speak, [u"First ", u"utterance"], priority=speech.SPRI_NOW); wx.CallLater(501, speech.speak, [u"Second ", u"utterance"], priority=speech.SPRI_NEXT)
# Utterance at SPRI_NORMAL. Utterance at SPRI_NOW with profile switch. Expected: Normal speaks but gets interrupted, interruption with different profile, normal speaks again
import sayAllHandler; trigger = sayAllHandler.SayAllProfileTrigger(); wx.CallLater(500, speech.speak, [u"This is a normal utterance, text, text"]); wx.CallLater(1000, speech.speak, [speech.ConfigProfileTriggerCommand(trigger, True), u"This is an interruption with a different profile"], priority=speech.SPRI_NOW)
# Utterance at SPRI_NORMAL with profile switch. Utterance at SPRI_NOW. Expected: Normal speaks with different profile but gets interrupted, interruption speaks with base config, normal speaks again with different profile
import sayAllHandler; trigger = sayAllHandler.SayAllProfileTrigger(); wx.CallLater(500, speech.speak, [speech.ConfigProfileTriggerCommand(trigger, True), u"This is a normal utterance with a different profile"]); wx.CallLater(1000, speech.speak, [u"This is an interruption"], priority=speech.SPRI_NOW)
# Utterance at SPRI_NORMAL with profile 1. Utterance at SPRI_NOW with profile 2. Expected: Normal speaks with profile 1 but gets interrupted, interruption speaks with profile 2, normal speaks again with profile 1
import sayAllHandler, appModuleHandler; t1 = sayAllHandler.SayAllProfileTrigger(); t2 = appModuleHandler.AppProfileTrigger("notepad"); wx.CallLater(500, speech.speak, [speech.ConfigProfileTriggerCommand(t1, True), u"This is a normal utterance with profile 1"]); wx.CallLater(1000, speech.speak, [speech.ConfigProfileTriggerCommand(t2, True), u"This is an interruption with profile 2"], priority=speech.SPRI_NOW)
# Utterance at SPRI_NORMAL including a pitch change and beep. Utterance at SPRI_NOW. Expected: Text speaks with higher pitch, beep, text gets interrupted, interruption speaks with normal pitch, text after the beep speaks again with higher pitch
wx.CallLater(500, speech.speak, [speech.PitchCommand(offset=100), u"Text before the beep ", speech.BeepCommand(440, 10), u"text after the beep, text, text, text, text"]); wx.CallLater(1500, speech.speak, [u"This is an interruption"], priority=speech.SPRI_NOW)

Known issues with pull request:

No issues with the code that I know of. There are two issues for the project, though:

All third party synth drivers need to be updated in order to support the new functionality. Old drivers will still work for now thanks to the compat code, but they get none of the new functionality. Getting third parties to do this will take some time.
While this PR forms the basis for a lot of functionality, it doesn't provide many user visible changes. That means merging it is risky without immediate benefit. That said, putting anything more in this PR would make it even more insane than it already is.

Change log entry:

Bug Fixes:

- When spelling text, reported tool tips are no longer interjected in the middle of the spelling. Instead, they are reported after spelling finishes. (#1229)

Changes for Developers:

- nvwave has been enhanced to simplify accurate indexing for speech synthesizers: (#4877)
 - `WavePlayer.feed` now takes an `onDone` argument specifying a function to be called when the provided chunk of audio has finished playing. Speech synths can simply feed audio up to an index and use the onDone callback to be accurately notified when the index is reached.
 - The `WavePlayer` constructor now takes a `buffered` argument. If True, small chunks of audio will be buffered to prevent audio glitches. This avoids the need for tricky buffering across calls in the synth driver if the synth provides fixed size chunks and an index lands near the end of a previous chunk. It is also useful for synths which always provide very small chunks.
- Several major changes related to synth drivers: (#4877)
 - SynthDrivers must now accurately notify when the synth reaches an index or finishes speaking using the new `synthIndexReached` and `synthDoneSpeaking` extension points in the `synthDriverHandler` module.
  - The `lastIndex` property is deprecated.
  - For drivers that don't support these, old speech code will be used. However, this means new functionality will be unavailable, including callbacks, beeps, playing audio, profile switching and prioritized speech. This old code will eventually be removed.
 - SynthDrivers must now support `PitchCommand` if they which to support capital pitch change.
 - SynthDrivers now have `supportedCommands` and `supportedNotifications` attributes which specify what they support.
 - Because there are some speech commands which trigger behaviour unrelated to synthesizers (e.g. beeps, callbacks and profile switches), commands which are passed to synthesizers are now subclasses of `speech.SynthCommand`.
- New/enhanced speech commands: (#4877)
 - `EndUtteranceCommand` ends the current utterance at this point in the speech. This allows you to have two utterances in a single speech sequence.
 - `CallbackCommand` calls a function when speech reaches the command.
 - `BeepCommand` produces a beep when speech reaches the command.
 - `WaveFileCommand` plays a wave file when speech reaches the command.
 - The above three commands are all subclasses of `BaseCallbackCommand`. You can subclass this to implement other commands which run a pre-defined function.
 - `ConfigProfileTriggerCommand` applies (or stops applying) a configuration profile trigger to subsequent speech. This is the basis for switching profiles (and thus synthesizers, speech rates, etc.) for specific languages, math, etc.
 - `PitchCommand`, `RateCommand` and `VolumeCommand` can now take either a multiplier or an offset. In addition, they can convert between the two on demand, which makes it easier to handle these commands in synth drivers based on the synth's requirements. They also have an `isDefault` attribute which specifies whether this is returning to the default value (as configured by the user).
- `speech.speak` now accepts a `priority` argument specifying one of three priorities: `SPRI_NORMAL` (normal priority), `SPRI_NEXT` (speak after next utterance of lower priority)or `SPRI_NOW` (speech is very important and should be spoken right now, interrupting lower priority speech). Interrupted lower priority speech resumes after any higher priority speech is complete. (#4877)
- Deprecated/removed speech functionality: (#4877)
 - `speech.getLastIndex` is deprecated and will simply return None.
 - `IndexCommand` should no longer be used in speech sequences passed to `speech.speak`. Use a subclass of `speech.BaseCallbackCommand` instead.
 - In the `speech` module, `speakMessage`, `speakText`, `speakTextInfo`, `speakObjectProperties` and `speakObject` no longer take an `index` argument.
 - `speech.SpeakWithoutPausesBreakCommand` has been removed. Use `speech.EndUtteranceCommand` instead.
 - `speech.speakWithoutPauses.lastSentIndex` has been removed. Instead, `speakWithoutPauses` returns True if something was actually spoken, False if only buffering occurred.
- Updated comtypes to version 1.1.3. (#4877)

1. Add an onDone argument to WavePlayer.feed which accepts a function to be called when the provided chunk of audio has finished playing. Speech synths can simply feed audio up to an index and use the onDone callback to be accurately notified when the index is reached. 2. Add a buffered argument to the WavePlayer constructor. If True, small chunks of audio will be buffered to prevent audio glitches. This avoids the need for tricky buffering across calls in the synth driver if the synth provides fixed size chunks and an index lands near the end of a previous chunk. It is also useful for synths which always provide very small chunks.

…within speech sequences. 1. Allow triggers to specify that handlers watching for config profile switches should not be notified. In the case of profile switches during speech sequences, we only want to apply speech settings, not switch braille displays. 2. Add some debug logging for when profiles are activated and deactivated.

…nce splits during speech sequences, as well as prioritized queuing. Changes for synth drivers: - SynthDrivers must now accurately notify when the synth reaches an index or finishes speaking using the new `synthIndexReached` and `synthDoneSpeaking` extension points in the `synthDriverHandler` module. The `lastIndex` property is deprecated. See below regarding backwards compatibility for SynthDrivers which do not support these notifications. - SynthDrivers must now support `PitchCommand` if they which to support capital pitch change. - SynthDrivers now have `supportedCommands` and `supportedNotifications` attributes which specify what they support. - Because there are some speech commands which trigger behaviour unrelated to synthesizers (e.g. beeps, callbacks and profile switches), commands which are passed to synthesizers are now subclasses of `speech.SynthCommand`. Central speech manager: - The core of this new functionality is the `speech._SpeechManager` class. It is intended for internal use only. It is called by higher level functions such as `speech.speak`. - It manages queuing of speech utterances, calling callbacks at desired points in the speech, profile switching, prioritization, etc. It relies heavily on index reached and done speaking notifications from synths. These notifications alone trigger the next task in the flow. - It maintains separate queues (`speech._ManagerPriorityQueue`) for each priority. As well as holding the pending speech sequences for that priority, each queue holds other information necessary to restore state (profiles, etc.) when that queue is preempted by a higher priority queue. - See the docstring for the `speech._SpeechManager` class for a high level summary of the flow of control. New/enhanced speech commands: - `EndUtteranceCommand` ends the current utterance at this point in the speech. This allows you to have two utterances in a single speech sequence. - `CallbackCommand` calls a function when speech reaches the command. - `BeepCommand` produces a beep when speech reaches the command. - `WaveFileCommand` plays a wave file when speech reaches the command. - The above three commands are all subclasses of `BaseCallbackCommand`. You can subclass this to implement other commands which run a pre-defined function. - `ConfigProfileTriggerCommand` applies (or stops applying) a configuration profile trigger to subsequent speech. This is the basis for switching profiles (and thus synthesizers, speech rates, etc.) for specific languages, math, etc. - `PitchCommand`, `RateCommand` and `VolumeCommand` can now take either a multiplier or an offset. In addition, they can convert between the two on demand, which makes it easier to handle these commands in synth drivers based on the synth's requirements. They also have an `isDefault` attribute which specifies whether this is returning to the default value (as configured by the user). Speech priorities: `speech.speak` now accepts a `priority` argument specifying one of three priorities: `SPRI_NORMAL` (normal priority), `SPRI_NEXT` (speak after next utterance of lower priority)or `SPRI_NOW` (speech is very important and should be spoken right now, interrupting lower priority speech). Interrupted lower priority speech resumes after any higher priority speech is complete. Refactored functionality to use the new framework: - Rather than using a polling generator, spelling is now sent as a single speech sequence, including `EndUtteranceCommand`s, `BeepCommand`s and `PitchCommand`s as appropriate. This can be created and incorporated elsewhere using the `speech.getSpeechForSpelling` function. - Say all has been completely rewritten to use `CallbackCommand`s instead of a polling generator. The code should also be a lot more readable now, as it is now classes with methods for the various stages in the process. Backwards compatibility for old synths: - For synths that don't support index and done speaking notifications, we don't use the speech manager at all. This means none of the new functionality (callbacks, profile switching, etc.) will work. - This means we must fall back to the old code for speak spelling, say all, etc. This code is in the `speechCompat` module. - This compatibility fallback is considered deprecated and will be removed eventually. Synth drivers should be updated ASAP. Deprecated/removed: - `speech.getLastIndex` is deprecated and will simply return None. - `IndexCommand` should no longer be used in speech sequences passed to `speech.speak`. Use a subclass of `speech.BaseCallbackCommand` instead. - In the `speech` module, `speakMessage`, `speakText`, `speakTextInfo`, `speakObjectProperties` and `speakObject` no longer take an `index` argument. No add-ons in the official repository use this, so I figured it was safe to just remove it rather than having it do nothing. - `speech.SpeakWithoutPausesBreakCommand` has been removed. Use `speech.EndUtteranceCommand` instead. No add-ons in the official repository use this. - `speech.speakWithoutPauses.lastSentIndex` has been removed. Use a subclass of `speech.BaseCallbackCommand` instead. No add-ons in the official repository use this.

This is necessary to handle events from SAPI 5, as one of the parameters is a decimal which is not supported by our existing (very outdated) version of comtypes . comtypes has now been added as a separate git submodule.

jcsteh · 2017-09-13T06:22:50Z

Urgh. Accidentally hit submit before I was ready. I've updated the PR description with (very lengthy) details. :)

jcsteh · 2017-09-13T06:42:17Z

Here's a test I was using with say all to make sure it was moving the cursor correctly and breaking utterances where I expected. The text "New utterance" should literally be at the start of a new utterance when you hear it.

Line 1
Line 2
Line 3
Line 4
Line 5
Line 6
Line 7
Line 8
Line 9
Line 10
New utterance  line 11
Line 12
Line 13
Line 14
Line 15
Line 16
Line 17
Line 18
Line 19
Line 20
New utterance line 21
Line 22.
New utterance line 23
Line 24
Line 25
Line 26
Line 27
Line 28
Line 29
Line 30
Line 31
Line 32
New utterance line 33
Last line

jcsteh · 2017-09-13T06:49:07Z

Some notes re unit testing:

I think the Python console test cases above can be used as the basis for unit tests.
A lot of this can be unit tested by running various methods and making assertions on the output or based on the state of the queue.
However, it's also necessary to check whether being notified about a particular index or done speaking will cause certain text to be sent to the synth, certain profiles to be switched, etc. This is a bit trickier.
Some of these (e.g. _switchProfile) can just be replaced with mocks when setting up the speech manager. We could probably replace the calls to send text to the synth and to exit all/restore all profile triggers with tiny functions which we can similarly mock. This is pretty trivial to do, but I didn't want to do this without being certain it was actually helpful.

LeonarddeR · 2017-09-13T06:55:36Z

Somehow, /source/comInterfaces/_944DE083_8FB8_45CF_BCB7_C477ACB2F897_0_1_0.py ended up in this. I thought those were created upon building and that they were part of gitignore, but it seems it already existed in the tracked source tree.

jcsteh · 2017-09-13T06:57:31Z

No, that comInterface is intentionally part of the repo now, since we can't rely on everyone running the latest Windows 10 build and thus having all of the interfaces in their typelib. It had to be re-built for updated comtypes.

jcsteh · 2017-09-13T11:05:24Z

I posted a brain dump on the wiki with implementation ideas for some of the more tricky use cases. I don't think these should be considered for this PR, but I'm linking it here so we have a reference.

derekriemer · 2017-09-13T20:01:54Z

Sayall isn't working for me in firefox with this.
Also, It is possible to make something preempt with priority now, and make NVDA read the same thing forever. Not sure it needs fixed as it's more of a DDos of the user, but do this in the python console, then go read some text in notepad++
expected: speech starts where it ended.
actual: speech starts at the beginning of the line, preempted at some point, then starts at the beginning of the line, preempted at the same place, ... forever.
Paste this into python console.

def a():
	wx.CallLater(4000, speech.speak, ["Interruption! You are in the path of a tornado, evacuate immediately!", speech.CallbackCommand(a)], priority=speech.SPRI_NOW)
a()

As soon as a is called, escape, and start a sayall.

derekriemer · 2017-09-13T20:09:02Z

Recording of above sayall repeat forever.
https://files.derekriemer.com/tornado.flac

michaelDCurran · 2017-09-13T21:30:40Z

Lower priority speech has to restart from some known point, otherwise speech could be lost. Currently I believe it restarts from the most recently reached index. The idea will be in future to simply add more indexes within the text. Having said this though, queuing high priority speech on repeat with such a short gap is not a great experience either way.

derekriemer · 2017-09-13T22:13:37Z

This fails for me. This is meant to simulate the reporting of a notification.
wx.CallLater(1000, speech.speak, [speech.WaveFileCommand(r"C:\Windows\Media\Windows Notify System Generic.wav"), "You have a meeting in ten minutes!"])

derekriemer · 2017-09-13T22:14:04Z

Does anyone else running this have problems with sayall in browsers?

jcsteh · 2017-09-13T23:48:15Z

@derekriemer commented on 14 Sep 2017, 08:13 GMT+10:

This fails for me. This is meant to simulate the reporting of a notification.
wx.CallLater(1000, speech.speak, [speech.WaveFileCommand(r"C:\Windows\Media\Windows Notify System Generic.wav"), "You have a meeting in ten minutes!"])

Can you be more specific about how it "fails"? It works just fine for me. I hear the notification sound with the message. Tested with espeak and oneCore.

@derekriemer commented on 14 Sep 2017, 08:14 GMT+10:

Does anyone else running this have problems with sayall in browsers?

Can you be more specific? Again, it works just fine for me. Tested in Firefox with eSpeak.

jcsteh · 2017-09-13T23:54:30Z

Oh blerg. Both of those things fail with eSpeak if you have automatic language switching turned on. If you use oneCore (or turn off auto language switching with eSpeak), it works as expected.

It looks like eSpeak fails to notify about marks (indexes) if they're immediately followed by a language change. That'll need to be fixed in eSpeak (or worked around somehow).

derekriemer · 2017-09-14T06:11:35Z

@jcsteh commented on Sep 13, 2017, 5:48 PM MDT:

@derekriemer commented on 14 Sep 2017, 08:13 GMT+10:

This fails for me. This is meant to simulate the reporting of a notification.
wx.CallLater(1000, speech.speak, [speech.WaveFileCommand(r"C:\Windows\Media\Windows Notify System Generic.wav"), "You have a meeting in ten minutes!"])

Can you be more specific about how it "fails"? It works just fine for me. I hear the notification sound with the message. Tested with espeak and oneCore.

@derekriemer commented on 14 Sep 2017, 08:14 GMT+10:

Does anyone else running this have problems with sayall in browsers?

Can you be more specific? Again, it works just fine for me. Tested in Firefox with eSpeak.

I hear nothing but the text

derekriemer · 2017-09-14T06:12:43Z

@jcsteh commented on Sep 13, 2017, 5:54 PM MDT:

Oh blerg. Both of those things fail with eSpeak if you have automatic language switching turned on. If you use oneCore (or turn off auto language switching with eSpeak), it works as expected.

It looks like eSpeak fails to notify about marks (indexes) if they're immediately followed by a language change. That'll need to be fixed in eSpeak (or worked around somehow).

confirm

Brian1Gaff · 2017-09-14T08:14:10Z

I'm not sure but some time ago I mentioned that Espeak kept switching languages when not appropriate, and turned the language switching off as a result. Brian bglists@blueyonder.co.uk Sent via blueyonder. Please address personal email to:- briang1@blueyonder.co.uk, putting 'Brian Gaff' in the display name field. ----- Original Message ----- From: "Derek Riemer" <notifications@github.com> To: "nvaccess/nvda" <nvda@noreply.github.com> Cc: "Subscribed" <subscribed@noreply.github.com> Sent: Thursday, September 14, 2017 7:12 AM Subject: Re: [nvaccess/nvda] New speech framework including callbacks, beeps, sounds, profile switches and prioritized queuing (#7599)

…

***@***.*****](https://github.com/jcsteh) commented on [Sep 13, 2017, 5:54 PM MDT](#7599 (comment) "2017-09-13T23:54:30Z - Replied by Github Reply Comments"): > Oh blerg. Both of those things fail with eSpeak if you have automatic > language switching turned on. If you use oneCore (or turn off auto > language switching with eSpeak), it works as expected. > > It looks like eSpeak fails to notify about marks (indexes) if they're > immediately followed by a language change. That'll need to be fixed in > eSpeak (or worked around somehow). confirm -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: #7599 (comment)

feerrenrut · 2019-05-15T16:57:17Z

source/synthDriverHandler.py

+#: Handlers are called with these keyword arguments:
+#: synth: The L{SynthDriver} which reached the index.
+#: index: The number of the index which has just been reached.
+synthIndexReached = extensionPoints.Action()


In keeping with the naming scheme for extensionPoints, I think we should change the name here. See the contrib guide or #7606 where the naming convention was initially discussed.

I need to look further at how these are used to understand how to rename these to meet the convention. The extension points used in #7606 were essentially informational "X is going to happen" or "X has just happend". My initial interpretation of these extension points is that they don't fit this pattern exactly. One thing that is not clear to me yet, is whether an index is conceptually tied to another command, or if it is independant.

feerrenrut · 2019-05-15T17:16:50Z

source/synthDrivers/_espeak.py

 	espeakDLL.espeak_SetSynthCallback(callback)
 	bgQueue = queue.Queue()
 	bgThread=BgThread()
 	bgThread.start()
+	onIndexReached = indexCallback


Shouldn't onIndexReached be set before espeakDLL.espeak_SetSynthCallback(callback) ? callback can call onIndexReached.

feerrenrut · 2019-05-15T17:34:01Z

source/speech.py

+				charDesc=characterProcessing.getCharacterDescription(locale,char.lower())
+		uppercase=char.isupper()
+		if useCharacterDescriptions and charDesc:
+			char=charDesc[0] if textLength>1 else u"\u3001".join(charDesc)


Please change this to the symbol name which I think should be 'IDEOGRAPHIC COMMA'.

At this point char can actually become a string? char is the thing returned, perhaps a better name might be speakCharAs?

feerrenrut · 2019-05-15T17:38:09Z

source/speech.py

+			yield BeepCommand(2000, 50)
+		yield char
+		if uppercase and synth.isSupported("pitch") and synthConfig["capPitchChange"]:
+			yield PitchCommand()


Since this PitchCommand needs to be tied to the first one, it would be nice to use some RAII like idiom to ensure that is the case. At the moment this seems to work a bit like a state machine, and is vulnerable to later modifications that may change the control flow but don't keep in mind the same cases as when originally developed.

ruifontes · 2019-05-16T00:16:04Z

Hello! Trying the new version, threshold-17245,with Vocalizer Expressive driver 3.0.14, the following error occurs after changing the synth to Vocalizer Expressive... WARNING - eventHandler._EventExecuter.next (01:12:20.823): Could not execute function event_gainFocus defined in appModules.thunderbird module; kwargs: {} Traceback (most recent call last): File "eventHandler.pyc", line 100, in next File "appModules\thunderbird.pyc", line 28, in event_gainFocus File "eventHandler.pyc", line 107, in next File "extensionPoints\util.pyc", line 185, in callWithSupportedKwargs File "NVDAObjects\behaviors.pyc", line 179, in event_gainFocus File "NVDAObjects\__init__.pyc", line 1030, in event_gainFocus File "NVDAObjects\__init__.pyc", line 918, in reportFocus File "speech.pyc", line 384, in speakObject File "speech.pyc", line 320, in speakObjectProperties TypeError: patchedSpeak() got an unexpected keyword argument 'priority' ERROR - scriptHandler.executeScript (01:12:23.177): error executing script: <bound method Dynamic_EditableTextWithAutoSelectDetectionBrokenFocusedStateDocumentEditorMozillaIAccessible.script_caret_moveByLine of <NVDAObjects.Dynamic_EditableTextWithAutoSelectDetectionBrokenFocusedStateDocumentEditorMozillaIAccessible object at 0x04FADFF0>> with gesture u'seta acima' Traceback (most recent call last): File "scriptHandler.pyc", line 192, in executeScript File "editableText.pyc", line 185, in script_caret_moveByLine File "editableText.pyc", line 144, in _caretMovementScriptHelper File "editableText.pyc", line 130, in _caretScriptPostMovedHelper File "speech.pyc", line 1032, in speakTextInfo TypeError: patchedSpeak() got an unexpected keyword argument 'priority' Curiously, Braille also stops working... Rui Fontes

ruifontes · 2019-05-16T00:35:34Z

Hello! Some more bugs, now using Windows Core: 1 - When typing a capital letter NVDA gaves the following error: Input: kb(desktop):shift+d DEBUG - synthDrivers.oneCore.SynthDriver.cancel (01:23:44.095): Cancelling IO - speech.speak (01:23:44.095): Speaking [PitchCommand(offset=30), CharacterModeCommand(True), u'D', PitchCommand(), EndUtteranceCommand()] ERROR - eventHandler.executeEvent (01:23:44.095): error executing event: typedCharacter on <NVDAObjects.Dynamic_EditableTextWithAutoSelectDetectionBrokenFocusedStateDocumentEditorMozillaIAccessible object at 0x04F89670> with extra args of {'ch': u'D'} Traceback (most recent call last): File "eventHandler.pyc", line 155, in executeEvent File "eventHandler.pyc", line 92, in __init__ File "eventHandler.pyc", line 100, in next File "NVDAObjects\__init__.pyc", line 977, in event_typedCharacter File "speech.pyc", line 713, in speakTypedCharacters File "speech.pyc", line 148, in speakSpelling File "speech.pyc", line 570, in speak File "speech.pyc", line 2260, in speak File "speech.pyc", line 2389, in _pushNextSpeech File "synthDrivers\oneCore.pyc", line 206, in speak File "speechXml.pyc", line 229, in convertToXml File "speechXml.pyc", line 156, in generateXml File "speechXml.pyc", line 242, in generateBalancerCommands File "speechXml.pyc", line 218, in generateBalancerCommands File "synthDrivers\oneCore.pyc", line 49, in convertPitchCommand AttributeError: '_OcSsmlConverter' object has no attribute '_pitch' 2 - In spite of having NVDA configured to announce the words typed, nothing is announced. I have verified through the log file set for bebug. Rui Fontes

michaelDCurran · 2019-05-16T03:50:13Z

@ruifontes the traceback you are reporting for vocalizer is a problem with your Vocalizer driver. Your driver patches speech.speak, but speech.speak has changed the arguments it takes. Specifically your patched speak function should take an optional keyword argument called 'priority'. Note that we don't really support people patching functions like this... but in this case adding that argument should fix your problem. I will investigate the OneCore traceback.

LeonarddeR · 2019-05-16T05:23:26Z

OneCore is most probably a mistake on my end, so I can also look into it?

LeonarddeR · 2019-05-16T07:21:59Z

Hmm, OneCore is an interesting one. It tries to send pitch using ssml, but in this case, it shouldn't.

I think it should be fixed as follows:

_convertProsody, convertRateCommand, convertPitchCommand and convertVolumeCommand should be moved from _OcSsmlConverter to _OcPreAPI5SsmlConverter
I believe convertRateCommand, convertPitchCommand and convertVolumeCommand all should return None on _OcSsmlConverter in other for these commands not being handles using ssml.

However then still, I'm afraid things won't work as expected when sending commands, as they aren't processed in the speak function. I'm afraid I'm too unfamiliar with synthesizer drivers to fix this ASAP.

jcsteh · 2019-05-16T10:28:38Z

Are you saying OneCore shouldn't use SSML for PitchCommand, etc.? If so, why? SSML is the ideal fit for inline speech prosody commands.

LeonarddeR · 2019-05-16T11:29:39Z

Are you saying OneCore shouldn't use SSML for PitchCommand, etc.? If so, why? SSML is the ideal fit for inline speech prosody commands.

Isn't the problem with SSML that it doesn't support the full rate and pitch range that is supported with the prosody commands? Or am I just misunderstanding something?

jcsteh · 2019-05-16T12:03:57Z

Using SSML to set the base pitch/rate is certainly weird, since SSML is meant to be relative. And the fact that it is relative also means we're affected by Windows Settings, which is bad. On the other hand, for speech commands, SSML is totally appropriate, since those commands are meant for inline, relative adjustments.

LeonarddeR · 2019-05-16T12:29:55Z

Thanks for the clarification. I will provide a pr that fixes this.

jcsteh added 7 commits September 13, 2017 13:09

Update the espeak synth driver to support the new speech framework.

0fbcda9

Update the oneCore synth driver to support the new speech framework.

1ebe910

Update comtypes to version 1.1.3.

9479237

This is necessary to handle events from SAPI 5, as one of the parameters is a decimal which is not supported by our existing (very outdated) version of comtypes . comtypes has now been added as a separate git submodule.

Update the sapi5 synth driver to support the new speech framework.

8102f71

jcsteh requested a review from michaelDCurran September 13, 2017 06:40

LeonarddeR mentioned this pull request Sep 13, 2017

Add a bunch of extension points to eliminate NVDA Remote monkey patching #7594

Closed

jcsteh mentioned this pull request Sep 13, 2017

Move NVWave to C/C++ #5096

Closed

Fix submodule URL for comtypes. Oops!

b2c2f74

LeonarddeR mentioned this pull request Sep 16, 2017

Suppress API calls (during say all) #3453

Closed

LeonarddeR added component/speech component/speech-synth-drivers labels Oct 12, 2017

feerrenrut reviewed May 15, 2019

View reviewed changes

michaelDCurran mentioned this pull request May 28, 2019

Pr7599 review actions #9626

Merged

This was referenced Jul 17, 2019

Refactor speech to support callbacks, speech commands in control/format fields, priority output, etc. #4877

Closed

double yield statement in sayAll handler #6685

Closed

josephsl mentioned this pull request Jul 23, 2019

What's new and readme: we are moving to Python 3.7 #9942

Merged

feerrenrut modified the milestones: 2019.2, 2019.3 Jul 30, 2019

codeofdusk mentioned this pull request Mar 25, 2020

Respect UIA Notification processing flag #9466

Merged

feerrenrut mentioned this pull request Jun 10, 2020

Fix several issues in speech manager #11245

Merged

This was referenced Nov 30, 2023

Determine add-on status based on tab of add-on store #15860

Merged

Add-on store: show updates, if any, when focusing add-ons in the installed add-ons tab and remove the updatable add-ons tab #15878

Closed

Adriani90 mentioned this pull request Mar 25, 2024

While reading text, play a sound for spelling errors instead of saying "spelling error". #10474

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New speech framework including callbacks, beeps, sounds, profile switches and prioritized queuing #7599

New speech framework including callbacks, beeps, sounds, profile switches and prioritized queuing #7599

jcsteh commented Sep 13, 2017 •

edited

jcsteh commented Sep 13, 2017

jcsteh commented Sep 13, 2017

jcsteh commented Sep 13, 2017

LeonarddeR commented Sep 13, 2017

jcsteh commented Sep 13, 2017

jcsteh commented Sep 13, 2017

derekriemer commented Sep 13, 2017

derekriemer commented Sep 13, 2017

michaelDCurran commented Sep 13, 2017 via email

derekriemer commented Sep 13, 2017

derekriemer commented Sep 13, 2017

jcsteh commented Sep 13, 2017

jcsteh commented Sep 13, 2017

derekriemer commented Sep 14, 2017

derekriemer commented Sep 14, 2017

Brian1Gaff commented Sep 14, 2017 via email

feerrenrut May 15, 2019

feerrenrut May 15, 2019

feerrenrut May 15, 2019 •

edited

feerrenrut May 15, 2019

feerrenrut May 15, 2019

ruifontes commented May 16, 2019 via email

ruifontes commented May 16, 2019 via email

michaelDCurran commented May 16, 2019 via email

LeonarddeR commented May 16, 2019 via email

LeonarddeR commented May 16, 2019

jcsteh commented May 16, 2019 via email

LeonarddeR commented May 16, 2019

jcsteh commented May 16, 2019 via email

LeonarddeR commented May 16, 2019 via email

New speech framework including callbacks, beeps, sounds, profile switches and prioritized queuing #7599

New speech framework including callbacks, beeps, sounds, profile switches and prioritized queuing #7599

Conversation

jcsteh commented Sep 13, 2017 • edited

Link to issue number:

Summary of the issue:

Description of how this pull request fixes the issue:

Enhance nvwave to simplify accurate indexing for speech synthesizers.

Enhancements to config profile triggers needed for profile switching within speech sequences.

Add support for callbacks, beeps, sounds, profile switches and utterance splits during speech sequences, as well as prioritized queuing.

Update comtypes to version 1.1.3.

Updated synth drivers

Testing performed:

Known issues with pull request:

Change log entry:

jcsteh commented Sep 13, 2017

jcsteh commented Sep 13, 2017

jcsteh commented Sep 13, 2017

LeonarddeR commented Sep 13, 2017

jcsteh commented Sep 13, 2017

jcsteh commented Sep 13, 2017

derekriemer commented Sep 13, 2017

derekriemer commented Sep 13, 2017

michaelDCurran commented Sep 13, 2017 via email

derekriemer commented Sep 13, 2017

derekriemer commented Sep 13, 2017

jcsteh commented Sep 13, 2017

jcsteh commented Sep 13, 2017

derekriemer commented Sep 14, 2017

derekriemer commented Sep 14, 2017

Brian1Gaff commented Sep 14, 2017 via email

feerrenrut May 15, 2019

Choose a reason for hiding this comment

feerrenrut May 15, 2019

Choose a reason for hiding this comment

feerrenrut May 15, 2019 • edited

Choose a reason for hiding this comment

feerrenrut May 15, 2019

Choose a reason for hiding this comment

feerrenrut May 15, 2019

Choose a reason for hiding this comment

ruifontes commented May 16, 2019 via email

ruifontes commented May 16, 2019 via email

michaelDCurran commented May 16, 2019 via email

LeonarddeR commented May 16, 2019 via email

LeonarddeR commented May 16, 2019

jcsteh commented May 16, 2019 via email

LeonarddeR commented May 16, 2019

jcsteh commented May 16, 2019 via email

LeonarddeR commented May 16, 2019 via email

jcsteh commented Sep 13, 2017 •

edited

feerrenrut May 15, 2019 •

edited