Voice commands #173

feugy · 2015-11-11T22:42:52Z

At last, a working implementation for voice commands.

User can now activate some commands (mouse clicks, scroll, magnifier, calibration, quit, browsing between keyboards...) with spoken sentences.

It uses Microsoft's built-in speech recognition engine, and is an alternative to mouse/gaze to trigger FunctionKeys.

After some tests, it became obvious that a prefixed grammar was needed. Users must start their order by a keyword (like with or Google Now).
When a command is recognized and applied, I used the speech synthesis to provide a feedback. It helps understand edge cases when user command is misunderstood.

All can be tuned from the Management console with a dedicated panel:

recognition can be enabled/disabled (disabled by default)
prefix can be customized (Opti by default)
audio feedback can be enabled/disabled (disabled by default)
all supported commands pattern are editable

Default commands are provided. A new Voice.resx file (that can be localized) was included, and is copied to application folder (AppData\Roaming\JuliusSweetland\OptiKey\Commands), allowing user to customize. Any time you will make a new version with new defaults, they will be merged with user's custom commands for the matching language.

Still 3 things from my point of view:

Write some tests. I started with one service, but it's not enought.
I suggest to use NUnit 3.0.0 (currently RC2) which includes more expressive assertions.
Handle misrecognized patterns in VoiceCommandSource. I need some advice, because I don't know how to "discard" values within Reactive's select
Handle InputService unsubscribe. Currently, I don't know when to do it, nor if it's relevant.

Last of all, I wonder if it could be possible/relevant to use voice commands during calibration.
It could make users more autonomous during this delicate process.

software design

…zed commands

…nto voicerecognition

Use speech recognition prefix for better accuracy Add (toggleable) audio feedback Support FunctionKeys enumeration most significative values Allows voice commands user customization per language Display voice prefix on splash screen

Use latests NUnit

Add unit test for ConfigurablCommandService

feugy · 2015-11-11T22:45:30Z

src/JuliusSweetland.OptiKey/UI/ViewModels/Management/WordsViewModel.cs

@@ -124,7 +124,7 @@ private void Load()

        public void ApplyChanges()
        {
-            bool reloadDictionary = Settings.Default.ResourceLanguage != ResourceLanguage;
+            bool reloadDictionary = Settings.Default.KeyboardLanguage != KeyboardLanguage;


It's not related to this PR, but I had to fix this to make unit tests working.

JuliusSweetland · 2015-11-12T18:06:24Z

I had a few small pieces of feedback:

There is an empty Commands folder in the project.

When the user puts OptiKey to "sleep", either via the key or a voice command, then voice commands should also sleep if they are active.

Similarly if voice commands are enabled it would be very useful to be able start and stop voice recognition with commands; "Opti Listen"/"Opti Stop Listening" for example. The use case is if you wanted to talk to someone in the room while still using OptiKey and didn't want the voice commands incorrectly responding to your voice.

I don't think there should be a voice command to clear the scratchpad by default. The commands should exist, but accidentally triggering it would be annoying as it would undo any typing you'd already entered, so maybe just leave the commands blank, for the user to enter if they want to use it?

I managed to trigger commands without using the prefix. I also triggered actions by saying unrelated words. Have you seen either behaviour? Is the recognition too eager as the set of possible phrases is so small?

Toggle voice recognition global state with voice commands

feugy · 2015-11-18T08:05:15Z

I've deleted the src\JuliusSweetland.OptiKey\Resources\Commands empty folder, but I'm still wondering why git took it into account (as it does not handle empty folders normally)

Voice recognition is disabled while Optiky is sleeping and now the overall recognition system can be toggled with voice command (equivalent of going to the management console and clicking on the relevant checkbox).

For that, I added two new FeatureKeys, in case we want a button for that in the future.

Regarding the scratchpad, I've wired the BackOne and BackMany keys, but not Clear. Did you meant BackMany ?

Normally it can't be possible to trigger something without prefix, I'm still investigating this point.

JuliusSweetland · 2015-11-18T08:07:31Z

Hi Damien,

That all sounds great. I'm taking a small break for a couple of weeks and
then I'll be working on OptiKey for a couple of weeks. I'll take a proper
look at this then. Thank you.

On Wed, 18 Nov 2015 08:05 Damien Simonin Feugas notifications@github.com
wrote:

I've deleted the src\JuliusSweetland.OptiKey\Resources\Commands empty
folder, but I'm still wondering why git took it into account (as it does
not handle empty folders normally)

Voice recognition is disabled while Optiky is sleeping and now the overall
recognition system can be toggled with voice command (equivalent of going
to the management console and clicking on the relevant checkbox).

For that, I added two new FeatureKeys, in case we want a button for that
in the future.

Regarding the scratchpad, I've wired the BackOne and BackMany keys, but
not Clear. Did you meant BackMany ?

Normally it can't be possible to trigger something without prefix, I'm
still investigating this point.

—
Reply to this email directly or view it on GitHub
#173 (comment)
.

resolution

feugy · 2015-12-06T08:27:47Z

Hi Julius.
I hope you're fully loaded after your vacation !

I've backport master to allow pull-request automatic resolution.
There is still open questions about the scratchpad (see upper) and the two later points of pull request's description

…nto feugy-voicerecognition

Fields to readonly Remove unused fields All user settings should be roaming Minor refactoring Introduction of constant string Reduce nesting

JuliusSweetland · 2015-12-06T15:28:43Z

Hi, I've had a quick look over the code and tested the functionality. I have started making a few (minor) changes which I've published here: https://github.com/OptiKey/OptiKey/tree/feugy-voicerecognition - if you could make your next pull request against that then we can work iteratively to resolve any outstanding issues.

Firstly I should answer your questions:

Scratchpad - I think I was mistaken. If clear isn't configured then great. I think that's fine for now.
To handle patterns that are not recognised - it looks like you kind of do handle this already by returning a new TriggerSignal, rather than one which is correctly populated. I've made a small change here on my "feugy-voicerecognition" branch, which filters out any signals that don't have a KeyValue. Is that what you mean?
InputService.DisposeSelectionSubscriptions() is called when selection subscription (trigger sources) are no longer required. In terms of creating and disposing the voiceCommandSubscription - I have checked in a change which creates this subscription when the first subscriber to Selection or SelectionResult is attached, and disposes the subscription when the last subscriber to these events unsubscribes. I think this is correct.

Secondly, I have some more feedback for you. It's all looking really promising, but I think we have a few gremlins to take care of before this can be released:

ViewCommandSource.MatchRecognised() - rather than output the cursor position I think it should output a default(Point), or the class should take in an IPointSource and use that to output real points. At the moment i don't think a point is useful alongside a voice command, but I'm not 100% sure.
ConfigurableCommandService - Task.Delay to give MainViewModel time to hook up PublishError handler and display loading problems is not ideal. I think it would be better to call the Load() method externally as parrt of the postMainViewLoaded() lambda in App.xaml.cs (I think this would work, but I have not tested). The same could be done for the DictionaryService as this could throw errors in its own Load method that would also not be displayed to the user correctly.
Prefix is still not required for me - just saying "click" works, despite prefixes being enabled.
Accuracy too low - if I talk at it randomly it will match words I am not saying. Is this potentially the difference between Microsoft.Speech (untrained, lower quality server version) and System.Speech (trained, higher quality, desktop version)? See http://stackoverflow.com/a/2982910/2009878
VoiceCommandSource: call to speechEngine.SetInputToDefaultAudioDevice(); will throw an exception if no recording device is configured. I think this exception should be caught and published as an Error event. I think the Error event handler would be attached at this point so the error will be displayed to the user. The exception is below, but this should be caught around this specific call, logged and converted into a nice error message indicating that no recording/input device could be found and that voice commands will not work until one is configured in Windows and OptiKey restarted.

    2015-12-06 14:30:05,594 [ 1] ERROR JuliusSweetland.OptiKey.App: An UnhandledException has been encountered...
    System.InvalidOperationException: Cannot find the requested data item, such as a data key or value.
       at System.Speech.Recognition.RecognizerBase.SetInputToDefaultAudioDevice()
       at System.Speech.Recognition.SpeechRecognitionEngine.SetInputToDefaultAudioDevice()
       at JuliusSweetland.OptiKey.Observables.TriggerSources.VoiceCommandSource.<get_Sequence>b__11_0() in C:\Users\Julius\Documents\GitHub\OptiKey\src\JuliusSweetland.OptiKey\Observables\TriggerSources\VoiceCommandSource.cs:line 69
       at System.Reactive.Linq.ObservableImpl.Using`2._.Run()
    --- End of stack trace from previous location where exception was thrown ---
       at System.Reactive.PlatformServices.ExceptionServicesImpl.Rethrow(Exception exception)
       at System.Reactive.Stubs.<.cctor>b__1(Exception ex)
       at System.Reactive.AnonymousSafeObserver`1.OnError(Exception error)
       at System.Reactive.Linq.ObservableImpl.Where`1._.OnError(Exception error)
       at System.Reactive.Concurrency.ObserveOn`1.ObserveOnSink.OnErrorPosted(Object error)
       at System.Windows.Threading.ExceptionWrapper.InternalRealCall(Delegate callback, Object args, Int32 numArgs)
       at System.Windows.Threading.ExceptionWrapper.TryCatchWhen(Object source, Delegate callback, Object args, Int32 numArgs, Delegate catchHandler)
       at System.Windows.Threading.DispatcherOperation.InvokeImpl()
       at System.Windows.Threading.DispatcherOperation.InvokeInSecurityContext(Object state)
       at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
       at System.Windows.Threading.DispatcherOperation.Invoke()
       at System.Windows.Threading.Dispatcher.ProcessQueue()
       at System.Windows.Threading.Dispatcher.WndProcHook(IntPtr hwnd, Int32 msg, IntPtr wParam, IntPtr lParam, Boolean& handled)
       at MS.Win32.HwndWrapper.WndProc(IntPtr hwnd, Int32 msg, IntPtr wParam, IntPtr lParam, Boolean& handled)
       at MS.Win32.HwndSubclass.DispatcherCallbackOperation(Object o)
       at System.Windows.Threading.ExceptionWrapper.InternalRealCall(Delegate callback, Object args, Int32 numArgs)
       at System.Windows.Threading.ExceptionWrapper.TryCatchWhen(Object source, Delegate callback, Object args, Int32 numArgs, Delegate catchHandler)
       at System.Windows.Threading.Dispatcher.LegacyInvokeImpl(DispatcherPriority priority, TimeSpan timeout, Delegate method, Object args, Int32 numArgs)
       at MS.Win32.HwndSubclass.SubclassWndProc(IntPtr hwnd, Int32 msg, IntPtr wParam, IntPtr lParam)
       at MS.Win32.UnsafeNativeMethods.DispatchMessage(MSG& msg)
       at System.Windows.Threading.Dispatcher.PushFrameImpl(DispatcherFrame frame)
       at System.Windows.Threading.Dispatcher.PushFrame(DispatcherFrame frame)
       at System.Windows.Application.RunDispatcher(Object ignore)
       at System.Windows.Application.RunInternal(Window window)
       at System.Windows.Application.Run(Window window)
       at System.Windows.Application.Run()
       at JuliusSweetland.OptiKey.App.Main() in C:\Users\Julius\Documents\GitHub\OptiKey\src\JuliusSweetland.OptiKey\obj\Debug\App.g.cs:line 0

JuliusSweetland · 2015-12-06T15:28:57Z

Ignore the close - it was a mis-click.

feugy · 2015-12-08T06:51:58Z

The changes you made are precisely what I needed for unrecognized patterns (I didn't knew about the return false possibility) and about unsubscription. Thank you !

EDIT We don't need real points, so I systematically set to 0,0. Vestige from my tries.
I 100% agree, my ignorance of C# and Rx tools is screechy. My solution was a real pain in tests, I'll do as you suggest.
EDIT I used the sizeAndPositionInitialised() lambda because during postMainViewLoaded() ToastNotificationPopup is not fully loaded at this time, and no one listens to the event stream.
Could you go to the management console and check that the corresponding settings has a proper Opti value ?
I suspect that I added the setting after you gave a first try, and that your value is empty. It could explain point 3. If it's the case, I could add a check to avoid using an empty prefix.
EDIT Finally, I got it ! System.speech is the most accurate speech engine, but it's the developer responsibility to enforce recognized input confidence. I divided inputs into prefix and commandsemantic values, and check their confidence separately. Now, a command will trigger only if we are 85% sure that user has said the prefix, and 75% sure that following command was properly recognized (because commands may be more complex, and we need to be flexible on that part).
I'll investigate Microsoft.Speech engine. But you still need to configure default English commands with not-too-close values. Even with a good trained engine, I fear we need default commands that are not too similar.
EDIT Microsoft.speech is less accurate that System.speech, and does not support semantic analysis of recognized inputs. So let's go on with System.speech.
Sorry for that. I'm struggling with Rx API, but I'll find the proper way to do that. I suggest we toggle the VoiceCommandsEnabled if an exception is caught during initialization: I don't want to bother user every time he starts Optikey because he can't/doesn't want to use recording device.
EDIT As you asked, the error is know logged and reported to user in ToastNotification. I had to make structural changes to ensure that notification will be displayed even if error is raised before the UI is loaded ans positioned, and that notification won't be hidden by the splash screen.
After the error, the VoiceEnabled settings is disable to avoid notification on next startup
There is an important feature missing: a simple way for visualize to access existing commands. It's not acceptable to go to management console to see them, so I'd like to add an help commands that shows a popup. What do you think ?

(PS and thank you for having spell checked my code !)

JuliusSweetland · 2015-12-08T12:13:57Z

I'll check that the prefix is set and re-test tonight.

No problem on spell check - your words weren't mis-spelled per-se, they were just the US English version so I changed them to UK English! :)

I'm not sure Microsoft.Speech would be better as it needs to be trained, i.e. you need to record each phrase. Maybe there is a way in System.Speech of defining how accurate the match must be before it is considered a match?

"toggle the VoiceCommandsEnabled if an exception is caught during initialization" - I think an error event should also be thrown.

Use fixed point for VoiceCommandTrigger.

JuliusSweetland · 2015-12-17T18:09:56Z

Hi @feugy - let me know when this pull request is ready for another review.

…fidences separately

Improved ToastNotification management, that displays notification one by one, even during startup. Usage of MVVM validation to avoid Voice recognition prefix

…ceptation

feugy · 2015-12-27T14:47:44Z

Hi Julius.

I think it's better now: all your remarks where taken into account, and it will be great if you have time to check them.

I've added notable stuff:

toast notifications are now queued, to be sure they will be displayed, and not hidden by others
in management console, I used MVVM validation annotation to enforce that Voice recognition prefix is not empty.

Regarding command help, I'd like to propose you something, but on a separated branch. voicerecognition lives since near two months, and I'd like to close it to avoid the multiple merge from master, unless you think it's better to ship everything in one piece.

JuliusSweetland · 2015-12-27T14:50:46Z

Hi Damien,

Great - I'll check it soon.

Regarding the new branch - would you like me to reject the existing PR and
then you can create a new one? Do you also want a new branch and for me to
delete the current voice recognition branch?

Regards,
Julius

On Sun, 27 Dec 2015, 16:47 Damien Simonin Feugas notifications@github.com
wrote:

Hi Julius.

I think it's better now: all your remarks where taken into account, and it
will be great if you have time to check them.

I've added notable stuff:

toast notifications are now queued, to be sure they will be
displayed, and not hidden by others

in management console, I used MVVM validation annotation to enforce
that Voice recognition prefix is not empty.

Regarding command help, I'd like to propose you something, but on a
separated branch. voicerecognition lives since near two months, and I'd
like to close it to avoid the multiple merge from master, unless you think
it's better to ship everything in one piece.

—
Reply to this email directly or view it on GitHub
#173 (comment).

feugy · 2015-12-27T19:55:26Z

I was thinking about dissociating the help feature from the voice recognition, so no deletion at all, just two different pull request.

JuliusSweetland · 2015-12-27T20:01:28Z

Ok sounds good. Is the current PR still ok to be auto merged?

On Sun, 27 Dec 2015, 21:55 Damien Simonin Feugas notifications@github.com
wrote:

I was thinking about dissociating the help feature from the voice
recognition, so no deletion at all, just two different pull request.

—
Reply to this email directly or view it on GitHub
#173 (comment).

feugy · 2015-12-29T07:03:28Z

For me yes.

JuliusSweetland · 2015-12-29T07:08:08Z

Should I wait for the second pull request before I test both together?

On Tue, 29 Dec 2015, 09:03 Damien Simonin Feugas notifications@github.com
wrote:

For me yes.

—
Reply to this email directly or view it on GitHub
#173 (comment).

feugy · 2015-12-29T07:41:12Z

IMO, they need to be handled differently.

I've opened #199 to discuss the help system, I think this one will also take some time before landing.

JuliusSweetland · 2015-12-29T07:45:26Z

Ok. I'll take a look at PR #173 in isolation as soon as I can. Probably in
a week as I've got a few things on over the new year.

On Tue, 29 Dec 2015, 09:41 Damien Simonin Feugas notifications@github.com
wrote:

IMO, they need to be handled differently.

I've opened #199 #199 to
discuss the help system, I think this one will also take some time before
landing.

—
Reply to this email directly or view it on GitHub
#173 (comment).

d-mojca · 2016-04-21T12:02:29Z

WOW, voice commands - is it possible to do that in other languages, too? (Slovene/Slovenian)

feugy · 2016-05-08T15:05:43Z

Hi @d-mojca.
Technically, all languages supported by Microsoft Speech engine can be used for voice commands.
Then, it's a matter of having Optikey translated into Slovenian has well.

But @JuliusSweetland didn't had time to review this PR, and I personally can't dedicate any more time for another huge merge, so it's likely that this feature won't land into master.

feugy added 8 commits October 18, 2015 08:31

First attempt to have voice recognition that plays nicely with OptiKey

b92584b

software design

Add configurable command service to allow user defining voice recogni…

524652d

…zed commands

Merge branch 'master' of https://github.com/JuliusSweetland/OptiKey i…

28f71c6

…nto voicerecognition

Working voice recognition with user editable commands

b584ded

Merge branch 'master' into voicerecognition

821b1d8

Add Unit tests for ConfigurableCommandService

b39106c

Use latests NUnit

Use resx for default voice commands

cb02bae

Add unit test for ConfigurablCommandService

feugy reviewed Nov 11, 2015
View reviewed changes

Removes test file

505eca4

Disabled voice recognition if sleeping

cd7911a

Toggle voice recognition global state with voice commands

feugy added 2 commits December 6, 2015 09:12

Merge user custom voice commands with default values at startup

fec0104

Merge master into voicerecognition to allow pull-request automatic

2b9ff68

resolution

JuliusSweetland added 3 commits December 6, 2015 12:01

Merge branch 'voicerecognition' of https://github.com/feugy/OptiKey i…

d057b6b

…nto feugy-voicerecognition

General cleanup of new voice command

a21689d

Fields to readonly Remove unused fields All user settings should be roaming Minor refactoring Introduction of constant string Reduce nesting

Creation and disposal of voice command subscription canged

a5ab033

JuliusSweetland closed this Dec 6, 2015

JuliusSweetland reopened this Dec 6, 2015

feugy added 2 commits December 10, 2015 08:49

Synchronously load ConfigurableCommandService when main window is ready.

57bf155

Use fixed point for VoiceCommandTrigger.

Merge remote-tracking branch 'julius/master' into voicerecognition

b3705b9

Merge master into voicerecognition

00e8092

feugy added 4 commits December 24, 2015 18:06

Update dependencies and fix warning

4108875

Improved voice accuracy by checking recognized prefix and command con…

2a619be

…fidences separately

Proper error handling when not recoding device found.

e52e58e

Improved ToastNotification management, that displays notification one by one, even during startup. Usage of MVVM validation to avoid Voice recognition prefix

Merge master into voicerecognition to allow automatic pull request ac…

f75b346

…ceptation

JuliusSweetland closed this Oct 6, 2016

alexandre-mbm mentioned this pull request Nov 16, 2016

Auditory trigger #26

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Voice commands #173

Voice commands #173

feugy commented Nov 11, 2015

feugy Nov 11, 2015

JuliusSweetland commented Nov 12, 2015

feugy commented Nov 18, 2015

JuliusSweetland commented Nov 18, 2015

feugy commented Dec 6, 2015

JuliusSweetland commented Dec 6, 2015

JuliusSweetland commented Dec 6, 2015

feugy commented Dec 8, 2015

JuliusSweetland commented Dec 8, 2015

JuliusSweetland commented Dec 17, 2015

feugy commented Dec 27, 2015

JuliusSweetland commented Dec 27, 2015

feugy commented Dec 27, 2015

JuliusSweetland commented Dec 27, 2015

feugy commented Dec 29, 2015

JuliusSweetland commented Dec 29, 2015

feugy commented Dec 29, 2015

JuliusSweetland commented Dec 29, 2015

d-mojca commented Apr 21, 2016

feugy commented May 8, 2016

Voice commands #173

Voice commands #173

Conversation

feugy commented Nov 11, 2015

feugy Nov 11, 2015

Choose a reason for hiding this comment

JuliusSweetland commented Nov 12, 2015

feugy commented Nov 18, 2015

JuliusSweetland commented Nov 18, 2015

feugy commented Dec 6, 2015

JuliusSweetland commented Dec 6, 2015

JuliusSweetland commented Dec 6, 2015

feugy commented Dec 8, 2015

JuliusSweetland commented Dec 8, 2015

JuliusSweetland commented Dec 17, 2015

feugy commented Dec 27, 2015

JuliusSweetland commented Dec 27, 2015

feugy commented Dec 27, 2015

JuliusSweetland commented Dec 27, 2015

feugy commented Dec 29, 2015

JuliusSweetland commented Dec 29, 2015

feugy commented Dec 29, 2015

JuliusSweetland commented Dec 29, 2015

d-mojca commented Apr 21, 2016

feugy commented May 8, 2016