Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Voice commands #173

Closed
wants to merge 22 commits into from
Closed

Voice commands #173

wants to merge 22 commits into from

Conversation

feugy
Copy link
Contributor

@feugy feugy commented Nov 11, 2015

At last, a working implementation for voice commands.

User can now activate some commands (mouse clicks, scroll, magnifier, calibration, quit, browsing between keyboards...) with spoken sentences.

It uses Microsoft's built-in speech recognition engine, and is an alternative to mouse/gaze to trigger FunctionKeys.

After some tests, it became obvious that a prefixed grammar was needed. Users must start their order by a keyword (like with or Google Now).
When a command is recognized and applied, I used the speech synthesis to provide a feedback. It helps understand edge cases when user command is misunderstood.

All can be tuned from the Management console with a dedicated panel:

  • recognition can be enabled/disabled (disabled by default)
  • prefix can be customized (Opti by default)
  • audio feedback can be enabled/disabled (disabled by default)
  • all supported commands pattern are editable

Default commands are provided. A new Voice.resx file (that can be localized) was included, and is copied to application folder (AppData\Roaming\JuliusSweetland\OptiKey\Commands), allowing user to customize. Any time you will make a new version with new defaults, they will be merged with user's custom commands for the matching language.

Still 3 things from my point of view:

  1. Write some tests. I started with one service, but it's not enought.
    I suggest to use NUnit 3.0.0 (currently RC2) which includes more expressive assertions.
  2. Handle misrecognized patterns in VoiceCommandSource. I need some advice, because I don't know how to "discard" values within Reactive's select
  3. Handle InputService unsubscribe. Currently, I don't know when to do it, nor if it's relevant.

Last of all, I wonder if it could be possible/relevant to use voice commands during calibration.
It could make users more autonomous during this delicate process.

Use speech recognition prefix for better accuracy
Add (toggleable) audio feedback
Support FunctionKeys enumeration most significative values
Allows voice commands user customization per language
Display voice prefix on splash screen
Add unit test for ConfigurablCommandService
@@ -124,7 +124,7 @@ private void Load()

public void ApplyChanges()
{
bool reloadDictionary = Settings.Default.ResourceLanguage != ResourceLanguage;
bool reloadDictionary = Settings.Default.KeyboardLanguage != KeyboardLanguage;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not related to this PR, but I had to fix this to make unit tests working.

@JuliusSweetland
Copy link
Member

I had a few small pieces of feedback:

There is an empty Commands folder in the project.

When the user puts OptiKey to "sleep", either via the key or a voice command, then voice commands should also sleep if they are active.

Similarly if voice commands are enabled it would be very useful to be able start and stop voice recognition with commands; "Opti Listen"/"Opti Stop Listening" for example. The use case is if you wanted to talk to someone in the room while still using OptiKey and didn't want the voice commands incorrectly responding to your voice.

I don't think there should be a voice command to clear the scratchpad by default. The commands should exist, but accidentally triggering it would be annoying as it would undo any typing you'd already entered, so maybe just leave the commands blank, for the user to enter if they want to use it?

I managed to trigger commands without using the prefix. I also triggered actions by saying unrelated words. Have you seen either behaviour? Is the recognition too eager as the set of possible phrases is so small?

Toggle voice recognition global state with voice commands
@feugy
Copy link
Contributor Author

feugy commented Nov 18, 2015

I've deleted the src\JuliusSweetland.OptiKey\Resources\Commands empty folder, but I'm still wondering why git took it into account (as it does not handle empty folders normally)

Voice recognition is disabled while Optiky is sleeping and now the overall recognition system can be toggled with voice command (equivalent of going to the management console and clicking on the relevant checkbox).

For that, I added two new FeatureKeys, in case we want a button for that in the future.

Regarding the scratchpad, I've wired the BackOne and BackMany keys, but not Clear. Did you meant BackMany ?

Normally it can't be possible to trigger something without prefix, I'm still investigating this point.

@JuliusSweetland
Copy link
Member

Hi Damien,

That all sounds great. I'm taking a small break for a couple of weeks and
then I'll be working on OptiKey for a couple of weeks. I'll take a proper
look at this then. Thank you.

On Wed, 18 Nov 2015 08:05 Damien Simonin Feugas notifications@github.com
wrote:

I've deleted the src\JuliusSweetland.OptiKey\Resources\Commands empty
folder, but I'm still wondering why git took it into account (as it does
not handle empty folders normally)

Voice recognition is disabled while Optiky is sleeping and now the overall
recognition system can be toggled with voice command (equivalent of going
to the management console and clicking on the relevant checkbox).

For that, I added two new FeatureKeys, in case we want a button for that
in the future.

Regarding the scratchpad, I've wired the BackOne and BackMany keys, but
not Clear. Did you meant BackMany ?

Normally it can't be possible to trigger something without prefix, I'm
still investigating this point.


Reply to this email directly or view it on GitHub
#173 (comment)
.

@feugy
Copy link
Contributor Author

feugy commented Dec 6, 2015

Hi Julius.
I hope you're fully loaded after your vacation !

I've backport master to allow pull-request automatic resolution.
There is still open questions about the scratchpad (see upper) and the two later points of pull request's description

Fields to readonly
Remove unused fields
All user settings should be roaming
Minor refactoring
Introduction of constant string
Reduce nesting
@JuliusSweetland
Copy link
Member

Hi, I've had a quick look over the code and tested the functionality. I have started making a few (minor) changes which I've published here: https://github.com/OptiKey/OptiKey/tree/feugy-voicerecognition - if you could make your next pull request against that then we can work iteratively to resolve any outstanding issues.

Firstly I should answer your questions:

  1. Scratchpad - I think I was mistaken. If clear isn't configured then great. I think that's fine for now.
  2. To handle patterns that are not recognised - it looks like you kind of do handle this already by returning a new TriggerSignal, rather than one which is correctly populated. I've made a small change here on my "feugy-voicerecognition" branch, which filters out any signals that don't have a KeyValue. Is that what you mean?
  3. InputService.DisposeSelectionSubscriptions() is called when selection subscription (trigger sources) are no longer required. In terms of creating and disposing the voiceCommandSubscription - I have checked in a change which creates this subscription when the first subscriber to Selection or SelectionResult is attached, and disposes the subscription when the last subscriber to these events unsubscribes. I think this is correct.

Secondly, I have some more feedback for you. It's all looking really promising, but I think we have a few gremlins to take care of before this can be released:

  1. ViewCommandSource.MatchRecognised() - rather than output the cursor position I think it should output a default(Point), or the class should take in an IPointSource and use that to output real points. At the moment i don't think a point is useful alongside a voice command, but I'm not 100% sure.
  2. ConfigurableCommandService - Task.Delay to give MainViewModel time to hook up PublishError handler and display loading problems is not ideal. I think it would be better to call the Load() method externally as parrt of the postMainViewLoaded() lambda in App.xaml.cs (I think this would work, but I have not tested). The same could be done for the DictionaryService as this could throw errors in its own Load method that would also not be displayed to the user correctly.
  3. Prefix is still not required for me - just saying "click" works, despite prefixes being enabled.
  4. Accuracy too low - if I talk at it randomly it will match words I am not saying. Is this potentially the difference between Microsoft.Speech (untrained, lower quality server version) and System.Speech (trained, higher quality, desktop version)? See http://stackoverflow.com/a/2982910/2009878
  5. VoiceCommandSource: call to speechEngine.SetInputToDefaultAudioDevice(); will throw an exception if no recording device is configured. I think this exception should be caught and published as an Error event. I think the Error event handler would be attached at this point so the error will be displayed to the user. The exception is below, but this should be caught around this specific call, logged and converted into a nice error message indicating that no recording/input device could be found and that voice commands will not work until one is configured in Windows and OptiKey restarted.
    2015-12-06 14:30:05,594 [ 1] ERROR JuliusSweetland.OptiKey.App: An UnhandledException has been encountered...
    System.InvalidOperationException: Cannot find the requested data item, such as a data key or value.
       at System.Speech.Recognition.RecognizerBase.SetInputToDefaultAudioDevice()
       at System.Speech.Recognition.SpeechRecognitionEngine.SetInputToDefaultAudioDevice()
       at JuliusSweetland.OptiKey.Observables.TriggerSources.VoiceCommandSource.<get_Sequence>b__11_0() in C:\Users\Julius\Documents\GitHub\OptiKey\src\JuliusSweetland.OptiKey\Observables\TriggerSources\VoiceCommandSource.cs:line 69
       at System.Reactive.Linq.ObservableImpl.Using`2._.Run()
    --- End of stack trace from previous location where exception was thrown ---
       at System.Reactive.PlatformServices.ExceptionServicesImpl.Rethrow(Exception exception)
       at System.Reactive.Stubs.<.cctor>b__1(Exception ex)
       at System.Reactive.AnonymousSafeObserver`1.OnError(Exception error)
       at System.Reactive.Linq.ObservableImpl.Where`1._.OnError(Exception error)
       at System.Reactive.Concurrency.ObserveOn`1.ObserveOnSink.OnErrorPosted(Object error)
       at System.Windows.Threading.ExceptionWrapper.InternalRealCall(Delegate callback, Object args, Int32 numArgs)
       at System.Windows.Threading.ExceptionWrapper.TryCatchWhen(Object source, Delegate callback, Object args, Int32 numArgs, Delegate catchHandler)
       at System.Windows.Threading.DispatcherOperation.InvokeImpl()
       at System.Windows.Threading.DispatcherOperation.InvokeInSecurityContext(Object state)
       at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
       at System.Windows.Threading.DispatcherOperation.Invoke()
       at System.Windows.Threading.Dispatcher.ProcessQueue()
       at System.Windows.Threading.Dispatcher.WndProcHook(IntPtr hwnd, Int32 msg, IntPtr wParam, IntPtr lParam, Boolean& handled)
       at MS.Win32.HwndWrapper.WndProc(IntPtr hwnd, Int32 msg, IntPtr wParam, IntPtr lParam, Boolean& handled)
       at MS.Win32.HwndSubclass.DispatcherCallbackOperation(Object o)
       at System.Windows.Threading.ExceptionWrapper.InternalRealCall(Delegate callback, Object args, Int32 numArgs)
       at System.Windows.Threading.ExceptionWrapper.TryCatchWhen(Object source, Delegate callback, Object args, Int32 numArgs, Delegate catchHandler)
       at System.Windows.Threading.Dispatcher.LegacyInvokeImpl(DispatcherPriority priority, TimeSpan timeout, Delegate method, Object args, Int32 numArgs)
       at MS.Win32.HwndSubclass.SubclassWndProc(IntPtr hwnd, Int32 msg, IntPtr wParam, IntPtr lParam)
       at MS.Win32.UnsafeNativeMethods.DispatchMessage(MSG& msg)
       at System.Windows.Threading.Dispatcher.PushFrameImpl(DispatcherFrame frame)
       at System.Windows.Threading.Dispatcher.PushFrame(DispatcherFrame frame)
       at System.Windows.Application.RunDispatcher(Object ignore)
       at System.Windows.Application.RunInternal(Window window)
       at System.Windows.Application.Run(Window window)
       at System.Windows.Application.Run()
       at JuliusSweetland.OptiKey.App.Main() in C:\Users\Julius\Documents\GitHub\OptiKey\src\JuliusSweetland.OptiKey\obj\Debug\App.g.cs:line 0

@JuliusSweetland
Copy link
Member

Ignore the close - it was a mis-click.

@feugy
Copy link
Contributor Author

feugy commented Dec 8, 2015

The changes you made are precisely what I needed for unrecognized patterns (I didn't knew about the return false possibility) and about unsubscription. Thank you !

  • EDIT We don't need real points, so I systematically set to 0,0. Vestige from my tries.
  • I 100% agree, my ignorance of C# and Rx tools is screechy. My solution was a real pain in tests, I'll do as you suggest.
    EDIT I used the sizeAndPositionInitialised() lambda because during postMainViewLoaded() ToastNotificationPopup is not fully loaded at this time, and no one listens to the event stream.
  • Could you go to the management console and check that the corresponding settings has a proper Opti value ?
    I suspect that I added the setting after you gave a first try, and that your value is empty. It could explain point 3. If it's the case, I could add a check to avoid using an empty prefix.
    EDIT Finally, I got it ! System.speech is the most accurate speech engine, but it's the developer responsibility to enforce recognized input confidence. I divided inputs into prefix and commandsemantic values, and check their confidence separately. Now, a command will trigger only if we are 85% sure that user has said the prefix, and 75% sure that following command was properly recognized (because commands may be more complex, and we need to be flexible on that part).
  • I'll investigate Microsoft.Speech engine. But you still need to configure default English commands with not-too-close values. Even with a good trained engine, I fear we need default commands that are not too similar.
    EDIT Microsoft.speech is less accurate that System.speech, and does not support semantic analysis of recognized inputs. So let's go on with System.speech.
  • Sorry for that. I'm struggling with Rx API, but I'll find the proper way to do that. I suggest we toggle the VoiceCommandsEnabled if an exception is caught during initialization: I don't want to bother user every time he starts Optikey because he can't/doesn't want to use recording device.
    EDIT As you asked, the error is know logged and reported to user in ToastNotification. I had to make structural changes to ensure that notification will be displayed even if error is raised before the UI is loaded ans positioned, and that notification won't be hidden by the splash screen.
    After the error, the VoiceEnabled settings is disable to avoid notification on next startup
  • There is an important feature missing: a simple way for visualize to access existing commands. It's not acceptable to go to management console to see them, so I'd like to add an help commands that shows a popup. What do you think ?

(PS and thank you for having spell checked my code !)

@JuliusSweetland
Copy link
Member

I'll check that the prefix is set and re-test tonight.

No problem on spell check - your words weren't mis-spelled per-se, they were just the US English version so I changed them to UK English! :)

I'm not sure Microsoft.Speech would be better as it needs to be trained, i.e. you need to record each phrase. Maybe there is a way in System.Speech of defining how accurate the match must be before it is considered a match?

"toggle the VoiceCommandsEnabled if an exception is caught during initialization" - I think an error event should also be thrown.

@JuliusSweetland
Copy link
Member

Hi @feugy - let me know when this pull request is ready for another review.

Improved ToastNotification management, that displays notification one by one, even during startup.
Usage of MVVM validation to avoid Voice recognition prefix
@feugy
Copy link
Contributor Author

feugy commented Dec 27, 2015

Hi Julius.

I think it's better now: all your remarks where taken into account, and it will be great if you have time to check them.

I've added notable stuff:

  • toast notifications are now queued, to be sure they will be displayed, and not hidden by others
  • in management console, I used MVVM validation annotation to enforce that Voice recognition prefix is not empty.

Regarding command help, I'd like to propose you something, but on a separated branch. voicerecognition lives since near two months, and I'd like to close it to avoid the multiple merge from master, unless you think it's better to ship everything in one piece.

@JuliusSweetland
Copy link
Member

Hi Damien,

Great - I'll check it soon.

Regarding the new branch - would you like me to reject the existing PR and
then you can create a new one? Do you also want a new branch and for me to
delete the current voice recognition branch?

Regards,
Julius

On Sun, 27 Dec 2015, 16:47 Damien Simonin Feugas notifications@github.com
wrote:

Hi Julius.

I think it's better now: all your remarks where taken into account, and it
will be great if you have time to check them.

I've added notable stuff:

  • toast notifications are now queued, to be sure they will be
    displayed, and not hidden by others
  • in management console, I used MVVM validation annotation to enforce
    that Voice recognition prefix is not empty.

Regarding command help, I'd like to propose you something, but on a
separated branch. voicerecognition lives since near two months, and I'd
like to close it to avoid the multiple merge from master, unless you think
it's better to ship everything in one piece.


Reply to this email directly or view it on GitHub
#173 (comment).

@feugy
Copy link
Contributor Author

feugy commented Dec 27, 2015

I was thinking about dissociating the help feature from the voice recognition, so no deletion at all, just two different pull request.

@JuliusSweetland
Copy link
Member

Ok sounds good. Is the current PR still ok to be auto merged?

On Sun, 27 Dec 2015, 21:55 Damien Simonin Feugas notifications@github.com
wrote:

I was thinking about dissociating the help feature from the voice
recognition, so no deletion at all, just two different pull request.


Reply to this email directly or view it on GitHub
#173 (comment).

@feugy
Copy link
Contributor Author

feugy commented Dec 29, 2015

For me yes.

@JuliusSweetland
Copy link
Member

Should I wait for the second pull request before I test both together?

On Tue, 29 Dec 2015, 09:03 Damien Simonin Feugas notifications@github.com
wrote:

For me yes.


Reply to this email directly or view it on GitHub
#173 (comment).

@feugy
Copy link
Contributor Author

feugy commented Dec 29, 2015

IMO, they need to be handled differently.

I've opened #199 to discuss the help system, I think this one will also take some time before landing.

@JuliusSweetland
Copy link
Member

Ok. I'll take a look at PR #173 in isolation as soon as I can. Probably in
a week as I've got a few things on over the new year.

On Tue, 29 Dec 2015, 09:41 Damien Simonin Feugas notifications@github.com
wrote:

IMO, they need to be handled differently.

I've opened #199 #199 to
discuss the help system, I think this one will also take some time before
landing.


Reply to this email directly or view it on GitHub
#173 (comment).

@d-mojca
Copy link

d-mojca commented Apr 21, 2016

WOW, voice commands - is it possible to do that in other languages, too? (Slovene/Slovenian)

@feugy
Copy link
Contributor Author

feugy commented May 8, 2016

Hi @d-mojca.
Technically, all languages supported by Microsoft Speech engine can be used for voice commands.
Then, it's a matter of having Optikey translated into Slovenian has well.

But @JuliusSweetland didn't had time to review this PR, and I personally can't dedicate any more time for another huge merge, so it's likely that this feature won't land into master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants