Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a kaldi rule (at least for sleep/wake) #797

Open
4 of 5 tasks
kendonB opened this issue Apr 28, 2020 · 16 comments
Open
4 of 5 tasks

add a kaldi rule (at least for sleep/wake) #797

kendonB opened this issue Apr 28, 2020 · 16 comments
Labels
Kaldi Related to Kaldi speech recognition backend New Feature A new feature that is not currently implemented. Windows Speech Recognition Windows Speech Recognition Backend

Comments

@kendonB
Copy link
Collaborator

kendonB commented Apr 28, 2020

Is your feature request related to a problem? Please describe.
Dragon has a bunch of built in commands that help with use. It would be nice for switchers to have a Kaldi rule loosely based on the functionality available in base Dragon.

Describe the solution you'd like
A Kaldi grammar with the following features:

  • Sleep/wake
  • An automatic program opener. In Dragon, you say "open " and it automatically finds and opens the program you want. It seems to be very good at automapping these utterances to program .exe's. I'm not sure how it works.
  • An automatic program switcher. In Dragon. you say "switch to ".
  • A universal button presser. I think this is just a matter of adding the buttons that are missing in hit in nav.py.
  • A "maximize window" command that works correctly.

To add sleep/wake we need to translate the grammar in this file into caster.

@daanzu if you have any pointers or know of someone who has done this please let us know.

@LexiconCode
Copy link
Member

LexiconCode commented Apr 28, 2020

This potential isn't too hard to implement. This would be relevant to all engines not just Kaldi. How Kaldi manages to make this work set_exclusiveness(). An exclusive grammar takes precedent over all other active grammars. A Rule/grammars that are exclusive are only available for recognition.

Note setting a grammar to exclusive overrides DNS's built-in sleep-wake function. When using DNS "natlink.setMicState("sleeping")" setting to match the state of the grammar.

@LexiconCode LexiconCode added the New Feature A new feature that is not currently implemented. label Apr 28, 2020
@lexxish
Copy link
Contributor

lexxish commented Apr 29, 2020

This potential isn't too hard to implement. This would be relevant to all engines not just Kaldi. How Kaldi manages to make this work set_exclusiveness(). An exclusive grammar takes precedent over all other active grammars. A Rule/grammars that are exclusive are only available for recognition.

Note setting a grammar to exclusive overrides DNS's built-in sleep-wake function. When using DNS "natlink.setMicState("sleeping")" setting to match the state of the grammar.

The FunkContext with a mapping rule.

Overriding the DNS default commands for sleep/wake would be nice because you can use shorter commands such as "snore" to put the mic to sleep. Only downside is I'm not sure if we could get the taskbar icon toggling from green to blue. I think the Kaldi implementation is more important regardless.

@LexiconCode
Copy link
Member

LexiconCode commented Apr 29, 2020

Only downside is I'm not sure if we could get the taskbar icon toggling from green to blue.

Fortunately I believe this can be handled by natlink.setMicState(state) reading the documentation it controls the mic, where state is 'on', 'off' or 'sleeping' natlink.getMicState() returns current state. Therefore the DNS icon could be in sync with the exclusive grammar state.

I agree the though the implementation is more important for WSR/Kaldi.

@lexxish
Copy link
Contributor

lexxish commented Apr 29, 2020

I agree the though the implementation is more important for WSR/Kaldi.

Regarding Kaldi, would implementation involve changing content_loader.py or does this rule operate independent of how you load the other rules?

@LexiconCode
Copy link
Member

LexiconCode commented Apr 29, 2020

Fortunately we don't have to change anything in Caster to make grammars exclusive. It's a simple bool. It works on any other rule. The rule must be already loaded into the engine before it's set to be exclusive. One set no other commands except for those that are exclusive through one or multiple rules will be recognized.

@kendonB
Copy link
Collaborator Author

kendonB commented May 11, 2020

@lexxish did you ever figure out getting sleep to work?

@LexiconCode
Copy link
Member

LexiconCode commented May 13, 2020

@lexxish

With straight dragonfly this would be pretty easy with Caster it's a bit different because we don't know the grammar name being used as it's different every boot. I've been working on programmatically switching DNS Modes in preparation for creating a mode unified mode manager for all engines. The following could be used in the sleep grammar.

from dragonfly import get_engine, Grammar

def find_grammar_name():
    grammar_cache = None
    if grammar_cache is None:
        for grammar in get_engine().grammars:
            for rule in grammar.rules:
                if rule.exported:
                    if rule.name == "Mode Rules": # Rule name
                        print(rule.name)
                        grammar_cache = grammar
                        return grammar_cache
    else:               
        return grammar_cache

in another function then you can use grammar_cache.set_exclusiveness(0) or grammar_cache.set_exclusiveness(1) to toggle exclusiveness

You can also check for the running engine type if there is differences that need to be handled based on engine implementation. For example with DNS:

if get_engine()._name == 'natlink':
	import natlink 
    # Do something

@LexiconCode
Copy link
Member

LexiconCode commented May 13, 2020

* A "maximize window" command that works correctly.

What's wrong with the current behavior @kendonB?

An automatic program switcher. In Dragon. you say "switch to ".

Besides creating a GUI the backend information could be obtained from a tweaked function to use get_all_windows(): returning all pids list instead of Window.get_foreground()

def get_active_window_info():
    '''Returns foreground window executable_file, executable_path, title, handle, classname'''
    FILENAME_PATTERN = re.compile(r"[/\\]([\w_ ]+\.[\w]+)")
    window = Window.get_foreground()
    executable_path = str(Path(get_active_window_path()))
    match_object = FILENAME_PATTERN.findall(window.executable)
    executable_file = None
    if len(match_object) > 0:
        executable_file = match_object[0]
    return [executable_file, executable_path, window.title, window.handle, window.classname]

@lexxish
Copy link
Contributor

lexxish commented May 13, 2020

@lexxish did you ever figure out getting sleep to work?

I have not tried yet. Will update you all if I do.

I do have some "switch to" like code I can post if anyone wants it. I use a phonetic distance library to choose the best match based on what is currently running. Also have "open" like command that searches a couple directories (e.g. desktop)...it's not perfect and I think the way "bring" allows you to specify programs is also nice for things you use a lot.

Another item that would be nice would be ability to use Kaldi for commands, but DNS for dictation - similar to how I believe Kaldi can be used with Google Speech Recognition.

Last item that would be nice to have (but deserves it's own issue number) is integration with accessibility APIs like DNS has. So you can say things like "Click X" when X is a button in a browser.

@lexxish
Copy link
Contributor

lexxish commented May 13, 2020

* A "maximize window" command that works correctly.

What's wrong with the current behavior @kendonB?

An automatic program switcher. In Dragon. you say "switch to ".

Besides creating a GUI the backend information could be obtained from a tweaked function to use get_all_windows(): returning all pids list instead of Window.get_foreground()

def get_active_window_info():
    '''Returns foreground window executable_file, executable_path, title, handle, classname'''
    FILENAME_PATTERN = re.compile(r"[/\\]([\w_ ]+\.[\w]+)")
    window = Window.get_foreground()
    executable_path = str(Path(get_active_window_path()))
    match_object = FILENAME_PATTERN.findall(window.executable)
    executable_file = None
    if len(match_object) > 0:
        executable_file = match_object[0]
    return [executable_file, executable_path, window.title, window.handle, window.classname]

I could be wrong, but I think Caster's default maximize uses "alt+SPACE, x" to maximize rather then sending the foreground window a maximize message (https://docs.microsoft.com/en-us/windows/win32/learnwin32/window-messages). I don't think "alt+SPACE, x" works for every application, but can't think of a specific one right now. The same type of scenario for exists for closing windows in Caster too I believe, where we could send SIGTERM and/or SIGKILL message equivalents (probably two different voice commands) instead of using keyboard shortcuts and it would (hopefully) work more consistently.

@LexiconCode
Copy link
Member

LexiconCode commented May 13, 2020

I could be wrong, but I think Caster's default maximize uses "alt+SPACE, x" to maximize rather then sending the foreground window a maximize message

Back when implementing kaldi support I switched it from "alt+SPACE, x" to dragonfly cross-platform implementation. For Windows OS utilizes Win32. If something's not behaving correctly with those minimize/maximize commands let me know.

def maximize_window():

@LexiconCode
Copy link
Member

LexiconCode commented May 13, 2020

Last item that would be nice to have (but deserves it's own issue number) is integration with accessibility APIs like DNS has. So you can say things like "Click X" when X is a button in a browser.

I will open up a new issue. Done #814

@LexiconCode LexiconCode added Kaldi Related to Kaldi speech recognition backend Windows Speech Recognition Windows Speech Recognition Backend labels May 14, 2020
@daanzu
Copy link
Contributor

daanzu commented May 14, 2020

Another item that would be nice would be ability to use Kaldi for commands, but DNS for dictation - similar to how I believe Kaldi can be used with Google Speech Recognition.

I don't have experience with Natlink, and don't currently have Dragon installed, but I'd be happy to help implementing this. Is there a way with Natlink to just get straight dictation recognition text from audio data passed to it? daanzu/kaldi-active-grammar#23

@LexiconCode
Copy link
Member

LexiconCode commented May 14, 2020

Perhaps there should be an issue in KaldiAG for working on this?

Agreed

@LexiconCode
Copy link
Member

@lexxish and @kendonB I will attempt to implement the sleeping grammar and modes for all engines. These modes will override DNS's built-in modes but will be kept in sync with the DNS GUI.

@LexiconCode
Copy link
Member

The #881 addresses the following request.

An automatic program switcher. In Dragon. you say "switch to ".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Kaldi Related to Kaldi speech recognition backend New Feature A new feature that is not currently implemented. Windows Speech Recognition Windows Speech Recognition Backend
Projects
None yet
Development

No branches or pull requests

4 participants