Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hotword Detection #100

Open
alanjames1987 opened this issue Feb 17, 2015 · 33 comments
Open

Hotword Detection #100

alanjames1987 opened this issue Feb 17, 2015 · 33 comments

Comments

@alanjames1987
Copy link

I would like to have hotword detection availible in Annyang, basically allowing me to tell Annyang to not "activate" until a certain hotword is spoken. Essentally speech recognition would be working the entire time but only start caching results returned from the webkitSpeechRecognition from the first instance of the spoken hotword.

Similar to how Okay Google works, but without the plugin.

@TalAter
Copy link
Owner

TalAter commented Feb 18, 2015

Closing as a duplicate of #18

@TalAter TalAter closed this as completed Feb 18, 2015
@alanjames1987
Copy link
Author

I'm not meaning a command prefix, I'm meaning a string that when detected will trigger a callback function so that the user can be visually informed speech input has been activated.

@TalAter
Copy link
Owner

TalAter commented Aug 12, 2015

Hello again @alanjames1987, sorry for the late response on this issue.

Here's an idea of how to do this, using the new regular expression support available in v2.0.0

// run this after hotword was detected to register the "real" commands
var hotWordDetected = function() {
  annyang.removeCommands();
  annyang.addCommands({
    'hello': function() { alert('Hello world!'); }
  });
}

// initial command to listen for the hotword
var hotwordCommand = {
  'hotword': {'regexp': /shenanigans/, 'callback': hotWordDetected}
}

annyang.addCommands(hotwordCommand);
annyang.start();

If what you had in mind was that the user would have to always say the hotword before each command, you can do something like:

var hello = function() {
  alert('Hello world!');
}

var goodbye = function() {
  alert('Goodbye world!');
}

annyang.addCommands({
  'hello':    {'regexp': /.* shenanigans hello/, 'callback': hello},
  'goodbye':  {'regexp': /.* shenanigans goodbye/, 'callback': goodbye}
});
annyang.start();

@alanjames1987
Copy link
Author

I will try that out but that might be a good solution. It still seems slightly like a hack.

I was hoping this could be built into annyang. The code to interact with it might look something like this.

function hotwordDetectionHandler() {

    // code to trigger a sound
    // or update interface to show it's listening

}

function hotwordTimeoutHandler() {

    // code to trigger a sound
    // or update interface to show it's not listening

}

var hotwords = {
    '(hey) computer': hotwordDetectionHandler,
    '(hey) hal': hotwordDetectionHandler,
    '(hey) jarvis': hotwordDetectionHandler,
};

var commands = {
    'show me *term': showFlickr,
    'calculate :month stats': calculateStats,
    'say hello (to my little) friend': greeting
};

annyang.hotwords(true);

annyang.addHotwords(hotwords);

annyang.hotwordTimeout(1000); // <-- if a sentence isn't started within the time a deactivation function is called
annyang.hotwordTimeoutHandler(hotwordTimeoutHandler); // <-- function to run after timeout

annyang.addCommands(hotwordCommand);
annyang.start();

@TalAter
Copy link
Owner

TalAter commented Sep 18, 2015

That's an interesting idea... and a very well thought out API!
I wish all issues were posted like this 👍 Thanks

How do you see the importance of allowing separate hotwordDetectionHandlers? Why allow just one hotwordTimeoutHandler but multiple hotwordDetectionHandlers?

Is there a specific common use case that requires multiple ones?
Would it be good enough to just return the captured hotword as a parameter to the hotwordDetectionHandler?

This would allow us to simplify the API to something like:

var hotwords = [
    '(hey) computer',
    /(hey|hello) hal/
];

@TalAter TalAter reopened this Sep 18, 2015
@alanjames1987
Copy link
Author

I don't think multiple hotwordDetectionHandlers is very important. I added it in there because it was in line with the commands are currently added to annyang and I was trying to keep a similar API.

I think the idea of sending the spoken hotword to the hotword handler is great.

@TalAter
Copy link
Owner

TalAter commented Sep 27, 2015

Sounds good.

Would you like to give this a shot and send me a pull request?

@alanjames1987
Copy link
Author

I will look into this as soon as I can, hopefully this weekend. I know I will have to use interim results, so I will be enabling that.

@TalAter
Copy link
Owner

TalAter commented Sep 29, 2015

Enabling interim results seems like a very drastic change to how annyang works, and doesn't really seem required for hotword detection.

Is there a reason this feature can't be enabled without enabling interim results?

@alanjames1987
Copy link
Author

I can only see real time hotword detection being added if we have real time results using interim results.

There might be a better way. If you think there is I would love to hear it.

@4tee
Copy link

4tee commented Jan 25, 2016

Hi there, I am wondering if this feature has been added.

@alanjames1987
Copy link
Author

It hasn't been added. I have had no time to work on this yet.

@unbolt
Copy link

unbolt commented Apr 6, 2016

Any plans on a timeframe for getting this feature implemented?

@revett
Copy link

revett commented Apr 16, 2016

+1

1 similar comment
@MariuszT
Copy link

MariuszT commented Jun 6, 2016

+1

@xuchen
Copy link

xuchen commented Jun 18, 2016

Looks like the Snowboy hotword detection toolkit is exactly used for this purpose:

https://github.com/kitt-ai/snowboy

It works offline so no streaming data to Google until you explicitly activate it.

Currently there are discussions about a NodeJS module (Kitt-AI/snowboy#4). Anyone wants to give it a try?

@evancohen
Copy link
Contributor

Now that we've finished the snowboy node module, I can continue with my master plan!

Because annyang is such an awesome library, there have been loads of people (myself included), that have used it for "non-web" (Electron or otherwise) projects. Just to make my point, there are over 700 forks of @TalAter's annyang-electron-demo.

That's is why I've started building sonus: a node speech framework that uses snowboy for hotword detection and Google Cloud Speech for accurate recognition. I haven't quite started working on the annyang shim yet, still a few things to iron out, but I'm planning to use it as one of the command recognition systems.

It's probably worth pointing out that it's not ready for prime time just yet, but I am looking for collaborators, so if you're interested hit me up!

🚀

@lynxaegon
Copy link

I'm currently building a "Jarvis" like system based on a chromium-browser and a rpi with a 7" screen. At first, when i saw annyang doesn't use hotwords, it was perfect. But after adding a few commands, well.. you can imagine what chaos is in the house :)

I'm looking forward for a hotword plugin / update for annyang.

@evancohen
Copy link
Contributor

After some deliberation I decided to take the "core" of annyang and include it in the project - it wasn't built to run outside of the web browser and there's a lot of logic that Sonus already offers that would take a lot of work to plumb into annyang.

I've included the annyang command registration system out of the box as a part of Sonus. Here's an example:

'use strict'

const Sonus = require('sonus')
const speech = require('@google-cloud/speech')({
  projectId: 'streaming-speech-sample',
  keyFilename: './keyfile.json'
})

const hotwords = [{ file: './resources/sonus.pmdl', hotword: 'sonus' }]
const language = "en-US"
const sonus = Sonus.init({ hotwords, language }, speech)

const commands = {
  'hello': () => {
    console.log('You will obey');
  },
  '(give me) :flavor ice cream': flavor => {
    console.log('Fetching some ' + flavor + ' ice cream for you, yo')
  },
  'turn (the)(lights) :state (the)(lights)': state => {
    console.log('Turning the lights', (state == 'on') ? state : 'off')
  },
  'stop': () => {
    console.log('Stopping...')
  }
}

Sonus.annyang.addCommands(commands)

Sonus.start(sonus)
console.log('Say "' + hotwords[0].hotword + '"...')

sonus.on('hotword', (index, keyword) => console.log("!" + keyword))
sonus.on('partial-result', result => console.log("Partial", result))

sonus.on('final-result', result => {
  console.log("Final", result)
  if (result.includes("stop")) {
    Sonus.stop()
  }
})

As of tonight I've published v0.1.0 which includes annyang and can be installed by following the instructions in the repo: https://github.com/evancohen/sonus

Feedback is welcome and appreciated.

@BetaStacks
Copy link

BetaStacks commented Jan 3, 2017

Here is a how I ended up creating a Global Command Prefix and Suffix
http://codepen.io/BrandonCorlett/pen/mRdMqY

/* SET GLOBAL COMMAND PREFIX */
var globalCommandPrefix = "Computer (please)" + " ";

/* SET GLOBAL COMMAND Suffix */
var globalCommandSuffix = " " + "(please)";

/* SET UNIQUE COMMAND TEXT */
var command1 = "say my name is :name";
var command2 = "I am :name";

(function () {
    var commands, log, sayName;
    log = $('.log');
    sayName = function (name) {
        log.append('<li>Your name is ' + name + '!</li>');
        return console.log(name);
    };
  
  
 /* CONCATENATE COMMANDs IN VARIABLES */ 
 
  var command1Con = globalCommandPrefix + command1 + globalCommandSuffix;
   var command2Con = globalCommandPrefix + command2 + globalCommandSuffix;
  
  /* USE VARIABLE IN BRACKETS AS OBJECT KEY */
    commands = { [command1Con]: sayName,
               [command2Con]: sayName};
    annyang.addCommands(commands);
    annyang.start();
    annyang.debug();
}.call(this));

$('.globalCommandPrefix').text(globalCommandPrefix);
$('.globalCommandSuffix').text(globalCommandSuffix);
$('.command1').text(command1);
$('.command2').text(command2);

I'm sure it could be a bit cleaner. It works well for my use case as I am developing a plugin for another piece of software who's API allows be to use a GUI to toggle on and off parts of the code each command/function time I drag a new stack into the IDE.

I set the global commands once per page or use PHP to set it once per site.

@Nixellion
Copy link

Nixellion commented Jan 5, 2017

I'll +1 to this issue. It would definitely be awesome to have some front-end javascript based hotword detection. If i'm correct snowboy and sonus both require node.js server side stuff?

I'm writing my own home assistant bot as well, using Python for command processing, and I only use browser as a UI that recognizes speech and sends text commands to the Python Flask server.

I chose this approach, because this way I can just put a few cheap android or windows tablets around the house, instead of dealing with and mixing a lot of microphones routed to one pc. It also allows me to use my AI when I'm not at home. So it makes it more like Cortana\OkGoogle\Alexa.

So I'm really curious about how to detect hotwords with browser-side JS.
Not feeling like writing an app for this yet :)

@evancohen
Copy link
Contributor

@Nixellion

Sonus uses Node.js, but it's a bit a-typical because it's primarily a "client" library intended for low powered hardware devices. I'm also looking to create a Python interface: evancohen/sonus#13.

To address your main question: You can run browser based detection with pocketsphinx. An alternative that I really like is JsSpeechRecognizer. You need a reasonably high powered device in order to actually get real-time recognition for both of these. Accuracy is also a big problem, if you have any background noise you are unlikely to get any kind of reasonable detection (and lots of false positives).

I went down the "offline hotword recognition in the browser" path for my smart mirror. After a lot of pain and dead-ends I found snowboy, wrote their Node library, and created sonus.

As an aside (and for inspiration): My current home automation solution right now uses a bunch of $9 CHIPs + $5 PlayStation Eyes + Sonus. Each device is location aware ("turn on the lights" will do something different depending on what room you are in, but "turn on the living room lights" will always turn on the living room lights). Also cool: Next Thing Co also recently released the $16 CHIP Pro which has an on-board microphone (I've yet to receive mine, but it looks promising).

@Nixellion
Copy link

I don't really need offline recognition, I only need offline hotword detection in browser, to activate google's online speech recognition after that. This way I'll be able to both NOT spam google with non-stop speech recognition requests (well, as far as people are talking), and talking paranoia - it will only get commands for recognition, no private talks.

And running offline speech recognition is even harder, because I need it to work with Russian language, and sphinx only supports english out of the box.

As for the power, I can record audio in the browser, send it to the home server (powerful DIY NAS), it can recognize whether there is hotword or not, but that would probably take too long.

@ghost
Copy link

ghost commented Jan 6, 2017

@Nixellion
For hotword detection in browser i went with annyang, but i made a "Conversation" class. You can have a list of commands, which when triggered, gives you another list of commands and so on.
You could you that for hotword detection, you just add the first command to be the "hotword" and in the conversation class you add the rest of the commands. I'll just leave it here for anyone that wants to do something with it. (the code in a few hours, about 3-4, it's at home :D)

@evancohen
It was a surprise for me to see the chip is so cheap and the pro version has a microphone. You can add a lot of chips in the house, for recognition, and just a main rpi (like mine) as the brains. The only problem.. i used the chrome speech recognition (browser based) which doesn't quite work with sounds from sources like mp3/wav.
How do you do speech recognition ?

@Nixellion
Copy link

@andreimavenhut
Oh, I guess I did not express myself correctly. I DO need offline hotword detection, so annyang is not an option. Annyang is using google's recognition, so in a noisy room it will send audio to google basically non-stop. It's bad for a huge number of reasons, starting with network bandwidth and ending with privacy.

Right now I use Annyang in a form of just ONE command basicallly. , *tag. It just grabs everything after botname, and sends it to my personal Python server, which then does all the natural language processing, user-specific context, user-specific conversations, finding the right command and\or using chatbot. I limit the use of JS only for a very simple web-ui. This way I can then make very simple native apps for other platforms IF needed. And I won't have to rewrite a lot of code for that. And it's more secure, I can give client access to any number of friends, and they can have fun with my bot, and have security clearance restricting them from accessing sensitive commands :D I actually already have the groundwork for speaker recognition. My client can send audio to server for processing, but I got stuck on actual audio speaker recognition yet.

So, I don't think it's a good solution to detect hotword with annyang, then process another command. With annyang it's easier to just use commands with global prefix. Because I don't really see any other reasons other than bandwidth and privacy that you would need a separate hotword, it only makes running all commands in 2 steps instead of one. Instead of just saying without a pause "SuperBot, kill the lights!" you will have to go through a dialog:

  • Superbot! (your own pause)
  • (Little Pause) BleepBlop!?
  • Kill the lights.
  • (Another Pause) Bloop!

With prefix approach it's just:

  • Superbot, (no pause) kill the lights!
  • (little pause) Bloop!

Now, I could use Python's speech recognition for hotword detection, sending audio to the server to proocess it using some custom matching algorythm, but I don't want to put so much data through my local network all the time. I mean, always sending audio, each time there is SOME sound detected...

Oh, and about your second question. While you're waiting for evancohen's answer, my opinion is that with Rpi or chips you should probably go with Python, using it's SpeechRecognition module, which support online recognition using Google's services, and also bing and a number of other online services (which you have to get API though). It does not support russian Yandex recognition service yet, but in fact it's not that hard to write your own online recognition module. It's all about recording audio, and just sending it as POST request to their server, and receiving the JSON response.

But SpeechRecognition (or SpeechRecognizer? Not sure how it's called in pip) also supports offline recognition using Sphinx. If you're english speaker, you're in huge luck. It does a nice job at recognizing english language out of the box. Worse than google's or any other online service (they're constantly improving, from what I understand they use neural networks to improve recognition over time), but it's still pretty good.

@ghost
Copy link

ghost commented Jan 6, 2017

I thought about using sphinx or another offline recognizer, but after a few benchmarks i went with the SpeechRecognition in chrome. I know you can use their APIs, but hey.. they do cost :) and inside chrome, the speechRecognition has a ApiKey that (from what i know) it's unlimited, which converts for me in 0 costs.
I wonder if @evancohen found a better way of detecting speech online or offline without any costs.

@Nixellion
Copy link

@andreimavenhut , Well, Chrome's speech recognition is actually using Google's servers as well, from what I know, so it's still bandwidth usage and all.

And in Python's speech recognition there is actually an unlimited Google apikey as well. So you get it for free in python too. And sphinx is of course free as well but a pain in the ass :D

@evancohen
Copy link
Contributor

@andreimavenhut For speech recognition I use Sonus. In terms of audio encoding, it uses 16-bit signed-integer linear pulse modulation coded WAV (no mp3 support today). It's entirely stream based, so you could theoretically stream to it from your web browser so a server instance of Sonus (although I've never actually tried this).

One big problem trying to do keyword spotting (aka hotword detection) off-device is latency/lag in detection. That's also a problem in the browser, JS simply isn't really optimized for audio processing... That's not to say it can't be done - it will probably just be a bit slower.

I saw @Nixellion's comment on Kitt-AI/snowboy#98 and would love to see browser compatibility (I would create a browser based version of Sonus in a heartbeat). Based on what you described it's exactly what you are looking for.

Since this conversation is no longer directly related to Annyang (and so we don't spam others) I've created a new issue on the Sonus repo to continue this discussion: evancohen/sonus#28

@gaitat
Copy link

gaitat commented Nov 9, 2017

Is there an update on this issue? i.e. using annyang along with a hotword? Is the solution to always join the hotword in front of the command?

@Nixellion
Copy link

Nixellion commented Nov 9, 2017

@gaitat You could try running continious recognition and checking if there's a hotword on each update. Once there is - restart and go for the phrase recognition. I did not test this approach, but thinking about doing it some time.

If you're not worried about constant stream of your audio going to google's servers that is.

Alternatively, instead of appending, I would also split the string at the hotword. Because you may be talking something, recognition starts. And in the middle of your talk you say your hotword and command. It will be in the middle, not in front of the string. Did not try this approach either though :D

@LukeMcLachlan
Copy link

LukeMcLachlan commented Feb 9, 2018

@andreimavenhut HI Andrei I was reading with interest about your conversation class. Currently I'm adding items via speech to physical boxes with Annyang, saying for example "add gloves to box number 4" (gloves are then saved to box 4 in MySQL), but I was thinking about the possibility of removing things and Annyang asking me e.g. "are you sure you want to remove the candle from box number 4", then waiting for me to say either "yes" or "no".
The way I was thinking of going was cookies, for example when I say "remove candle from box number 4" a cookie is stored in the browser that lasts e.g 10 seconds, so when Annyang asks "are you sure you want to remove the candle from box number 4" and I answer "yes", the "yes" triggers a function that searches for the cookie and if found removes the candle from box number 4.
It sounds as though you may have a better way of having a conversation with Annyang, if so would you care to share? Thanks / Luke

@lynxaegon
Copy link

lynxaegon commented Feb 9, 2018

@LukeMcLachlan Hi Luke, it's not that great of a module/class, i just hacked it fast.
I wanted to refactor the whole app, but i haven't had the time to do it, but it works :)

Here it is:
https://gist.github.com/lynxaegon/a76ae8d2ac30f80dff93027290c9e577

Short Explanation:
Whenever it matches a command, it just resets the annyang commands, and adds the current conversation commands. If you don't answer in 10s, it reverts back to the main commands.

@LukeMcLachlan
Copy link

Thank you @lynxaegon I'll have a look at it this evening and see what I can do with it, very kind of you to share it with me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests