New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix wake word record length #1207
Conversation
7983c10
to
f66f2f4
Compare
|
I believe the unit was ms (120/1000.0 => seconds). If I recall correctly 120ms was a suggested mean phoneme time for a typical speaker + 20 percent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the definition of that config makes me nervous. I think we should stick with what it was (milliseconds), so switch default value to "200" instead.
mycroft/configuration/mycroft.conf
Outdated
| @@ -125,7 +125,7 @@ | |||
| "user": "precise", | |||
| "folder": "/home/precise/wakewords" | |||
| }, | |||
| "phoneme_duration": 120, | |||
| "phoneme_duration": 0.2, | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment this default. 0.2 what?
What was the reason for changing units? What happens if someone customized this setting? Why not stick with milliseconds?
|
I did some testing on dev and feature/precise and can't seem to reproduce this. Could it be that the num_phonemes is actually 0 (or possibly 1) in your test case? The cast to int when converting to bytes is a really good catch. Is it that or upping the time per phoneme that's the core fix? |
80dc1aa
to
d48351c
Compare
|
Hello @MatthewScholefield! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. Comment last updated on November 09, 2017 at 17:30 Hours UTC |
mycroft/client/speech/mic.py
Outdated
| @@ -335,7 +336,7 @@ def _upload_file(self, filename): | |||
| keyfile + ' ' + fn + ' ' + address | |||
| if os.system(cmd) == 0: | |||
| del self.filenames_to_upload[i] | |||
| os.remove(fn) | |||
| #os.remove(fn) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this sneaked in unintentionally
|
Tested this with the "Hey Victoria" wake word (HH EY . V IH K T AO R IY AH) and for some reason it won't trigger. "Hey Mycroft" works fine and on dev Hey Victoria works. Will continue to investigate. |
mycroft/client/speech/mic.py
Outdated
| @@ -283,7 +284,7 @@ def decrease_noise(level): | |||
|
|
|||
| @staticmethod | |||
| def sec_to_bytes(sec, source): | |||
| return sec * source.SAMPLE_RATE * source.SAMPLE_WIDTH | |||
| return int(sec * source.SAMPLE_RATE * source.SAMPLE_WIDTH) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be rounded up for it to work consistently
either
return int(math.ceil(sec * source.SAMPLE_RATE * source.SAMPLE_WIDTH))
or the c hackers choice
return int(sec * source.SAMPLE_RATE * source.SAMPLE_WIDTH + 0.5)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this is strange because it indicates that changing a value by one sample (1/16000 of second) is making a difference. Could you by chance find which call to sec_to_bytes caused that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The relevant call is at L366 in mic.py
I think it's if the resulting byte size is odd, and since we pull the data from the rear of the byte array every 16 bit sample will be mangled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which means my suggested fix is probably wrong instead we need to check that bytes % sample_width == 0 and then append sample_width - bytes % sample_width padding if it's not?
2e8369f
to
ec03105
Compare
This is used to remove data retroactively from dataset
ec03105
to
0dc3dc1
Compare
|
@penrods Changed back to milliseconds and added a comment in the config to show the unit. |
|
Since at @penrods concerns have been addressed and the PR looks good. I'm merging this. |
Currently, wake word recordings that are used when the user opts in are too short (0.1 seconds or less). This fixes that. This also changes the value in the config to be in seconds. @forslund Let me know if there's a reason we should keep this as 120. I wasn't really sure what unit that was in, so I could be messing something up.
Steps for Testing
Before Patch:
hey mycroft, <anything>, the wav file stored in/tmp/mycroft_wake_words/should be really shortgit checkout bugfix/record-lengthAfter Patch:
hey mycroft, <anything>, the wav file stored in/tmp/mycroft_wake_words/should be of long length and hold all of the wake word plus a bit before in case of multiple attempted activations.