Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speaker Recognition #367

Draft
wants to merge 6 commits into
base: naomi-dev
Choose a base branch
from

Commits on Sep 15, 2022

  1. Speaker Recognition

    Working on introducing a new speaker recognition plugin type. This
    plugin is to both recognize an individual and taylor responses to
    them and improve speech recognition by training STT models to
    individual speakers.
    aaronchantrill committed Sep 15, 2022
    Configuration menu
    Copy the full SHA
    421d062 View commit details
    Browse the repository at this point in the history

Commits on Oct 17, 2022

  1. Added a default speaker recognizer

    Added a default speaker recognizer, default_sr, which does not
    attempt to identify the speaker but just responds with the first
    name as stored in the profile.
    
    I am also passing the result around from the sr_plugin to the
    intent parser, to which the identity of the speaker can be
    attached to the intent object being passed to the speechhandler.
    
    I also simplified the list of parameters being passed to the mic
    object when it is created, storing them instead in the profile.
    aaronchantrill committed Oct 17, 2022
    Configuration menu
    Copy the full SHA
    26a663a View commit details
    Browse the repository at this point in the history

Commits on Nov 6, 2022

  1. Fixed problems with speech recognizer

    There were numerous problems with the speech recognizer class that
    made it not work when using the VOSK_sr plugin.
    
    I am no longer trying to recognize the speaker when listening
    passively for the wake word, only when doing the active listening.
    This is because the passive listening needs to be very fast.
    
    I am now putting the name of the identified user in parenthesis
    after the active listening transcript. Plugins can access the
    identity of the speaker as `intent.get('user', '')`. The only
    plugin currently set up to use this is the shutdown plugin. I
    also have an update to the Greetings plugin which greets you by
    name when you greet Naomi.
    
    The setup still assumes en-US when downloading the VOSK models,
    which needs to be fixed to respect the "language" setting in the
    profile.
    
    The VOSK speaker recognition is not terribly accurate. It also
    seems like you need to retrain your speaker recognition database
    from new recordings when you switch to different recording
    hardware.
    aaronchantrill committed Nov 6, 2022
    Configuration menu
    Copy the full SHA
    dab0b66 View commit details
    Browse the repository at this point in the history

Commits on Nov 7, 2022

  1. Fixed some method signature mismatches

    Fixed some method signature mismatches from when I removed the
    parameters from the mic methods. Fixed an issue preventing the
    input device verification from working during initial setup or
    repopulate. Changed the name of the "confidence" result from
    speaker recognition to "distance" (since smaller numbers are
    better). Clarified how to enter multiple email addresses in
    notification client configuration, although I think that still
    needs looked at. I think the safe email list is not being
    stored as a list, and I think Naomi should only respond to email
    addresses in that list, not to any email address if the list is
    empty.
    aaronchantrill committed Nov 7, 2022
    Configuration menu
    Copy the full SHA
    1af470c View commit details
    Browse the repository at this point in the history
  2. Download French or German language models

    Added filenames for the default French and German models.
    Constructed URLs and paths from these models to automatically
    download and extract the model that matches the language choice
    in profile.
    
    The only thing it does not currently do is check whether or not
    VOSK is also used as the STT engine. If not, then the audio
    file pointer should be passed to the actual STT engine.
    aaronchantrill committed Nov 7, 2022
    Configuration menu
    Copy the full SHA
    fbc8f46 View commit details
    Browse the repository at this point in the history

Commits on Nov 8, 2022

  1. Fix for when mic.listen returns False

    If the mic gets cut off when Naomi is listening, then the
    mic.listen() method will return False rather than an sr_output
    dictionary.
    aaronchantrill committed Nov 8, 2022
    Configuration menu
    Copy the full SHA
    f8c1106 View commit details
    Browse the repository at this point in the history