MMM-WhisperGPT

This is a module for the MagicMirror².

How it works 👉 https://nikro.me/articles/professional/crafting-our-ai-assistant/

Goal of the module is to create a custom interactive widget that uses Open AI tools:

Whisper - self-hosted model for voice-to-text transcription.
LangChain - intended to be used with ChatGPT API, to process the requests.
Picovoice -> Porcupine - is used for offline (self-hosted) word trigger (accent on the privacy).
also... mimic3 :)

Idea is the following:

Wake word (Porcupine).
...record query (show a sexy animation, will be done later)
...pass to self-hosted Whisper
...transcribe voice-to-text
Show the question as transcribed rendered-text (in the module render)
...pass through LangChain to ChatGPT
...pass the textual reply back to the module and render on-screen
...use TTS (mimic3) - self-hosted on the network, to throw back a wav file to play.

Using the module

To use this module, add the following configuration block to the modules array in the config/config.js file:

var config = {
    modules: [
        {
            module: 'MMM-WhisperGPT',
            config: {
                // See below for configurable options
                picovoiceKey: 'xxx',
                picovoiceWord: 'JARVIS',
                picovoiceSilenceTime: 3,
                picovoiceSilenceThreshold: 600,
                audioDeviceIndex: 3,
                openAiKey: 'xxx',
                openAiSystemMsg: 'xxx',
                whisperUrl: '192.168.1.5:9000/asr',
                whisperMethod: 'openai-whisper',
                mimic3Url: '192.168.1.6:59125'
            }
        }
    ]
}

Configuration options

Option	Required?	Description
`picovoiceKey`	Required	Picovoice access key - you have to register to obtain it - this is used for trigger word.
`picovoiceWord`	Optional	Picovoice trigger word, i.e. BUMBLEBEE, JARVIS, etc. Defaults to JARVIS.
`picovoiceSilenceTime`	Optional	Silence period - defaults to 3 (3 seconds).
`picovoiceSilenceThreshold`	Optional	This is usually background noise * THIS NUMBER. Default value is 1.1 (aka 10%).
`audioDeviceIndex`	Optional	Audio device - i.e. 3 - those will be printed out when you're using debug mode. Defaults to 0.
`whisperUrl`	Required	URL (or IP?) to self-hosted instance of the Whisper.
`whisperMethod`	Optional	Whisper method: openai-whisper or faster-whisper. Defaults to: faster-whisper.
`whisperLanguage`	Optional	Defaults to: en.
`openAiKey`	Required	API Key of OpenAI.
`openAiSystemMsg`	Optional	System msg - how the AI should behave.
`mimic3Url`	Required	Mimic3 URL (server), with protocol, port, without /api/tts
`mimic3Voice`	Optional	Mimic3 Voice - default: en_US/cmu-arctic_low%23gka
`debug`	Optional	If you want to debug, default is: false.

What is Picovoice / Porcupine

Picovoice / Porcupine is used for the "Trigger" word. It's a self-hosted small AI / Neural Network (NN). Picovoice offers a range of services, including a license for this offline AI. It only sends usage statistics, not the actual audio conversations.

What is Whisper

Whisper is an open-source product from OpenAI. It's a Large Language Model (LLM) AI that handles speech-to-text (transcription). In my personal case, I have it self-hosted on my local network.

I used this: https://github.com/ahmetoner/whisper-asr-webservice

What is ChatGPT

ChatGPT is another product from OpenAI. It's a Large Language Model (LLM) AI. You will need to register and get an API Key to use it.

What is LangChain

LangChain is a library built around LLMs that allows for extra functionality, such as long-term memory.

What is Mimic3 (Mycroft)

Mycroft's Mimic3 is a Text-to-Speech (TTS) system based on a Large Language Model (LLM). It offers realistic TTS that can run on somewhat resource-restricted systems. I initially tried to set it up on my OrangePi, but instead, I installed it on the same machine with Whisper and use it via the network.

I used this docker-compose.yml 😉

version: '3.7'

services:
  mimic3:
    image: mycroftai/mimic3
    ports:
      - 59125:59125
    volumes:
      - .:/home/mimic3/.local/share/mycroft/mimic3
    stdin_open: true
    tty: true

Troubleshooting

If your audio doesn't work - check if you're using alsa or pulseaudio. You might need to install mpg123. You can install it using the command sudo apt-get install mpg123.
You might also need to install lame for audio encoding. You can install it using the command sudo apt-get install lame.

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
sounds		sounds
translations		translations
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.stylelintrc		.stylelintrc
CHANGELOG.md		CHANGELOG.md
Gruntfile.js		Gruntfile.js
LICENSE.txt		LICENSE.txt
MMM-WhisperGPT.css		MMM-WhisperGPT.css
MMM-WhisperGPT.js		MMM-WhisperGPT.js
README.md		README.md
docker-compose.yml		docker-compose.yml
node_helper.js		node_helper.js
package-lock.json		package-lock.json
package.json		package.json

License

Nikro/MMM-WhisperGPT

Folders and files

Latest commit

History

Repository files navigation

MMM-WhisperGPT

Using the module

Configuration options

What is Picovoice / Porcupine

What is Whisper

What is ChatGPT

What is LangChain

What is Mimic3 (Mycroft)

Troubleshooting

About

Topics

Resources

License

Stars

Watchers

Forks

Languages