Skip to content

ARDings/OpenLexaPi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

OpenLexaPi

πŸ“– Full guide & tutorial: xrchris.com/projects/openlexa

A real-time AI voice assistant running on a Raspberry Pi Zero 1.1, powered by the OpenAI Realtime API. Say a wake word to activate it, speak naturally, and it responds through a USB speaker. An animated robot face renders on an HDMI display.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  β—‰ β—‰  (eyes)                        β”‚
│─────────────────────────────────────│
β”‚   Hello! How can I help you today?  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Features

  • Wake-word activation β€” says "Computer" to wake, sleeps automatically after inactivity
  • Fully offline wake-word detection β€” Porcupine runs locally, ~1% CPU on Pi Zero
  • Real-time conversation via OpenAI gpt-4o-realtime-preview
  • Multilingual β€” responds in English or Korean depending on the speaker
  • Retro robot face on HDMI display β€” eyes close when sleeping, open when active
  • Echo prevention β€” mic is muted while the AI speaks, echo buffer is flushed after
  • Auto-reconnect β€” transparently reconnects if the WebSocket drops
  • Graceful degradation β€” runs headless (no display) without any code changes

Hardware

Component Details ~Price
Computer Raspberry Pi Zero 2 W ~15 €
Audio Option A Soundcore Mini β€” Bluetooth Speaker + Mic ~20 €
Audio Option B USB Speaker Bar (plug-and-play) ~15 €
Display (optional) 3.5" HDMI Display ~20 €
MicroSD + Power MicroSD 16 GB+ & USB PSU ~13 €
Audio server PipeWire β€”
OS Raspberry Pi OS (Bookworm) β€”

Affiliate links β€” buying through these supports the project at no extra cost to you.


Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  SLEEPING                                                        β”‚
β”‚    Porcupine (offline) ◄── pacat --record 16kHz ◄── USB Mic    β”‚
β”‚         β”‚ "Computer" detected                                    β”‚
β”‚         β–Ό                                                        β”‚
β”‚  ACTIVE                                                          β”‚
β”‚    USB Mic ──► pacat --record 24kHz ──► AudioRecorder.queue     β”‚
β”‚                                              β”‚                   β”‚
β”‚                                         send_audio()             β”‚
β”‚                                              β”‚                   β”‚
β”‚                                       OpenAI Realtime API        β”‚
β”‚                                              β”‚                   β”‚
β”‚                                      receive_events()            β”‚
β”‚                                              β”‚                   β”‚
β”‚                              AudioPlayer β—„β”€β”€β”€β”˜                   β”‚
β”‚                                  β”‚                               β”‚
β”‚                    pacat --playback 24kHz ──► USB Speaker        β”‚
β”‚                                                                  β”‚
β”‚    [15s inactivity] ──► back to SLEEPING                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

launcher.py β€” boot-time launcher: waits for USB mic, shows countdown, then exec's into main.py main.py β€” wake-word loop, WebSocket session, audio I/O, event handling display.py β€” Pygame rendering loop (daemon thread)


Design Decisions

Wake-word: why Porcupine?

Running continuously connected to OpenAI is expensive and wasteful. A local wake-word detector lets the device sleep (no WebSocket, no API cost) until the user actually wants to speak.

Porcupine (by Picovoice) was chosen because:

  • It ships a pre-compiled ARM binary that runs on ARMv6 (Pi Zero 1.1)
  • CPU usage is ~1% β€” leaves the Pi's single core free for audio and WebSocket I/O
  • It works fully offline β€” no network call for wake-word detection
  • Free tier includes built-in keywords: computer, jarvis, porcupine, bumblebee, and more
  • Custom keywords (.ppn files) can be trained for free at console.picovoice.ai

Why OpenAI Realtime API?

The OpenAI Realtime API provides speech-to-text, language model inference, and text-to-speech in a single persistent WebSocket connection. This eliminates the need to chain three separate services (Whisper β†’ GPT β†’ TTS) and dramatically reduces latency. It also handles voice activity detection (VAD) server-side, so no local VAD library is needed.

Why PipeWire instead of ALSA or PulseAudio?

Raspberry Pi OS Bookworm ships PipeWire as the default audio server. It handles resampling (USB devices run at 48 kHz natively; our pipeline uses 16/24 kHz) transparently. pacat (PulseAudio-compatible client) works directly against PipeWire via its PulseAudio compatibility layer.

Important: PipeWire's default source may be set to a .monitor (speaker loopback) rather than the real microphone input. The code explicitly queries pactl list sources short to find the first alsa_input.* device and passes it to pacat via --device=, bypassing this issue.

Why pacat instead of a Python audio library?

Python audio libraries (PyAudio, sounddevice) require compiled native extensions and often have dependency conflicts on Raspberry Pi OS. pacat is a standard system tool, always available where PipeWire/PulseAudio is installed. It communicates via subprocess stdin/stdout, which is reliable, portable, and adds no Python dependencies.

USB audio warm-up silence

USB audio devices suspend themselves when idle to save power. The first ~200 ms of audio after a period of silence gets "eaten" by the device waking up. All sound effects (startup chime, wake acknowledgement) prepend 300 ms of silence before the actual audio, ensuring the device is active before the tone begins.

Inactivity timeout

After the AI finishes speaking (response.done), a 15-second inactivity timer starts. If the user doesn't speak within that window, the WebSocket is closed and the device returns to the sleeping (wake-word) state. The timer is cancelled while the AI is speaking (so long responses don't trigger a premature sleep) and reset whenever the user starts talking.

Echo prevention strategy

The USB Speaker Bar's microphone is physically close to its speaker, making acoustic echo a problem. When the AI starts speaking, the microphone is muted in software (recorder.muted = True). After the AI finishes:

  1. A 2.5-second silence allows the room echo to decay.
  2. The microphone queue is flushed to discard any residual echo already captured.
  3. The microphone is unmuted.

Why 200Γ—120 internal canvas for the display?

This project runs on a Raspberry Pi Zero 1.1 β€” single-core ARMv6 @ 1 GHz, 512 MB RAM, no GPU. Rendering at full 800Γ—480 every frame would saturate the CPU and starve the audio pipeline. By rendering animated elements on a 200Γ—120 surface and scaling 4Γ— with pygame.transform.scale, pixels touched per frame are reduced to 6.25% of full resolution. Combined with dirty-flag rendering, the display thread consumes a negligible fraction of CPU.

The 4Γ— upscale creates a visible pixel grid that gives the robot face a retro LED-matrix aesthetic.

Display states

State Eyes When
sleeping Closed (horizontal lines) Waiting for wake word
idle Open, pupils wandering Session active, waiting for user
listening Open User is speaking
speaking Open AI is speaking

Setup

1. System packages

sudo apt update
sudo apt install -y pipewire pipewire-pulse fonts-nanum python3-pygame

2. Python dependencies

pip3 install --break-system-packages -r requirements.txt

3. Picovoice Access Key (free)

  1. Create a free account at console.picovoice.ai
  2. Copy your Access Key from the dashboard
  3. Paste it into main.py β†’ PORCUPINE_ACCESS_KEY

4. Configuration

Edit main.py and set:

OPENAI_API_KEY       = "sk-..."       # Your OpenAI API key
PORCUPINE_ACCESS_KEY = "..."          # Your Picovoice access key (free)
WAKE_WORD            = "computer"     # Built-in keyword, or "custom"
WAKE_WORD_MODEL_PATH = ""             # Path to .ppn file if WAKE_WORD = "custom"
INACTIVITY_TIMEOUT   = 15            # Seconds of silence before going back to sleep
VOICE                = "verse"        # AI voice (alloy, echo, nova, shimmer, verse, ...)
INSTRUCTIONS         = "..."          # System prompt / personality

Built-in free keywords (no .ppn file needed): computer, jarvis, porcupine, bumblebee, alexa, grasshopper, blueberry, grapefruit, terminator, hey barista, americano, picovoice

Custom keyword (e.g. "Hey Peter"): Go to console.picovoice.ai β†’ Wake Word β†’ create your keyword β†’ download the .ppn file for Raspberry Pi β†’ set WAKE_WORD = "custom" and WAKE_WORD_MODEL_PATH = "/path/to/file.ppn".

5. Autostart (systemd)

sudo nano /etc/systemd/system/openlexa.service
[Unit]
Description=OpenLexa AI Voice Assistant
After=network-online.target bluetooth.target sound.target
Wants=network-online.target

[Service]
User=pi
WorkingDirectory=/home/pi/ElevenLexa
ExecStart=/usr/bin/python3 /home/pi/ElevenLexa/launcher.py
Restart=on-failure
RestartSec=5
Environment=XDG_RUNTIME_DIR=/run/user/1000
Environment=DISPLAY=:0

[Install]
WantedBy=multi-user.target
sudo systemctl enable openlexa.service
sudo systemctl start openlexa.service

Why launcher.py instead of main.py directly? PipeWire is a user-level service and may not be fully initialised when the system service starts. launcher.py polls pactl list sources until the USB microphone (alsa_input.*) appears, then shows a countdown on the display before handing off to main.py via os.execv. This eliminates the race condition where Porcupine starts with a non-functional audio source and only reacts to very loud sounds.

6. Run manually

python3 main.py

File Structure

OpenLexaPi/
β”œβ”€β”€ launcher.py       # Boot launcher β€” waits for USB mic, countdown, then starts main.py
β”œβ”€β”€ main.py           # Main application β€” wake-word loop, audio pipeline, OpenAI session
β”œβ”€β”€ display.py        # Pygame HDMI display (optional, auto-detected)
β”œβ”€β”€ requirements.txt  # Python dependencies
β”œβ”€β”€ README.md         # This file
└── archive/          # Old debug scripts (not needed for running)

Customisation

What Where How
Wake word main.py β†’ WAKE_WORD Built-in keyword name or "custom"
Custom wake word main.py β†’ WAKE_WORD_MODEL_PATH Path to .ppn file from Picovoice Console
Inactivity timeout main.py β†’ INACTIVITY_TIMEOUT Seconds before returning to sleep
AI personality main.py β†’ INSTRUCTIONS Edit the system prompt string
Voice main.py β†’ VOICE Any OpenAI Realtime voice name
Languages main.py β†’ INSTRUCTIONS Change language instructions
VAD sensitivity main.py β†’ turn_detection.threshold 0.0–1.0, lower = more sensitive
Eye colours display.py β†’ colour constants RGB tuples at the top of the file
Display layout display.py β†’ EYE_AREA_H, TEXT_AREA_Y Adjust the eye/text split point

License

MIT β€” free to use, modify, and distribute.

About

RaspAI

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages