Intelux is an AI-powered wearable prototype that delivers real-time auditory descriptions of the user's environment to help visually impaired people perceive and navigate the world more independently.
- Overview
- Features
- How It Works
- Architecture
- Prerequisites
- Installation
- Configuration
- Usage
- Development & Project Structure
- Troubleshooting & Known Issues
- Roadmap
- Acknowledgements
Over 1 billion people worldwide live with blindness, yet current mobility aids (white canes, guide dogs) provide limited contextual information. Intelux aims to go beyond detection: the system narrates what is around the user in real time and answers direct, image-grounded questions spoken by the user.
Intelux currently runs as a camera mounted on a hat (prototype) with a Raspberry Pi performing local capture and inference. The system supports two complementary interaction modes: continuous environment narration and interactive, multimodal Q&A.
- Real-time object detection and spatial narration (left / right / ahead)
- Spoken interactive queries grounded in a live image snapshot
- Multimodal reasoning via Claude (Anthropic) for precise, contextual answers
- Natural-sounding audio output via ElevenLabs TTS
- Hands-free voice input and output (VOSK or other speech-to-text)
Environment Mode
- Continuously reads frames from the camera and runs YOLO to detect objects.
- Generates short, frequent summaries that indicate object type and approximate position (left/right/ahead).
Interactive Mode
- On user request, captures a snapshot and pairs it with the user's spoken question.
- Sends the image and question to Claude (Anthropic API) with a system prompt to provide a concise, image-grounded answer.
- Converts Claude's text response to audio using ElevenLabs and plays it back.
Both modes are voice-driven and play audio responses through the attached speaker, keeping the experience hands-free.
- Hardware: Raspberry Pi + camera module (prototype: hat-mounted camera), USB or on-board speaker/headphone
- Video capture: OpenCV
- Object detection: YOLO (Ultralytics / YOLOv8) with local model weights
- Multimodal LLM: Claude (Anthropic) — image + prompt + user question
- Text-to-Speech: ElevenLabs API
- Language: Python (scripts orchestrate capture, detection, TTS, and LLM calls)
- Python 3.10+ (virtualenv recommended)
- A modern Raspberry Pi (for field deployment), or a laptop for local testing
- Camera accessible to OpenCV
- API keys:
- Anthropic / Claude API key
- ElevenLabs API key and voice ID
- Models and weights (place under
models/):- YOLO weights (e.g.
models/yolo26n.pt) - Speech models for offline STT if used (VOSK or custom)
- YOLO weights (e.g.
- Clone the repo:
git clone <your-repo-url>
cd "intelux copy"- Create a virtual environment and install dependencies:
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt-
Place model files in the
models/directory if not already present. Example provided in this repo:models/yolo26n.pt. -
On Raspberry Pi: run
setup.shto install system dependencies and optimized builds where needed (this script may requiresudo).
Set the following environment variables before running the system (example for Bash/zsh):
export ANTHROPIC_API_KEY="your_anthropic_key"
export ELEVENLABS_API_KEY="your_elevenlabs_key"
export ELEVEN_VOICE_ID="voice-id"
export YOLO_WEIGHTS="models/yolo26n.pt"Check the constants and configuration in the repository under src/constants.py and mac/constants.py for additional options you can adjust (frame rate, confidence thresholds, TTS settings, etc.).
High-level commands (examples):
- Run the main application (project root):
./run.sh- Or run the Python entry script directly:
source .venv/bin/activate
python src/main.py- To test macOS-specific code (development-only):
python mac/mac_test.pyRuntime notes:
- Speak the wake phrase or press the assigned button (if hardware supports it) to toggle modes.
- In Environment Mode say: "describe my environment" to start continuous narration.
- In Interactive Mode ask specific questions like: "What color is the sign?" or "How many chairs are in front of me?"
Key files and folders:
- src/main.py — primary entry point and orchestrator for modes
- src/navigation.py — navigation helpers and path-planning stubs
- src/runClaudeLLM.py — handles Claude API interactions
- src/elevenlabs_tts.py — ElevenLabs TTS wrapper
- src/SpeechModel.py — speech recognition interface (VOSK)
- src/vosk_stt.py and mac/vosk_stt.py — VOSK integration
- models/ — model weights, ONNX, and related files
- mac/ — macOS-specific development/test code
- run.sh and setup.sh — convenience/run scripts
If you plan to modify or expand the system, start by running the app locally and stepping through src/main.py. Unit tests and CI are not included by default.
- TTS pipeline: historically produced silent failures; if audio does not play, verify ElevenLabs responses, audio device configuration, and file permissions.
- Ultralytics installation: some macOS setups experience dependency conflicts — consider using a matching Python minor version (Python 3.10) and a clean virtual environment.
- Raspberry Pi: expect extra debugging steps for camera drivers and native library compatibility.
- Live turn-by-turn navigation integrated into Interactive Mode
- Smart glasses hardware form factor (from hat-mounted camera)
- OCR improvements for signage and reading text
- Face recognition to identify familiar people (privacy-first approach)
- Productize and release within 12 months
This project used Claude (Anthropic) for multimodal reasoning and ElevenLabs for text-to-speech. Special thanks to team members and mentors who supported design and hardware integration efforts.
We welcome contributions. Please open issues or PRs describing changes. For major changes, create an issue first to discuss the approach.
No license file is included in this repository. Add a LICENSE file to indicate project licensing (for example, MIT or Apache-2.0).
If you'd like, I can also:
- commit this
README.mdto a git branch and open a PR - add example environment variable templates (
.env.example) - create a concise quick-start for Raspberry Pi
Tell me which next step you'd like.