Intelux

Intelux is an AI-powered wearable prototype that delivers real-time auditory descriptions of the user's environment to help visually impaired people perceive and navigate the world more independently.

Overview

Over 1 billion people worldwide live with blindness, yet current mobility aids (white canes, guide dogs) provide limited contextual information. Intelux aims to go beyond detection: the system narrates what is around the user in real time and answers direct, image-grounded questions spoken by the user.

Intelux currently runs as a camera mounted on a hat (prototype) with a Raspberry Pi performing local capture and inference. The system supports two complementary interaction modes: continuous environment narration and interactive, multimodal Q&A.

Features

Real-time object detection and spatial narration (left / right / ahead)
Spoken interactive queries grounded in a live image snapshot
Multimodal reasoning via Claude (Anthropic) for precise, contextual answers
Natural-sounding audio output via ElevenLabs TTS
Hands-free voice input and output (VOSK or other speech-to-text)

How It Works

Environment Mode

Continuously reads frames from the camera and runs YOLO to detect objects.
Generates short, frequent summaries that indicate object type and approximate position (left/right/ahead).

Interactive Mode

On user request, captures a snapshot and pairs it with the user's spoken question.
Sends the image and question to Claude (Anthropic API) with a system prompt to provide a concise, image-grounded answer.
Converts Claude's text response to audio using ElevenLabs and plays it back.

Both modes are voice-driven and play audio responses through the attached speaker, keeping the experience hands-free.

Architecture

Hardware: Raspberry Pi + camera module (prototype: hat-mounted camera), USB or on-board speaker/headphone
Video capture: OpenCV
Object detection: YOLO (Ultralytics / YOLOv8) with local model weights
Multimodal LLM: Claude (Anthropic) — image + prompt + user question
Text-to-Speech: ElevenLabs API
Language: Python (scripts orchestrate capture, detection, TTS, and LLM calls)

Prerequisites

Python 3.10+ (virtualenv recommended)
A modern Raspberry Pi (for field deployment), or a laptop for local testing
Camera accessible to OpenCV
API keys:
- Anthropic / Claude API key
- ElevenLabs API key and voice ID
Models and weights (place under models/):
- YOLO weights (e.g. models/yolo26n.pt)
- Speech models for offline STT if used (VOSK or custom)

Installation

Clone the repo:

git clone <your-repo-url>
cd "intelux copy"

Create a virtual environment and install dependencies:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Place model files in the models/ directory if not already present. Example provided in this repo: models/yolo26n.pt.
On Raspberry Pi: run setup.sh to install system dependencies and optimized builds where needed (this script may require sudo).

Configuration

Set the following environment variables before running the system (example for Bash/zsh):

export ANTHROPIC_API_KEY="your_anthropic_key"
export ELEVENLABS_API_KEY="your_elevenlabs_key"
export ELEVEN_VOICE_ID="voice-id"
export YOLO_WEIGHTS="models/yolo26n.pt"

Check the constants and configuration in the repository under src/constants.py and mac/constants.py for additional options you can adjust (frame rate, confidence thresholds, TTS settings, etc.).

Usage

High-level commands (examples):

Run the main application (project root):

./run.sh

Or run the Python entry script directly:

source .venv/bin/activate
python src/main.py

To test macOS-specific code (development-only):

python mac/mac_test.py

Runtime notes:

Speak the wake phrase or press the assigned button (if hardware supports it) to toggle modes.
In Environment Mode say: "describe my environment" to start continuous narration.
In Interactive Mode ask specific questions like: "What color is the sign?" or "How many chairs are in front of me?"

Development & Project Structure

Key files and folders:

src/main.py — primary entry point and orchestrator for modes
src/navigation.py — navigation helpers and path-planning stubs
src/runClaudeLLM.py — handles Claude API interactions
src/elevenlabs_tts.py — ElevenLabs TTS wrapper
src/SpeechModel.py — speech recognition interface (VOSK)
src/vosk_stt.py and mac/vosk_stt.py — VOSK integration
models/ — model weights, ONNX, and related files
mac/ — macOS-specific development/test code
run.sh and setup.sh — convenience/run scripts

If you plan to modify or expand the system, start by running the app locally and stepping through src/main.py. Unit tests and CI are not included by default.

Troubleshooting & Known Issues

TTS pipeline: historically produced silent failures; if audio does not play, verify ElevenLabs responses, audio device configuration, and file permissions.
Ultralytics installation: some macOS setups experience dependency conflicts — consider using a matching Python minor version (Python 3.10) and a clean virtual environment.
Raspberry Pi: expect extra debugging steps for camera drivers and native library compatibility.

Roadmap

Live turn-by-turn navigation integrated into Interactive Mode
Smart glasses hardware form factor (from hat-mounted camera)
OCR improvements for signage and reading text
Face recognition to identify familiar people (privacy-first approach)
Productize and release within 12 months

Acknowledgements

This project used Claude (Anthropic) for multimodal reasoning and ElevenLabs for text-to-speech. Special thanks to team members and mentors who supported design and hardware integration efforts.

Contributing

We welcome contributions. Please open issues or PRs describing changes. For major changes, create an issue first to discuss the approach.

License

No license file is included in this repository. Add a LICENSE file to indicate project licensing (for example, MIT or Apache-2.0).

If you'd like, I can also:

commit this README.md to a git branch and open a PR
add example environment variable templates (.env.example)
create a concise quick-start for Raspberry Pi

Tell me which next step you'd like.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intelux

Table of Contents

Overview

Features

How It Works

Architecture

Prerequisites

Installation

Configuration

Usage

Development & Project Structure

Troubleshooting & Known Issues

Roadmap

Acknowledgements

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
mac		mac
models		models
res		res
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run.sh		run.sh
setup.sh		setup.sh

Folders and files

Latest commit

History

Repository files navigation

Intelux

Table of Contents

Overview

Features

How It Works

Architecture

Prerequisites

Installation

Configuration

Usage

Development & Project Structure

Troubleshooting & Known Issues

Roadmap

Acknowledgements

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages