Skip to content

Rktim/Boddi

Repository files navigation

🤖 Boddi

Boddi is a local-first, open-source, agentic AI companion built in Python. It runs entirely on your machine, uses open-source models, supports real-time voice interaction, expressive visuals, and interruption-aware conversations — designed as a serious foundation, not a demo.

Boddi is inspired by friendly companions like BMO, but engineered with clean architecture, privacy-first principles, and extensibility in mind.


✨ What is Boddi?

Boddi is:

  • 🧠 Agentic — it reasons, routes intents, and uses tools
  • 🎧 Voice-enabled — speaks and listens locally (offline)
  • Interruption-aware — you can cut it off mid-sentence
  • 🎭 Expressive — visual states like thinking, smiling, talking
  • 🔒 Privacy-first — no cloud, no telemetry, no data leaves your machine
  • 🧩 Extensible — designed to grow without rewrites

Boddi is not:

  • a cloud chatbot wrapper
  • a fake UI demo
  • a prompt-only toy

🧠 How Boddi Works (High-Level)

At runtime, Boddi behaves like a real conversational system:

  1. Listens through the microphone

  2. Detects speech activity (VAD)

  3. Transcribes speech locally (STT)

  4. Detects wake word

  5. Routes intent through an agentic loop

  6. Uses tools (LLM, web search, tasks)

  7. Responds with:

    • instant micro voice clips ("On it!", "Sorry", etc.)
    • real-time spoken responses (TTS)
  8. Updates visual expressions based on state

All of this happens locally.


🔁 Core Interaction Flow

[ Microphone ]
      ↓
[ Voice Activity Detection (Silero) ]
      ↓
[ Speech-to-Text (Whisper) ]
      ↓
[ Wake Word Detection ]
      ↓
[ Agentic Loop ]
   ├─► Intent Detection
   ├─► Tool Routing
   │     ├─► Ollama (LLM)
   │     ├─► Web Search (DDGS)
   │     └─► User Tasks
   ↓
[ Response Planner ]
   ├─► Micro Voice Clip (instant)
   └─► Piper TTS (real-time speech)
      ↓
[ Visual State Update ]

🎧 Voice System Design

Boddi intentionally uses two voice paths.

1️⃣ Micro Voice Clips (Instant Reactions)

Used for:

  • greetings
  • acknowledgements ("On it!", "Okay")
  • apologies
  • appreciation
  • sign-off

These are:

  • pre-generated WAV clips
  • generated once using Piper TTS
  • played instantly at runtime
  • zero latency

This makes Boddi feel responsive and alive.

2️⃣ Real-Time Speech (Dynamic Responses)

Used for:

  • answering questions
  • explanations
  • summaries
  • long-form responses

Powered by:

  • Piper TTS
  • Voice: en_GB / semaine / medium
  • Fully offline
  • Interruptible

🎭 Visual Expression System

Boddi includes a lightweight visual layer implemented with Python UI primitives.

Visuals are state-driven, not decorative.

Supported States

Agent State Visual Expression
idle blinking / neutral
listening attentive
thinking confused / thinking
speaking talking
success smiling
error confused
sleep eyes closed

Background Themes

Users can configure soft background colors:

  • yellow
  • green
  • blue
  • red
  • orange
  • black (soft dark)

🧠 Agentic Core

Boddi is built around a custom event-driven agent loop.

Key responsibilities:

  • turn awareness (who is speaking)
  • interruption handling
  • state transitions
  • intent routing
  • tool execution

No heavy frameworks are required in v0.1 — the core logic is transparent and hackable.


🌐 Tooling & Capabilities

🧠 Language Model

  • Ollama
  • User-selected open-source LLMs
  • Fully local inference

🔍 Web Search

  • DDGS (DuckDuckGo Search)
  • No API keys
  • Privacy-friendly
  • Used only when needed

🛠️ Tasks

  • User-defined triggers
  • Configurable actions
  • Designed for automation and repetition

🧰 Tech Stack

Layer Technology
Language Python 3.9+
LLM Runtime Ollama
STT Whisper (offline)
VAD Silero VAD
TTS Piper (semaine voice)
Audio Playback simpleaudio
UI Tkinter (v0.1)
Web Search DDGS
Config YAML

📁 Project Structure

boddi/
├── core/        # agent loop, state, intent logic
├── audio/       # mic, VAD, STT, TTS, clips
├── llm/         # Ollama integration
├── tools/       # web search, tasks
├── visual/      # expressions & UI
├── wake/        # wake word logic
├── assets/      # audio clips & defaults
├── scripts/     # dev utilities
└── cli.py       # entry point

🚀 Why Boddi Exists

Most assistants today are:

  • cloud-dependent
  • opaque
  • difficult to extend
  • privacy-invasive

Boddi exists to be:

  • local
  • transparent
  • hackable
  • respectful of users

It is meant to grow with the community, not be rewritten every version.


🤝 Contributing

Boddi is intentionally early and open.

Contributions are welcome in:

  • voice improvements
  • visual expressions
  • task templates
  • documentation
  • performance & stability

This project is built to be understood, not just used.


📌 Status

  • Version: v0.1
  • Scope: foundational runtime
  • Focus: correctness, architecture, extensibility

🧠 Final Note

Boddi is not trying to compete with cloud assistants. It is building something different:

A local, expressive, agentic AI companion — engineered properly from day one.

About

It is a fully local, agentic AI assistant with voice, visual expressions, and tool-use. Built with open-source models, real-time speech, and a clean event-driven architecture — no cloud, no lock-in.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages