Skip to content

A privacy-first, local voice-enabled AI assistant SDK. Connect to Claude, Gemini, or ChatGPT using entirely local, open-source components for speech processing.

License

Notifications You must be signed in to change notification settings

HartBrook/dulcet

Repository files navigation

Dulcet

A privacy-first, local voice-enabled AI assistant SDK. Connect to Claude, Gemini, or ChatGPT using entirely local, open-source components for speech processing.

Features

  • Privacy-first: Audio never leaves your device for STT/TTS processing
  • Low latency: Sub-second response initiation for natural conversation flow
  • Model-agnostic: Voice layer decoupled from LLM backend selection
  • Simple deployment: Minimal dependencies, runs on consumer hardware

Packages

Package Description
dulcet Python SDK with FastAPI WebSocket server
@dulcet/client TypeScript browser client

Quick Start

Using Docker (recommended)

Docker provides a complete development environment with all dependencies pre-installed:

# Start the server (includes FFmpeg, models, everything)
docker compose up server

# Run Python tests
docker compose run --rm python-test

# Run client tests
docker compose run --rm client-test

# Open a Python dev shell
docker compose run --rm python-dev

# Open a client dev shell (watch mode)
docker compose run --rm client-dev

Configure API keys:

cp .env.example .env
# Edit .env with your API keys
docker compose up server

Manual Setup

Prerequisites

Install FFmpeg (required for speech processing):

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt-get install ffmpeg libavcodec-dev libavformat-dev libavdevice-dev

Python Server

Using uv (recommended):

uv pip install "dulcet[speech]"

Or with pip:

pip install "dulcet[speech]"

Download the speech models (~500MB):

dulcet download

Start the server:

from dulcet import VoicePipeline, run_server

pipeline = VoicePipeline()
run_server(pipeline)

Or via CLI:

dulcet serve --provider claude

Browser Client

npm install @dulcet/client
import { DulcetClient } from "@dulcet/client";

const client = new DulcetClient({ url: "ws://localhost:8000/ws" });

client.on("transcript", ({ text }) => console.log("You:", text));
client.on("response", ({ text }) => console.log("Assistant:", text));

await client.connect();
await client.startListening();

Client Configuration

const client = new DulcetClient({
  url: "ws://localhost:8000/ws",

  // LLM settings (optional, can also be set server-side)
  provider: "claude",           // "claude" | "gemini" | "openai"
  model: "claude-sonnet-4-20250514",
  systemPrompt: "You are a helpful assistant.",

  // TTS voice
  voice: "en_US-lessac-medium",

  // Audio settings
  sampleRate: 16000,            // Default: 16000

  // Reconnection settings
  autoReconnect: true,          // Default: true
  maxReconnectAttempts: 5,      // Default: 5
  reconnectDelay: 1000,         // Default: 1000ms
  maxReconnectDelay: 30000,     // Default: 30000ms

  // Debug mode
  debug: false,                 // Default: false
});

Client Methods

await client.connect();         // Connect to server
client.disconnect();            // Disconnect from server

await client.startListening();  // Start microphone capture
client.stopListening();         // Stop microphone capture

client.sendText("Hello");       // Send text directly (bypass STT)
client.interrupt();             // Stop current TTS playback

client.configure({              // Update settings at runtime
  provider: "openai",
  model: "gpt-4o",
  voice: "en_GB-alba-medium",
  systemPrompt: "New prompt",
});

// Properties
client.isConnected;             // boolean
client.isListening;             // boolean
client.status;                  // "disconnected" | "connecting" | "connected" | "reconnecting"

Client Events

client.on("connected", () => {});
client.on("disconnected", (reason) => {});
client.on("reconnecting", (attempt) => {});

client.on("transcript", ({ text, isFinal }) => {});  // User speech transcription
client.on("response", ({ text, isFinal }) => {});    // LLM response text

client.on("status", ({ state }) => {});              // "listening" | "processing" | "speaking"
client.on("audioStart", () => {});                   // TTS playback started
client.on("audioEnd", () => {});                     // TTS playback ended

client.on("error", ({ message, code }) => {});

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Frontend (Browser)                       │
│                                                                 │
│   ┌─────────┐      ┌─────────┐      ┌─────────────────────┐     │
│   │   Mic   │ ───▶ │   VAD   │ ───▶ │  WebSocket Client   │     │
│   └─────────┘      └─────────┘      └──────────┬──────────┘     │
│                                                │                │
│   ┌─────────┐                                  │                │
│   │ Speaker │ ◀────────────────────────────────┘                │
│   └─────────┘                                                   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
                                 │
                                 │ WebSocket
                                 ▼
┌─────────────────────────────────────────────────────────────────┐
│                     Backend (Python/FastAPI)                    │
│                                                                 │
│   ┌───────────────┐    ┌───────────────┐    ┌──────────────┐    │
│   │ faster-whisper│───▶│  LLM Router   │───▶│    Piper     │    │
│   │     (STT)     │    │               │    │    (TTS)     │    │
│   └───────────────┘    └───────┬───────┘    └──────────────┘    │
│                                │                                │
│                    ┌───────────┼───────────┐                    │
│                    ▼           ▼           ▼                    │
│                 Claude     Gemini      ChatGPT                  │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

CLI Commands

# Download speech models (required before first run)
dulcet download
dulcet download --stt-model large-v3    # Use larger Whisper model
dulcet download --tts-voice en_GB-alba-medium  # Different voice

# Start the server
dulcet serve
dulcet serve --provider openai --port 3000
dulcet serve --reload  # Auto-reload for development

# Validate API keys
dulcet validate

Components

Requirements

Minimum (CPU-only)

  • 4-core CPU (Intel i5 / AMD Ryzen 5 or better)
  • 8 GB RAM
  • 2 GB disk space for models

Recommended (GPU)

  • NVIDIA GPU with 6+ GB VRAM (RTX 3060 or better)
  • 16 GB RAM
  • 4 GB disk space for models

License

MIT

About

A privacy-first, local voice-enabled AI assistant SDK. Connect to Claude, Gemini, or ChatGPT using entirely local, open-source components for speech processing.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published