Skip to content

SitoSt/JotaClient

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JotaClient

A C++ voice assistant client designed for multi-platform deployment, from desktop to embedded systems (Raspberry Pi, ESP32, Arduino).

Overview

JotaClient is the client component of the Jota ecosystem - an AI-powered voice assistant platform. This implementation serves as a desktop prototype with a clear path for embedded deployment, demonstrating the core architecture for:

  • Real-time audio capture and streaming
  • WebSocket-based communication with transcription server
  • Automatic silence detection
  • Dual-mode operation (file recording + real-time streaming)

The system is designed to eventually run on embedded hardware (Raspberry Pi, ESP32, Arduino), enabling voice-controlled smart home and IoT applications.

Architecture

graph TB
    subgraph Client["JotaClient (C++)"]
        Mic[Microphone] --> AudioListener[AudioListener<br/>PortAudio]
        AudioListener --> FileMode[File Recording]
        AudioListener --> StreamMode[Real-time Streaming]
        FileMode --> WAV[WAV Files]
        StreamMode --> Callback[Audio Callback]
    end
    
    subgraph Server["Transcription Server (Future)"]
        WS[WebSocket Server] --> Whisper[Whisper.cpp<br/>Speech-to-Text]
        Whisper --> LLM[LLM Integration<br/>GPT/Local Model]
        LLM --> TTS[Text-to-Speech]
        TTS --> Response[Audio Response]
    end
    
    Callback -->|Audio Chunks| WS
    Response -->|Audio Stream| Client
    
    style Server fill:#f0f0f0,stroke:#999,stroke-dasharray: 5 5
Loading

Features

Current Implementation (Desktop Prototype)

  • Audio Capture: High-quality 16kHz mono audio recording using PortAudio
  • Dual Mode Operation:
    • File-only: Record to WAV file
    • Stream-only: Real-time audio streaming via callbacks
    • Both: Simultaneous recording and streaming
  • Silence Detection: Automatic recording stop after 1 second of silence
  • Real-time Visualization: Volume level meter during recording
  • Exception Handling: Custom exception hierarchy for robust error management
  • Path Management: Organized data storage with automatic directory creation

Planned Features (Arduino Adaptation)

  • 🔄 ESP32/Arduino Support: Port to embedded hardware
  • 🔄 WebSocket Client: Real-time audio streaming to server
  • 🔄 Server Integration: Whisper.cpp transcription service
  • 🔄 LLM Integration: GPT or local model for intelligent responses
  • 🔄 TTS Playback: Audio response playback on device
  • 🔄 Power Management: Optimized for battery-powered operation

Project Structure

JotaClient/
├── src/
│   ├── AudioListener.h/cpp       # Audio capture and streaming
│   ├── TranscriptionWrapper.h/cpp # WebSocket client (in progress)
│   ├── main.cpp                   # Entry point with visualization
│   ├── exceptions/
│   │   ├── JotaException.h/cpp    # Base exception class
│   │   ├── AudioExceptions.h      # Audio-specific exceptions
│   │   └── FileExceptions.h       # File I/O exceptions
│   ├── types/
│   │   ├── DataTypes.h            # Data type enumerations
│   │   └── ErrorCodes.h           # Error code definitions
│   └── utils/
│       ├── PathManager.h/cpp      # Path resolution utilities
│       └── DataManager.h/cpp      # Data organization
├── scripts/
│   └── download_models.sh         # Download Whisper and VAD models
├── CMakeLists.txt                 # Build configuration
└── README.md                      # This file

Dependencies

Required

  • CMake 3.16 or higher
  • C++17 compatible compiler (GCC, Clang, MSVC)
  • PortAudio: Cross-platform audio I/O library
  • libwebsockets: WebSocket client/server library (for future streaming)

macOS Installation

brew install cmake portaudio libwebsockets

Linux Installation

sudo apt-get install cmake libportaudio2 libportaudio-dev libwebsockets-dev

Windows Installation

Use vcpkg or build dependencies from source.

Building

Clone and Build

git clone <repository-url>
cd JotaClient
mkdir build && cd build
cmake ..
make

Build Options

# Debug build with symbols
cmake .. -DCMAKE_BUILD_TYPE=Debug

# Release build with optimizations
cmake .. -DCMAKE_BUILD_TYPE=Release

# Strict warnings (recommended for development)
cmake .. -DCMAKE_CXX_FLAGS="-Wall -Wextra -Werror"

Usage

Basic Recording

./JotaClient

The application will:

  1. Start recording audio from the default microphone
  2. Display real-time volume visualization
  3. Automatically stop after 1 second of silence
  4. Save the recording to data/audio_recordings/YYYY-MM-DD_HH-MM-SS.wav

Press q + Enter to stop recording manually.

Audio Modes

The AudioListener class supports three modes:

AudioListener listener;

// File only - record to WAV file
listener.start(AudioMode::FILE_ONLY);

// Stream only - real-time callback processing
listener.setCallback([](const std::vector<float>& chunk) {
    // Process audio chunk
});
listener.start(AudioMode::STREAM_ONLY);

// Both - record AND stream simultaneously
listener.start(AudioMode::BOTH);

Development

Code Style

This project follows the Google C++ Style Guide:

  • Class names: PascalCase
  • Function names: camelCase
  • Constants: UPPER_SNAKE_CASE
  • Private members: trailing underscore (member_)

Exception Handling

Custom exception hierarchy based on JotaException:

  • AudioDeviceException: Audio hardware errors
  • AudioStreamException: Audio stream errors
  • FileWriteException: File I/O errors
  • FileReadException: File reading errors

All exceptions include error codes and contextual information.

Adding New Features

  1. Create feature branch
  2. Follow existing code structure
  3. Add appropriate exception handling
  4. Update documentation
  5. Test thoroughly before merging

Future Roadmap

Phase 1: Audio Streaming & Transcription (Current Priority)

  • Implement audio streaming via WebSocket
  • Send audio chunks in real-time to server
  • Receive transcribed text from server
  • Display transcription results
  • Handle reconnection and error cases
  • Integrate Porcupine Wakeword Detection (Next Step)

Important

Preliminary Version: The current architecture is designed to be Wakeword-Only. The client starts in a LISTENING state and waits for a wakeword trigger (currently a placeholder) before connecting to the server and streaming audio. This ensures efficient bandwidth usage and privacy.

Note: Server will handle Whisper transcription. LLM orchestration and TTS will be implemented later as a separate orchestrator component.

Phase 2: Raspberry Pi Port

  • Port to Raspberry Pi platform
  • Optimize for ARM architecture
  • Power management optimization
  • Create installation script

Phase 3: ESP32 Adaptation

  • Port to ESP32 platform
  • Replace PortAudio with I2S microphone
  • Optimize memory usage for embedded systems
  • Implement WiFi configuration
  • Add OTA update support

Phase 4: Arduino Consideration

  • Evaluate feasibility for Arduino Nano/Uno
  • Extreme memory optimization if viable
  • Consider ESP32 as primary embedded target

Phase 5: Smart Home Integration

  • Home automation commands
  • Multi-device coordination
  • Cloud synchronization
  • Mobile app companion

Technical Details

Audio Specifications

  • Sample Rate: 16,000 Hz (optimal for speech recognition)
  • Channels: Mono
  • Format: 32-bit float (internal), 16-bit PCM (WAV output)
  • Buffer Size: 512 frames
  • Silence Threshold: 0.005 RMS
  • Silence Timeout: 1000ms

Performance

  • CPU Usage: ~5-10% on modern desktop (recording + visualization)
  • Memory: ~10MB typical usage
  • Latency: <50ms audio processing latency

Troubleshooting

Audio Device Not Found

Error: Failed to initialize PortAudio

Solution: Check that your microphone is connected and not in use by another application.

Build Errors

fatal error: portaudio.h: No such file or directory

Solution: Install PortAudio development headers (see Dependencies section).

Permission Denied (macOS)

Error: Failed to open audio stream

Solution: Grant microphone permissions in System Preferences → Security & Privacy → Privacy → Microphone.

Contributing

This is a personal portfolio project by Sito.

Author

Sito


Note: This is currently a desktop prototype. The multi-platform architecture is designed to be portable to Raspberry Pi, ESP32, and potentially Arduino with minimal modifications. See the development roadmap for platform-specific adaptation plans.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages