JotaClient

A C++ voice assistant client designed for multi-platform deployment, from desktop to embedded systems (Raspberry Pi, ESP32, Arduino).

Overview

JotaClient is the client component of the Jota ecosystem - an AI-powered voice assistant platform. This implementation serves as a desktop prototype with a clear path for embedded deployment, demonstrating the core architecture for:

Real-time audio capture and streaming
WebSocket-based communication with transcription server
Automatic silence detection
Dual-mode operation (file recording + real-time streaming)

The system is designed to eventually run on embedded hardware (Raspberry Pi, ESP32, Arduino), enabling voice-controlled smart home and IoT applications.

Architecture

graph TB
    subgraph Client["JotaClient (C++)"]
        Mic[Microphone] --> AudioListener[AudioListener<br/>PortAudio]
        AudioListener --> FileMode[File Recording]
        AudioListener --> StreamMode[Real-time Streaming]
        FileMode --> WAV[WAV Files]
        StreamMode --> Callback[Audio Callback]
    end
    
    subgraph Server["Transcription Server (Future)"]
        WS[WebSocket Server] --> Whisper[Whisper.cpp<br/>Speech-to-Text]
        Whisper --> LLM[LLM Integration<br/>GPT/Local Model]
        LLM --> TTS[Text-to-Speech]
        TTS --> Response[Audio Response]
    end
    
    Callback -->|Audio Chunks| WS
    Response -->|Audio Stream| Client
    
    style Server fill:#f0f0f0,stroke:#999,stroke-dasharray: 5 5

Features

Current Implementation (Desktop Prototype)

✅ Audio Capture: High-quality 16kHz mono audio recording using PortAudio
✅ Dual Mode Operation:
- File-only: Record to WAV file
- Stream-only: Real-time audio streaming via callbacks
- Both: Simultaneous recording and streaming
✅ Silence Detection: Automatic recording stop after 1 second of silence
✅ Real-time Visualization: Volume level meter during recording
✅ Exception Handling: Custom exception hierarchy for robust error management
✅ Path Management: Organized data storage with automatic directory creation

Planned Features (Arduino Adaptation)

🔄 ESP32/Arduino Support: Port to embedded hardware
🔄 WebSocket Client: Real-time audio streaming to server
🔄 Server Integration: Whisper.cpp transcription service
🔄 LLM Integration: GPT or local model for intelligent responses
🔄 TTS Playback: Audio response playback on device
🔄 Power Management: Optimized for battery-powered operation

Project Structure

JotaClient/
├── src/
│   ├── AudioListener.h/cpp       # Audio capture and streaming
│   ├── TranscriptionWrapper.h/cpp # WebSocket client (in progress)
│   ├── main.cpp                   # Entry point with visualization
│   ├── exceptions/
│   │   ├── JotaException.h/cpp    # Base exception class
│   │   ├── AudioExceptions.h      # Audio-specific exceptions
│   │   └── FileExceptions.h       # File I/O exceptions
│   ├── types/
│   │   ├── DataTypes.h            # Data type enumerations
│   │   └── ErrorCodes.h           # Error code definitions
│   └── utils/
│       ├── PathManager.h/cpp      # Path resolution utilities
│       └── DataManager.h/cpp      # Data organization
├── scripts/
│   └── download_models.sh         # Download Whisper and VAD models
├── CMakeLists.txt                 # Build configuration
└── README.md                      # This file

Dependencies

Required

CMake 3.16 or higher
C++17 compatible compiler (GCC, Clang, MSVC)
PortAudio: Cross-platform audio I/O library
libwebsockets: WebSocket client/server library (for future streaming)

macOS Installation

brew install cmake portaudio libwebsockets

Linux Installation

sudo apt-get install cmake libportaudio2 libportaudio-dev libwebsockets-dev

Windows Installation

Use vcpkg or build dependencies from source.

Building

Clone and Build

git clone <repository-url>
cd JotaClient
mkdir build && cd build
cmake ..
make

Build Options

# Debug build with symbols
cmake .. -DCMAKE_BUILD_TYPE=Debug

# Release build with optimizations
cmake .. -DCMAKE_BUILD_TYPE=Release

# Strict warnings (recommended for development)
cmake .. -DCMAKE_CXX_FLAGS="-Wall -Wextra -Werror"

Usage

Basic Recording

./JotaClient

The application will:

Start recording audio from the default microphone
Display real-time volume visualization
Automatically stop after 1 second of silence
Save the recording to data/audio_recordings/YYYY-MM-DD_HH-MM-SS.wav

Press q + Enter to stop recording manually.

Audio Modes

The AudioListener class supports three modes:

AudioListener listener;

// File only - record to WAV file
listener.start(AudioMode::FILE_ONLY);

// Stream only - real-time callback processing
listener.setCallback([](const std::vector<float>& chunk) {
    // Process audio chunk
});
listener.start(AudioMode::STREAM_ONLY);

// Both - record AND stream simultaneously
listener.start(AudioMode::BOTH);

Development

Code Style

This project follows the Google C++ Style Guide:

Class names: PascalCase
Function names: camelCase
Constants: UPPER_SNAKE_CASE
Private members: trailing underscore (member_)

Exception Handling

Custom exception hierarchy based on JotaException:

AudioDeviceException: Audio hardware errors
AudioStreamException: Audio stream errors
FileWriteException: File I/O errors
FileReadException: File reading errors

All exceptions include error codes and contextual information.

Adding New Features

Create feature branch
Follow existing code structure
Add appropriate exception handling
Update documentation
Test thoroughly before merging

Future Roadmap

Phase 1: Audio Streaming & Transcription (Current Priority)

Implement audio streaming via WebSocket
Send audio chunks in real-time to server
Receive transcribed text from server
Display transcription results
Handle reconnection and error cases
Integrate Porcupine Wakeword Detection (Next Step)

Important

Preliminary Version: The current architecture is designed to be Wakeword-Only. The client starts in a LISTENING state and waits for a wakeword trigger (currently a placeholder) before connecting to the server and streaming audio. This ensures efficient bandwidth usage and privacy.

Note: Server will handle Whisper transcription. LLM orchestration and TTS will be implemented later as a separate orchestrator component.

Phase 2: Raspberry Pi Port

Port to Raspberry Pi platform
Optimize for ARM architecture
Power management optimization
Create installation script

Phase 3: ESP32 Adaptation

Port to ESP32 platform
Replace PortAudio with I2S microphone
Optimize memory usage for embedded systems
Implement WiFi configuration
Add OTA update support

Phase 4: Arduino Consideration

Evaluate feasibility for Arduino Nano/Uno
Extreme memory optimization if viable
Consider ESP32 as primary embedded target

Phase 5: Smart Home Integration

Home automation commands
Multi-device coordination
Cloud synchronization
Mobile app companion

Technical Details

Audio Specifications

Sample Rate: 16,000 Hz (optimal for speech recognition)
Channels: Mono
Format: 32-bit float (internal), 16-bit PCM (WAV output)
Buffer Size: 512 frames
Silence Threshold: 0.005 RMS
Silence Timeout: 1000ms

Performance

CPU Usage: ~5-10% on modern desktop (recording + visualization)
Memory: ~10MB typical usage
Latency: <50ms audio processing latency

Troubleshooting

Audio Device Not Found

Error: Failed to initialize PortAudio

Solution: Check that your microphone is connected and not in use by another application.

Build Errors

fatal error: portaudio.h: No such file or directory

Solution: Install PortAudio development headers (see Dependencies section).

Permission Denied (macOS)

Error: Failed to open audio stream

Solution: Grant microphone permissions in System Preferences → Security & Privacy → Privacy → Microphone.

Contributing

This is a personal portfolio project by Sito.

Author

Sito

Note: This is currently a desktop prototype. The multi-platform architecture is designed to be portable to Raspberry Pi, ESP32, and potentially Arduino with minimal modifications. See the development roadmap for platform-specific adaptation plans.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
scripts		scripts
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

JotaClient

Overview

Architecture

Features

Current Implementation (Desktop Prototype)

Planned Features (Arduino Adaptation)

Project Structure

Dependencies

Required

macOS Installation

Linux Installation

Windows Installation

Building

Clone and Build

Build Options

Usage

Basic Recording

Audio Modes

Development

Code Style

Exception Handling

Adding New Features

Future Roadmap

Phase 1: Audio Streaming & Transcription (Current Priority)

Phase 2: Raspberry Pi Port

Phase 3: ESP32 Adaptation

Phase 4: Arduino Consideration

Phase 5: Smart Home Integration

Technical Details

Audio Specifications

Performance

Troubleshooting

Audio Device Not Found

Build Errors

Permission Denied (macOS)

Contributing

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages