A Python monolith for speaker diarization, audio transcription, and LLM-powered conversation analysis. Provides two CLI applications: pawn-diarize for audio processing and embedding management, and pawn-agent for conversational analysis of recorded sessions.
- Speaker Diarization: Identify and separate multiple speakers using pyannote.audio
- Audio Transcription: Convert speech to text via NeMo Parakeet or faster-whisper backends
- Combined Transcription & Diarization: Transcribe and label speakers in one command
- Multi-file Processing: Process multiple files as one conversation with cross-file speaker alignment
- Session Accumulation: Process long conversations in parts across multiple invocations
- Speaker Embeddings: Extract and store 512-dim speaker vectors in PostgreSQL with pgvector
- Conversation Analysis: Generate titles, summaries, key topics, sentiment, and per-speaker highlights
- Knowledge Graph Extraction: Extract semantic triples (subject, relation, object) from transcripts
- RAG Index: Embed transcripts and SiYuan notes for semantic similarity search
- S3 Storage Management: List, filter, and delete objects in S3-compatible storage
- SiYuan Notes Integration: Push analysis documents back to a SiYuan Notes instance
- Background Queue Worker: S3-backed job queue with lease-based concurrency
- GPU Support: Accelerated processing on CUDA-enabled devices
git clone <repository-url>
cd parakeet
pip install -e ".[dev]"- Python 3.10+
- PostgreSQL with the
pgvectorextension - CUDA 11.0+ (optional, for GPU acceleration)
- 8 GB+ RAM recommended for ML models
# Apply all migrations
alembic upgrade headAll settings live in pawnai.yaml (auto-discovered in the current directory, or passed via --config).
Precedence: CLI flags → pawnai.yaml → environment variables → defaults.
models:
hf_token: hf_your_token_here # HuggingFace token for gated models
# asr_model: nvidia/parakeet-tdt-0.6b-v3
# diarization_model: pyannote/speaker-diarization-3.1
db_dsn: postgresql+psycopg://user:pass@localhost:5433/pawnai
paths:
db_path: speakers_db
audio_dir: audio/
device: auto # auto | cuda | cpu
s3:
bucket: my-audio-bucket
endpoint_url: https://s3.amazonaws.com
access_key: AKIAIOSFODNN7EXAMPLE
secret_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
region: us-east-1
prefix: ""
verify_ssl: true
path_style: true
siyuan:
url: http://localhost:6806
token: your-siyuan-token
notebook: Meetings
agent:
name: Bob
anima: anima.md # optional personality/system-prompt file
openai:
model: qwen3:8b
base_url: http://localhost:11434/v1
api_key: ollama
copilot:
model: gpt-4o
rag:
embed_model: Qwen/Qwen3-Embedding-0.6B
embed_dim: 1024
embed_device: cpu
queue:
topic: pawn-jobs
consumer_name: worker-1
bucket_name: my-audio-bucket# Check system status
pawn-diarize status
# Transcribe and diarize a meeting
pawn-diarize transcribe-diarize meeting.wav -o transcript.txt
# Label a speaker for future recognition
pawn-diarize label -f audio.wav -s SPEAKER_00 -n "Alice Smith"Perform speaker diarization on one or more audio files. Automatically recognizes known speakers from the database and replaces generic labels with names.
pawn-diarize diarize <audio_path>... [OPTIONS]
Options:
--output, -o TEXT Output file (.txt or .json)
--config TEXT Path to pawnai.yaml
--db-dsn TEXT PostgreSQL DSN
--threshold, -t FLOAT Similarity threshold for speaker matching (default: 0.7)
--store-new/--no-store Store embeddings for unknown speakers (default: enabled)pawn-diarize diarize meeting.wav -o speakers.json
pawn-diarize diarize part1.wav part2.wav part3.wav -o full_diarization.json
pawn-diarize diarize meeting.wav --no-storeTranscribe audio to text. Supports two backends: nemo (Nvidia Parakeet) and whisper (faster-whisper).
pawn-diarize transcribe <audio_path> [OPTIONS]
Options:
--output, -o TEXT Output file (.txt or .json)
--config TEXT Path to pawnai.yaml
--db-dsn TEXT PostgreSQL DSN
--timestamps/--no-timestamps Include word-level timestamps (default: enabled)
--device, -d TEXT cuda or cpu (default: auto)
--chunk-duration, -c FLOAT Split audio into N-second chunks (helps avoid OOM)
--backend TEXT nemo or whisper (default: nemo)pawn-diarize transcribe speech.wav -o transcript.txt
pawn-diarize transcribe large.mp3 --backend whisper --device cpu -c 300Combined transcription + diarization with speaker labels in a single pass. When multiple files are given they are treated as ordered chunks of the same conversation, with speaker labels aligned across files.
pawn-diarize transcribe-diarize <audio_path>... [OPTIONS]
Options:
--output, -o TEXT Output file (.txt or .json)
--config TEXT Path to pawnai.yaml
--session, -s TEXT Session JSON file for accumulative processing
--db-dsn TEXT PostgreSQL DSN
--threshold, -t FLOAT Speaker matching threshold (default: 0.7)
--store-new/--no-store Store unknown speaker embeddings (default: enabled)
--device, -d TEXT cuda or cpu
--chunk-duration, -c FLOAT Split audio into chunks
--cross-threshold, -x FLOAT Cross-file speaker alignment threshold (default: 0.85)
--no-timestamps Hide timestamps in output
--backend TEXT nemo or whisper (default: nemo)# Single file
pawn-diarize transcribe-diarize meeting.wav -o transcript.txt
# Multiple files (ordered chunks of same conversation)
pawn-diarize transcribe-diarize part1.wav part2.wav part3.wav -o full.txt
# Accumulate across invocations
pawn-diarize transcribe-diarize part1.wav --session conv.json
pawn-diarize transcribe-diarize part2.wav --session conv.json
pawn-diarize transcribe-diarize part3.wav --session conv.json -o final.txtExtract and store speaker embeddings (512-dim) in PostgreSQL for future recognition.
pawn-diarize embed <audio_path> --speaker-id SPEAKER_ID [OPTIONS]
Options:
--speaker-id, -s TEXT Unique speaker identifier (required)
--config TEXT Path to pawnai.yaml
--db-dsn TEXT PostgreSQL DSNpawn-diarize embed person.wav --speaker-id alice_smithFind speakers with similar voice characteristics via cosine similarity on stored embeddings.
pawn-diarize search <speaker_id> [OPTIONS]
Options:
--limit INT Maximum results (default: 5)
--db-dsn TEXT PostgreSQL DSNAssign a human-readable name to a speaker for automatic recognition in future diarizations.
pawn-diarize label [OPTIONS]
Options:
--file, -f TEXT Audio file containing the speaker
--speaker, -s TEXT Speaker label (e.g. SPEAKER_00)
--name, -n TEXT Human-readable name
--session TEXT Session ID to scope the labeling
--list, -l List all speaker name mappings
--config TEXT Path to pawnai.yaml
--db-dsn TEXT PostgreSQL DSNpawn-diarize label -f audio.wav -s SPEAKER_00 -n "Alice Smith"
pawn-diarize label --listTypical workflow:
transcribe-diarize→ transcript with SPEAKER_00, SPEAKER_01, …label→ assign names to those labels- Future
transcribe-diarizeruns → names appear automatically
Bulk-rename a mis-identified speaker across an entire session.
pawn-diarize session-relabel --session SESSION_ID --from OLD_NAME --to NEW_NAME [OPTIONS]
Options:
--yes, -y Skip confirmation prompt
--db-dsn TEXT
--config TEXTShow speakers and their embedding source files for a session.
pawn-diarize session-info <session_id> [OPTIONS]List all sessions or inspect a specific one.
pawn-diarize sessions [OPTIONS]
Options:
--session TEXT Inspect a specific session
--head INT Show first N segments
--tail INT Show last N segments
--output TEXT Save output to file
--config TEXT Path to pawnai.yamlPush a session's analysis to a SiYuan Notes instance as a new document.
pawn-diarize sync-siyuan [OPTIONS]
Options:
--session TEXT Session ID to sync
--all Sync all sessions with stored analysis
--notebook TEXT Target SiYuan notebook
--token TEXT SiYuan API token
--url TEXT SiYuan instance URL
--path-template TEXT Document path template
--daily-note Insert into today's daily note
--db-dsn TEXT
--config TEXTList objects in the configured S3 bucket with POSIX-style output (timestamp and size always shown).
pawn-diarize s3 ls [PREFIX] [OPTIONS]
Options:
--config TEXT Path to pawnai.yaml
--recursive, -r Flat listing (no directory grouping)
--time, -t Sort by modification time, newest first
--reverse Reverse sort order
--older-than INTEGER Show only objects older than N dayspawn-diarize s3 ls
pawn-diarize s3 ls recordings/ -t
pawn-diarize s3 ls recordings/ -r -t --reverse
pawn-diarize s3 ls --older-than 30Output format:
2024-03-15 14:32:10 45.2 MB recordings/2024/meeting.wav
PRE recordings/archive/
Delete objects from S3 by key or by age. Always prefer --dry-run first.
pawn-diarize s3 rm [PATH] [OPTIONS]
Options:
--config TEXT Path to pawnai.yaml
--older-than INTEGER Delete all objects older than N days
--prefix TEXT Limit --older-than to this key prefix
--dry-run, -n Show what would be deleted without making changes
--yes, -y Skip confirmation prompt for bulk deletions# Delete a specific file
pawn-diarize s3 rm recordings/2024/old.wav
# Preview bulk deletion
pawn-diarize s3 rm --older-than 30 --dry-run
# Delete all objects older than 90 days under a prefix
pawn-diarize s3 rm --older-than 90 --prefix recordings/ --yesBackground worker that polls a pawn-queue topic and executes jobs (transcribe-diarize, transcribe, diarize, embed, analyze, sync-siyuan).
pawn-diarize listen [OPTIONS]
Options:
--config TEXT Path to pawnai.yaml
--topic TEXT Queue topic name
--consumer-name TEXT Worker identity for lease-based concurrencyShow system status: device, CUDA availability, GPU info, configured models.
pawn-diarize status [--config TEXT]All audio processing commands accept s3:// URIs directly — files are downloaded to a temp location, processed, then cleaned up automatically.
pawn-diarize transcribe s3://my-bucket/recordings/meeting.wav -o transcript.txt
pawn-diarize transcribe-diarize s3://my-bucket/recordings/*.wavLLM-powered conversational agent for analyzing pawn-diarize sessions. Backed by PydanticAI and supports multiple providers (OpenAI-compatible, GitHub Copilot, Anthropic, Google, Groq, Mistral, and others).
Tools are auto-discovered from pawn_agent/tools/ — any module that exports NAME, DESCRIPTION, and build(cfg) is registered automatically.
Start an interactive multi-turn conversation session.
pawn-agent chat [OPTIONS]
Options:
--config TEXT Path to pawnai.yaml
--model TEXT Override the configured LLM model
--session TEXT Session hint for context (e.g. a session ID to analyze)Execute a single prompt and exit (useful for scripting).
pawn-agent run "<prompt>" [OPTIONS]
Options:
--config TEXT Path to pawnai.yaml
--model TEXT Override the configured LLM model
--session TEXT Session hintpawn-agent run "Summarize session abc123 and save to SiYuan"List all registered agent tools with their descriptions.
pawn-agent tools [--config TEXT]List available Copilot models with billing and reasoning effort details.
pawn-agent models [--config TEXT]| Tool | Description |
|---|---|
query_conversation |
Fetch the full transcript for a session from the database |
analyze_summary |
Run structured analysis (title, summary, topics, sentiment, tags) and persist to DB |
get_analysis |
Retrieve the most recent stored analysis for a session |
extract_graph |
Extract knowledge-graph triples from a transcript and persist to DB |
vectorize |
Embed session transcripts or SiYuan pages into the RAG index |
search_knowledge |
Semantic similarity search over transcript chunks and SiYuan notes |
save_to_siyuan |
Save Markdown content to SiYuan Notes as a new document |
fetch_siyuan_page |
Fetch the text content of a SiYuan page by path |
rag_stats |
Show a summary of the RAG vector index (sources and chunk counts) |
parakeet/
├── pawn_diarize/
│ ├── __main__.py # Entry point → pawn-diarize
│ ├── cli/
│ │ ├── commands.py # All CLI command definitions
│ │ └── utils.py # Console, helpers
│ ├── core/
│ │ ├── config.py # pawnai.yaml loader
│ │ ├── diarization.py # pyannote.audio + DBSCAN clustering
│ │ ├── transcription.py # NeMo Parakeet / faster-whisper backends
│ │ ├── embeddings.py # Speaker embeddings (PostgreSQL + pgvector)
│ │ ├── combined.py # Multi-file transcription+diarization
│ │ ├── database.py # SQLAlchemy ORM (embeddings, segments, analysis, RAG, graph)
│ │ ├── analysis.py # Session analysis via Copilot
│ │ ├── s3.py # S3/MinIO client and transparent download helpers
│ │ ├── siyuan.py # SiYuan Notes API client
│ │ └── queue_listener.py # Background job worker
│ └── utils/
├── pawn_agent/
│ ├── __main__.py # Entry point → pawn-agent
│ ├── cli/commands.py # run, chat, tools, models
│ ├── core/
│ │ ├── agent.py # ConversationAgent (Copilot SDK)
│ │ └── pydantic_agent.py # PydanticAI multi-provider agent
│ ├── tools/ # Auto-discovered tool modules
│ └── utils/
│ ├── config.py # AgentConfig loader
│ ├── db.py # DB session factory + RAG tables
│ ├── transcript.py # Fetch/format session transcripts
│ ├── siyuan.py # SiYuan helpers
│ ├── analysis.py # Analysis runner
│ └── vectorize.py # RAG vectorization
├── migrations/ # Alembic schema versions
├── pawnai.yaml # Project configuration
└── alembic.ini
| Category | Technology |
|---|---|
| Speaker diarization | pyannote.audio 4.0+ |
| Transcription | NeMo Parakeet (nvidia/parakeet-tdt-0.6b-v3), faster-whisper |
| Speaker embeddings | PostgreSQL + pgvector (512-dim cosine similarity) |
| RAG embeddings | sentence-transformers / Qwen3-Embedding-0.6B (1024-dim) |
| Database ORM | SQLAlchemy 2.0 + Alembic migrations |
| LLM agent | PydanticAI (multi-provider) + GitHub Copilot SDK |
| S3 storage | boto3 / aioboto3 |
| Job queue | pawn-queue (S3-backed) |
| Notes integration | SiYuan Notes API |
| CLI | Typer + Rich |
# Install with dev dependencies
pip install -e ".[dev]"
# Format
black pawn_diarize pawn_agent tests
isort pawn_diarize pawn_agent tests
# Lint
flake8 pawn_diarize pawn_agent tests
mypy pawn_diarize pawn_agent
# Test
pytest
pytest tests/test_cli.py::test_status
pytest --cov=pawn_diarize --cov-report=html