Skip to content

arakoodev/pinmoli

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pinmoli

From Tamil பின்மொழி (pin mozhi) -- "afterword"

An AI-powered CLI for testing SIP and WebRTC voice endpoints. Describe what you want to test in plain English, and Pinmoli handles the protocol details -- INVITE flows, codec negotiation, RTP streaming, failure analysis.

Think "Postman for Voice", but conversational.

$ pinmoli

  Pinmoli - SIP/WebRTC Testing Agent

You: Test sip:+15551234567@trunk.example.com with INVITE, wait 15 seconds for a response

Pinmoli: Running INVITE test against sip:+15551234567@trunk.example.com...

  [sip_test] INVITE sip:+15551234567@trunk.example.com
  ├─ 100 Trying (12ms)
  ├─ 180 Ringing (45ms)
  ├─ 200 OK (1203ms) — codec: PCMU/8000
  ├─ ACK sent
  ├─ RTP: sent 150 packets (voice-hello, 3.0s)
  ├─ RTP: waiting 15s for agent response...
  ├─ RTP: received 1247 packets (15.0s)
  └─ BYE sent, 200 OK

  Call completed successfully. The agent answered after 1.2s and spoke for
  the full 15-second window. Codec negotiated: PCMU/8000 (G.711 u-law).

Features

  • Natural language interface -- describe tests in plain English, the AI agent translates to protocol operations
  • Full SIP call flows -- OPTIONS pings, INVITE with SDP offer/answer, REGISTER with auth, ACK, BYE
  • WebRTC via WHIP -- connect to any WHIP endpoint (LiveKit, Cloudflare, Janus), negotiate ICE/DTLS/SRTP, send and receive audio
  • Bidirectional RTP audio -- send pre-generated or custom speech, receive and measure agent responses
  • DTMF send and receive (RFC 4733) -- send telephone-event RTP packets during active calls, detect incoming DTMF from the remote side
  • Runtime speech synthesis -- generate custom TTS audio on the fly with espeak
  • Failure analysis -- pattern-matched diagnostics with actionable recovery steps
  • Test persistence -- save and reload test configurations (SQLite with FTS5)
  • Works with any SIP or WebRTC endpoint -- LiveKit, Daily.co, Twilio, Cloudflare, Asterisk, FreeSWITCH, or any RFC 3261/WHIP-compliant server
  • Automatic packet capture -- every session captures SIP signaling and RTP media to pcap (Wireshark-ready), fail-fast if volume not mounted
  • Real codec negotiation -- PCMU (G.711 u-law), PCMA (G.711 A-law), G722 (wideband), opus. Transcodes audio at send time to match the negotiated codec
  • Runs in Docker -- all dependencies (ffmpeg, espeak, tcpdump, tini) included, no local setup required

Architecture

Pinmoli is built on pi, the same open-source agent framework that powers OpenClaw. Where OpenClaw uses pi to build a general-purpose personal AI assistant (messaging gateway, file operations, shell commands across 50+ integrations), Pinmoli takes the opposite approach: a domain-restricted agent that does exactly one thing -- SIP/WebRTC testing -- and does it well.

The key difference is scope. OpenClaw embeds pi-coding-agent to give an LLM full access to read, write, edit, and bash tools across an entire system. Pinmoli uses only pi-agent-core and pi-ai with a locked-down tool allowlist of 7 voice-testing tools. The LLM cannot touch the filesystem, run shell commands, or do anything outside voice protocol testing.

Pi libraries

pi-ai                              pi-tui
Multi-provider LLM abstraction     Terminal UI with diff rendering
Anthropic, OpenAI, Google,         Editor, Markdown, Box, Text
Bedrock, Mistral, Groq, ...       Keyboard input, layout engine
         │                                   │
         ▼                                   ▼
pi-agent-core                      Pinmoli TUI (src/ui/tui.ts)
Agent loop, tool execution,        Wraps pi-tui for interactive mode
event subscription, AbortSignal    Falls back to raw Terminal for tests
         │
         ▼
PinmoliAgent (src/agent/runtime.ts)
Domain-restricted system prompt
7-tool allowlist, event routing

Pinmoli uses three pi packages:

Package Role in Pinmoli
@mariozechner/pi-agent-core Agent loop -- receives user input, calls LLM, executes tools, streams events back
@mariozechner/pi-ai LLM provider abstraction -- swap between Gemini, Claude, GPT with one config change
@mariozechner/pi-tui Terminal rendering -- differential updates, editor with autocomplete, flicker-free output

How the pieces connect

User input
  │
  ▼
┌──────────────────────────────────────────────────────────────────┐
│  TUI  (src/ui/tui.ts)                                           │
│  pi-tui Editor → reads input → sends to agent                   │
│  Agent events → streamed back → rendered in real time            │
│  Ctrl+C: abort current operation / clear input / quit            │
└──────────────────────┬───────────────────────────────────────────┘
                       │
                       ▼
┌──────────────────────────────────────────────────────────────────┐
│  Agent  (src/agent/runtime.ts)                                   │
│  pi-agent-core Agent with pi-ai model                            │
│                                                                  │
│  System prompt constrains LLM to SIP testing only:               │
│  "You are Pinmoli, a SIP/WebRTC testing assistant.               │
│   You ONLY help test voice protocols.                            │
│   You CANNOT edit files, run bash, or access the filesystem."    │
│                                                                  │
│  Tool allowlist enforced by registry (src/tools/registry.ts):    │
│  sip_test, webrtc_test, generate_audio, analyze_failure,         │
│  save_test, load_test, list_tests                                │
└──────────────────────┬───────────────────────────────────────────┘
                       │  LLM decides which tool to call
                       ▼
┌──────────────────────────────────────────────────────────────────┐
│  Tools  (src/tools/*.ts)                                         │
│                                                                  │
│  sip_test ─────► SIP Engine (async generator, streams events)    │
│  webrtc_test ──► WebRTC Engine (WHIP signaling, werift stack)    │
│  generate_audio ► ffmpeg/espeak (sine, DTMF, silence, speech)    │
│  analyze_failure ► Pattern matching on event history             │
│  save/load/list ► SQLite with FTS5 (src/storage/db.ts)           │
└──────────────────────┬───────────────────────────────────────────┘
                       │
                       ▼
┌──────────────────────────────────────────────────────────────────┐
│  SIP Engine  (src/sip/engine.ts)                                 │
│                                                                  │
│  async function* runSipTest(config): AsyncGenerator<SipEvent>    │
│                                                                  │
│  ┌─ protocol.ts ── SIP message builder (INVITE, ACK, BYE)       │
│  ├─ sdp.ts ─────── SDP offer/answer (opus, PCMU, PCMA, G722)    │
│  ├─ rtp-receiver.ts ── RTP/DTMF send/receive on UDP socket      │
│  ├─ dtmf.ts ───── RFC 4733 encode/decode, DtmfDetector          │
│  └─ audio.ts ───── Sample resolution (WAV files, generated)      │
│                                                                  │
│  Yields events as they happen:                                   │
│    SIP messages, RTP stats, DTMF, diagnostics, codec negotiation │
└──────────────────────────────────────────────────────────────────┘
                       │
                       ▼
┌──────────────────────────────────────────────────────────────────┐
│  WebRTC Engine  (src/webrtc/engine.ts)                           │
│                                                                  │
│  async function* runWebRtcTest(config): AsyncGenerator<TestEvent>│
│                                                                  │
│  ┌─ whip.ts ───── WHIP signaling (RFC 9725: POST offer→answer)  │
│  ├─ audio-frames.ts ── PCM16 frame chunking + WAV save           │
│  └─ werift ────── Pure TS WebRTC stack (ICE/DTLS/SRTP/RTP)      │
│                                                                  │
│  Yields events as they happen:                                   │
│    WHIP signaling, ICE/DTLS, RTP stats, DTMF, agent audio       │
└──────────────────────────────────────────────────────────────────┘

Why async generators

The SIP engine is an async function* that yields events as they happen -- a SIP 100 Trying at 12ms, a 200 OK at 1200ms, RTP packet counts every second. The TUI renders each event the moment it arrives. No buffering, no callbacks, no polling.

// The engine yields events in real time
for await (const event of runSipTest(config)) {
  tui.render(event);  // instant display
}

This design makes the engine usable outside the TUI too -- pipe events to NDJSON, feed them into a test assertion, or stream them over a websocket.

Why domain restriction matters

General-purpose agents (like OpenClaw) give the LLM access to bash, file I/O, and the full system. That power makes sense for a personal assistant. For a SIP testing tool, it's a liability -- you don't want an LLM accidentally rm -rf-ing your project while trying to debug a codec mismatch.

Pinmoli's agent can only call 7 tools, all voice-testing related. The system prompt explicitly forbids filesystem access, and the tool registry enforces the allowlist at runtime. The LLM stays in its lane.

Quick Start

Prerequisites

  • Docker
  • An LLM provider credential (GCP service account key for Gemini, or an API key for Anthropic/OpenAI)

Quick Start (GHCR)

Pull the published image and run with any supported LLM provider:

docker pull ghcr.io/arakoodev/pinmoli:latest

Anthropic:

docker run --rm -it --network host \
  -v $(pwd)/captures:/app/captures \
  -e ANTHROPIC_API_KEY=sk-ant-... \
  ghcr.io/arakoodev/pinmoli

OpenAI:

docker run --rm -it --network host \
  -v $(pwd)/captures:/app/captures \
  -e OPENAI_API_KEY=sk-... \
  ghcr.io/arakoodev/pinmoli

Google Gemini (API key):

docker run --rm -it --network host \
  -v $(pwd)/captures:/app/captures \
  -e GEMINI_API_KEY=... \
  ghcr.io/arakoodev/pinmoli

Google Vertex AI (service account):

docker run --rm -it --network host \
  -v $(pwd)/captures:/app/captures \
  -v /path/to/key.json:/credentials.json:ro \
  ghcr.io/arakoodev/pinmoli --service-account /credentials.json

Groq:

docker run --rm -it --network host \
  -v $(pwd)/captures:/app/captures \
  -e GROQ_API_KEY=gsk_... \
  ghcr.io/arakoodev/pinmoli

The provider is auto-detected from whichever env var you set. Use --provider to override.

The image is published automatically on every push to main via GitHub Actions. Tagged releases (v*) produce versioned images (e.g., ghcr.io/arakoodev/pinmoli:0.2.0).

Development

For contributors building from source:

git clone git@github.com:arakoodev/pinmoli.git
cd pinmoli
docker compose build
docker compose up -d

# Start the TUI
docker compose exec pinmoli npx tsx src/cli.ts

# With a GCP service account
docker compose exec pinmoli npx tsx src/cli.ts \
  --service-account /app/secrets/my-key.json

The source directory is bind-mounted, so code changes are reflected immediately.

Try it

You're in. Type a test request. Every example below has a corresponding integration test in test/integration/readme-prompts.test.ts.

SIP basics:

You: Send OPTIONS to sip:trunk.example.com
You: INVITE sip:+15551234567@sip.livekit.cloud with opus and PCMU
You: Register at sip:pbx.example.com with username admin password secret

Codec negotiation:

You: Test with PCMA codec -- I want to verify A-law support
You: Call the agent using G722 and wait 20 seconds for a response
You: Test sip:pbx.example.com offering only PCMA and PCMU, see which it picks

DTMF and IVR navigation:

You: Call sip:+15551234567@trunk.example.com and press 1-2-3-# after the greeting
You: Call sip:+18005551234@trunk.example.com, press 1 for sales, then 0 for operator
You: Connect via WebRTC to https://agent.example.com/whip and enter PIN 1234#

Speech generation:

You: Generate speech saying "What is the weather today?" then call the agent
You: Generate a 1000Hz sine wave for 5 seconds, then test the endpoint
You: Make the greeting say "Por favor espere" in Spanish, then test

Bidirectional conversations:

You: Call sip:agent@example.com, listen for 5 seconds first, then send my greeting
You: INVITE sip:agent@livekit.cloud, send the greeting, wait 30 seconds for a response

WebRTC:

You: Test the WHIP endpoint at https://my-agent.example.com/whip with bearer token abc123

Failure analysis:

You: Why did it fail?
You: What went wrong? (after a 488 codec mismatch)

Save, load, and batch:

You: Save this test as "production-health-check"
You: Show me all saved tests, then run one
You: Compare sip:trunk-us.example.com and sip:trunk-eu.example.com
You: Test these servers: sip:a.example.com, sip:b.example.com, sip:c.example.com

Troubleshooting:

You: Try again with 15 second timeout

Advanced combos:

You: Generate speech "Hello, I need billing support", call with PCMA, then press 2 for billing
You: Test sip:agent@broken-trunk.com, analyze the failure, fix it with TCP, save the config

Run without the AI agent

If you just want to run SIP tests programmatically without the conversational TUI:

docker compose exec pinmoli npx tsx -e "
  import { runSipTest } from './src/sip/engine.js';
  for await (const event of runSipTest({
    uri: 'sip:trunk.example.com',
    method: 'OPTIONS',
    codecs: ['PCMU']
  })) { console.log(JSON.stringify(event)); }
"

Configuration

LLM Provider

Pinmoli auto-detects your LLM provider from environment variables. Set one and go:

Provider --provider Env var Default model
Anthropic anthropic ANTHROPIC_API_KEY claude-sonnet-4-5
OpenAI openai OPENAI_API_KEY gpt-4o
Google Gemini google GEMINI_API_KEY gemini-2.5-flash
Google Vertex AI google-vertex --service-account <path> gemini-2.5-flash
Groq groq GROQ_API_KEY llama-3.3-70b-versatile
OpenRouter openrouter OPENROUTER_API_KEY anthropic/claude-sonnet-4.5
# Just set the env var — provider is auto-detected
ANTHROPIC_API_KEY=sk-ant-... pinmoli

# Or be explicit
pinmoli --provider openai --model gpt-4o

# Override the default model
pinmoli --provider anthropic --model claude-haiku-4-5

# Vertex AI (service account)
pinmoli --service-account /path/to/key.json

# Switch provider at runtime via slash command
/model anthropic claude-sonnet-4-5
/model google gemini-2.5-pro
/model                              # show current provider/model

CLI Reference

pinmoli [options]

  --provider <name>          LLM provider (anthropic, openai, google, google-vertex, groq, openrouter)
  --model <id>               Model ID (default depends on provider)
  --service-account <path>   GCP service account JSON (implies google-vertex)
  --help                     Show usage

Environment Variables

Variable Provider
ANTHROPIC_API_KEY Anthropic
OPENAI_API_KEY OpenAI
GEMINI_API_KEY Google Gemini
GROQ_API_KEY Groq
OPENROUTER_API_KEY OpenRouter
GOOGLE_APPLICATION_CREDENTIALS Google Vertex AI (with GOOGLE_CLOUD_PROJECT, GOOGLE_CLOUD_LOCATION)

Docker Compose

The default docker-compose.yml uses network_mode: host so SIP and RTP traffic reaches the network directly. Modify if your setup requires bridged networking with explicit port mapping.

Tools

Pinmoli exposes 7 tools to the AI agent. You don't call these directly -- you describe what you want and the agent picks the right tool. See SKILLS.md for full parameter reference.

Tool Purpose
sip_test Run OPTIONS, INVITE, or REGISTER against a SIP endpoint. Supports DTMF send/receive via dtmfDigits.
webrtc_test Connect to a WHIP endpoint, negotiate ICE/DTLS/SRTP, send audio, capture agent response. Supports DTMF.
generate_audio Create custom audio samples (sine, DTMF dual-tone, silence, TTS speech)
analyze_failure Diagnose a failed test and suggest fixes
save_test Save a test configuration by name
load_test Reload and run a saved test
list_tests List all saved test configurations

Audio Samples

Pre-generated (included in the Docker image)

Sample Description Duration
voice-hello "Hello, this is a test call from Pinmoli" ~3s
sine-440hz 440 Hz sine wave 3s
sine-1000hz 1000 Hz sine wave 3s
dtmf-123 DTMF tones 1-2-3 1.5s
silence Silence 3s

All samples are PCMU @ 8kHz mono (G.711 u-law), the standard SIP codec.

Runtime generation

Ask the agent to generate custom speech:

You: Generate speech saying "Please transfer me to billing"
You: Now call sip:+15551234567@trunk.example.com with that audio

Or generate tones:

You: Generate a 1000Hz sine wave for 5 seconds, then test the endpoint

Packet Capture

Every Pinmoli session automatically captures all SIP signaling and RTP media traffic to a pcap file. Open it in Wireshark for protocol-level debugging.

How it works

The entrypoint.sh runs tcpdump in the background for the entire session:

  • Captures port 5060 (SIP) and UDP ports 10000-65535 (RTP/SRTP)
  • Saves to /app/captures/pinmoli-YYYYMMDD-HHMMSS.pcap inside the container
  • Stops automatically when the session ends (EXIT trap)

Saving captures to your local machine

Docker Compose (development): Captures appear at ./captures/ automatically — the source directory is bind-mounted.

Docker Run (GHCR image): Mount a volume so captures persist after the container exits:

# Create the captures directory (first time only)
mkdir -p captures

# Mount it when running Pinmoli
docker run --rm -it --network host \
  -v $(pwd)/captures:/app/captures \
  -e ANTHROPIC_API_KEY=sk-ant-... \
  ghcr.io/arakoodev/pinmoli

After the session, your captures are in ./captures/:

ls captures/
# pinmoli-20260305-143022.pcap

# Open in Wireshark
wireshark captures/pinmoli-20260305-143022.pcap

Previous captures from earlier runs are preserved — new sessions create new pcap files with unique timestamps.

Disable capture

If you don't need packet capture (e.g., CI/CD), set PINMOLI_NO_CAPTURE=1:

docker run --rm -it --network host \
  -e PINMOLI_NO_CAPTURE=1 \
  -e ANTHROPIC_API_KEY=sk-ant-... \
  ghcr.io/arakoodev/pinmoli

Project Structure

pinmoli/
├── src/
│   ├── cli.ts                  # Entry point, REPL loop
│   ├── agent/runtime.ts        # PinmoliAgent wraps pi-agent-core
│   ├── ui/
│   │   ├── tui.ts              # PinmoliTUI wraps pi-tui
│   │   ├── tool-output.ts      # Collapsible tool result rendering
│   │   └── test-terminal.ts    # Test-mode Terminal implementation
│   ├── tools/
│   │   ├── registry.ts         # 7-tool allowlist enforcement
│   │   ├── index.ts            # Tool registration
│   │   ├── sip-test.ts         # SIP test execution (async generator)
│   │   ├── webrtc-test.ts      # WebRTC test execution (WHIP + werift)
│   │   ├── generate-audio.ts   # Audio generation (ffmpeg, espeak)
│   │   ├── analyze-failure.ts  # Diagnostic pattern matching
│   │   └── save/load/list-tests.ts
│   ├── sip/
│   │   ├── engine.ts           # SIP test orchestration (async generator)
│   │   ├── protocol.ts         # SIP message building
│   │   ├── sdp.ts              # SDP offer/answer builder
│   │   ├── rtp-receiver.ts     # RTP/DTMF packet send/receive
│   │   ├── codec.ts            # Codec table, transcoding (PCMU↔PCMA), lookup
│   │   ├── dtmf.ts             # RFC 4733 encode/decode, DtmfDetector
│   │   └── audio.ts            # Audio sample resolution
│   ├── webrtc/
│   │   ├── engine.ts           # WebRTC test orchestration (async generator)
│   │   ├── whip.ts             # WHIP signaling client (RFC 9725)
│   │   └── audio-frames.ts     # PCM16 frame chunking + WAV save
│   ├── storage/db.ts           # SQLite + FTS5 persistence
│   ├── validation/schemas.ts   # TypeBox schemas
│   └── commands/service-account.ts
├── audio-samples/              # Pre-generated PCMU WAV files
├── test/
│   ├── unit/                   # Protocol, SDP, RTP, DTMF, storage, tools, lint, WebRTC
│   ├── integration/            # TUI flows, e2e, bidirectional RTP, speech
│   └── live/                   # Tests against real SIP and WebRTC endpoints
├── eslint-plugin-pinmoli.cjs   # 14 lint rules from real bugs
├── Dockerfile                  # Alpine + Node 20 + ffmpeg + espeak + tcpdump + tini
├── docker-compose.yml
└── entrypoint.sh

Testing

All tests run inside Docker.

# Start the container
docker compose up -d

# Run all tests
docker compose exec pinmoli npx vitest run

# Unit tests only (~1s)
docker compose exec pinmoli npx vitest run test/unit/

# Integration tests
docker compose exec pinmoli npx vitest run test/integration/

# Live tests (hits real SIP endpoints, requires network)
docker compose exec pinmoli npx vitest run test/live/

# Type-check
docker compose exec pinmoli npx tsc --noEmit

# Lint
docker compose exec pinmoli npm run lint

Troubleshooting

Port 5060 already in use

Only one process can bind the SIP port. Kill the conflicting process inside the container:

docker compose exec pinmoli sh -c 'kill $(lsof -ti:5060)'

No RTP packets received

  1. NAT/firewall -- the host must be reachable on the RTP port advertised in SDP. Private IPs (WSL2 172.x, Docker 172.x) are not routable from the internet.
  2. No agent running -- the remote SIP endpoint accepted the call but has no worker to generate audio.
  3. Run from a host with a public IP or use Docker with network_mode: host.

503 Service Unavailable after 60s

This is usually a synthetic 503 generated by the sip npm library when the remote drops the TCP connection (e.g., LiveKit agent timeout). It's not a real SIP 503. Common causes:

  • AI agent worker not running on the remote side
  • Malformed SDP or unroutable IPs in headers
  • Missing ACK after 200 OK

LLM not responding

Check that your credentials are configured:

# If using a volume-mounted service account, verify it's accessible inside the container
docker compose exec pinmoli ls -la /app/secrets/my-key.json

# Or pass credentials via environment variable
docker run --rm -it --network host \
  -e ANTHROPIC_API_KEY=sk-ant-... \
  ghcr.io/arakoodev/pinmoli

Contributing

# Fork and clone
git clone https://github.com/your-fork/pinmoli.git
cd pinmoli

# Build the container
docker compose build

# Run tests (must pass before submitting a PR)
docker compose up -d
docker compose exec pinmoli npx vitest run
docker compose exec pinmoli npx tsc --noEmit
docker compose exec pinmoli npm run lint

All commands run inside Docker -- the container includes ffmpeg, espeak, and other dependencies that aren't available locally.

License

MIT

About

Postman for Voice - Powered by Pi/OpenClaw

Resources

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors