A binary protocol for transferring KV-cache and hidden states between LLM agents, eliminating redundant text re-processing in multi-agent systems. 73-78% token savings, 2-4x faster across 7 benchmarks and 3 model families.
Agent Vector Protocol (AVP) is a binary protocol for LLM agent communication via latent representations. When two agents run the same model, AVP lets them exchange hidden states and KV-cache directly, skipping autoregressive text generation entirely. When agents run different models from the same family (e.g. Qwen2.5-1.5B and Qwen2.5-0.5B), AVP uses vocabulary-mediated projection to bridge between their latent spaces. When models are fully incompatible, agents fall back to JSON.
AVP is transport-agnostic -- it defines the binary format, handshake, and codec, not the transport. The reference implementation uses HTTP/2, but AVP messages can be carried over A2A, MCP, gRPC, WebSockets, or any channel that supports binary payloads. AVP handles the latent communication layer, not discovery or orchestration.
- Handshake -- Agents exchange model identity (architecture, dimensions, weight hash, tokenizer hash)
- Resolve -- Same model: latent mode. Same family: cross-model projection. Otherwise: JSON fallback.
- Communicate -- Latent mode: binary tensor payloads. Cross-model: projected hidden states. JSON mode: text messages.
In a standard agent-to-agent exchange, each message requires full autoregressive generation (token-by-token decoding). For same-model agents, this is redundant -- the receiving agent already operates in the same representation space. AVP eliminates this step by transmitting intermediate hidden states and KV-cache directly.
AVP uses a compact 12-byte header followed by protobuf metadata and raw tensor bytes:
Bytes 0-1: Magic (0x4156 = "AV")
Byte 2: Version (0x01)
Byte 3: Flags (compressed, hybrid, has_map, kv_cache)
Bytes 4-7: Payload length (uint32 LE)
Bytes 8-11: Metadata length (uint32 LE)
Bytes 12..N: Protobuf metadata
Bytes N..: Raw tensor bytes
Version: 0.2.2
Current scope: same-model latent communication and same-family cross-model communication via vocabulary-mediated projection (Rosetta Stone v2). Cross-family communication via learned projection maps is experimental.
- Python SDK --
pip install avp(v0.2.2). Easy API (pack()/unpack()/generate()), connector API (think()/generate()/AVPContext),ContextStore, observability metrics, codec, handshake, session management, realignment, KV-cache serialization, Rosetta Stone cross-model projection, HuggingFace + vLLM connectors, HTTP/2 transport, 7 benchmark suites (377 tests)
AVP is complementary to existing agent protocols:
- A2A -- AVP provides a transport binding for A2A via
multipart/relatedwith binary payloads - MCP -- MCP handles tools and context; AVP handles tensor transfer between agents
- vLLM -- AVP integrates via KVConnectorBase_V1 plugin for production serving
- HuggingFace Transformers -- Full hidden state and KV-cache access for development and benchmarking
Based on LatentMAS: Latent Collaboration in Multi-Agent Systems -- same-model latent communication via hidden state transfer and KV-cache sharing, with realignment for untied-weight models.
See CONTRIBUTING.md
Apache 2.0