DeepTap

Agent-native web search. Three depth modes on one endpoint. Fixed prices. Transparent firewall. The same search backbone Anthropic chose for Claude.

What Is DeepTap?

DeepTap is a web search API built specifically for AI agents. It lives between shallow SERP wrappers like Tavily and Serper on one end, and slow deep-research products like Perplexity Sonar Deep Research on the other. One endpoint, three depth modes: a fast single-pass lookup when you need speed, a gap-analysis round-trip when you need completeness, and a multi-iteration research loop when you need thorough coverage. All three carry the same response envelope. All three return with a predictable, fixed credit cost. No dynamic pricing, no 4-to-250 credit surprise bills, no context-size multipliers.

Underneath the public search layer sits a compounding fact cache with decay models. Every extracted page is distilled into structured claims with provenance, evidence counts, source diversity, and a freshness decay curve tuned to the fact type (permanent, slow-decay, moderate-decay, fast-decay, volatile). When the next agent asks a question that has already been answered on the public web, DeepTap answers in under fifteen milliseconds for a tenth of a credit instead of triggering a fresh search. The cache is self-healing: organic query traffic constantly re-verifies the most popular facts, and a priority-scored River job queue re-verifies the rest inside a bounded daily budget.

DeepTap is agent-native by construction. Retail developers authenticate with traditional API keys and pay through Stripe subscriptions. Agents with no account pay per-call with x402 crypto micropayments or per-session with Merchant Payments Protocol (MPP) credentials bound to Demonstrating Proof of Possession (DPoP) keys. Portfolio and enterprise customers verify through TrustPlane with a SPIFFE Verifiable Identity Document, which routes billing to a double-entry ledger instead of Stripe. The API speaks Model Context Protocol (MCP) for Claude Desktop and Claude Code, Agent-to-Agent Protocol (A2A) v1.0 for agent marketplaces, and a Tavily-compatible shim for frictionless migration. For customers who cannot send queries to a public cloud, DeepTap ships a local-node Docker Compose deployment with mutual TLS, delta sync, and an offline fact-lookup mode that continues serving when the cloud is unreachable.

The production architecture is a pure Go API binary backed by a Python 3.11 NLP sidecar that exposes seven gRPC remote procedure calls. All machine-learning inference runs in the sidecar, never in the Go process: reranking via ms-marco-MiniLM-L6-v2 INT8, embedding via bge-small-en-v1.5, natural-language-inference grounding via MiniCheck-Flan-T5-Large, entity linking via ReFinED, and prompt-injection scoring via Prompt Guard 2 86M. Bootstrap deployments swap the sidecar for managed inference APIs (Cohere Rerank, Voyage Embed, Cloud Run Trafilatura) behind stable Go interfaces, giving the project scale-to-zero economics from day zero and an interface-compatible migration path to self-hosted infrastructure once volume justifies it.

Why DeepTap?

The agent search market is actively destabilizing. In February 2026 Nebius acquired Tavily for roughly 275 million US dollars, leaving an install base of Tavily customers facing the usual post-acquisition uncertainty: price changes, feature freezes, support regressions, product reprioritization toward the acquirer's roadmap. In December 2025 Google sued SerpApi for Digital Millennium Copyright Act (DMCA) section 1201 violations, marking the first major legal action against Google-SERP-scraping search APIs and creating existential legal risk for every competitor built on the same scraping pattern. Bing's search API was deprecated. Google's Custom Search API remains locked behind low quotas and per-query fees that do not scale for agent workloads. The market is smaller than it looks and the legally clean vendors are smaller still.

DeepTap is built on Brave Search as the default provider. Brave is the same independent index that Anthropic chose for Claude's web search, with confirmed 86.7% result overlap versus Google on evaluated queries. It is licensed, not scraped, and carries no DMCA 1201 exposure. Serper remains available as an opt-in provider_class=fast tier for customers who explicitly acknowledge the legal tradeoff at the API-key level. Defaulting to Brave is the single most important provider decision in the product: it is the one that makes DeepTap safe to integrate into a regulated pipeline without a lawyer conversation.

Pricing is the second wedge. Tavily Research calls can consume anywhere from 4 to 250 credits depending on opaque internal heuristics; Perplexity Sonar Deep Research lists $410 to $1,320 per 1,000 calls and takes 30 seconds to 2 minutes per answer; Exa Auto runs roughly $7 per 1,000 calls. DeepTap charges exactly 1 credit for depth=1 safe, exactly 3 credits for depth=2 safe, exactly 8 credits for depth=3 safe, exactly 0.1 credit for a fact lookup, exactly 0.5 credit for an extract excerpt. Every response includes credits_used. You can budget an agent deployment on a spreadsheet without instrumentation.

The third wedge is transparency on the firewall. Tavily markets an agent-native prompt-injection firewall; its implementation is proprietary, returns no injection score, no reasons array, and no sanitized-content field. The defender cannot audit what was blocked. DeepTap implements a layered firewall and exposes every scoring signal in the response envelope: prompt_injection_score (0.00 to 1.00), unsafe_reasons[] (machine-readable tags), sanitized_content (extracted text with injection vectors stripped), trusted_snippet (provider-supplied metadata that never touched untrusted HTML), and untrusted_content (the raw-extracted text kept separate so security teams can make their own call). The joint OpenAI/Anthropic/Google study from October 2025 (Nasr et al.) confirmed that every published prompt-injection detector was bypassed above 90% under adaptive attack, meaning detectors are scoring signals, not gates. DeepTap layers rule-based pre-extraction stripping in Go (Unicode tag chars, zero-width chars, CSS-hidden text, HTML comments, aria/meta injection vectors, off-screen positioning tricks) before any ML model touches the content, then Prompt Guard 2 86M in the sidecar as a scoring layer, then explicit trusted/untrusted splits in the response so the security team downstream can apply its own policy.

Key Features

Configurable depth, one endpoint (plus a streaming endpoint for the deep tier)

POST /v1/search takes a depth field of 1 or 2. Depth 1 runs a single RunStage pipeline (Fanout -> per-URL fetch + Layer 1 strip + Trafilatura extract + Layer 2 score -> Rerank) with automatic query decomposition; the 95th-percentile latency target is under 7 seconds and the hard timeout is DEEPTAP_DEPTH1_TIMEOUT (default 7s). Depth 2 adds one reflection round against anthropic/claude-sonnet-4.6 with a JSON-schema structured output that asks "what is missing from these findings?", issues a targeted second stage on the gaps, merges the two rounds with a score-max dedup on normalized URL, and re-sorts by rerank_score; the 95th-percentile target is under 20 seconds and the hard timeout is DEEPTAP_DEPTH2_TIMEOUT (default 18s). Depth 3 is served on a separate streaming endpoint (see below) because a two-minute blocking request is unusable in an agent loop; POST /v1/search with depth=3 returns 400 use_research_endpoint pointing callers at POST /v1/research. The envelope carries depth, rounds_executed, stop_reason, and (when include_ledger=true) the facet ledger.

Depth-3 streaming research endpoint `POST /v1/research`

The deep-research tier is a Server-Sent Events stream, not a blocking request. POST /v1/research emits a documented sequence of events as each round of the facet-ledger-guided loop completes: round_start ({round, sub_queries[]}), partial_results ({round, results[]} after each stage rerank), facet_update ({round, facet, coverage} as each facet's coverage moves), reflection ({round, gaps[], stop, model, latency_ms} after each reflector call), saturation ({reason} when the loop decides to stop), final (the full envelope including ledger, rounds_executed, stop_reason), error (RFC 7807-shaped), and ping ({}) every DEEPTAP_SSE_HEARTBEAT (default 15s) as a keep-alive heartbeat. The loop runs up to DEEPTAP_DEPTH3_MAX_ROUNDS (default 4) rounds under DEEPTAP_DEPTH3_TIMEOUT (default 110s, SLO 120s p95) and stops on one of five precedence-ordered conditions: hard timeout with 2s grace; marginal-lift saturation (Saturated(DEEPTAP_SATURATION_DELTA default 0.05, consecutive=2)); reflector LLM-stop advisory; max rounds; reflector error. A single writer goroutine fed by a bounded channel owns the flusher so event ordering is deterministic. Client disconnect cancels the orchestrator context. The usage_ledger row (8.0 safe / 4.0 fast) is written AFTER the final event, so a mid-stream error never bills.

Facet ledger: bounded, auditable research coverage

Depth 2 and depth 3 share a facet ledger defined in internal/depth/ledger.go. SeedFacets(query) walks a small keyword table to seed facet names: vs/versus -> comparative, history of/origin of/when was -> historical, what is/define/definition of -> definitional, current/latest/today/now -> current-status, no match -> general. Each sub-query is attributed to a facet at dispatch time (via the decomposer's new facets schema extension, or the seed fallback). After every stage the ledger accumulates each result's rerank_score against its attributed facet, normalizes by DefaultFacetSaturation=5.0, caps at 1.0 per facet, and averages across facets to produce overall Coverage(). The ledger exposes MarginalLift(consecutive) and Saturated(delta, consecutive) so the composer can decide when to stop on evidence-gathering plateau. When include_ledger=true is set on the request and server config allows it, the envelope carries the full {facets[{name, coverage, sub_queries[], rounds[], docs_scored}], rounds, coverage_history[]} JSON view so agent-builders can audit exactly which facets were researched, how thoroughly, and across which rounds.

Automatic query decomposition via OpenRouter

Every POST /v1/search request that does not already carry a caller-supplied sub_queries array is routed through a hand-written stdlib OpenRouter chat-completions client that decomposes the user's question into N diverse sub-queries under JSON-schema structured output. The model slug is chosen by depth: depth=1 uses anthropic/claude-haiku-4.5 for cheap, fast decomposition; depth=2 and depth=3 use anthropic/claude-sonnet-4.6 for the harder research questions. The decomposer runs under a 10-second handler-level timeout. Responses carry a decomposition object with the resolved model, upstream inference provider, generation_id, sub_queries[], tokens_prompt, tokens_completion, cost_usd, and latency_ms so callers can audit and meter the LLM call per request. On any decomposition failure (timeout, breaker open, policy violation, parse error) the handler logs a warning, adds decomposition_failed to warnings[], and falls back to the single-query path so the request still returns a useful answer.

LLM policy enforcement with Secure-SKU Zero Data Retention

Every organization has an organizations.llm_policy JSONB column that the handler loads into a typed LLMPolicy on every request: require_zdr, require_data_collection_deny, providers[] allowlist, models_allowed[] allowlist, max_tokens clamp, optional temperature. When the organization's tier is secure, the loader unconditionally clamps require_zdr=true, require_data_collection_deny=true, and defaults the provider allowlist to ["anthropic"] if empty. The three controls are then materialised directly into the OpenRouter request body as provider.zdr=true, provider.data_collection="deny", and provider.order=<allowlist> so a Secure-tier query cannot route to a log-retaining inference provider even if OpenRouter's default would have picked one. Malformed policy JSON does not fail open; the loader errors, the handler logs it as policy_load_failed in warnings[], and the decomposer runs against a zero-value permissive baseline so the degraded request still makes progress.

Freshness classification on every query

A deterministic Go classifier at internal/freshness/ labels every query as volatile, daily, standard, or stable before it reaches the provider. The classifier applies NFC normalization and lowercasing, extracts 4-digit year tokens to decide historical vs. current-year signals, and walks a fixed sequence of keyword unions and structural patterns (price-or-score live, weather live, status live, breaking, is-still, future-event, current-role, stable-pattern). The result populates freshness_class and freshness_reason on every response, bumps the deeptap_freshness_class_total{class} Prometheus counter, and is read by the S11 caching layer to choose a cache TTL (volatile 5 minutes, daily 1 hour, standard 4 hours, stable 24 hours). No network call; pure Go regex evaluation.

Four-tier Redis result cache with freshness-driven TTLs

S11 wires a four-tier cache behind internal/cache/: full-envelope, per-sub-query, per-URL extract, and a fact-cache hook reserved for S21. Keys use a shared v1: prefix plus a sha256 hash over NFC-normalized + lowercased + whitespace-collapsed inputs (with optional leading-article stripping behind DEEPTAP_CACHE_STRIP_LEADING_ARTICLES). FullKey composes the normalized query plus depth plus provider class plus country plus language plus safe mode so depth-1 safe and depth-2 fast for the same question never collide. TTLs come from DetermineTTL(class) with MinFullTTL=60s and MinExtractTTL=15m floors and a 1-hour sub-query cap. The depth orchestrator LookupFull at the top of every depth, calls NoopFactCache.Lookup on miss (S21 will replace the noop), then wraps the pipeline in a Singleflighter under DEEPTAP_CACHE_SF_TIMEOUT so concurrent identical requests collapse into one upstream run. ResearchStage inside each round batch-MGets the sub-query tier and writes back on success; the per-URL extract tier short-circuits the fetch and Trafilatura call on hit but still re-scores the cached text through the firewall so a model update defends against stale injection classifications. Envelopes carry cache_hit, cache_hit_type in {full, subquery, extraction, fact, miss}, and cache_keys_hit[]; /v1/research emits a new cache_hit SSE event when the full-tier cache short-circuits the loop before round 1. Billing is unchanged: a full-tier hit still writes 1.0 / 0.5 / 3.0 / 1.5 / 8.0 / 4.0 credits at the normal rate (customer value is the answer, not the path).

Cross-instance DMCA invalidation inside one second

Every StoreFull also adds the FullKey to a byurl:{sha256} and a bydomain:{sha256} Redis SET for each result URL and domain the envelope referenced, inside a single Redis pipeline. invalidator.go opens a dedicated pgx.Conn, subscribes to deeptap_cache (single-key eviction) and deeptap_dmca_suppress (URL + domain fan-out), and on a suppress NOTIFY looks up the reverse-index sets and deletes every cache key named there on the local Redis. Every instance of DeepTap runs the same listener, so a single pg_notify publishes to the fleet and suppression lands in under a second end-to-end (tracked as deeptap_cache_invalidation_latency_seconds). The listener is wrapped in a panic-recover so a malformed payload cannot take down the subscription.

Negative-cache tombstones against adversarial pages

When any per-result prompt_injection_score is at or above UnsafeScoreThreshold, the real envelope is NOT stored. A 60-second tombstone lands at the same FullKey carrying an empty-results envelope with cache_hit=true, cache_hit_type="full", and the surfaced unsafe_reasons[]. Hot retries against genuinely adversarial pages hit the tombstone and return immediately instead of re-driving the pipeline. The tombstone expires naturally after 60 seconds; deeptap_cache_evictions_total tracks normal evictions but not tombstone expiries.

Per-API-key GCRA rate limiting with atomic Redis Lua enforcement

S12 wires a Generic Cell Rate Algorithm limiter at internal/ratelimit/ backed by a 40-line atomic Lua script embedded via go:embed. The script reads the bucket's Theoretical Arrival Time (TAT), advances it by rate_period_ms = 1000 / tier.CPS, compares against new_tat - burst * rate_period_ms, and either admits with the new TAT SET-ed under a burst * rate_period_ms * 2 TTL or denies with retry_after_ms and reset_ms. The compare-and-swap happens inside a single Redis script execution, so a 100-goroutine-vs-50-burst test admits exactly 50 and denies exactly 50. Script.Load runs at boot with a WARN-log-on-fail fallback (NOSCRIPT recovery inside Allow re-loads lazily); each Allow runs under a 50 ms per-op deadline with a NOSCRIPT retry once. The middleware is mounted AFTER APIKey and BEFORE the cache layer so a cache hit still counts against the budget. On admission it stamps X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset (Unix seconds) on every response, and stashes the admission *Decision + resolved Tier on context so /v1/search, /v1/extract, and /v1/research envelopes stamp a rate_limit: {limit, remaining, reset_ms} block from the same values. On denial the middleware writes a 429 with an application/problem+json body: type=https://deeptap.ai/errors/rate_limited, retry_after_ms, limits.rate{limit, remaining, reset_ms}, and a Retry-After header (ceiling of retry_after_ms / 1000, floored at 1 second). Tier table from specs/PROJECT-CONTEXT.md: Free 10 cps / 20 burst, Starter 50 / 100, Growth 200 / 400, Scale 500 / 1000, Secure 500 / 1000, Enterprise 1000 / 2000. Per-API-key overrides live on api_keys.rate_limit_override: a positive override replaces CPS and sets Burst = 2 * CPS regardless of DEEPTAP_RATELIMIT_BURST_2X. Redis error fails OPEN: the middleware admits the request, records the fail-open in deeptap_ratelimit_redis_errors_total, and labels the deeptap_ratelimit_requests_total{tier, outcome} counter with redis_error or redis_timeout. Four env vars: DEEPTAP_RATELIMIT_ENABLED (default true), DEEPTAP_RATELIMIT_BURST_2X (default true), DEEPTAP_CONCURRENCY_ENABLED (default true), DEEPTAP_CONCURRENCY_DEPTH3_OVERRIDE (default 0).

Per-org concurrency caps with depth=3 bucket protecting OpenRouter quota

S12 also wires a separate per-organization in-flight counter at internal/middleware/concurrency.go. The Redis key is v1:conc:<org_id>:<bucket> with two buckets: depth3 for /v1/research (Free 2, Starter 5, Growth 10, Scale 25, Secure 25, Enterprise 50) and general for every other authenticated /v1/* endpoint (Free 5, Starter 20, Growth 50, Scale 100, Secure 100, Enterprise 200). A single depth=3 research request holds an HTTP connection for up to two minutes and issues multiple OpenRouter calls per round; a runaway agent firing 100 depth=3 requests in parallel can drain an OpenRouter credit pool in minutes. The depth3 bucket stops that before it starts. The middleware INCRs the counter on entry, compares against the tier limit (or DEEPTAP_CONCURRENCY_DEPTH3_OVERRIDE for depth=3 when that env var is positive), and atomically DECRs + returns 503 with type=https://deeptap.ai/errors/concurrency_exhausted + limits.concurrency{bucket, limit, in_flight} on saturation. The release runs inside a defer rdb.Decr(context.Background(), key) so a handler panic still returns the slot: Go's defer semantics run the block during stack unwind, and the Recoverer middleware above the chain catches the panic AFTER the DECR has already fired. A test mounts Concurrency downstream of Recoverer + a panicking handler, runs 50 concurrent requests, and asserts the Redis counter returns to zero. Redis error on INCR fails OPEN. The operational override lets on-call engineers tighten depth=3 during an incident without a code deploy: DEEPTAP_CONCURRENCY_DEPTH3_OVERRIDE=3 drops every tier's depth=3 bucket to 3 until the override is cleared.

Compounding fact cache with decay models

Every extracted page feeds a Kafka-driven fact-extraction worker that produces structured claims (subject, predicate, object) with entity linking to Wikidata QIDs, NLI grounding via MiniCheck, and a per-claim confidence score. Facts are classified into five decay buckets: permanent (zero decay), slow_decay (e.g., historical dates, decay rate 0.001), moderate_decay (e.g., executive biographies, decay rate 0.005), fast_decay (e.g., quarterly financials, decay rate 0.02), and volatile (e.g., stock prices, decay rate 0.1). Effective confidence is base_confidence * exp(-decay_rate * days_since_confirmed). Queries that match a high-confidence cached fact return in under 15 milliseconds for 0.1 credits. This is not response caching; it is a structured, domain-level knowledge base that compounds across every customer's query volume.

Consensus-based trust scoring

Every fetched page is attributed to a domain. Every fact carries evidence from specific source pages with stance markers (supports, contradicts, neutral) and an NLI score. A nightly River batch job computes consensus_ratio = facts_confirmed_by_others / (facts_confirmed_by_others + facts_contradicted_by_others) per domain, assigns a trust tier (authoritative, reliable, mixed, unreliable, adversarial, unknown), and flags suspicious Autonomous System Number (ASN) clusters where three or more newly registered domains publish coordinated content from the same hosting provider. Reranking boosts authoritative domains and penalizes adversarial ones. Unknown domains receive no boost or penalty so the system does not punish legitimate new sites.

Client domain indexing (private docs searchable next to the public web)

Customers upload PDFs, Microsoft Word documents, Markdown, HTML, Comma Separated Values files, and JavaScript Object Notation payloads through a drag-and-drop uploader, an Amazon Simple Storage Service sync connector, a Google Cloud Storage sync connector, or a direct API push. Documents are parsed by a separate document-parser sidecar (avoiding the Affero General Public License on PyMuPDF by using pypdfium2 plus pdfplumber for PDFs and python-docx plus mammoth for Word files), chunked, embedded via bge-small-en-v1.5, and stored with row-level security enforced by set_config('app.current_tenant', $1, true). Client-domain results appear inline with public-web results labeled [PRIVATE]. Customer A's documents are invisible to Customer B at the database tier.

Prompt-injection firewall with transparent scoring

Layer 1 (Go, pre-extraction, at internal/firewall/strip.go + patterns.go): strip 13 documented patterns before Trafilatura sees the HTML: unicode_tag_chars (U+E0000 through U+E007F), zero_width, bidi_override, html_comment, css_display_none, css_visibility_hidden, css_font_size_zero, css_opacity_zero, css_text_indent_offscreen, css_position_offscreen, meta_injection, aria_hidden (gated by DEEPTAP_STRIP_ARIA_HIDDEN), and script_style. Layer 2 (Python sidecar, post-extraction, via the ScoreInjection gRPC RPC at internal/firewall/sidecar_scorer.go): score every extracted document with meta-llama/Prompt-Guard-2-86M and return a numeric score plus heuristic reasons; a NoopScorer bootstrap fallback at noop_scorer.go returns zero so operators can ship without standing up the sidecar. Layer 3 (response mutator at internal/firewall/safe_mode.go): when the request carries safe_mode: "agent", null untrusted_content on every result at the HTTP boundary and stamp safe_mode_applied: "agent" on the envelope. The envelope surfaces prompt_injection_score, unsafe_reasons[], sanitized_content_bytes, untrusted_content_bytes, and a firewall block carrying layer1_stripped_bytes, layer1_patterns_matched[], layer2_model, layer2_implementation, layer2_latency_ms, layer2_docs_scored. Every per-result object carries sanitized_content, untrusted_content, trusted_snippet, prompt_injection_score, and unsafe_reasons. Security teams can audit what was flagged, which patterns fired, which model ran, how long it took, and what it scored. The research is explicit that detectors are signals, not gates (Nasr et al., October 2025, joint OpenAI/Anthropic/Google study: every published detector bypassed above 90% under adaptive attack); the security guarantee DeepTap ships is transparency. Tavily's firewall returns none of these fields; DeepTap tells you what it saw so your agent can decide what to trust.

Agent micropayments via x402 and MPP

No account required. An agent sends a request without authentication, receives a 402 Payment Required response with a payment challenge, settles a United States Dollar Coin (USDC) transaction on Base mainnet at address 0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913 (or Base Sepolia for testing at 0x036CbD53842c5426634e7929541eC2318f3dCF7e), and retries the request with the payment signature in the X-PAYMENT-SIGNATURE header. The Coinbase CDP facilitator verifies and settles. Merchant Payments Protocol (MPP) layers on top: its charge scheme is wire-compatible with x402 exact, and a Go server using coinbase/x402/go automatically accepts MPP charge traffic through the Authorization: Payment header. MPP sessions add DPoP-bound session credentials for streaming micropayments in long agent loops.

Protocol-native: MCP plus A2A

DeepTap ships deeptap-mcp, a binary that serves MCP over stdio for Claude Desktop and Claude Code and Cursor and over Streamable HyperText Transfer Protocol (HTTP) for remote authenticated agents. Three tools are exposed: deeptap_search, deeptap_extract, deeptap_facts. The deeptap_extract tool carries the same attestation gate as the REST handler: mode=full requires an attestation and is rejected at the MCP layer before any REST call runs. The binary accepts --transport stdio|http; the HTTP transport mounts at /mcp + /mcp/ with /healthz served on a separate path so probes do not hit the authenticated route, and requires an Mcp-Api-Key header enforced by APIKeyMiddleware. Server-Sent Events transport is deprecated in MCP specification 2025-11-25 and is rejected explicitly. Copy-paste configuration snippets live in docs/integrations/claude-desktop.md, docs/integrations/claude-code.md, and docs/integrations/cursor.md. On the A2A side, DeepTap uses a2a-go/v2 v2.0.1 against the v1.0 specification (breaking from v0.3: new .well-known/agent-card.json path, new TASK_STATE_* enums, new google.rpc.Status error shape) to publish an agent card and serve task create, task status, and task stream endpoints.

Tavily compatibility shim

cmd/deeptap-tavily-shim/ listens on port 8082 and accepts Tavily-wire POST /search, POST /extract, and POST /map. A Tavily customer points their SDK's base URL at tavily.deeptap.ai (or the locally-deployed shim) and receives responses projected back into Tavily's exact field names and units: response_time as a decimal number, follow_up_questions always null, images passthrough when requested, and usage.credits on every response. Every field-level translation lands in an X-DeepTap-Compat-Notes response header so integration teams can audit exactly what was adapted. Callers that send X-DeepTap-Compat-Mode: strict receive 409 Conflict with a JSON body explaining which notes would have fired instead of a silently-translated 200 response. An embedded 255-entry country lookup resolves Tavily's country field for 195 ISO-3166 alpha-2 codes plus 60 common aliases (usa, uk). The shim calls the DeepTap REST API at DEEPTAP_INTERNAL_URL with DEEPTAP_SHIM_KEY as its bearer so the shim is accountable separately in usage_ledger.

`POST /v1/map` URL discovery

POST /v1/map discovers every URL on a starting domain. A two-phase orchestrator at internal/mapsvc/ runs sitemap discovery first (reading robots.txt Sitemap: directives, then root sitemap.xml) and escalates to a bounded HTML crawl when sitemap yield is below DEEPTAP_MAP_HTML_MIN_YIELD=10. The response returns results[] plus a sources breakdown (robots_sitemap_urls, sitemap_urls, html_crawl_urls, pages_fetched) plus dropped plus truncated plus credits_used, so callers see exactly how each URL landed in the response. Configurable max_depth, max_breadth, limit, allow_external, include_subdomains, and exclude[] parameters bound the crawl cost precisely. Credit pricing matches search: 1.0 safe, 0.5 fast, one usage_ledger row per request.

Hybrid local/cloud (enterprise local-node deployment)

The local-node release is a distroless Go binary plus Postgres 17 with pgvector 0.8.2 plus a lightweight Python embedding sidecar, packaged as Docker Compose and Helm. Mutual TLS with a per-customer self-signed Certificate Authority is provisioned during setup. Public facts synchronize from cloud to local node via HTTP with Hash-based Message Authentication Code-signed cursors and idempotency keys (around 10,000 facts per day). Customer embeddings and fact metadata synchronize local to cloud, opt-in and defaulted off for European Union customers. Dual-key Ed25519 envelope signing binds requests to a customer-specific key. Offline mode keeps local fact lookup, pgvector semantic search, and embedding operational; public search and language-model calls fail with an RFC 7807 Problem+JavaScript Object Notation error that the agent can reason about. Telemetry is pushed over OpenTelemetry Protocol on HyperText Transfer Protocol Secure port 443 only, because enterprise networks will not open the OpenTelemetry default port.

Bootstrap-grade cost envelope

The managed-bootstrap deployment runs on Fly.io Machines with scale-to-zero, Neon Postgres with scale-to-zero, Upstash Redis pay-per-request, Upstash Kafka pay-per-message, ClickHouse Cloud serverless, Vercel for the dashboard, and Cloud Run for Trafilatura extraction. Inference is outsourced to Cohere Rerank and Voyage Embed behind the Go Reranker and Embedder interfaces. Cost at zero traffic is approximately 10 cents per month. Cost at 1,000 requests per day is approximately 38 US dollars per month. Gross margin from the first paying customer is 87% or higher. Transition to self-hosted sidecar happens when volume exceeds 50,000 requests per day, at which point self-hosting wins on unit economics.

Quick Start

DeepTap is Building. All 28 sessions are complete. Phase 1 (Core Search Platform, S01-S18), Phase 2 (Fact Cache, S19-S22), and Phase 3 (Knowledge Layer, S23-S28) all shipped: Foundation, Data Layer + Append-Only Ledger, Search Provider Adapters, Fan-Out + Dedup + URL Normalization, Go-Owned Fetch + Trafilatura Sidecar, Playwright Pool + Domain Strategy Cache, Query Decomposition + LLM Policy, Reranking + Embeddings via Python Sidecar, Prompt Injection Firewall, Depth Modes + Facet Ledger, Caching + Freshness TTLs, Rate Limiting + Concurrency, MCP Server + Tavily Shim, A2A + Payment Middleware with Idempotency, TrustPlane Integration with Fallback, Billing Engine, Postmark Email + DMCA + Dashboard UI, Fact Data Model + Extraction, Staleness Model + Re-Verification, Fact Cache Query Integration, Fact Cache Analytics + Warming, Semantic Source Index + pgvector, Consensus Trust Scoring, Rapid Fact API, Client Domain Indexing, Local Node MVP, and Knowledge Layer Analytics. Twenty-eight session specs are frozen, the bootstrap-hosting plan is frozen. Items in {braces} become real once the referenced session lands.

# Clone and boot the full dev stack
git clone https://github.com/RelayOne/deeptap
cd deeptap
make dev                 # default: boots deeptap, nlp-sidecar, playwright-pool, postgres,
                         # redis, clickhouse, prometheus, grafana, otel-collector
                         # (River is the default event bus, no Kafka required)

make dev-kafka           # opt-in: same as make dev plus Redpanda (Kafka profile)
                         # sets DEEPTAP_EVENT_BUS=kafka

# Apply database migrations (requires golang-migrate installed via
# `go install -tags postgres github.com/golang-migrate/migrate/v4/cmd/migrate@latest`)
make migrate-up          # applies all 12 migrations (extensions, orgs, api_keys,
                         # append-only usage_ledger with 24 monthly partitions pre-created,
                         # facts, fact_evidence, source_pages, domain_profiles,
                         # client_domains with row-level security, dmca_requests,
                         # accounts + journal_entries with deferred balanced-sum trigger,
                         # seed chart-of-accounts)

# Verify health
curl http://localhost:8080/v1/health          # returns {"status": "ok"}
curl http://localhost:8080/v1/ready           # default: returns {"status": "ready"} when
                                              #   postgres + redis + river probes pass
                                              # kafka mode: also probes the Kafka broker
curl http://localhost:8080/metrics            # Prometheus exposition format

# Run your first search (once Session 3 ships the Brave adapter)
curl -X POST http://localhost:8080/v1/search \
  -H "Authorization: Bearer dt_live_xxx" \
  -H "Content-Type: application/json" \
  -d '{"query": "Tavily acquisition", "depth": 1}'

Authentication modes at a glance (all five are mounted on the same /v1/* surface, the middleware stack branches on the header type):

Authorization: Bearer dt_live_xxx for API-key customers billed through Stripe
X-PAYMENT-SIGNATURE: <base64 EIP-3009 payload> for x402 agent micropayments
Authorization: Payment <MPP charge token> for MPP charge traffic, backward-compatible with x402
Authorization: Payment <DPoP-bound session token> for MPP streaming sessions
X-TrustPlane-Credential: <SPIFFE SVID> for portfolio and enterprise TrustPlane verification

Architecture Overview

                ┌─────────────────────────────────────────┐
                │              CLIENTS                     │
                │  SDK (TS/Py/Go) · MCP · A2A · x402/MPP │
                └──────────────────┬──────────────────────┘
                                   │
                ┌──────────────────▼──────────────────────┐
                │           GO API SERVER (chi)            │
                │  auth · rate limit · firewall · billing  │
                │  depth orchestration · caching · ledger  │
                └──┬───────┬────────┬───────┬─────────────┘
                   │       │        │       │
          ┌────────▼──┐ ┌──▼────┐ ┌─▼────┐ ┌▼──────────┐
          │  Brave API │ │Serper │ │OpenRtr│ │Python NLP  │
          │  (safe)    │ │(fast) │ │(LLM) │ │Sidecar     │
          └────────────┘ └───────┘ └──────┘ │ gRPC:50051 │
                                            │ 7 RPCs     │
                                            └────────────┘
                   │            │           │
          ┌────────▼────────────▼───────────▼─────────┐
          │                 POSTGRES                    │
          │  usage_ledger · facts · source_pages        │
          │  domain_profiles · api_keys · orgs          │
          │  pgvector (384d halfvec HNSW)               │
          └────────────────────────────────────────────┘
                   │            │           │
          ┌────────▼──┐  ┌─────▼────┐  ┌───▼──────────┐
          │   Redis    │  │  Kafka   │  │  ClickHouse  │
          │ cache+rate │  │  events  │  │  analytics   │
          └───────────┘  └──────────┘  └──────────────┘

The Go API server is a pure Go binary with no C Foreign Function Interface (CGO), no embedded models, no native libraries. All machine-learning inference routes through the Python NLP sidecar via gRPC on localhost port 50051. The sidecar is a single Python 3.11 process that exposes seven remote procedure calls: Parse (Trafilatura 2.0+), BatchParse, Rerank, Embed, VerifyClaim, LinkEntities (server-streamed), ScoreInjection. ThreadPoolExecutor-backed gRPC server parallelism is real because ONNX Runtime and PyTorch both release the Global Interpreter Lock in their C++ kernels. Minimum sidecar instance size is c7i.2xlarge (16 gibibytes of random-access memory).

Postgres 17 with pgvector 0.8.2 is the canonical data store and the authoritative billing source. The usage_ledger table is append-only, partitioned by month, and protected by triggers that reject UPDATE and DELETE. ClickHouse handles analytics materialized views, never billing. Redis 7 serves the hot cache, the Generic Cell Rate Algorithm (GCRA) rate limiter via a Lua script, cross-instance cache invalidation pub/sub, and idempotency locks. Redpanda (Kafka-compatible) carries usage events in CloudEvents format with idempotent producer semantics. River (Postgres-backed job queue) handles scheduled jobs: fact re-verification, trust-score batches, Wikidata incremental sync, domain-profile aggregation.

For the full technical deep-dive including middleware ordering, data models, protocol details, and every non-obvious decision with its rationale, see docs/ARCHITECTURE.md.

Pricing

Credits per call

Operation	Safe (Brave)	Fast (Serper)
depth=1 search	1 credit	0.5 credit
depth=2 search	3 credits	1.5 credits
depth=3 search	8 credits	4 credits
fact lookup	0.1 credit	0.1 credit
extract excerpt	0.5 credit	0.5 credit
extract full (attested)	2 credits	2 credits
map	1 credit	0.5 credit

Subscription tiers

Tier	Monthly	Credits	Overage / credit	Calls per second	Concurrent
Free	$0	500	n/a	10	5
Starter	$30	4,000	$0.010	50	20
Growth	$200	30,000	$0.008	200	50
Scale	$1,000	200,000	$0.006	500	100
Secure	from $2,500	custom	$0.020	custom	custom
Enterprise	custom	custom	custom	custom	custom

Agent micropayment pricing (no subscription)

Operation	x402 price
depth=1 safe	$0.008
depth=1 fast	$0.002
depth=2 safe	$0.024
depth=3 safe	$0.064
fact lookup	$0.001

The Secure tier adds Bring Your Own Key language-model routing, region pinning, fact-cache opt-out (reads and writes), stricter Zero Data Retention defaults, and a dedicated Customer Success contact. Portfolio tier uses a 40%-of-retail wholesale rate card with monthly invoicing to the internal double-entry ledger; portfolio companies that resell DeepTap handle their own end-customer billing off-platform.

Project Status

DeepTap is Building. All 28 sessions are complete on their build branches: Foundation, Data Layer + Append-Only Ledger, Search Provider Adapters, Fan-Out + Dedup + URL Normalization, Go-Owned Fetch + Trafilatura Sidecar, Playwright Pool + Domain Strategy Cache, Query Decomposition + LLM Policy, Reranking + Embeddings via Python Sidecar, Prompt Injection Firewall, Depth Modes + Facet Ledger, Caching + Freshness TTLs, Rate Limiting + Concurrency, MCP Server + Tavily Shim, A2A + Payment Middleware with Idempotency, TrustPlane Integration with Fallback, Billing Engine, Postmark Email + DMCA Compliance + Dashboard UI, Fact Data Model + Extraction Pipeline, Staleness Model + Re-Verification, Fact Cache Query Integration, Fact Cache Analytics + Warming, Semantic Source Index + pgvector, Consensus Trust Scoring, Rapid Fact API, Client Domain Indexing, Local Node MVP, and Knowledge Layer Analytics. All three product phases (Phase 1 Core Search Platform, Phase 2 Fact Cache, Phase 3 Knowledge Layer) are now complete. Twenty-eight session specs are written and frozen. Thirty-two research artifacts have been completed and integrated. Two adversarial review passes (one technical, one security and legal) have been incorporated. River is the default event bus per specs/ADDENDUM-river-default.md; Kafka is an opt-in implementation of the same EventBus interface selected by DEEPTAP_EVENT_BUS=kafka. The bootstrap-hosting plan is frozen.

Done

Scoping across the full 18 to 20 month roadmap
Master specification at /specs/WORK.md (1,257 lines)
Frozen Statement of Work at /specs/deeptap-sow-combined.md (3,715 lines)
Bootstrap-hosting plan at /specs/deeptap-bootstrap-hosting.md
Thirty-two research artifacts in /specs/research/raw/
All key technology versions pinned (Go 1.25+, pgvector 0.8.2, chi v5.2.5, pgx v5.9.1, stripe-go v85, grpc-go v1.80.0 with CVE-2026-33186 patched)

Done (Session 1: Foundation)

Monorepo structure, Go workspace (go.work)
chi router v5.2.5 with verified 7-layer middleware stack (RequestID, OTel, slog, Recoverer, Prometheus, CORS, Auth)
Health endpoints (/v1/health, /v1/ready, /metrics)
Prometheus metrics with route-pattern labels (deeptap_http_requests_total, _request_duration_seconds, _requests_inflight, _panics_total)
OTel tracing with OTLP or stdout exporter
Structured logging via log/slog
7-RPC NLP sidecar gRPC proto + Python stub server with Health service reporting SERVING
Docker Compose dev stack
GitHub Actions CI (lint, test-go with 80% coverage gate, test-sidecar, build, docker-smoke)
Four Go binaries compile (deeptap, deeptap-mcp, deeptap-cli, deeptap-tavily-shim)
Coverage above 80% on every foundation package (config 100%, health 100%, logging 100%, version 100%, middleware 98.9%, server 93.8%, tracing 90%)
End-to-end verified: compose stack boots, curl /v1/health returns 200, /metrics exposes deeptap_http_requests_total, sidecar Health RPC returns SERVING

Done (Session 2: Data Layer + Append-Only Ledger)

12 golang-migrate migrations: extensions (vector 0.8.2, pgcrypto, btree_gin), organizations, api_keys, usage_ledger, facts, fact_evidence, source_pages, domain_profiles, client_domains, dmca_requests, accounts and journal_entries, seed chart-of-accounts
Append-only usage_ledger partitioned by month with 24 monthly partitions pre-created; BEFORE UPDATE and BEFORE DELETE triggers reject modification at the database tier
Double-entry accounts + journal_entries tables with a DEFERRABLE INITIALLY DEFERRED CONSTRAINT TRIGGER that enforces SUM(amount_cents) = 0 per txn_id at commit time
sqlc v1.30.0 generates typed Go code in internal/db/deeptapdb/ for every .sql query file; pgx v5 driver package
pgxpool wrapper (internal/db/pgx.go) with MaxConns = max(4, runtime.NumCPU()), MinConns=2, 30s HealthCheckPeriod, 1h MaxConnLifetime, pgvector type registration on AfterConnect, and statement-mode switch (cache_statement by default, cache_describe when DEEPTAP_POSTGRES_PGBOUNCER=true)
go-redis v9.18.0 client with 10 pool size, 2 min idle connections, 5s dial timeout, 500ms read/write timeouts, and a Ping(ctx) probe
internal/eventbus/ package with a single Publisher/Subscriber interface implemented by RiverBus (default, Postgres-backed, supports transactional PublishTx) and KafkaBus (opt-in, franz-go v1.20.7 + outbox)
LISTEN/NOTIFY cache-invalidation listener on a dedicated pgx.Conn (not the pool) with exponential-backoff reconnect on the deeptap_cache channel
/v1/ready runs Postgres, Redis, and the active event-bus probe in parallel via errgroup.WithContext bounded to a 3-second timeout; Kafka probe is gated behind DEEPTAP_EVENT_BUS=kafka
Docker Compose default profile drops Redpanda; make dev boots 9 services (deeptap, nlp-sidecar, playwright-pool, postgres pgvector/pgvector:0.8.2-pg17, redis 7-alpine, clickhouse, prometheus, grafana, otel-collector). make dev-kafka activates the kafka compose profile and adds Redpanda as a tenth service
12 integration tests across test/integration/{db,redis,eventbus}_test.go using testcontainers-go against pgvector/pgvector:0.8.2-pg17, redis:7-alpine, and (for the Kafka profile) redpandadata/redpanda:latest; covers migrations apply, pgvector 0.8.2 present, append-only UPDATE/DELETE rejected, accounts seeded, journal balanced-sum trigger, Redis Ping/SetGet/PubSub, River Publish enqueues a job, River PublishTx rollback removes it, PublishTx commit persists it, empty event type rejected
End-to-end verification: make migrate-up applies all 12 migrations cleanly; /v1/health returns 200; /v1/ready returns 200 with postgres + redis + river probes green; DEEPTAP_EVENT_BUS=kafka boot path verified against the kafka profile

Done (Session 3: Search Provider Adapters)

SearchProvider interface plus a typed Registry at internal/search/ that picks the safe adapter (Brave) for provider_class=safe and the fast adapter (Serper) for provider_class=fast; returns ErrProviderUnavailable when an adapter is not configured
Brave adapter hitting https://api.search.brave.com/res/v1/web/search with X-Subscription-Token, Accept: application/json, Cache-Control: no-cache; gobreaker/v2 circuit breaker trips on 5 consecutive failures; 2 retries on 429/5xx with full-jitter backoff and Retry-After respected
Serper adapter hitting https://google.serper.dev/search via POST with X-API-KEY; gobreaker/v2 trips on 6 consecutive failures (distinct threshold from Brave); same retry policy
URL Normalize plus Dedupe at internal/search/normalize.go: lowercase scheme/host, strip default ports, collapse repeated slashes, alpha-sort query params, strip fragment, drop tracking params (utm_*, gclid, fbclid, mc_cid, mc_eid, msclkid, ref, ref_src), prefer https when an http/https pair collapses to the same (host, path, query)
API-key middleware at internal/middleware/apikey.go reads Authorization: Bearer dt_live_*, SHA-256 hashes the full token, resolves the row via sqlc GetAPIKeyWithOrg, and attaches {orgID, apiKeyID, providerClassFromKey, providerClassFromOrg, providerAckAt} to the request context
POST /v1/search handler with depth=1 only: validates body (query 1..1024 chars, depth == 1, otherwise 400 unsupported_depth), resolves provider_class per body -> key -> org, returns 403 fast_provider_not_acknowledged when fast is requested and organizations.provider_ack_at IS NULL, returns 503 provider_unavailable when the selected adapter has no configured API key
Credit pricing in the handler: depth=1 safe = 1.0 credit, depth=1 fast = 0.5 credit; both write an append-only row to usage_ledger via ledger.Append (unique request_id enforces idempotency at the database tier)
Integration tests at test/integration/search_handler_test.go covering the four business paths (safe happy path with Brave mock, fast-without-ack 403, fast-with-ack Serper at 0.5 credits with usage_ledger row asserted, missing auth 401); 14 unit-test packages green; live compose smoke verified for missing auth, unsupported depth, and missing Brave key paths per audit/s03-e2e-verification-2026-04-22.md

Done (Session 4: Fan-Out + Dedup + URL Normalization)

POST /v1/search accepts a sub_queries array (up to DEEPTAP_FANOUT_MAX_SUBS, default 8) alongside the primary query; empty or missing falls back to single-query behaviour
search.Fanout runs sub-queries in parallel against the resolved provider adapter with a semaphore-bounded concurrency limit (DEEPTAP_FANOUT_MAX_INFLIGHT, default 4), a per-call timeout (DEEPTAP_FANOUT_PER_CALL_TIMEOUT, default 5s), and an overall request deadline (DEEPTAP_FANOUT_OVERALL_TIMEOUT, default 10s)
search.Dedupe merges results across sub-queries using first-seen title and snippet, maximum observed score, and the existing URL Normalize collapse (lowercase scheme/host, tracking-param strip, http/https collapse)
One usage_ledger row per user request regardless of sub-query count; idempotency preserved via the existing request_id unique constraint
Partial failures (some sub-queries error or time out, at least one succeeds) surface as warnings[] in the 200 response with the failing sub-query index and error class; full failure returns 504 upstream_timeout when the request deadline is exceeded and 502 upstream_error when every sub-query fails with a non-timeout error
Integration tests at test/integration/search_fanout_test.go cover parallel fan-out happy path, dedup across overlapping sub-queries, per-call timeout surfacing as warning, overall-deadline 504, all-fail 502; unit tests across internal/search/ packages green

Done (Session 5: Go-Owned Fetch + Trafilatura Sidecar)

Go-side fetch client at internal/fetch/ with User-Agent DeepTapBot/1.0 (+https://deeptap.ai/bot), redirect cap 5, size cap 10 MiB, and a content-type gate that admits text/*, application/xhtml+xml, and the application/*xml family
In-process sync.Map robots.txt cache with 1-hour positive TTL and 5-minute negative TTL; per-domain probe order walks AI-specific tokens first (DeepTapBot, GPTBot, ClaudeBot, Claude-SearchBot, anthropic-ai) before falling back to the * user-agent rules
Extractor interface at internal/extract/ with two implementations: SidecarExtractor (gRPC to the Python Trafilatura sidecar at DEEPTAP_SIDECAR_ADDR) and CloudRunExtractor (HTTPS to a Cloud Run Trafilatura function) behind a factory keyed on DEEPTAP_EXTRACTOR_BACKEND (sidecar, cloudrun, or auto)
POST /v1/extract handler fans out per URL with robots -> fetch -> extract -> optional source_pages upsert, bounded by DEEPTAP_EXTRACT_MAX_INFLIGHT (default 4), DEEPTAP_EXTRACT_OVERALL_TIMEOUT (default 20s), and DEEPTAP_EXTRACT_MAX_URLS (default 10); per-URL failures surface as warnings[]; all-robots-deny returns 422; all-timeout returns 504; X-DeepTap-Attestation header required for mode=full
Exactly one usage_ledger row per request regardless of URL count; pricing is 0.5 credit per URL for mode=excerpt and 2.0 credits per URL for mode=full; mode defaults to excerpt
Python sidecar now ships trafilatura==2.0.0 and the real Parse RPC is wired on services/nlp-sidecar/
Integration tests at test/integration/extract_handler_test.go cover excerpt happy path, attestation gate, robots-deny 422, timeout 504, per-URL warning surfacing, and ledger-row accounting across both extractor backends

Done (Session 6: Playwright Pool + Domain Strategy Cache)

Node.js Playwright pool service at services/playwright-pool/ (Fastify + Chromium via mcr.microsoft.com/playwright:v1.49.0-jammy) exposing POST /render, GET /health, and GET /metrics; shared-secret auth via the X-Internal-Token header rejects unauthenticated callers before any browser work
Context pool with configurable size (default 4); overflow requests return 503 pool_exhausted instead of queueing unboundedly; per-render timeout and body-size caps match the Go fetch client
Per-domain strategy cache in Postgres (domain_strategies table) with a rolling empty-rate counter over the last DEEPTAP_STRATEGY_WINDOW samples (default 50); flip-on to Tier 2 at 50% empties with a minimum of 5 samples (DEEPTAP_STRATEGY_MIN_SAMPLES), flip-off back to Tier 1 at 20%; empty defined as extracted content shorter than DEEPTAP_STRATEGY_EMPTY_FLOOR_CHARS (default 200)
/v1/extract Tier 1/Tier 2 escalation wired in Go: Tier 1 runs the S05 fetch plus Trafilatura path (cheap); if the domain strategy says escalate (or Tier 1 returned an empty page), Tier 2 hits the Playwright pool for a rendered DOM and re-runs Trafilatura on the post-render HTML; successful Tier 2 extractions surface js_rendered in warnings[]; pool-down or pool-timeout surfaces playwright_unavailable and falls back to the Tier 1 result
Bootstrap mode (DEEPTAP_MODE=bootstrap) forces PlaywrightEnabled=false at config load; no Tier 2 ever runs in bootstrap, the domain strategy cache records samples but never escalates, and the Cloud Run extractor path handles every URL
New env vars: DEEPTAP_PLAYWRIGHT_POOL_URL, DEEPTAP_PLAYWRIGHT_SHARED_SECRET (REQUIRED in production), DEEPTAP_PLAYWRIGHT_TIMEOUT_MS, DEEPTAP_PLAYWRIGHT_ENABLED, DEEPTAP_STRATEGY_EMPTY_FLOOR_CHARS, DEEPTAP_STRATEGY_FLIP_ON, DEEPTAP_STRATEGY_FLIP_OFF, DEEPTAP_STRATEGY_MIN_SAMPLES, DEEPTAP_STRATEGY_WINDOW
Unit tests green across internal/extract/ escalation, internal/strategy/ flip thresholds, and the Node pool handlers; the full end-to-end integration test (TASK-13) requires a Playwright testcontainer and is deferred

Done (Session 7: Query Decomposition + LLM Policy)

Hand-written stdlib OpenRouter chat-completions client at internal/llm/openrouter/ with Authorization: Bearer, optional HTTP-Referer and X-Title attribution headers, pooled net/http.Transport, gobreaker/v2 circuit breaker (trips on 5 consecutive failures or 60% failure ratio over 20 requests), exponential retry with jitter on 408, 429, 500, 502, 503, 504 via APIError.Retryable(), and X-Generation-Id response-header propagation for audit correlation
Startup GET /api/v1/key ping wired in cmd/deeptap/main.go: 200 logs openrouter key verified and boots; 401 fails fast with openrouter key rejected (401); misconfiguration, refusing to boot; 402 logs a warning and continues (LLM-dependent paths return decomposition_failed until credits are topped up); 5xx or network error logs a warning and continues (the circuit breaker handles the next real call); missing DEEPTAP_OPENROUTER_API_KEY disables the decomposer and every /v1/search request runs the single-query path
Freshness classifier at internal/freshness/ producing four buckets (volatile, daily, standard, stable) via NFC normalization, 4-digit year extraction (historical vs. current/next-year signal), a volatile keyword union, structural patterns (price-or-score live, weather live, status live, breaking, is-still, future-event, current-role, stable-pattern), a daily keyword union, and a standard default; returns (Class, reason) where reason is a short machine-readable label
Per-organization LLM policy loader at internal/policy/ reading organizations.llm_policy JSONB into a typed LLMPolicy (require_zdr, require_data_collection_deny, providers[], models_allowed[], max_tokens, optional temperature); Secure-SKU clamp when organizations.tier == "secure" unconditionally sets require_zdr=true, require_data_collection_deny=true, and defaults providers to ["anthropic"] when empty; malformed JSONB does not fail open (surfaces policy_load_failed in warnings[] and continues with the permissive baseline)
OpenRouterDecomposer at internal/decompose/ with JSON-schema structured output ({subqueries: [{query, priority}]}), NFC-lowered dedup on trimmed query string, priority-descending stable sort, and truncation to DEEPTAP_DECOMPOSE_SUBQUERIES_D1 (default 2) for depth=1 or DEEPTAP_DECOMPOSE_SUBQUERIES_D23 (default 6) for depth=2/3
Model picker at internal/decompose/picker.go: depth=1 prefers DEEPTAP_DECOMPOSE_MODEL_DEPTH1 (default anthropic/claude-haiku-4.5); depth=2/3 prefers DEEPTAP_DECOMPOSE_MODEL_DEPTH23 (default anthropic/claude-sonnet-4.6); falls back to the first anthropic/* slug in policy.ModelsAllowed when the preferred slug is disallowed; returns ErrLLMPolicyViolation when no allowlisted anthropic model exists
Secure-SKU ZDR triple materialises in the OpenRouter request body as provider.zdr=true, provider.data_collection="deny", and provider.order=<allowlist> whenever the loaded policy requires any one of them; the three controls travel together, not separately
/v1/search handler at internal/api/search.go wires freshness classification (always runs, never fails), policy load (failure -> policy_load_failed in warnings, continue on zero-value policy), decomposition under a 10-second timeout (failure -> decomposition_failed in warnings, fall back to single-query path), and sub-query fan-out via search.Fanout; caller-supplied sub_queries always wins over decomposer output; depth=2 and depth=3 are rejected 400 unsupported_depth until DEEPTAP_ENABLE_DEPTH_GT1=true unlocks them in S10
Response envelope extended with freshness_class, freshness_reason, and nested decomposition {model, provider, sub_queries, tokens_prompt, tokens_completion, cost_usd, latency_ms, generation_id}; decomposition is omitted on the single-query path or when decomposition fails; depth-based credit multipliers (depth=2 = 3x, depth=3 = 7x) wired in the handler, gated until S10 activates deeper depths
Prometheus metrics: deeptap_decompose_requests_total{model, outcome} where outcome is one of ok, invalid_json, retry_succeeded, retry_failed, upstream_error, timeout, policy_violation; deeptap_decompose_duration_seconds{model} histogram; deeptap_decompose_subquery_count{depth} histogram; deeptap_decompose_tokens_total{model, kind} counter for prompt and completion tokens; deeptap_freshness_class_total{class} counter
OTEL span llm.openrouter.chat_completion with SpanKindClient and attributes llm.model, llm.provider, llm.prompt_tokens, llm.completion_tokens, llm.zdr, llm.latency_ms, llm.generation_id; errors recorded on the span via RecordError + SetStatus(codes.Error)
New env vars: DEEPTAP_OPENROUTER_API_KEY (required for decomposition), DEEPTAP_OPENROUTER_BASE_URL (default https://openrouter.ai/api/v1), DEEPTAP_OPENROUTER_REFERER + DEEPTAP_OPENROUTER_TITLE (optional attribution headers), DEEPTAP_OPENROUTER_TIMEOUT (default 30s), DEEPTAP_DECOMPOSE_MODEL_DEPTH1 (default anthropic/claude-haiku-4.5), DEEPTAP_DECOMPOSE_MODEL_DEPTH23 (default anthropic/claude-sonnet-4.6), DEEPTAP_DECOMPOSE_SUBQUERIES_D1 (default 2), DEEPTAP_DECOMPOSE_SUBQUERIES_D23 (default 6), DEEPTAP_ENABLE_DEPTH_GT1 (default false)

Done (Session 8: Reranking + Embeddings via Python Sidecar)

Python sidecar services/nlp-sidecar/rerank.py loads ms-marco-MiniLM-L6-v2 INT8 ONNX and serves the gRPC Rerank RPC as a cross-encoder that scores every (query, document) pair; services/nlp-sidecar/embed.py loads bge-small-en-v1.5 ONNX and serves the gRPC Embed RPC producing L2-normalized 384-dim float32 vectors; both are mounted on server.py, and a missing model at boot logs the absence and mounts a stub that returns UNIMPLEMENTED at RPC time rather than crashing the sidecar
Go internal/rerank/ package with a Reranker interface (Rerank, Healthz), SidecarReranker gRPC adapter against DEEPTAP_SIDECAR_ADDR, CohereReranker HTTPS POST /v2/rerank adapter against DEEPTAP_COHERE_API_KEY with one retry on 429 (honoring Retry-After) and a sync.Once bootstrap-warn on first use, and a mode-keyed factory that picks sidecar in production and Cohere in bootstrap; both adapters are wrapped in gobreaker/v2 with MaxRequests=3, Interval=60s, Timeout=30s, ReadyToTrip at 5 consecutive failures
Go internal/embed/ package with an Embedder interface (Embed, Healthz) exposing ModeQuery and ModePassage, SidecarEmbedder gRPC adapter, VoyageEmbedder HTTPS POST /v1/embeddings adapter against DEEPTAP_VOYAGE_API_KEY that sends output_dimension=384 on every request and errors when the response dimensionality is not exactly 384, and a mode-keyed factory
POST /v1/search rerank step between search.Dedupe merge and response write: caps at DEEPTAP_RERANK_MAX_DOCS (default 30), truncates snippets to DEEPTAP_RERANK_MAX_TEXT_CHARS (default 1024), runs under DEEPTAP_RERANK_TIMEOUT (default 1s), reorders results to the reranker's score order, stamps each surviving result with its rerank_score, and attaches a reranker {model, implementation, latency_ms, docs_scored, error} block to the response envelope; disable/nil/error skips silently and still returns 200
POST /v1/extract embed step after each successful per-URL extraction: runs under DEEPTAP_EMBED_TIMEOUT (default 500ms), writes the 384-dim pgvector into source_pages.embedding via the sqlc UpdateSourcePageEmbedding query, and is non-fatal on error
sqlc query UpdateSourcePageEmbedding plus a sqlc.yaml overrides entry mapping the vector column type to github.com/pgvector/pgvector-go.Vector so pgvector.NewVector([]float32{...}) round-trips cleanly through pgx
Prometheus metrics at internal/metrics/nlp.go: deeptap_rerank_requests_total{implementation, outcome}, deeptap_rerank_duration_seconds{implementation}, deeptap_rerank_docs_scored{implementation}, deeptap_rerank_failures_total{implementation, outcome}, and the matching deeptap_embed_{requests,duration,failures}_total family
/v1/ready probes add reranker.Healthz and embedder.Healthz under a 1-second timeout each in production mode; bootstrap mode relies on HTTPS reachability at call time instead
docker-compose pins nlp-sidecar to mem_limit: 1g and adds a grpc-health healthcheck; deeptap service now declares depends_on: nlp-sidecar: {condition: service_healthy} so the Go API will not start until the sidecar passes its healthcheck
New env vars: DEEPTAP_ENABLE_RERANK (default true), DEEPTAP_ENABLE_EMBED (default true), DEEPTAP_RERANK_TIMEOUT (default 1s), DEEPTAP_EMBED_TIMEOUT (default 500ms), DEEPTAP_RERANK_MAX_DOCS (default 30), DEEPTAP_RERANK_MAX_TEXT_CHARS (default 1024), DEEPTAP_COHERE_API_KEY (required in bootstrap mode when rerank is enabled), DEEPTAP_VOYAGE_API_KEY (required in bootstrap mode when embed is enabled); bootstrap-mode Load() rejects missing keys when the matching feature is enabled

Done (Session 9: Prompt Injection Firewall)

Go internal/firewall/ package with five load-bearing files: strip.go + patterns.go implement the Layer 1 pre-extraction HTML stripper matching 13 documented patterns (unicode_tag_chars, zero_width, bidi_override, html_comment, css_display_none, css_visibility_hidden, css_font_size_zero, css_opacity_zero, css_text_indent_offscreen, css_position_offscreen, meta_injection, aria_hidden gated by DEEPTAP_STRIP_ARIA_HIDDEN, script_style); scorer.go defines the Scorer interface; sidecar_scorer.go implements SidecarScorer against the ScoreInjection gRPC RPC under DEEPTAP_SCORE_INJECTION_TIMEOUT (default 500ms) with the text truncated to DEEPTAP_SCORE_INJECTION_MAX_CHARS (default 4096); noop_scorer.go implements the bootstrap-mode NoopScorer returning (0.0, nil); factory.go wires a mode-keyed factory picking SidecarScorer in production and NoopScorer in bootstrap; safe_mode.go implements the Layer 3 SafeModeOff | SafeModeAgent response mutator that nulls untrusted_content on every result at the HTTP boundary when agent mode is selected and stamps safe_mode_applied on the envelope
Python sidecar services/nlp-sidecar/score_injection.py loads meta-llama/Prompt-Guard-2-86M when the Meta-gated weights are present (baked into the image at build time via the HF_TOKEN Dockerfile build arg or downloaded locally via services/nlp-sidecar/scripts/download-models.sh), emits heuristic reasons alongside the score, and mounts a stub that returns UNIMPLEMENTED at RPC time when weights are absent so a missing model is not a boot crash
POST /v1/extract pipeline is now fetch -> Strip -> Extract -> Score -> persist; the sqlc UpdateSourcePageInjection query writes injection_score and the reasons list to source_pages on every extract (non-fatal on error, same pattern as UpdateSourcePageEmbedding)
Response envelope extended with prompt_injection_score (max across results), unsafe_reasons[] (deduped), sanitized_content_bytes, untrusted_content_bytes, a firewall block with layer1_stripped_bytes, layer1_patterns_matched[], layer2_model, layer2_implementation, layer2_latency_ms, layer2_docs_scored, and safe_mode_applied; every per-result object carries sanitized_content, untrusted_content, trusted_snippet, prompt_injection_score, and unsafe_reasons
Request body accepts safe_mode: "off" | "agent"; default pulled from DEEPTAP_SAFE_MODE_DEFAULT (ships at off); agent mode nulls untrusted_content on every result at the HTTP boundary
/v1/search firewall hook is a documented no-op until S10 wires extraction into depth=1 search; today provider snippets from Brave and Serper are not attacker-controlled through our pipeline so Layer 1 and Layer 2 would have nothing to do
Prometheus internal/metrics/firewall.go registers deeptap_firewall_l1_strips_total{pattern}, deeptap_firewall_l1_stripped_bytes, deeptap_firewall_l2_requests_total{implementation, outcome}, deeptap_firewall_l2_duration_seconds{implementation}, deeptap_firewall_l2_score_bucket{implementation}, and deeptap_firewall_unsafe_pages_total{provider_class} with the unsafe threshold pulled from DEEPTAP_UNSAFE_SCORE_THRESHOLD (default 0.7)
/v1/ready probes add Scorer.Healthz under a 1-second timeout in production mode
services/nlp-sidecar/Dockerfile accepts HF_TOKEN as a build argument so CI and local builds can fetch the Meta-gated Prompt-Guard-2-86M weights; services/nlp-sidecar/scripts/download-models.sh is the shared helper for local development
New env vars: DEEPTAP_ENABLE_FIREWALL_L1 (default true), DEEPTAP_ENABLE_FIREWALL_L2 (default true), DEEPTAP_SCORE_INJECTION_TIMEOUT (default 500ms), DEEPTAP_SCORE_INJECTION_MAX_CHARS (default 4096), DEEPTAP_STRIP_ARIA_HIDDEN (default false), DEEPTAP_SAFE_MODE_DEFAULT (default off), DEEPTAP_UNSAFE_SCORE_THRESHOLD (default 0.7)

Done (Session 10: Depth Modes + Facet Ledger)

Go internal/depth/ package with three composers (depth1.go, depth2.go, depth3.go) built on one Composer struct defined in depth.go, plus shared primitives stage.go (RunStage = Fanout -> per-URL fetch + Layer 1 strip + Trafilatura extract + Layer 2 score -> Rerank), ledger.go (facet ledger with SeedFacets heuristic, coverage accumulation, marginal-lift saturation), and reflect.go (OpenRouterReflector against anthropic/claude-sonnet-4.6 with JSON-schema structured output, retry-once on malformed JSON, top-15 findings at 400 chars each capped at DEEPTAP_REFLECTION_INPUT_MAX_CHARS default 12000, Secure-SKU ZDR triple in the request body)
Composer.RunDepth1 runs one stage under DEEPTAP_DEPTH1_TIMEOUT (default 7s, SLO 7s p95); stamps depth=1, rounds_executed=1, stop_reason="ok"
Composer.RunDepth2 runs round 1, reflects, dedupes proposals against the ledger, runs round 2, merges with score-max dedup + re-rerank, under DEEPTAP_DEPTH2_TIMEOUT (default 18s, SLO 20s p95); stop reasons ok, llm_stop, reflector_error, saturation_deduped, timeout, depth1_fallback
Composer.RunDepth3 loops up to DEEPTAP_DEPTH3_MAX_ROUNDS (default 4) rounds under DEEPTAP_DEPTH3_TIMEOUT (default 110s, SLO 120s p95) with an SSE OnEvent callback; stops on hard timeout with 2s grace, Saturated(DEEPTAP_SATURATION_DELTA default 0.05, consecutive=2), LLM-stop advisory, max_rounds, or reflector_error in that precedence
Facet ledger SeedFacets heuristic: vs|versus -> comparative, history of|origin of|when was -> historical, what is|define|definition of -> definitional, current|latest|today|now -> current-status, no match -> general; per-facet coverage normalizes each result's rerank_score against DefaultFacetSaturation=5.0 and caps at 1.0; overall coverage is the unweighted mean; Contains, MarginalLift, Saturated, Export are the public readers
/v1/search dispatches depth=1 and depth=2 through the composer; depth=3 returns 400 use_research_endpoint pointing callers at /v1/research; envelope adds depth, rounds_executed, stop_reason, and (when include_ledger=true and IncludeLedgerAllowed) the ledger block; decomposition stays as-is from S07
POST /v1/research at internal/api/research.go is a POST text/event-stream endpoint with a single writer goroutine fed by a bounded channel plus a separate 15-second heartbeat goroutine (DEEPTAP_SSE_HEARTBEAT), emitting round_start, partial_results, facet_update, reflection, saturation, final, error, and ping events; client-disconnect cancels the orchestrator context; usage_ledger row (8.0 safe / 4.0 fast) is written AFTER the final event
Decomposer schema extension: optional facets array per sub-query exported via SubQueryFacets map[string]string; the reflector and ledger share facet attribution through this field
Prometheus metrics at internal/metrics/depth.go: deeptap_depth_rounds_total{depth}, deeptap_depth_saturation_total{reason}, deeptap_depth_duration_seconds{depth} (SLO-aligned buckets at 7s / 20s / 120s), deeptap_reflection_requests_total{outcome}, deeptap_facet_coverage_average{depth}, deeptap_sse_events_total{event}
OTEL spans depth.round (per round), depth.reflect (per reflection), depth.rerank (per rerank pass) with attributes depth, round, sub_query_count, docs_scored, coverage, plan.new_sub_queries, plan.stop, model, latency_ms
DEEPTAP_ENABLE_DEPTH_GT1 default flipped to true; config.Load() derives DepthGT1Disabled=true at startup when DEEPTAP_MODE=bootstrap AND DEEPTAP_OPENROUTER_API_KEY=="" so depth>=2 falls back to depth=1 with the reflection_unavailable_bootstrap warning; this is NOT a startup error
New env vars: DEEPTAP_DEPTH_MAX_URLS_PER_ROUND (default 10), DEEPTAP_DEPTH_MAX_INFLIGHT_EXTRACT (4), DEEPTAP_DEPTH_MAX_FANOUT (4), DEEPTAP_DEPTH1_TIMEOUT (7s), DEEPTAP_DEPTH2_TIMEOUT (18s), DEEPTAP_DEPTH3_TIMEOUT (110s), DEEPTAP_DEPTH3_MAX_ROUNDS (4), DEEPTAP_SATURATION_DELTA (0.05), DEEPTAP_REFLECTION_INPUT_MAX_CHARS (12000), DEEPTAP_REFLECTION_TIMEOUT (10s), DEEPTAP_SSE_HEARTBEAT (15s)

Done (Session 11: Caching + Freshness TTLs)

Go internal/cache/ package ships nine load-bearing files alongside the existing S02 redis.go and invalidation.go (eleven files total under internal/cache/): keys.go defines FullKey, SubQueryKey, ExtractKey, and FactKey with a v1: version prefix plus sha256 hashes; normalize.go runs NFC + lowercase + whitespace collapse + optional leading-article strip; manager.go is the tiered Manager with msgpack v5 encoding, byurl:{sha256} and bydomain:{sha256} reverse indices written through one Redis pipeline per store, and corrupt-key cleanup that deletes any payload that fails to decode; ttl.go exports DetermineTTL mapping volatile=5m, daily=1h, standard=4h, stable=24h with MinFullTTL=60s and MinExtractTTL=15m floors and a 1-hour sub-query cap; singleflight.go coalesces concurrent identical requests under DEEPTAP_CACHE_SF_TIMEOUT; invalidator.go opens a dedicated pgx.Conn, issues LISTEN deeptap_cache + LISTEN deeptap_dmca_suppress, decodes JSON payloads, and runs handlers under a panic-recover; fact_noop.go defines FactCache + NoopFactCache; tombstone.go writes the 60-second negative-cache tombstone on adversarial pages; metrics_sink.go emits the deeptap_cache_* Prometheus family
Depth orchestrator integration: depth1.go/depth2.go/depth3.go compute FullKey from (normalized query, depth, provider_class, country, language, safe_mode), call LookupFull at the top, call NoopFactCache.Lookup after full-miss, singleflight-wrap the pipeline on miss, and call StoreFull on success with the TTL derived from the freshness class; DepthResult now carries CacheHit, CacheHitType, CacheKeysHit
ResearchStage.RunStage does a per-sub-query batch MGet and only calls the provider for misses; writes back per success via StoreSubQuery; per-URL extract caching uses LookupExtract to skip fetch + Trafilatura on hit (the firewall still re-scores the cached text so a model update defends against stale injection classifications) and StoreExtract on miss-then-success; a successful extract cache hit emits the extract_cache_hit warning
/v1/extract handler integrates the same per-URL extract cache and emits the same warning; /v1/search envelope, /v1/research final event, and a new cache_hit SSE event all carry cache_hit, cache_hit_type in {full, subquery, extraction, fact, miss}, and cache_keys_hit[]
Negative-cache tombstone: when any per-result prompt_injection_score is at or above UnsafeScoreThreshold, the envelope is NOT stored; a 60-second tombstone at the same FullKey returns a shaped empty-results envelope with cache_hit=true, cache_hit_type="full", and unsafe_reasons populated so hot retries on adversarial pages never reach the upstream provider
Cross-instance DMCA invalidation: pg_notify('deeptap_dmca_suppress', '{"type":"suppress","url":"...","domain":"..."}') fans out to every DeepTap instance, looks up the byurl:{sha256} and bydomain:{sha256} reverse-index sets, and deletes every cache key that referenced the suppressed URL or domain within a 1-second SLO (tracked as deeptap_cache_invalidation_latency_seconds). Single-key eviction rides the deeptap_cache channel with {"type":"invalidate_key","key":"..."}
Production-mode and bootstrap-mode /v1/ready add a 100-millisecond Redis PING cache probe that returns 503 on failure
Prometheus internal/metrics/cache.go registers deeptap_cache_requests_total{tier, outcome} (tier: full|subquery|extract|fact; outcome: hit|miss|error|bypass), deeptap_cache_lookup_duration_seconds{tier}, deeptap_cache_store_bytes{tier}, deeptap_cache_evictions_total{reason} (reasons: dmca_url|dmca_domain|key|ttl|size_cap), deeptap_cache_singleflight_share_total, deeptap_cache_full_hit_latency_seconds, deeptap_cache_invalidation_latency_seconds
Billing unchanged: cache hits write the full usage_ledger credit cost (research artifact 25, Tavily parity) because the customer value is the answer, not the path
Redis 7 required for EXPIRE ... NX|XX|GT|LT option flags used by StoreFull
New env vars: DEEPTAP_CACHE_ENABLED (default true), DEEPTAP_CACHE_FULL_TIER (true), DEEPTAP_CACHE_SUBQUERY_TIER (true), DEEPTAP_CACHE_EXTRACT_TIER (true), DEEPTAP_CACHE_MIN_FULL_TTL (60s), DEEPTAP_CACHE_MIN_EXTRACT_TTL (15m), DEEPTAP_CACHE_MAX_VALUE_BYTES (262144), DEEPTAP_CACHE_STRIP_LEADING_ARTICLES (false), DEEPTAP_CACHE_SF_TIMEOUT (30s)

Done (Session 12: Rate Limiting + Concurrency)

Go internal/ratelimit/ package with three load-bearing files: lua/gcra.lua is a 40-line atomic Generic Cell Rate Algorithm Lua script embedded via go:embed; gcra.go wraps it in redis.NewScript (stable SHA1 across callers), preloads via Script.Load at boot with a WARN-log-on-fail fallback, runs each Allow under a 50 ms per-op timeout with a NOSCRIPT retry once, and fails OPEN on Redis error by returning Decision{Allowed: true} alongside a non-nil error; a MetricsSink interface keeps the package free of a metrics-package import; tier.go holds the canonical Tiers map and ResolveLimit(orgTier, rateLimitOverride) that falls back to Free on unknown tiers and lets a positive per-key api_keys.rate_limit_override replace CPS with Burst = 2 * CPS
Tier table (from specs/PROJECT-CONTEXT.md): Free 10 cps / 20 burst / 5 general / 2 depth3; Starter 50 / 100 / 20 / 5; Growth 200 / 400 / 50 / 10; Scale 500 / 1000 / 100 / 25; Secure 500 / 1000 / 100 / 25; Enterprise 1000 / 2000 / 200 / 50
Go internal/middleware/ratelimit.go mounts the GCRA middleware AFTER APIKey and BEFORE the cache layer so a cache hit still counts against the per-key budget; stamps X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset on every admission; renders a 429 RFC 7807 application/problem+json body with type=https://deeptap.ai/errors/rate_limited, a Retry-After header (ceiling of retry_after_ms / 1000, floored at 1), and a limits.rate block on denial; stashes the admission *Decision + resolved Tier on the request context via WithRateLimitDecision so /v1/search, /v1/extract, and /v1/research envelopes stamp a rate_limit: {limit, remaining, reset_ms} block from the same numbers used on the headers
Go internal/middleware/concurrency.go implements the per-org in-flight counter keyed on v1:conc:<org_id>:<bucket>: /v1/research maps to the depth3 bucket, every other /v1/* endpoint maps to general; INCR on entry with an atomic DECR rejection on saturation and a defer rdb.Decr(context.Background(), key) release so a handler panic still returns the slot (Recoverer above the chain catches the panic; Go's defer runs during the unwind); 503 RFC 7807 with type=https://deeptap.ai/errors/concurrency_exhausted + limits.concurrency{bucket, limit, in_flight} on saturation; Redis error on INCR fails OPEN
Go internal/auth/ctx.go introduces AuthContext + WithAuthContext / AuthContextFrom as a thin context-access shim so the ratelimit middleware does not have to import internal/middleware (the apikey middleware is still the authoritative producer of org / key identifiers)
Go internal/api/errors.go adds the Problem RFC 7807 struct, WriteProblem helper, and ErrTypeRateLimit + ErrTypeConcExhausted constants reused by both middlewares (the string literals are duplicated inside internal/middleware/ to avoid an api <- middleware <- api import cycle)
cmd/deeptap/main.go boots ratelimit.NewLimiter(redisClient, logger).WithMetrics(rateLimitMetrics), runs Limiter.LoadScript(ctx) with a WARN log on failure, and the chi chain order is now RequestID -> OTEL -> slog -> Recoverer -> Prom -> CORS -> APIKey -> RateLimit -> Concurrency -> handler
Prometheus internal/metrics/ratelimit.go registers deeptap_ratelimit_requests_total{tier, outcome} (outcomes allowed|denied|redis_error|redis_timeout), deeptap_ratelimit_decision_duration_seconds, deeptap_ratelimit_retry_after_ms, deeptap_concurrency_inflight{bucket} gauge, deeptap_concurrency_rejections_total{bucket}, and deeptap_ratelimit_redis_errors_total
100-goroutine-vs-50-burst concurrency test against a real Redis container verifies the Lua compare-and-swap atomicity (exactly 50 admissions, exactly 50 denials); a separate test covers the panic-safe DECR path by mounting Concurrency downstream of Recoverer and a handler that panics and asserting the Redis counter returns to zero
Fail-OPEN policy is deliberate: rate limiting is an SLA lever, not a correctness gate; a Redis outage must not cause an API outage
New env vars: DEEPTAP_RATELIMIT_ENABLED (default true), DEEPTAP_RATELIMIT_BURST_2X (default true), DEEPTAP_CONCURRENCY_ENABLED (default true), DEEPTAP_CONCURRENCY_DEPTH3_OVERRIDE (default 0 = use tier)

Done (Session 13: MCP Server + Tavily Shim)

POST /v1/map endpoint wired at the DeepTap API via internal/mapsvc/: a two-phase orchestrator (Phase 1 reads robots.txt Sitemap: directives then falls back to <scheme>://<host>/sitemap.xml; Phase 2 escalates to a bounded HTML crawl when sitemap yield is below DEEPTAP_MAP_HTML_MIN_YIELD=10) plus post-processing that unions + normalizes + dedupes + filters (allow_external, include_subdomains, exclude[]) + truncates to DEEPTAP_MAP_LIMIT=1000; response carries results[], sources breakdown (robots_sitemap_urls, sitemap_urls, html_crawl_urls, pages_fetched), dropped, truncated, credits_used; one usage_ledger row per request at 1.0 credit safe / 0.5 fast
cmd/deeptap-mcp/ binary serves the Model Context Protocol with --transport stdio|http (SSE transport is rejected explicitly per MCP 2025-11-25); HTTP mounts the MCP handler at /mcp + /mcp/ with /healthz served separately and requires an Mcp-Api-Key header enforced by APIKeyMiddleware before the MCP handler runs; internal/mcp/ ships types.go (jsonschema-tagged SearchIn/Out, ExtractIn/Out, FactsIn/Out), client.go (APIClient.doJSON + RESTError), handlers.go (three tool handlers with attestation gate on deeptap_extract mode=full), server.go (NewServer), middleware.go (APIKeyMiddleware), and server_test.go (in-memory transport coverage)
cmd/deeptap-tavily-shim/ binary listens on :8082 for Tavily-wire POST /search + POST /extract + POST /map; internal/shim/ (translate_search.go, translate_extract.go, translate_map.go, translate_response.go, handler.go, country.go with embedded 255-entry ISO-3166 + alias lookup) translates Tavily requests into DeepTap and projects responses back into Tavily's exact wire shape (decimal response_time, null follow_up_questions, images passthrough, usage.credits); X-DeepTap-Compat-Notes response header lists every field-level translation; callers that send X-DeepTap-Compat-Mode: strict receive 409 Conflict when any compat note would have fired; the shim calls DEEPTAP_INTERNAL_URL with DEEPTAP_SHIM_KEY as its own bearer
Three new integration docs under docs/integrations/: claude-desktop.md (macOS/Linux direct binary, Windows double-backslash paths, remote via mcp-remote), claude-code.md (claude mcp add CLI with local/user/project scopes and Streamable HTTP), cursor.md (.cursor/mcp.json for stdio and remote streamable-http, with warning on Cursor's 40-tool global cap)
migrations/0014_map_jobs.sql scaffolds an async /v1/map mode for a future session; the current endpoint is synchronous and returns partial results with truncated=true on timeout rather than a 504
New env vars: DEEPTAP_MAP_MAX_DEPTH (default 2), DEEPTAP_MAP_MAX_BREADTH (default 200), DEEPTAP_MAP_LIMIT (default 1000), DEEPTAP_MAP_HTML_MAX_BYTES (default 2 MiB), DEEPTAP_MAP_SITEMAP_MAX_BYTES (default 50 MiB), DEEPTAP_MAP_TIMEOUT (default 20s), DEEPTAP_MAP_HTML_CONCURRENCY (default 8), DEEPTAP_MAP_HTML_MIN_YIELD (default 10)
Prometheus metrics: deeptap_map_sitemap_phase_seconds, deeptap_map_html_phase_seconds, deeptap_map_total_seconds, deeptap_map_urls_discovered_total{source}, deeptap_map_pages_fetched_total

Done (Session 14: A2A + Payment Middleware with Idempotency)

Agent-to-Agent Protocol v1.0 server at internal/a2a/ with BuildAgentCard wiring four skills (web-search, fact-lookup, url-extract, site-map), four security schemes (bearer, dpop, x402, mpp), and two transports (JSON-RPC, HTTP-plus-JSON); AgentExecutor maps TextPart to search, DataPart.urls to extract, DataPart.subject to facts
x402 pay-per-call at internal/payments/x402.go with a 402 challenge emitting Base64url v2 JSON plus WWW-Authenticate: Payment with an HMAC-SHA-256 id plus canonical USDC addresses for Base mainnet and Sepolia; VerifyAndSettle orchestrates a facilitator call against /v2/x402/verify + /v2/x402/settle
Merchant Payments Protocol at internal/payments/mpp_charge.go + mpp_session.go: Authorization: Payment parser with temporary / Stripe-SPT / Lightning dispatch; MPP sessions use 32-byte opaque access tokens stored as SHA-256 hash in mpp_sessions.access_token_hash with monotonic cumulative accounting
DPoP with go-dpop v1.1.2: ES256 / RS256 allow-list enforced pre and post parse, Redis nonce rotation via dpop:nonce:{jkt} under a 5-minute TTL, jti replay guard via dpop:jti:{jkt}:{jti} SET NX, 60-second clock-skew tolerance
internal/middleware/payment_dispatch.go 7-branch dispatcher (Bearer > x402 > MPP > DPoP > 402) mounted BEFORE APIKey on /v1 when cfg.X402Enabled || cfg.MPPEnabled
internal/middleware/idempotency.go with SHA-256 body hash under a 1 MiB cap, Redis SET NX lock on idemp:lock:{scope}:{key}, replay-cached 2xx bytes carrying Idempotency-Replayed: true, 409 on body-hash mismatch, 409 on in-flight collision, fail-open on Redis unreachable with a degraded counter; panic-safe lock release via deferred Redis DEL
4 new migrations: 0015_payment_attempts with 12 seeded monthly partitions + 60-day retention; 0016_mpp_sessions with DPoP-thumbprint + SHA-256 hashed access-token binding; 0017_dpop_nonces append-only audit; 0018_a2a_tasks JSONB history + artifacts

Done (Session 15: TrustPlane Integration with Fallback)

SPIFFE X.509-SVID auth path with live-preferred + local-fallback verification via github.com/spiffe/go-spiffe/v2 v2.6.0; 3 Postgres migrations (0019 trustplane_bundle_cache, 0020 trustplane_verifications monthly-partitioned, 0021 portfolio_accounts with portfolio-revenue seed); 10 TrustPlane env vars with fail-fast boot validation when TRUSTPLANE_ENABLED=true
internal/identity/spiffeid.go URI-SAN extractor + trust-domain allow-list; internal/identity/tp_client.go dedicated http.Transport with hard-deny vs. network-error distinction for local fallthrough; internal/identity/bundler.go + metrics.go periodic spiffebundle fetch + Postgres + Redis persistence + 7 Prometheus instruments labeled trust_domain only to avoid SPIFFE-ID cardinality explosion
internal/identity/trustplane.go Verifier with live-preferred + local-fallback, deny short-circuit, stale-reject / warn, x509svid.Verify offline, async audit writer with bounded channel and drop counter
internal/billing/portfolio.go#PortfolioLedger.PostToLedger balanced double-entry posting with banker-rounded markup and spiffe_id > catch-all account lookup; payment dispatcher TrustPlane branch runs BEFORE Bearer (401 deny, 503 stale / no_bundle, passes through when no client certificate present)
internal/identity/tls.go ClientCAProvider + BuildTLSConfig with VerifyClientCertIfGiven + GetConfigForClient memoized at a 10-second minimum; .well-known / health / metrics remain mTLS-optional
internal/identity/admin.go chi-mountable admin router at /portfolio/accounts, /trustplane/bundle{,/refresh}, base64 bundle export; deeptap-cli trustplane {verify,bundle-refresh,bundle-status} subcommands; Caddy-based trustplane-mock compose service on port 8089 (dev-default OFF)

Done (Session 16: Billing Engine)

30-task branch on build/S16-billing-engine. Four independent billing surfaces, one unified ledger feed. go.mod bumps toolchain to Go 1.26.1 + adds github.com/stripe/stripe-go/v85 + github.com/johnfercher/maroto/v2 v2.4.0
7 Postgres migrations: 0022_stripe_customers with tier + dunning counters, 0023_stripe_meters, 0024_stripe_credit_grants, 0025_stripe_meter_events_outbox with UUID identifier + partial index on sent_at IS NULL, 0026_stripe_webhooks with event_id dedup, 0027_portfolio_invoices with UNIQUE(account_id, period_start, period_end), 0028_reconciliation_reports with NUMERIC(6,4) variance_pct
internal/billing/ package: client.go facade with DryRun kill and Enabled() check; meters.go idempotent EnsureMeters with MeterCreator injection; metrics.go 10 Prometheus instruments (tier / event_name / outcome / endpoint labels only, never customer_id); outbox.go transactional EnqueueMeterEvent with UUID v7 Stripe-dedup identifier + pure validateEnqueueArgs helper; flusher.go FlushOnce drains FOR UPDATE SKIP LOCKED in batches of 100 via V2BillingMeterEventStreams.Create, updates sent_at before commit (prefers duplicate delivery over double-charge); subscriptions.go UpsertSubscriptionForOrg with tier + SubscriptionItem translation + proration_behavior=create_prorations + LoadRateCard JSON; creditgrants.go CreateCreditGrant + SyncCreditGrantFromWebhook (price_type=metered applicability, category=paid)
internal/billing/webhooks.go signature verify via webhook.ConstructEvent + ON CONFLICT DO NOTHING dedup + panic recovery + per-type dispatch fans out to webhooks_invoice.go (created / finalized / paid / failed with dunning counter at threshold 3), webhooks_subscription.go (sync subscription_id without flipping tier), webhooks_creditgrant.go (shadow table upsert), webhooks_meter_error.go (log + counter without marking outbox sent); empty-secret hard-fail is deliberate (operator misconfiguration, not a dev fallback)
internal/billing/portfolio.go + portfolio_monthend.go month-end aggregator; pdf.go maroto/v2.4.0 A4 portrait renderer with header + customer block + line-item table + totals + footer, byte-stable across re-renders; reconcile.go Classify 3-bucket drift detector (clean under 0.1 percent, variance 0.1 to 5 percent, error above 5 percent) + SumStripeMeterEvents iterator across every active (stripe_customer_id, meter_id) pair + RunPeriodReconcile end-to-end
internal/billing/jobs.go + per-account MonthEndWorker in portfolio.go: 4 River workers (FlushMeterEventsWorker, PortfolioMonthEndWorker fan-out, ReconcileStripeWorker with caller-injected StripeTotaler closure, WebhookReplayWorker); internal/jobs/schedules.go PeriodicJobs at 60-second flush, 24-hour reconcile at 04:00 UTC via DailyAtUTC, monthly portfolio run at 03:00 UTC day-one via MonthlyAtUTC, 5-minute webhook replay
cmd/deeptap/main.go wires the billing outbox hook into /v1/search + /v1/extract + /v1/map handlers post-ledger-commit; mounts POST /v1/billing/webhooks/stripe at top-level r (outside /v1 so no auth middleware consumes raw body); constructs BillingClient + Metrics + PortalCreator unconditionally so webhook verification and zero-state metrics work even with BillingEnabled=false; River client stops BEFORE http.Server.Shutdown so in-flight billing jobs commit cleanly
Handler hooks skip the TrustPlane path. When PaymentDispatchModeTrustPlane rides on the request context, the outbox enqueue fast-exits because portfolio customers are billed via the internal double-entry ledger and their traffic never touches Stripe. TrustPlane detection uses the payment-dispatch mode flag set by the S15 dispatcher branch; the handlers do not inspect api_key_id directly
cmd/deeptap-cli/billing.go adds billing reconcile / portfolio-invoice / outbox-status subcommands; config/rate-card.json seed with three tiers (Starter $19 / 1,000 credits, Growth $99 / 10,000 credits, Scale $499 / 100,000 credits) and four meters each with placeholder price_id fields the operator fills post-EnsureMeters
8 new env vars documented under docs/DEPLOYMENT.md: DEEPTAP_STRIPE_SECRET_KEY, DEEPTAP_STRIPE_WEBHOOK_SECRET, DEEPTAP_STRIPE_CLIMATE_ENABLED, DEEPTAP_DASHBOARD_URL, DEEPTAP_BILLING_RATE_CARD_PATH, DEEPTAP_PORTFOLIO_INVOICE_FROM_EMAIL, DEEPTAP_PORTFOLIO_INVOICE_LOGO_PATH, DEEPTAP_BILLING_DRY_RUN

Done (Sessions 17 + 18: DMCA Compliance + Dashboard UI)

Postmark transactional email delivery on the primary send.deeptap.ai domain plus separate dmca.deeptap.ai reputation domain via Amazon Simple Email Service for the takedown workflow
DMCA intake at POST /v1/dmca/report with sworn-statement validation, ticket-creation flow, and cross-instance cache suppression that lands inside one second via the existing deeptap_dmca_suppress LISTEN/NOTIFY channel
Counter-notice state machine (received -> actioned -> counter_notice -> resolved) covering the DMCA 512(g) 10-to-14-business-day window
Next.js 15 dashboard shell with seven routes (/overview, /usage, /billing, /apikeys, /domains, /facts, /settings), session-cookie middleware, four recharts time-series views, fact-cache analytics panel reading the S19-onward metrics, and a Mintlify docs site with 18 MDX pages plus a single-source-of-truth trust-report renderer

Done (Session 19: Fact Data Model + Extraction Pipeline)

internal/facts/ package with the full extraction pipeline: 40 predicate aliases, five-tier classification cascade, NFKC normalization, ROUGE-L recall gate, atomic per-day budget consumer, OpenRouter JSON-schema extractor, sidecar VerifyClaim and LinkEntities callers, three-tier dedup (exact, trigram, insert), conflict flagger, and a Kafka-driven per-page worker
cmd/deeptap-fact-worker/ binary with graceful shutdown and Prometheus on port 9091
Four migrations land the dual trigram GIN index + partial unique canonical constraint + audit run table + per-day budget counter

Done (Session 20: Staleness Model + Re-Verification)

Pure-function decay model EffectiveConfidence(base, rate, lastConfirmed, now) = base * exp(-rate * days_since_confirmed) with NeedsReverification and InOpportunisticBand predicates
River-backed scheduler with three queues (reverify_priority 10 workers, reverify_scan 1, maintenance 1) and two periodic jobs (hourly scan, monthly partition creator)
cmd/deeptap-scheduler/ standalone binary plus a /internal/river UI gated by X-DeepTap-Internal header
Contradiction-resolution state machine with MinConfirmingDomains=2 default, fact-supersession audit log, and opportunistic re-verification hook for the band [threshold, threshold + 0.10)

Done (Session 21: Fact Cache Query Integration)

POST /v1/facts/query exposes the fact cache with subject or subject_qid lookups, optional predicate, min_confidence floor, include_evidence toggle, and max_results cap (default 10, hard cap 50)
Redis read-through cache fact:q:v1:<hash> with snappy-compressed msgpack payloads; TTL driven by the fastest-decaying tier in the result (permanent 24h, slow 12h, moderate 1h, fast 10m, volatile 2m)
One usage_ledger row at 0.1 credit per request irrespective of hit or miss; conflict-flagged facts surface with conflict_flag=true; superseded facts never returned
Depth=1 search pipeline gains a top-of-RunDepth1 fact probe via Composer.Prober that short-circuits decomposition + search + extract + rerank on a confident fact-cache hit; billing becomes 0.1 credit instead of 1.0

Done (Session 22: Fact Cache Analytics + Warming)

Four always-on feed workers under internal/feeds/ (wikidata, cve, ietf, edgar) scheduled by River from internal/jobs/feed_workers.go on independent cadences (Wikidata daily delta + monthly seed dump, CVE/NVD daily, IETF weekly, SEC EDGAR hourly during US trading); workers persist resume state in feed_ingestion_state (last-run summary) and feed_cursor (opaque per-feed resume blob) so a worker restart picks up where the previous run stopped
Demand-triggered feed routing keyed off a feed_registry table with topic_pattern glob matching against subject:predicate; cache-miss queries that match a pattern queue a synchronous EnqueueDemandFeed River job behind a single 500-credit-per-day shared budget so a hot pattern cannot starve quiet ones
New deeptap.facts Kafka topic produced by the S19 fact-extraction worker, the S21 fact-query handler (on opportunistic-reverify writes), and every S22 feed worker; CloudEvents-shaped envelope per new, confirmed, or contradicted fact with a per-event UUID dedup identifier so worker restarts cannot double-count downstream
deploy/clickhouse/schemas/022_fact_events.sql defines the consumer side: a Kafka engine table on deeptap.facts plus four materialized views that drive the dashboard's Fact Cache tab (mv_fact_hit_rate rolling 5-minute hit-vs-miss ratio, mv_facts_by_type 5-tier decay-class histogram, mv_staleness_distribution days_since_confirmed histogram, mv_conflict_rate rolling proportion of conflict_flag=true facts); ClickHouse is read-only from the dashboard's perspective and never feeds back into the Postgres facts row
Tiny upstream-shape fixtures land at test/fixtures/feeds/{wikidata-tiny.ndjson, cve-sample.json, rfc-index-tiny.txt, edgar-sample.atom} with a schema README; build-tagged integration test skeletons under integration_feeds and integration_clickhouse reserve test names + paths and skip cleanly until a Postgres + ClickHouse + Kafka testcontainer stack is wired in go.mod
Phase 2 (Fact Cache, S19-S22) is now complete
Phase 3 (Knowledge Layer, S23-S28) is now complete; all 28 sessions of the build have shipped

In Progress

No session currently active. Phases 1, 2, and 3 are all complete; the platform is in v1 release-candidate posture pending the Phase 4 hardening surfaces.

Done (Knowledge Layer Analytics, S28 FINAL)

S28 is the read-only analytics surface over the Knowledge Layer. Three new Postgres materialized views (mv_source_index_stats, mv_entity_coverage, mv_topic_coverage) refreshed daily at 06:00 UTC by a River-managed worker. New public endpoint GET /v1/trust/domain returns the full domain trust profile (tier, consensus ratio, fact counters, ASN metadata, sample evidence count, last-updated timestamp) at 0.05 credit per call with a 5-minute Redis cache. Three new dashboard handlers (source-index, entity-coverage, topic-coverage) read the matching MV with a 5-minute Redis cache and degrade gracefully on a backing-store outage. New internal/diversity package adds an Autonomous System Number tiebreaker the reranker uses to prefer cross-network confirmation over single-network confirmation at the same effective confidence. Three Prometheus instruments cover lookup outcome, refresh duration per view, and mean diversity per tier. OpenAPI 3.1 schema documents the public endpoint; full unit-test coverage on every Go-side surface. With S28 shipped, all 28 sessions of the eighteen-to-twenty-month build are complete.

Scoped

All 28 sessions are now Done. Phase 4 hardening surfaces (SOC 2 Type 2, additional protocol adapters, multi-region production topology) remain on the horizon.

Scoping

Phase 4 hardening specifics (Service Organization Control (SOC) 2 Type 2, additional protocol adapters, multi-region, long-term residency controls)

Potential / On Horizon

Service Organization Control (SOC) 2 Type 2 certification
Additional protocol adapters beyond MCP, A2A, x402, MPP, TrustPlane
Multi-region production topology beyond the bootstrap iad (us-east) region
Additional language-model provider integrations beyond the initial OpenRouter default
Stripe Connect reseller integration (explicitly cut from version 1)
Voice-agent-grade latency tier with warm Fly.io machines

Documentation

Document	Audience	What You Will Find
How It Works	Product and engineering	User journeys for developers, agents, and enterprise; the full search flow, `/v1/map` orchestrator, MCP server, Tavily shim, fact-cache flow, firewall flow, billing flow, error behavior
Architecture	Developers	Tech stack with pinned versions, repository map, system topology, data flow, persistence layer, machine-learning inference, protocols, middleware stack, authentication, billing, non-obvious decisions with rationale
Feature Map	Product and stakeholders	Every feature grouped by domain with user-facing benefit, session that delivers it, and status
Deployment	DevOps and deployers	Account prerequisites, environment variables, local development setup, bootstrap deployment, production migration triggers, database migrations, proto regeneration, local-node deployment, observability, DMCA domain setup, secrets rotation
Business Value	Marketing, investors, executives	The opportunity, the problem, the solution, why us and why now, target customers, how we make money, what is different, traction plan, team, status
Claude Desktop Integration	Claude Desktop users	Copy-paste `claude_desktop_config.json` snippets for macOS/Linux direct binary, Windows double-backslash paths, and remote via `mcp-remote`
Claude Code Integration	Claude Code users	`claude mcp add` CLI snippets for stdio (local/user/project scopes) and Streamable HTTP
Cursor Integration	Cursor users	`.cursor/mcp.json` snippets for stdio and `streamable-http`; warning on Cursor's 40-tool global cap

Contributing

DeepTap is developed in public by RelayOne. Contribution guidelines are available via the repository at github.com/RelayOne/deeptap. The monorepo uses a Go workspace (go.work), sqlc for typed database queries, golang-migrate for migrations, and golangci-lint plus ruff for linting.

Branch naming follows session/S<NN>-<short-slug> for session-aligned work and fix/<short-slug> for bugfixes. All commits that touch documentation go in their own commit per the project rules. Every specification in /specs/ is considered frozen unless explicitly reopened.

License

License will be declared before Session 1 lands. The current plan is dual-licensed: source-available for the Go API and dashboard, permissive license for the SDKs (TypeScript on npm, Python on PyPI, Go as a module), commercial license for local-node binaries. This will be finalized in the Session 1 commit.

Last updated: 2026-04-24 (S22; Phase 2 complete)

Name		Name	Last commit message	Last commit date
Latest commit History 759 Commits
.github/workflows		.github/workflows
.site		.site
audit		audit
cmd		cmd
config		config
dashboard		dashboard
deploy		deploy
docs-site		docs-site
docs		docs
internal		internal
migrations		migrations
plans		plans
sdk		sdk
services		services
specs		specs
test		test
.editorconfig		.editorconfig
.env.sample		.env.sample
.gitignore		.gitignore
.golangci.yml		.golangci.yml
CLAUDE.md		CLAUDE.md
Makefile		Makefile
README.md		README.md
cloudbuild-ci.yaml		cloudbuild-ci.yaml
go.mod		go.mod
go.sum		go.sum
go.work		go.work
go.work.sum		go.work.sum
sqlc.yaml		sqlc.yaml

Folders and files

Latest commit

History

Repository files navigation

DeepTap

What Is DeepTap?

Why DeepTap?

Key Features

Configurable depth, one endpoint (plus a streaming endpoint for the deep tier)

Depth-3 streaming research endpoint POST /v1/research

Facet ledger: bounded, auditable research coverage

Automatic query decomposition via OpenRouter

LLM policy enforcement with Secure-SKU Zero Data Retention

Freshness classification on every query

Four-tier Redis result cache with freshness-driven TTLs

Cross-instance DMCA invalidation inside one second

Negative-cache tombstones against adversarial pages

Per-API-key GCRA rate limiting with atomic Redis Lua enforcement

Per-org concurrency caps with depth=3 bucket protecting OpenRouter quota

Compounding fact cache with decay models

Consensus-based trust scoring

Client domain indexing (private docs searchable next to the public web)

Prompt-injection firewall with transparent scoring

Agent micropayments via x402 and MPP

Protocol-native: MCP plus A2A

Tavily compatibility shim

POST /v1/map URL discovery

Hybrid local/cloud (enterprise local-node deployment)

Bootstrap-grade cost envelope

Quick Start

Architecture Overview

Pricing

Credits per call

Subscription tiers

Agent micropayment pricing (no subscription)

Project Status

Done

Done (Session 1: Foundation)

Done (Session 2: Data Layer + Append-Only Ledger)

Done (Session 3: Search Provider Adapters)

Done (Session 4: Fan-Out + Dedup + URL Normalization)

Done (Session 5: Go-Owned Fetch + Trafilatura Sidecar)

Done (Session 6: Playwright Pool + Domain Strategy Cache)

Done (Session 7: Query Decomposition + LLM Policy)

Done (Session 8: Reranking + Embeddings via Python Sidecar)

Done (Session 9: Prompt Injection Firewall)

Done (Session 10: Depth Modes + Facet Ledger)

Done (Session 11: Caching + Freshness TTLs)

Done (Session 12: Rate Limiting + Concurrency)

Done (Session 13: MCP Server + Tavily Shim)

Done (Session 14: A2A + Payment Middleware with Idempotency)

Done (Session 15: TrustPlane Integration with Fallback)

Done (Session 16: Billing Engine)

Done (Sessions 17 + 18: DMCA Compliance + Dashboard UI)

Done (Session 19: Fact Data Model + Extraction Pipeline)

Done (Session 20: Staleness Model + Re-Verification)

Done (Session 21: Fact Cache Query Integration)

Done (Session 22: Fact Cache Analytics + Warming)

In Progress

Done (Knowledge Layer Analytics, S28 FINAL)

Scoped

Scoping

Potential / On Horizon

Documentation

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Depth-3 streaming research endpoint `POST /v1/research`

`POST /v1/map` URL discovery

Packages