Skip to content

RelayOne/deeptap

Repository files navigation

DeepTap

Agent-native web search. Three depth modes on one endpoint. Fixed prices. Transparent firewall. The same search backbone Anthropic chose for Claude.


What Is DeepTap?

DeepTap is a web search API built specifically for AI agents. It lives between shallow SERP wrappers like Tavily and Serper on one end, and slow deep-research products like Perplexity Sonar Deep Research on the other. One endpoint, three depth modes: a fast single-pass lookup when you need speed, a gap-analysis round-trip when you need completeness, and a multi-iteration research loop when you need thorough coverage. All three carry the same response envelope. All three return with a predictable, fixed credit cost. No dynamic pricing, no 4-to-250 credit surprise bills, no context-size multipliers.

Underneath the public search layer sits a compounding fact cache with decay models. Every extracted page is distilled into structured claims with provenance, evidence counts, source diversity, and a freshness decay curve tuned to the fact type (permanent, slow-decay, moderate-decay, fast-decay, volatile). When the next agent asks a question that has already been answered on the public web, DeepTap answers in under fifteen milliseconds for a tenth of a credit instead of triggering a fresh search. The cache is self-healing: organic query traffic constantly re-verifies the most popular facts, and a priority-scored River job queue re-verifies the rest inside a bounded daily budget.

DeepTap is agent-native by construction. Retail developers authenticate with traditional API keys and pay through Stripe subscriptions. Agents with no account pay per-call with x402 crypto micropayments or per-session with Merchant Payments Protocol (MPP) credentials bound to Demonstrating Proof of Possession (DPoP) keys. Portfolio and enterprise customers verify through TrustPlane with a SPIFFE Verifiable Identity Document, which routes billing to a double-entry ledger instead of Stripe. The API speaks Model Context Protocol (MCP) for Claude Desktop and Claude Code, Agent-to-Agent Protocol (A2A) v1.0 for agent marketplaces, and a Tavily-compatible shim for frictionless migration. For customers who cannot send queries to a public cloud, DeepTap ships a local-node Docker Compose deployment with mutual TLS, delta sync, and an offline fact-lookup mode that continues serving when the cloud is unreachable.

The production architecture is a pure Go API binary backed by a Python 3.11 NLP sidecar that exposes seven gRPC remote procedure calls. All machine-learning inference runs in the sidecar, never in the Go process: reranking via ms-marco-MiniLM-L6-v2 INT8, embedding via bge-small-en-v1.5, natural-language-inference grounding via MiniCheck-Flan-T5-Large, entity linking via ReFinED, and prompt-injection scoring via Prompt Guard 2 86M. Bootstrap deployments swap the sidecar for managed inference APIs (Cohere Rerank, Voyage Embed, Cloud Run Trafilatura) behind stable Go interfaces, giving the project scale-to-zero economics from day zero and an interface-compatible migration path to self-hosted infrastructure once volume justifies it.


Why DeepTap?

The agent search market is actively destabilizing. In February 2026 Nebius acquired Tavily for roughly 275 million US dollars, leaving an install base of Tavily customers facing the usual post-acquisition uncertainty: price changes, feature freezes, support regressions, product reprioritization toward the acquirer's roadmap. In December 2025 Google sued SerpApi for Digital Millennium Copyright Act (DMCA) section 1201 violations, marking the first major legal action against Google-SERP-scraping search APIs and creating existential legal risk for every competitor built on the same scraping pattern. Bing's search API was deprecated. Google's Custom Search API remains locked behind low quotas and per-query fees that do not scale for agent workloads. The market is smaller than it looks and the legally clean vendors are smaller still.

DeepTap is built on Brave Search as the default provider. Brave is the same independent index that Anthropic chose for Claude's web search, with confirmed 86.7% result overlap versus Google on evaluated queries. It is licensed, not scraped, and carries no DMCA 1201 exposure. Serper remains available as an opt-in provider_class=fast tier for customers who explicitly acknowledge the legal tradeoff at the API-key level. Defaulting to Brave is the single most important provider decision in the product: it is the one that makes DeepTap safe to integrate into a regulated pipeline without a lawyer conversation.

Pricing is the second wedge. Tavily Research calls can consume anywhere from 4 to 250 credits depending on opaque internal heuristics; Perplexity Sonar Deep Research lists $410 to $1,320 per 1,000 calls and takes 30 seconds to 2 minutes per answer; Exa Auto runs roughly $7 per 1,000 calls. DeepTap charges exactly 1 credit for depth=1 safe, exactly 3 credits for depth=2 safe, exactly 8 credits for depth=3 safe, exactly 0.1 credit for a fact lookup, exactly 0.5 credit for an extract excerpt. Every response includes credits_used. You can budget an agent deployment on a spreadsheet without instrumentation.

The third wedge is transparency on the firewall. Tavily markets an agent-native prompt-injection firewall; its implementation is proprietary, returns no injection score, no reasons array, and no sanitized-content field. The defender cannot audit what was blocked. DeepTap implements a layered firewall and exposes every scoring signal in the response envelope: prompt_injection_score (0.00 to 1.00), unsafe_reasons[] (machine-readable tags), sanitized_content (extracted text with injection vectors stripped), trusted_snippet (provider-supplied metadata that never touched untrusted HTML), and untrusted_content (the raw-extracted text kept separate so security teams can make their own call). The joint OpenAI/Anthropic/Google study from October 2025 (Nasr et al.) confirmed that every published prompt-injection detector was bypassed above 90% under adaptive attack, meaning detectors are scoring signals, not gates. DeepTap layers rule-based pre-extraction stripping in Go (Unicode tag chars, zero-width chars, CSS-hidden text, HTML comments, aria/meta injection vectors, off-screen positioning tricks) before any ML model touches the content, then Prompt Guard 2 86M in the sidecar as a scoring layer, then explicit trusted/untrusted splits in the response so the security team downstream can apply its own policy.


Key Features

Configurable depth, one endpoint (plus a streaming endpoint for the deep tier)

POST /v1/search takes a depth field of 1 or 2. Depth 1 runs a single RunStage pipeline (Fanout -> per-URL fetch + Layer 1 strip + Trafilatura extract + Layer 2 score -> Rerank) with automatic query decomposition; the 95th-percentile latency target is under 7 seconds and the hard timeout is DEEPTAP_DEPTH1_TIMEOUT (default 7s). Depth 2 adds one reflection round against anthropic/claude-sonnet-4.6 with a JSON-schema structured output that asks "what is missing from these findings?", issues a targeted second stage on the gaps, merges the two rounds with a score-max dedup on normalized URL, and re-sorts by rerank_score; the 95th-percentile target is under 20 seconds and the hard timeout is DEEPTAP_DEPTH2_TIMEOUT (default 18s). Depth 3 is served on a separate streaming endpoint (see below) because a two-minute blocking request is unusable in an agent loop; POST /v1/search with depth=3 returns 400 use_research_endpoint pointing callers at POST /v1/research. The envelope carries depth, rounds_executed, stop_reason, and (when include_ledger=true) the facet ledger.

Depth-3 streaming research endpoint POST /v1/research

The deep-research tier is a Server-Sent Events stream, not a blocking request. POST /v1/research emits a documented sequence of events as each round of the facet-ledger-guided loop completes: round_start ({round, sub_queries[]}), partial_results ({round, results[]} after each stage rerank), facet_update ({round, facet, coverage} as each facet's coverage moves), reflection ({round, gaps[], stop, model, latency_ms} after each reflector call), saturation ({reason} when the loop decides to stop), final (the full envelope including ledger, rounds_executed, stop_reason), error (RFC 7807-shaped), and ping ({}) every DEEPTAP_SSE_HEARTBEAT (default 15s) as a keep-alive heartbeat. The loop runs up to DEEPTAP_DEPTH3_MAX_ROUNDS (default 4) rounds under DEEPTAP_DEPTH3_TIMEOUT (default 110s, SLO 120s p95) and stops on one of five precedence-ordered conditions: hard timeout with 2s grace; marginal-lift saturation (Saturated(DEEPTAP_SATURATION_DELTA default 0.05, consecutive=2)); reflector LLM-stop advisory; max rounds; reflector error. A single writer goroutine fed by a bounded channel owns the flusher so event ordering is deterministic. Client disconnect cancels the orchestrator context. The usage_ledger row (8.0 safe / 4.0 fast) is written AFTER the final event, so a mid-stream error never bills.

Facet ledger: bounded, auditable research coverage

Depth 2 and depth 3 share a facet ledger defined in internal/depth/ledger.go. SeedFacets(query) walks a small keyword table to seed facet names: vs/versus -> comparative, history of/origin of/when was -> historical, what is/define/definition of -> definitional, current/latest/today/now -> current-status, no match -> general. Each sub-query is attributed to a facet at dispatch time (via the decomposer's new facets schema extension, or the seed fallback). After every stage the ledger accumulates each result's rerank_score against its attributed facet, normalizes by DefaultFacetSaturation=5.0, caps at 1.0 per facet, and averages across facets to produce overall Coverage(). The ledger exposes MarginalLift(consecutive) and Saturated(delta, consecutive) so the composer can decide when to stop on evidence-gathering plateau. When include_ledger=true is set on the request and server config allows it, the envelope carries the full {facets[{name, coverage, sub_queries[], rounds[], docs_scored}], rounds, coverage_history[]} JSON view so agent-builders can audit exactly which facets were researched, how thoroughly, and across which rounds.

Automatic query decomposition via OpenRouter

Every POST /v1/search request that does not already carry a caller-supplied sub_queries array is routed through a hand-written stdlib OpenRouter chat-completions client that decomposes the user's question into N diverse sub-queries under JSON-schema structured output. The model slug is chosen by depth: depth=1 uses anthropic/claude-haiku-4.5 for cheap, fast decomposition; depth=2 and depth=3 use anthropic/claude-sonnet-4.6 for the harder research questions. The decomposer runs under a 10-second handler-level timeout. Responses carry a decomposition object with the resolved model, upstream inference provider, generation_id, sub_queries[], tokens_prompt, tokens_completion, cost_usd, and latency_ms so callers can audit and meter the LLM call per request. On any decomposition failure (timeout, breaker open, policy violation, parse error) the handler logs a warning, adds decomposition_failed to warnings[], and falls back to the single-query path so the request still returns a useful answer.

LLM policy enforcement with Secure-SKU Zero Data Retention

Every organization has an organizations.llm_policy JSONB column that the handler loads into a typed LLMPolicy on every request: require_zdr, require_data_collection_deny, providers[] allowlist, models_allowed[] allowlist, max_tokens clamp, optional temperature. When the organization's tier is secure, the loader unconditionally clamps require_zdr=true, require_data_collection_deny=true, and defaults the provider allowlist to ["anthropic"] if empty. The three controls are then materialised directly into the OpenRouter request body as provider.zdr=true, provider.data_collection="deny", and provider.order=<allowlist> so a Secure-tier query cannot route to a log-retaining inference provider even if OpenRouter's default would have picked one. Malformed policy JSON does not fail open; the loader errors, the handler logs it as policy_load_failed in warnings[], and the decomposer runs against a zero-value permissive baseline so the degraded request still makes progress.

Freshness classification on every query

A deterministic Go classifier at internal/freshness/ labels every query as volatile, daily, standard, or stable before it reaches the provider. The classifier applies NFC normalization and lowercasing, extracts 4-digit year tokens to decide historical vs. current-year signals, and walks a fixed sequence of keyword unions and structural patterns (price-or-score live, weather live, status live, breaking, is-still, future-event, current-role, stable-pattern). The result populates freshness_class and freshness_reason on every response, bumps the deeptap_freshness_class_total{class} Prometheus counter, and is read by the S11 caching layer to choose a cache TTL (volatile 5 minutes, daily 1 hour, standard 4 hours, stable 24 hours). No network call; pure Go regex evaluation.

Four-tier Redis result cache with freshness-driven TTLs

S11 wires a four-tier cache behind internal/cache/: full-envelope, per-sub-query, per-URL extract, and a fact-cache hook reserved for S21. Keys use a shared v1: prefix plus a sha256 hash over NFC-normalized + lowercased + whitespace-collapsed inputs (with optional leading-article stripping behind DEEPTAP_CACHE_STRIP_LEADING_ARTICLES). FullKey composes the normalized query plus depth plus provider class plus country plus language plus safe mode so depth-1 safe and depth-2 fast for the same question never collide. TTLs come from DetermineTTL(class) with MinFullTTL=60s and MinExtractTTL=15m floors and a 1-hour sub-query cap. The depth orchestrator LookupFull at the top of every depth, calls NoopFactCache.Lookup on miss (S21 will replace the noop), then wraps the pipeline in a Singleflighter under DEEPTAP_CACHE_SF_TIMEOUT so concurrent identical requests collapse into one upstream run. ResearchStage inside each round batch-MGets the sub-query tier and writes back on success; the per-URL extract tier short-circuits the fetch and Trafilatura call on hit but still re-scores the cached text through the firewall so a model update defends against stale injection classifications. Envelopes carry cache_hit, cache_hit_type in {full, subquery, extraction, fact, miss}, and cache_keys_hit[]; /v1/research emits a new cache_hit SSE event when the full-tier cache short-circuits the loop before round 1. Billing is unchanged: a full-tier hit still writes 1.0 / 0.5 / 3.0 / 1.5 / 8.0 / 4.0 credits at the normal rate (customer value is the answer, not the path).

Cross-instance DMCA invalidation inside one second

Every StoreFull also adds the FullKey to a byurl:{sha256} and a bydomain:{sha256} Redis SET for each result URL and domain the envelope referenced, inside a single Redis pipeline. invalidator.go opens a dedicated pgx.Conn, subscribes to deeptap_cache (single-key eviction) and deeptap_dmca_suppress (URL + domain fan-out), and on a suppress NOTIFY looks up the reverse-index sets and deletes every cache key named there on the local Redis. Every instance of DeepTap runs the same listener, so a single pg_notify publishes to the fleet and suppression lands in under a second end-to-end (tracked as deeptap_cache_invalidation_latency_seconds). The listener is wrapped in a panic-recover so a malformed payload cannot take down the subscription.

Negative-cache tombstones against adversarial pages

When any per-result prompt_injection_score is at or above UnsafeScoreThreshold, the real envelope is NOT stored. A 60-second tombstone lands at the same FullKey carrying an empty-results envelope with cache_hit=true, cache_hit_type="full", and the surfaced unsafe_reasons[]. Hot retries against genuinely adversarial pages hit the tombstone and return immediately instead of re-driving the pipeline. The tombstone expires naturally after 60 seconds; deeptap_cache_evictions_total tracks normal evictions but not tombstone expiries.

Per-API-key GCRA rate limiting with atomic Redis Lua enforcement

S12 wires a Generic Cell Rate Algorithm limiter at internal/ratelimit/ backed by a 40-line atomic Lua script embedded via go:embed. The script reads the bucket's Theoretical Arrival Time (TAT), advances it by rate_period_ms = 1000 / tier.CPS, compares against new_tat - burst * rate_period_ms, and either admits with the new TAT SET-ed under a burst * rate_period_ms * 2 TTL or denies with retry_after_ms and reset_ms. The compare-and-swap happens inside a single Redis script execution, so a 100-goroutine-vs-50-burst test admits exactly 50 and denies exactly 50. Script.Load runs at boot with a WARN-log-on-fail fallback (NOSCRIPT recovery inside Allow re-loads lazily); each Allow runs under a 50 ms per-op deadline with a NOSCRIPT retry once. The middleware is mounted AFTER APIKey and BEFORE the cache layer so a cache hit still counts against the budget. On admission it stamps X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset (Unix seconds) on every response, and stashes the admission *Decision + resolved Tier on context so /v1/search, /v1/extract, and /v1/research envelopes stamp a rate_limit: {limit, remaining, reset_ms} block from the same values. On denial the middleware writes a 429 with an application/problem+json body: type=https://deeptap.ai/errors/rate_limited, retry_after_ms, limits.rate{limit, remaining, reset_ms}, and a Retry-After header (ceiling of retry_after_ms / 1000, floored at 1 second). Tier table from specs/PROJECT-CONTEXT.md: Free 10 cps / 20 burst, Starter 50 / 100, Growth 200 / 400, Scale 500 / 1000, Secure 500 / 1000, Enterprise 1000 / 2000. Per-API-key overrides live on api_keys.rate_limit_override: a positive override replaces CPS and sets Burst = 2 * CPS regardless of DEEPTAP_RATELIMIT_BURST_2X. Redis error fails OPEN: the middleware admits the request, records the fail-open in deeptap_ratelimit_redis_errors_total, and labels the deeptap_ratelimit_requests_total{tier, outcome} counter with redis_error or redis_timeout. Four env vars: DEEPTAP_RATELIMIT_ENABLED (default true), DEEPTAP_RATELIMIT_BURST_2X (default true), DEEPTAP_CONCURRENCY_ENABLED (default true), DEEPTAP_CONCURRENCY_DEPTH3_OVERRIDE (default 0).

Per-org concurrency caps with depth=3 bucket protecting OpenRouter quota

S12 also wires a separate per-organization in-flight counter at internal/middleware/concurrency.go. The Redis key is v1:conc:<org_id>:<bucket> with two buckets: depth3 for /v1/research (Free 2, Starter 5, Growth 10, Scale 25, Secure 25, Enterprise 50) and general for every other authenticated /v1/* endpoint (Free 5, Starter 20, Growth 50, Scale 100, Secure 100, Enterprise 200). A single depth=3 research request holds an HTTP connection for up to two minutes and issues multiple OpenRouter calls per round; a runaway agent firing 100 depth=3 requests in parallel can drain an OpenRouter credit pool in minutes. The depth3 bucket stops that before it starts. The middleware INCRs the counter on entry, compares against the tier limit (or DEEPTAP_CONCURRENCY_DEPTH3_OVERRIDE for depth=3 when that env var is positive), and atomically DECRs + returns 503 with type=https://deeptap.ai/errors/concurrency_exhausted + limits.concurrency{bucket, limit, in_flight} on saturation. The release runs inside a defer rdb.Decr(context.Background(), key) so a handler panic still returns the slot: Go's defer semantics run the block during stack unwind, and the Recoverer middleware above the chain catches the panic AFTER the DECR has already fired. A test mounts Concurrency downstream of Recoverer + a panicking handler, runs 50 concurrent requests, and asserts the Redis counter returns to zero. Redis error on INCR fails OPEN. The operational override lets on-call engineers tighten depth=3 during an incident without a code deploy: DEEPTAP_CONCURRENCY_DEPTH3_OVERRIDE=3 drops every tier's depth=3 bucket to 3 until the override is cleared.

Compounding fact cache with decay models

Every extracted page feeds a Kafka-driven fact-extraction worker that produces structured claims (subject, predicate, object) with entity linking to Wikidata QIDs, NLI grounding via MiniCheck, and a per-claim confidence score. Facts are classified into five decay buckets: permanent (zero decay), slow_decay (e.g., historical dates, decay rate 0.001), moderate_decay (e.g., executive biographies, decay rate 0.005), fast_decay (e.g., quarterly financials, decay rate 0.02), and volatile (e.g., stock prices, decay rate 0.1). Effective confidence is base_confidence * exp(-decay_rate * days_since_confirmed). Queries that match a high-confidence cached fact return in under 15 milliseconds for 0.1 credits. This is not response caching; it is a structured, domain-level knowledge base that compounds across every customer's query volume.

Consensus-based trust scoring

Every fetched page is attributed to a domain. Every fact carries evidence from specific source pages with stance markers (supports, contradicts, neutral) and an NLI score. A nightly River batch job computes consensus_ratio = facts_confirmed_by_others / (facts_confirmed_by_others + facts_contradicted_by_others) per domain, assigns a trust tier (authoritative, reliable, mixed, unreliable, adversarial, unknown), and flags suspicious Autonomous System Number (ASN) clusters where three or more newly registered domains publish coordinated content from the same hosting provider. Reranking boosts authoritative domains and penalizes adversarial ones. Unknown domains receive no boost or penalty so the system does not punish legitimate new sites.

Client domain indexing (private docs searchable next to the public web)

Customers upload PDFs, Microsoft Word documents, Markdown, HTML, Comma Separated Values files, and JavaScript Object Notation payloads through a drag-and-drop uploader, an Amazon Simple Storage Service sync connector, a Google Cloud Storage sync connector, or a direct API push. Documents are parsed by a separate document-parser sidecar (avoiding the Affero General Public License on PyMuPDF by using pypdfium2 plus pdfplumber for PDFs and python-docx plus mammoth for Word files), chunked, embedded via bge-small-en-v1.5, and stored with row-level security enforced by set_config('app.current_tenant', $1, true). Client-domain results appear inline with public-web results labeled [PRIVATE]. Customer A's documents are invisible to Customer B at the database tier.

Prompt-injection firewall with transparent scoring

Layer 1 (Go, pre-extraction, at internal/firewall/strip.go + patterns.go): strip 13 documented patterns before Trafilatura sees the HTML: unicode_tag_chars (U+E0000 through U+E007F), zero_width, bidi_override, html_comment, css_display_none, css_visibility_hidden, css_font_size_zero, css_opacity_zero, css_text_indent_offscreen, css_position_offscreen, meta_injection, aria_hidden (gated by DEEPTAP_STRIP_ARIA_HIDDEN), and script_style. Layer 2 (Python sidecar, post-extraction, via the ScoreInjection gRPC RPC at internal/firewall/sidecar_scorer.go): score every extracted document with meta-llama/Prompt-Guard-2-86M and return a numeric score plus heuristic reasons; a NoopScorer bootstrap fallback at noop_scorer.go returns zero so operators can ship without standing up the sidecar. Layer 3 (response mutator at internal/firewall/safe_mode.go): when the request carries safe_mode: "agent", null untrusted_content on every result at the HTTP boundary and stamp safe_mode_applied: "agent" on the envelope. The envelope surfaces prompt_injection_score, unsafe_reasons[], sanitized_content_bytes, untrusted_content_bytes, and a firewall block carrying layer1_stripped_bytes, layer1_patterns_matched[], layer2_model, layer2_implementation, layer2_latency_ms, layer2_docs_scored. Every per-result object carries sanitized_content, untrusted_content, trusted_snippet, prompt_injection_score, and unsafe_reasons. Security teams can audit what was flagged, which patterns fired, which model ran, how long it took, and what it scored. The research is explicit that detectors are signals, not gates (Nasr et al., October 2025, joint OpenAI/Anthropic/Google study: every published detector bypassed above 90% under adaptive attack); the security guarantee DeepTap ships is transparency. Tavily's firewall returns none of these fields; DeepTap tells you what it saw so your agent can decide what to trust.

Agent micropayments via x402 and MPP

No account required. An agent sends a request without authentication, receives a 402 Payment Required response with a payment challenge, settles a United States Dollar Coin (USDC) transaction on Base mainnet at address 0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913 (or Base Sepolia for testing at 0x036CbD53842c5426634e7929541eC2318f3dCF7e), and retries the request with the payment signature in the X-PAYMENT-SIGNATURE header. The Coinbase CDP facilitator verifies and settles. Merchant Payments Protocol (MPP) layers on top: its charge scheme is wire-compatible with x402 exact, and a Go server using coinbase/x402/go automatically accepts MPP charge traffic through the Authorization: Payment header. MPP sessions add DPoP-bound session credentials for streaming micropayments in long agent loops.

Protocol-native: MCP plus A2A

DeepTap ships deeptap-mcp, a binary that serves MCP over stdio for Claude Desktop and Claude Code and Cursor and over Streamable HyperText Transfer Protocol (HTTP) for remote authenticated agents. Three tools are exposed: deeptap_search, deeptap_extract, deeptap_facts. The deeptap_extract tool carries the same attestation gate as the REST handler: mode=full requires an attestation and is rejected at the MCP layer before any REST call runs. The binary accepts --transport stdio|http; the HTTP transport mounts at /mcp + /mcp/ with /healthz served on a separate path so probes do not hit the authenticated route, and requires an Mcp-Api-Key header enforced by APIKeyMiddleware. Server-Sent Events transport is deprecated in MCP specification 2025-11-25 and is rejected explicitly. Copy-paste configuration snippets live in docs/integrations/claude-desktop.md, docs/integrations/claude-code.md, and docs/integrations/cursor.md. On the A2A side, DeepTap uses a2a-go/v2 v2.0.1 against the v1.0 specification (breaking from v0.3: new .well-known/agent-card.json path, new TASK_STATE_* enums, new google.rpc.Status error shape) to publish an agent card and serve task create, task status, and task stream endpoints.

Tavily compatibility shim

cmd/deeptap-tavily-shim/ listens on port 8082 and accepts Tavily-wire POST /search, POST /extract, and POST /map. A Tavily customer points their SDK's base URL at tavily.deeptap.ai (or the locally-deployed shim) and receives responses projected back into Tavily's exact field names and units: response_time as a decimal number, follow_up_questions always null, images passthrough when requested, and usage.credits on every response. Every field-level translation lands in an X-DeepTap-Compat-Notes response header so integration teams can audit exactly what was adapted. Callers that send X-DeepTap-Compat-Mode: strict receive 409 Conflict with a JSON body explaining which notes would have fired instead of a silently-translated 200 response. An embedded 255-entry country lookup resolves Tavily's country field for 195 ISO-3166 alpha-2 codes plus 60 common aliases (usa, uk). The shim calls the DeepTap REST API at DEEPTAP_INTERNAL_URL with DEEPTAP_SHIM_KEY as its bearer so the shim is accountable separately in usage_ledger.

POST /v1/map URL discovery

POST /v1/map discovers every URL on a starting domain. A two-phase orchestrator at internal/mapsvc/ runs sitemap discovery first (reading robots.txt Sitemap: directives, then root sitemap.xml) and escalates to a bounded HTML crawl when sitemap yield is below DEEPTAP_MAP_HTML_MIN_YIELD=10. The response returns results[] plus a sources breakdown (robots_sitemap_urls, sitemap_urls, html_crawl_urls, pages_fetched) plus dropped plus truncated plus credits_used, so callers see exactly how each URL landed in the response. Configurable max_depth, max_breadth, limit, allow_external, include_subdomains, and exclude[] parameters bound the crawl cost precisely. Credit pricing matches search: 1.0 safe, 0.5 fast, one usage_ledger row per request.

Hybrid local/cloud (enterprise local-node deployment)

The local-node release is a distroless Go binary plus Postgres 17 with pgvector 0.8.2 plus a lightweight Python embedding sidecar, packaged as Docker Compose and Helm. Mutual TLS with a per-customer self-signed Certificate Authority is provisioned during setup. Public facts synchronize from cloud to local node via HTTP with Hash-based Message Authentication Code-signed cursors and idempotency keys (around 10,000 facts per day). Customer embeddings and fact metadata synchronize local to cloud, opt-in and defaulted off for European Union customers. Dual-key Ed25519 envelope signing binds requests to a customer-specific key. Offline mode keeps local fact lookup, pgvector semantic search, and embedding operational; public search and language-model calls fail with an RFC 7807 Problem+JavaScript Object Notation error that the agent can reason about. Telemetry is pushed over OpenTelemetry Protocol on HyperText Transfer Protocol Secure port 443 only, because enterprise networks will not open the OpenTelemetry default port.

Bootstrap-grade cost envelope

The managed-bootstrap deployment runs on Fly.io Machines with scale-to-zero, Neon Postgres with scale-to-zero, Upstash Redis pay-per-request, Upstash Kafka pay-per-message, ClickHouse Cloud serverless, Vercel for the dashboard, and Cloud Run for Trafilatura extraction. Inference is outsourced to Cohere Rerank and Voyage Embed behind the Go Reranker and Embedder interfaces. Cost at zero traffic is approximately 10 cents per month. Cost at 1,000 requests per day is approximately 38 US dollars per month. Gross margin from the first paying customer is 87% or higher. Transition to self-hosted sidecar happens when volume exceeds 50,000 requests per day, at which point self-hosting wins on unit economics.


Quick Start

DeepTap is Building. All 28 sessions are complete. Phase 1 (Core Search Platform, S01-S18), Phase 2 (Fact Cache, S19-S22), and Phase 3 (Knowledge Layer, S23-S28) all shipped: Foundation, Data Layer + Append-Only Ledger, Search Provider Adapters, Fan-Out + Dedup + URL Normalization, Go-Owned Fetch + Trafilatura Sidecar, Playwright Pool + Domain Strategy Cache, Query Decomposition + LLM Policy, Reranking + Embeddings via Python Sidecar, Prompt Injection Firewall, Depth Modes + Facet Ledger, Caching + Freshness TTLs, Rate Limiting + Concurrency, MCP Server + Tavily Shim, A2A + Payment Middleware with Idempotency, TrustPlane Integration with Fallback, Billing Engine, Postmark Email + DMCA + Dashboard UI, Fact Data Model + Extraction, Staleness Model + Re-Verification, Fact Cache Query Integration, Fact Cache Analytics + Warming, Semantic Source Index + pgvector, Consensus Trust Scoring, Rapid Fact API, Client Domain Indexing, Local Node MVP, and Knowledge Layer Analytics. Twenty-eight session specs are frozen, the bootstrap-hosting plan is frozen. Items in {braces} become real once the referenced session lands.

# Clone and boot the full dev stack
git clone https://github.com/RelayOne/deeptap
cd deeptap
make dev                 # default: boots deeptap, nlp-sidecar, playwright-pool, postgres,
                         # redis, clickhouse, prometheus, grafana, otel-collector
                         # (River is the default event bus, no Kafka required)

make dev-kafka           # opt-in: same as make dev plus Redpanda (Kafka profile)
                         # sets DEEPTAP_EVENT_BUS=kafka

# Apply database migrations (requires golang-migrate installed via
# `go install -tags postgres github.com/golang-migrate/migrate/v4/cmd/migrate@latest`)
make migrate-up          # applies all 12 migrations (extensions, orgs, api_keys,
                         # append-only usage_ledger with 24 monthly partitions pre-created,
                         # facts, fact_evidence, source_pages, domain_profiles,
                         # client_domains with row-level security, dmca_requests,
                         # accounts + journal_entries with deferred balanced-sum trigger,
                         # seed chart-of-accounts)

# Verify health
curl http://localhost:8080/v1/health          # returns {"status": "ok"}
curl http://localhost:8080/v1/ready           # default: returns {"status": "ready"} when
                                              #   postgres + redis + river probes pass
                                              # kafka mode: also probes the Kafka broker
curl http://localhost:8080/metrics            # Prometheus exposition format

# Run your first search (once Session 3 ships the Brave adapter)
curl -X POST http://localhost:8080/v1/search \
  -H "Authorization: Bearer dt_live_xxx" \
  -H "Content-Type: application/json" \
  -d '{"query": "Tavily acquisition", "depth": 1}'

Authentication modes at a glance (all five are mounted on the same /v1/* surface, the middleware stack branches on the header type):

  • Authorization: Bearer dt_live_xxx for API-key customers billed through Stripe
  • X-PAYMENT-SIGNATURE: <base64 EIP-3009 payload> for x402 agent micropayments
  • Authorization: Payment <MPP charge token> for MPP charge traffic, backward-compatible with x402
  • Authorization: Payment <DPoP-bound session token> for MPP streaming sessions
  • X-TrustPlane-Credential: <SPIFFE SVID> for portfolio and enterprise TrustPlane verification

Architecture Overview

                ┌─────────────────────────────────────────┐
                │              CLIENTS                     │
                │  SDK (TS/Py/Go) · MCP · A2A · x402/MPP │
                └──────────────────┬──────────────────────┘
                                   │
                ┌──────────────────▼──────────────────────┐
                │           GO API SERVER (chi)            │
                │  auth · rate limit · firewall · billing  │
                │  depth orchestration · caching · ledger  │
                └──┬───────┬────────┬───────┬─────────────┘
                   │       │        │       │
          ┌────────▼──┐ ┌──▼────┐ ┌─▼────┐ ┌▼──────────┐
          │  Brave API │ │Serper │ │OpenRtr│ │Python NLP  │
          │  (safe)    │ │(fast) │ │(LLM) │ │Sidecar     │
          └────────────┘ └───────┘ └──────┘ │ gRPC:50051 │
                                            │ 7 RPCs     │
                                            └────────────┘
                   │            │           │
          ┌────────▼────────────▼───────────▼─────────┐
          │                 POSTGRES                    │
          │  usage_ledger · facts · source_pages        │
          │  domain_profiles · api_keys · orgs          │
          │  pgvector (384d halfvec HNSW)               │
          └────────────────────────────────────────────┘
                   │            │           │
          ┌────────▼──┐  ┌─────▼────┐  ┌───▼──────────┐
          │   Redis    │  │  Kafka   │  │  ClickHouse  │
          │ cache+rate │  │  events  │  │  analytics   │
          └───────────┘  └──────────┘  └──────────────┘

The Go API server is a pure Go binary with no C Foreign Function Interface (CGO), no embedded models, no native libraries. All machine-learning inference routes through the Python NLP sidecar via gRPC on localhost port 50051. The sidecar is a single Python 3.11 process that exposes seven remote procedure calls: Parse (Trafilatura 2.0+), BatchParse, Rerank, Embed, VerifyClaim, LinkEntities (server-streamed), ScoreInjection. ThreadPoolExecutor-backed gRPC server parallelism is real because ONNX Runtime and PyTorch both release the Global Interpreter Lock in their C++ kernels. Minimum sidecar instance size is c7i.2xlarge (16 gibibytes of random-access memory).

Postgres 17 with pgvector 0.8.2 is the canonical data store and the authoritative billing source. The usage_ledger table is append-only, partitioned by month, and protected by triggers that reject UPDATE and DELETE. ClickHouse handles analytics materialized views, never billing. Redis 7 serves the hot cache, the Generic Cell Rate Algorithm (GCRA) rate limiter via a Lua script, cross-instance cache invalidation pub/sub, and idempotency locks. Redpanda (Kafka-compatible) carries usage events in CloudEvents format with idempotent producer semantics. River (Postgres-backed job queue) handles scheduled jobs: fact re-verification, trust-score batches, Wikidata incremental sync, domain-profile aggregation.

For the full technical deep-dive including middleware ordering, data models, protocol details, and every non-obvious decision with its rationale, see docs/ARCHITECTURE.md.


Pricing

Credits per call

Operation Safe (Brave) Fast (Serper)
depth=1 search 1 credit 0.5 credit
depth=2 search 3 credits 1.5 credits
depth=3 search 8 credits 4 credits
fact lookup 0.1 credit 0.1 credit
extract excerpt 0.5 credit 0.5 credit
extract full (attested) 2 credits 2 credits
map 1 credit 0.5 credit

Subscription tiers

Tier Monthly Credits Overage / credit Calls per second Concurrent
Free $0 500 n/a 10 5
Starter $30 4,000 $0.010 50 20
Growth $200 30,000 $0.008 200 50
Scale $1,000 200,000 $0.006 500 100
Secure from $2,500 custom $0.020 custom custom
Enterprise custom custom custom custom custom

Agent micropayment pricing (no subscription)

Operation x402 price
depth=1 safe $0.008
depth=1 fast $0.002
depth=2 safe $0.024
depth=3 safe $0.064
fact lookup $0.001

The Secure tier adds Bring Your Own Key language-model routing, region pinning, fact-cache opt-out (reads and writes), stricter Zero Data Retention defaults, and a dedicated Customer Success contact. Portfolio tier uses a 40%-of-retail wholesale rate card with monthly invoicing to the internal double-entry ledger; portfolio companies that resell DeepTap handle their own end-customer billing off-platform.


Project Status

DeepTap is Building. All 28 sessions are complete on their build branches: Foundation, Data Layer + Append-Only Ledger, Search Provider Adapters, Fan-Out + Dedup + URL Normalization, Go-Owned Fetch + Trafilatura Sidecar, Playwright Pool + Domain Strategy Cache, Query Decomposition + LLM Policy, Reranking + Embeddings via Python Sidecar, Prompt Injection Firewall, Depth Modes + Facet Ledger, Caching + Freshness TTLs, Rate Limiting + Concurrency, MCP Server + Tavily Shim, A2A + Payment Middleware with Idempotency, TrustPlane Integration with Fallback, Billing Engine, Postmark Email + DMCA Compliance + Dashboard UI, Fact Data Model + Extraction Pipeline, Staleness Model + Re-Verification, Fact Cache Query Integration, Fact Cache Analytics + Warming, Semantic Source Index + pgvector, Consensus Trust Scoring, Rapid Fact API, Client Domain Indexing, Local Node MVP, and Knowledge Layer Analytics. All three product phases (Phase 1 Core Search Platform, Phase 2 Fact Cache, Phase 3 Knowledge Layer) are now complete. Twenty-eight session specs are written and frozen. Thirty-two research artifacts have been completed and integrated. Two adversarial review passes (one technical, one security and legal) have been incorporated. River is the default event bus per specs/ADDENDUM-river-default.md; Kafka is an opt-in implementation of the same EventBus interface selected by DEEPTAP_EVENT_BUS=kafka. The bootstrap-hosting plan is frozen.

Done

  • Scoping across the full 18 to 20 month roadmap
  • Master specification at /specs/WORK.md (1,257 lines)
  • Frozen Statement of Work at /specs/deeptap-sow-combined.md (3,715 lines)
  • Bootstrap-hosting plan at /specs/deeptap-bootstrap-hosting.md
  • Thirty-two research artifacts in /specs/research/raw/
  • All key technology versions pinned (Go 1.25+, pgvector 0.8.2, chi v5.2.5, pgx v5.9.1, stripe-go v85, grpc-go v1.80.0 with CVE-2026-33186 patched)

Done (Session 1: Foundation)

  • Monorepo structure, Go workspace (go.work)
  • chi router v5.2.5 with verified 7-layer middleware stack (RequestID, OTel, slog, Recoverer, Prometheus, CORS, Auth)
  • Health endpoints (/v1/health, /v1/ready, /metrics)
  • Prometheus metrics with route-pattern labels (deeptap_http_requests_total, _request_duration_seconds, _requests_inflight, _panics_total)
  • OTel tracing with OTLP or stdout exporter
  • Structured logging via log/slog
  • 7-RPC NLP sidecar gRPC proto + Python stub server with Health service reporting SERVING
  • Docker Compose dev stack
  • GitHub Actions CI (lint, test-go with 80% coverage gate, test-sidecar, build, docker-smoke)
  • Four Go binaries compile (deeptap, deeptap-mcp, deeptap-cli, deeptap-tavily-shim)
  • Coverage above 80% on every foundation package (config 100%, health 100%, logging 100%, version 100%, middleware 98.9%, server 93.8%, tracing 90%)
  • End-to-end verified: compose stack boots, curl /v1/health returns 200, /metrics exposes deeptap_http_requests_total, sidecar Health RPC returns SERVING

Done (Session 2: Data Layer + Append-Only Ledger)

  • 12 golang-migrate migrations: extensions (vector 0.8.2, pgcrypto, btree_gin), organizations, api_keys, usage_ledger, facts, fact_evidence, source_pages, domain_profiles, client_domains, dmca_requests, accounts and journal_entries, seed chart-of-accounts
  • Append-only usage_ledger partitioned by month with 24 monthly partitions pre-created; BEFORE UPDATE and BEFORE DELETE triggers reject modification at the database tier
  • Double-entry accounts + journal_entries tables with a DEFERRABLE INITIALLY DEFERRED CONSTRAINT TRIGGER that enforces SUM(amount_cents) = 0 per txn_id at commit time
  • sqlc v1.30.0 generates typed Go code in internal/db/deeptapdb/ for every .sql query file; pgx v5 driver package
  • pgxpool wrapper (internal/db/pgx.go) with MaxConns = max(4, runtime.NumCPU()), MinConns=2, 30s HealthCheckPeriod, 1h MaxConnLifetime, pgvector type registration on AfterConnect, and statement-mode switch (cache_statement by default, cache_describe when DEEPTAP_POSTGRES_PGBOUNCER=true)
  • go-redis v9.18.0 client with 10 pool size, 2 min idle connections, 5s dial timeout, 500ms read/write timeouts, and a Ping(ctx) probe
  • internal/eventbus/ package with a single Publisher/Subscriber interface implemented by RiverBus (default, Postgres-backed, supports transactional PublishTx) and KafkaBus (opt-in, franz-go v1.20.7 + outbox)
  • LISTEN/NOTIFY cache-invalidation listener on a dedicated pgx.Conn (not the pool) with exponential-backoff reconnect on the deeptap_cache channel
  • /v1/ready runs Postgres, Redis, and the active event-bus probe in parallel via errgroup.WithContext bounded to a 3-second timeout; Kafka probe is gated behind DEEPTAP_EVENT_BUS=kafka
  • Docker Compose default profile drops Redpanda; make dev boots 9 services (deeptap, nlp-sidecar, playwright-pool, postgres pgvector/pgvector:0.8.2-pg17, redis 7-alpine, clickhouse, prometheus, grafana, otel-collector). make dev-kafka activates the kafka compose profile and adds Redpanda as a tenth service
  • 12 integration tests across test/integration/{db,redis,eventbus}_test.go using testcontainers-go against pgvector/pgvector:0.8.2-pg17, redis:7-alpine, and (for the Kafka profile) redpandadata/redpanda:latest; covers migrations apply, pgvector 0.8.2 present, append-only UPDATE/DELETE rejected, accounts seeded, journal balanced-sum trigger, Redis Ping/SetGet/PubSub, River Publish enqueues a job, River PublishTx rollback removes it, PublishTx commit persists it, empty event type rejected
  • End-to-end verification: make migrate-up applies all 12 migrations cleanly; /v1/health returns 200; /v1/ready returns 200 with postgres + redis + river probes green; DEEPTAP_EVENT_BUS=kafka boot path verified against the kafka profile

Done (Session 3: Search Provider Adapters)

  • SearchProvider interface plus a typed Registry at internal/search/ that picks the safe adapter (Brave) for provider_class=safe and the fast adapter (Serper) for provider_class=fast; returns ErrProviderUnavailable when an adapter is not configured
  • Brave adapter hitting https://api.search.brave.com/res/v1/web/search with X-Subscription-Token, Accept: application/json, Cache-Control: no-cache; gobreaker/v2 circuit breaker trips on 5 consecutive failures; 2 retries on 429/5xx with full-jitter backoff and Retry-After respected
  • Serper adapter hitting https://google.serper.dev/search via POST with X-API-KEY; gobreaker/v2 trips on 6 consecutive failures (distinct threshold from Brave); same retry policy
  • URL Normalize plus Dedupe at internal/search/normalize.go: lowercase scheme/host, strip default ports, collapse repeated slashes, alpha-sort query params, strip fragment, drop tracking params (utm_*, gclid, fbclid, mc_cid, mc_eid, msclkid, ref, ref_src), prefer https when an http/https pair collapses to the same (host, path, query)
  • API-key middleware at internal/middleware/apikey.go reads Authorization: Bearer dt_live_*, SHA-256 hashes the full token, resolves the row via sqlc GetAPIKeyWithOrg, and attaches {orgID, apiKeyID, providerClassFromKey, providerClassFromOrg, providerAckAt} to the request context
  • POST /v1/search handler with depth=1 only: validates body (query 1..1024 chars, depth == 1, otherwise 400 unsupported_depth), resolves provider_class per body -> key -> org, returns 403 fast_provider_not_acknowledged when fast is requested and organizations.provider_ack_at IS NULL, returns 503 provider_unavailable when the selected adapter has no configured API key
  • Credit pricing in the handler: depth=1 safe = 1.0 credit, depth=1 fast = 0.5 credit; both write an append-only row to usage_ledger via ledger.Append (unique request_id enforces idempotency at the database tier)
  • Integration tests at test/integration/search_handler_test.go covering the four business paths (safe happy path with Brave mock, fast-without-ack 403, fast-with-ack Serper at 0.5 credits with usage_ledger row asserted, missing auth 401); 14 unit-test packages green; live compose smoke verified for missing auth, unsupported depth, and missing Brave key paths per audit/s03-e2e-verification-2026-04-22.md

Done (Session 4: Fan-Out + Dedup + URL Normalization)

  • POST /v1/search accepts a sub_queries array (up to DEEPTAP_FANOUT_MAX_SUBS, default 8) alongside the primary query; empty or missing falls back to single-query behaviour
  • search.Fanout runs sub-queries in parallel against the resolved provider adapter with a semaphore-bounded concurrency limit (DEEPTAP_FANOUT_MAX_INFLIGHT, default 4), a per-call timeout (DEEPTAP_FANOUT_PER_CALL_TIMEOUT, default 5s), and an overall request deadline (DEEPTAP_FANOUT_OVERALL_TIMEOUT, default 10s)
  • search.Dedupe merges results across sub-queries using first-seen title and snippet, maximum observed score, and the existing URL Normalize collapse (lowercase scheme/host, tracking-param strip, http/https collapse)
  • One usage_ledger row per user request regardless of sub-query count; idempotency preserved via the existing request_id unique constraint
  • Partial failures (some sub-queries error or time out, at least one succeeds) surface as warnings[] in the 200 response with the failing sub-query index and error class; full failure returns 504 upstream_timeout when the request deadline is exceeded and 502 upstream_error when every sub-query fails with a non-timeout error
  • Integration tests at test/integration/search_fanout_test.go cover parallel fan-out happy path, dedup across overlapping sub-queries, per-call timeout surfacing as warning, overall-deadline 504, all-fail 502; unit tests across internal/search/ packages green

Done (Session 5: Go-Owned Fetch + Trafilatura Sidecar)

  • Go-side fetch client at internal/fetch/ with User-Agent DeepTapBot/1.0 (+https://deeptap.ai/bot), redirect cap 5, size cap 10 MiB, and a content-type gate that admits text/*, application/xhtml+xml, and the application/*xml family
  • In-process sync.Map robots.txt cache with 1-hour positive TTL and 5-minute negative TTL; per-domain probe order walks AI-specific tokens first (DeepTapBot, GPTBot, ClaudeBot, Claude-SearchBot, anthropic-ai) before falling back to the * user-agent rules
  • Extractor interface at internal/extract/ with two implementations: SidecarExtractor (gRPC to the Python Trafilatura sidecar at DEEPTAP_SIDECAR_ADDR) and CloudRunExtractor (HTTPS to a Cloud Run Trafilatura function) behind a factory keyed on DEEPTAP_EXTRACTOR_BACKEND (sidecar, cloudrun, or auto)
  • POST /v1/extract handler fans out per URL with robots -> fetch -> extract -> optional source_pages upsert, bounded by DEEPTAP_EXTRACT_MAX_INFLIGHT (default 4), DEEPTAP_EXTRACT_OVERALL_TIMEOUT (default 20s), and DEEPTAP_EXTRACT_MAX_URLS (default 10); per-URL failures surface as warnings[]; all-robots-deny returns 422; all-timeout returns 504; X-DeepTap-Attestation header required for mode=full
  • Exactly one usage_ledger row per request regardless of URL count; pricing is 0.5 credit per URL for mode=excerpt and 2.0 credits per URL for mode=full; mode defaults to excerpt
  • Python sidecar now ships trafilatura==2.0.0 and the real Parse RPC is wired on services/nlp-sidecar/
  • Integration tests at test/integration/extract_handler_test.go cover excerpt happy path, attestation gate, robots-deny 422, timeout 504, per-URL warning surfacing, and ledger-row accounting across both extractor backends

Done (Session 6: Playwright Pool + Domain Strategy Cache)

  • Node.js Playwright pool service at services/playwright-pool/ (Fastify + Chromium via mcr.microsoft.com/playwright:v1.49.0-jammy) exposing POST /render, GET /health, and GET /metrics; shared-secret auth via the X-Internal-Token header rejects unauthenticated callers before any browser work
  • Context pool with configurable size (default 4); overflow requests return 503 pool_exhausted instead of queueing unboundedly; per-render timeout and body-size caps match the Go fetch client
  • Per-domain strategy cache in Postgres (domain_strategies table) with a rolling empty-rate counter over the last DEEPTAP_STRATEGY_WINDOW samples (default 50); flip-on to Tier 2 at 50% empties with a minimum of 5 samples (DEEPTAP_STRATEGY_MIN_SAMPLES), flip-off back to Tier 1 at 20%; empty defined as extracted content shorter than DEEPTAP_STRATEGY_EMPTY_FLOOR_CHARS (default 200)
  • /v1/extract Tier 1/Tier 2 escalation wired in Go: Tier 1 runs the S05 fetch plus Trafilatura path (cheap); if the domain strategy says escalate (or Tier 1 returned an empty page), Tier 2 hits the Playwright pool for a rendered DOM and re-runs Trafilatura on the post-render HTML; successful Tier 2 extractions surface js_rendered in warnings[]; pool-down or pool-timeout surfaces playwright_unavailable and falls back to the Tier 1 result
  • Bootstrap mode (DEEPTAP_MODE=bootstrap) forces PlaywrightEnabled=false at config load; no Tier 2 ever runs in bootstrap, the domain strategy cache records samples but never escalates, and the Cloud Run extractor path handles every URL
  • New env vars: DEEPTAP_PLAYWRIGHT_POOL_URL, DEEPTAP_PLAYWRIGHT_SHARED_SECRET (REQUIRED in production), DEEPTAP_PLAYWRIGHT_TIMEOUT_MS, DEEPTAP_PLAYWRIGHT_ENABLED, DEEPTAP_STRATEGY_EMPTY_FLOOR_CHARS, DEEPTAP_STRATEGY_FLIP_ON, DEEPTAP_STRATEGY_FLIP_OFF, DEEPTAP_STRATEGY_MIN_SAMPLES, DEEPTAP_STRATEGY_WINDOW
  • Unit tests green across internal/extract/ escalation, internal/strategy/ flip thresholds, and the Node pool handlers; the full end-to-end integration test (TASK-13) requires a Playwright testcontainer and is deferred

Done (Session 7: Query Decomposition + LLM Policy)

  • Hand-written stdlib OpenRouter chat-completions client at internal/llm/openrouter/ with Authorization: Bearer, optional HTTP-Referer and X-Title attribution headers, pooled net/http.Transport, gobreaker/v2 circuit breaker (trips on 5 consecutive failures or 60% failure ratio over 20 requests), exponential retry with jitter on 408, 429, 500, 502, 503, 504 via APIError.Retryable(), and X-Generation-Id response-header propagation for audit correlation
  • Startup GET /api/v1/key ping wired in cmd/deeptap/main.go: 200 logs openrouter key verified and boots; 401 fails fast with openrouter key rejected (401); misconfiguration, refusing to boot; 402 logs a warning and continues (LLM-dependent paths return decomposition_failed until credits are topped up); 5xx or network error logs a warning and continues (the circuit breaker handles the next real call); missing DEEPTAP_OPENROUTER_API_KEY disables the decomposer and every /v1/search request runs the single-query path
  • Freshness classifier at internal/freshness/ producing four buckets (volatile, daily, standard, stable) via NFC normalization, 4-digit year extraction (historical vs. current/next-year signal), a volatile keyword union, structural patterns (price-or-score live, weather live, status live, breaking, is-still, future-event, current-role, stable-pattern), a daily keyword union, and a standard default; returns (Class, reason) where reason is a short machine-readable label
  • Per-organization LLM policy loader at internal/policy/ reading organizations.llm_policy JSONB into a typed LLMPolicy (require_zdr, require_data_collection_deny, providers[], models_allowed[], max_tokens, optional temperature); Secure-SKU clamp when organizations.tier == "secure" unconditionally sets require_zdr=true, require_data_collection_deny=true, and defaults providers to ["anthropic"] when empty; malformed JSONB does not fail open (surfaces policy_load_failed in warnings[] and continues with the permissive baseline)
  • OpenRouterDecomposer at internal/decompose/ with JSON-schema structured output ({subqueries: [{query, priority}]}), NFC-lowered dedup on trimmed query string, priority-descending stable sort, and truncation to DEEPTAP_DECOMPOSE_SUBQUERIES_D1 (default 2) for depth=1 or DEEPTAP_DECOMPOSE_SUBQUERIES_D23 (default 6) for depth=2/3
  • Model picker at internal/decompose/picker.go: depth=1 prefers DEEPTAP_DECOMPOSE_MODEL_DEPTH1 (default anthropic/claude-haiku-4.5); depth=2/3 prefers DEEPTAP_DECOMPOSE_MODEL_DEPTH23 (default anthropic/claude-sonnet-4.6); falls back to the first anthropic/* slug in policy.ModelsAllowed when the preferred slug is disallowed; returns ErrLLMPolicyViolation when no allowlisted anthropic model exists
  • Secure-SKU ZDR triple materialises in the OpenRouter request body as provider.zdr=true, provider.data_collection="deny", and provider.order=<allowlist> whenever the loaded policy requires any one of them; the three controls travel together, not separately
  • /v1/search handler at internal/api/search.go wires freshness classification (always runs, never fails), policy load (failure -> policy_load_failed in warnings, continue on zero-value policy), decomposition under a 10-second timeout (failure -> decomposition_failed in warnings, fall back to single-query path), and sub-query fan-out via search.Fanout; caller-supplied sub_queries always wins over decomposer output; depth=2 and depth=3 are rejected 400 unsupported_depth until DEEPTAP_ENABLE_DEPTH_GT1=true unlocks them in S10
  • Response envelope extended with freshness_class, freshness_reason, and nested decomposition {model, provider, sub_queries, tokens_prompt, tokens_completion, cost_usd, latency_ms, generation_id}; decomposition is omitted on the single-query path or when decomposition fails; depth-based credit multipliers (depth=2 = 3x, depth=3 = 7x) wired in the handler, gated until S10 activates deeper depths
  • Prometheus metrics: deeptap_decompose_requests_total{model, outcome} where outcome is one of ok, invalid_json, retry_succeeded, retry_failed, upstream_error, timeout, policy_violation; deeptap_decompose_duration_seconds{model} histogram; deeptap_decompose_subquery_count{depth} histogram; deeptap_decompose_tokens_total{model, kind} counter for prompt and completion tokens; deeptap_freshness_class_total{class} counter
  • OTEL span llm.openrouter.chat_completion with SpanKindClient and attributes llm.model, llm.provider, llm.prompt_tokens, llm.completion_tokens, llm.zdr, llm.latency_ms, llm.generation_id; errors recorded on the span via RecordError + SetStatus(codes.Error)
  • New env vars: DEEPTAP_OPENROUTER_API_KEY (required for decomposition), DEEPTAP_OPENROUTER_BASE_URL (default https://openrouter.ai/api/v1), DEEPTAP_OPENROUTER_REFERER + DEEPTAP_OPENROUTER_TITLE (optional attribution headers), DEEPTAP_OPENROUTER_TIMEOUT (default 30s), DEEPTAP_DECOMPOSE_MODEL_DEPTH1 (default anthropic/claude-haiku-4.5), DEEPTAP_DECOMPOSE_MODEL_DEPTH23 (default anthropic/claude-sonnet-4.6), DEEPTAP_DECOMPOSE_SUBQUERIES_D1 (default 2), DEEPTAP_DECOMPOSE_SUBQUERIES_D23 (default 6), DEEPTAP_ENABLE_DEPTH_GT1 (default false)

Done (Session 8: Reranking + Embeddings via Python Sidecar)

  • Python sidecar services/nlp-sidecar/rerank.py loads ms-marco-MiniLM-L6-v2 INT8 ONNX and serves the gRPC Rerank RPC as a cross-encoder that scores every (query, document) pair; services/nlp-sidecar/embed.py loads bge-small-en-v1.5 ONNX and serves the gRPC Embed RPC producing L2-normalized 384-dim float32 vectors; both are mounted on server.py, and a missing model at boot logs the absence and mounts a stub that returns UNIMPLEMENTED at RPC time rather than crashing the sidecar
  • Go internal/rerank/ package with a Reranker interface (Rerank, Healthz), SidecarReranker gRPC adapter against DEEPTAP_SIDECAR_ADDR, CohereReranker HTTPS POST /v2/rerank adapter against DEEPTAP_COHERE_API_KEY with one retry on 429 (honoring Retry-After) and a sync.Once bootstrap-warn on first use, and a mode-keyed factory that picks sidecar in production and Cohere in bootstrap; both adapters are wrapped in gobreaker/v2 with MaxRequests=3, Interval=60s, Timeout=30s, ReadyToTrip at 5 consecutive failures
  • Go internal/embed/ package with an Embedder interface (Embed, Healthz) exposing ModeQuery and ModePassage, SidecarEmbedder gRPC adapter, VoyageEmbedder HTTPS POST /v1/embeddings adapter against DEEPTAP_VOYAGE_API_KEY that sends output_dimension=384 on every request and errors when the response dimensionality is not exactly 384, and a mode-keyed factory
  • POST /v1/search rerank step between search.Dedupe merge and response write: caps at DEEPTAP_RERANK_MAX_DOCS (default 30), truncates snippets to DEEPTAP_RERANK_MAX_TEXT_CHARS (default 1024), runs under DEEPTAP_RERANK_TIMEOUT (default 1s), reorders results to the reranker's score order, stamps each surviving result with its rerank_score, and attaches a reranker {model, implementation, latency_ms, docs_scored, error} block to the response envelope; disable/nil/error skips silently and still returns 200
  • POST /v1/extract embed step after each successful per-URL extraction: runs under DEEPTAP_EMBED_TIMEOUT (default 500ms), writes the 384-dim pgvector into source_pages.embedding via the sqlc UpdateSourcePageEmbedding query, and is non-fatal on error
  • sqlc query UpdateSourcePageEmbedding plus a sqlc.yaml overrides entry mapping the vector column type to github.com/pgvector/pgvector-go.Vector so pgvector.NewVector([]float32{...}) round-trips cleanly through pgx
  • Prometheus metrics at internal/metrics/nlp.go: deeptap_rerank_requests_total{implementation, outcome}, deeptap_rerank_duration_seconds{implementation}, deeptap_rerank_docs_scored{implementation}, deeptap_rerank_failures_total{implementation, outcome}, and the matching deeptap_embed_{requests,duration,failures}_total family
  • /v1/ready probes add reranker.Healthz and embedder.Healthz under a 1-second timeout each in production mode; bootstrap mode relies on HTTPS reachability at call time instead
  • docker-compose pins nlp-sidecar to mem_limit: 1g and adds a grpc-health healthcheck; deeptap service now declares depends_on: nlp-sidecar: {condition: service_healthy} so the Go API will not start until the sidecar passes its healthcheck
  • New env vars: DEEPTAP_ENABLE_RERANK (default true), DEEPTAP_ENABLE_EMBED (default true), DEEPTAP_RERANK_TIMEOUT (default 1s), DEEPTAP_EMBED_TIMEOUT (default 500ms), DEEPTAP_RERANK_MAX_DOCS (default 30), DEEPTAP_RERANK_MAX_TEXT_CHARS (default 1024), DEEPTAP_COHERE_API_KEY (required in bootstrap mode when rerank is enabled), DEEPTAP_VOYAGE_API_KEY (required in bootstrap mode when embed is enabled); bootstrap-mode Load() rejects missing keys when the matching feature is enabled

Done (Session 9: Prompt Injection Firewall)

  • Go internal/firewall/ package with five load-bearing files: strip.go + patterns.go implement the Layer 1 pre-extraction HTML stripper matching 13 documented patterns (unicode_tag_chars, zero_width, bidi_override, html_comment, css_display_none, css_visibility_hidden, css_font_size_zero, css_opacity_zero, css_text_indent_offscreen, css_position_offscreen, meta_injection, aria_hidden gated by DEEPTAP_STRIP_ARIA_HIDDEN, script_style); scorer.go defines the Scorer interface; sidecar_scorer.go implements SidecarScorer against the ScoreInjection gRPC RPC under DEEPTAP_SCORE_INJECTION_TIMEOUT (default 500ms) with the text truncated to DEEPTAP_SCORE_INJECTION_MAX_CHARS (default 4096); noop_scorer.go implements the bootstrap-mode NoopScorer returning (0.0, nil); factory.go wires a mode-keyed factory picking SidecarScorer in production and NoopScorer in bootstrap; safe_mode.go implements the Layer 3 SafeModeOff | SafeModeAgent response mutator that nulls untrusted_content on every result at the HTTP boundary when agent mode is selected and stamps safe_mode_applied on the envelope
  • Python sidecar services/nlp-sidecar/score_injection.py loads meta-llama/Prompt-Guard-2-86M when the Meta-gated weights are present (baked into the image at build time via the HF_TOKEN Dockerfile build arg or downloaded locally via services/nlp-sidecar/scripts/download-models.sh), emits heuristic reasons alongside the score, and mounts a stub that returns UNIMPLEMENTED at RPC time when weights are absent so a missing model is not a boot crash
  • POST /v1/extract pipeline is now fetch -> Strip -> Extract -> Score -> persist; the sqlc UpdateSourcePageInjection query writes injection_score and the reasons list to source_pages on every extract (non-fatal on error, same pattern as UpdateSourcePageEmbedding)
  • Response envelope extended with prompt_injection_score (max across results), unsafe_reasons[] (deduped), sanitized_content_bytes, untrusted_content_bytes, a firewall block with layer1_stripped_bytes, layer1_patterns_matched[], layer2_model, layer2_implementation, layer2_latency_ms, layer2_docs_scored, and safe_mode_applied; every per-result object carries sanitized_content, untrusted_content, trusted_snippet, prompt_injection_score, and unsafe_reasons
  • Request body accepts safe_mode: "off" | "agent"; default pulled from DEEPTAP_SAFE_MODE_DEFAULT (ships at off); agent mode nulls untrusted_content on every result at the HTTP boundary
  • /v1/search firewall hook is a documented no-op until S10 wires extraction into depth=1 search; today provider snippets from Brave and Serper are not attacker-controlled through our pipeline so Layer 1 and Layer 2 would have nothing to do
  • Prometheus internal/metrics/firewall.go registers deeptap_firewall_l1_strips_total{pattern}, deeptap_firewall_l1_stripped_bytes, deeptap_firewall_l2_requests_total{implementation, outcome}, deeptap_firewall_l2_duration_seconds{implementation}, deeptap_firewall_l2_score_bucket{implementation}, and deeptap_firewall_unsafe_pages_total{provider_class} with the unsafe threshold pulled from DEEPTAP_UNSAFE_SCORE_THRESHOLD (default 0.7)
  • /v1/ready probes add Scorer.Healthz under a 1-second timeout in production mode
  • services/nlp-sidecar/Dockerfile accepts HF_TOKEN as a build argument so CI and local builds can fetch the Meta-gated Prompt-Guard-2-86M weights; services/nlp-sidecar/scripts/download-models.sh is the shared helper for local development
  • New env vars: DEEPTAP_ENABLE_FIREWALL_L1 (default true), DEEPTAP_ENABLE_FIREWALL_L2 (default true), DEEPTAP_SCORE_INJECTION_TIMEOUT (default 500ms), DEEPTAP_SCORE_INJECTION_MAX_CHARS (default 4096), DEEPTAP_STRIP_ARIA_HIDDEN (default false), DEEPTAP_SAFE_MODE_DEFAULT (default off), DEEPTAP_UNSAFE_SCORE_THRESHOLD (default 0.7)

Done (Session 10: Depth Modes + Facet Ledger)

  • Go internal/depth/ package with three composers (depth1.go, depth2.go, depth3.go) built on one Composer struct defined in depth.go, plus shared primitives stage.go (RunStage = Fanout -> per-URL fetch + Layer 1 strip + Trafilatura extract + Layer 2 score -> Rerank), ledger.go (facet ledger with SeedFacets heuristic, coverage accumulation, marginal-lift saturation), and reflect.go (OpenRouterReflector against anthropic/claude-sonnet-4.6 with JSON-schema structured output, retry-once on malformed JSON, top-15 findings at 400 chars each capped at DEEPTAP_REFLECTION_INPUT_MAX_CHARS default 12000, Secure-SKU ZDR triple in the request body)
  • Composer.RunDepth1 runs one stage under DEEPTAP_DEPTH1_TIMEOUT (default 7s, SLO 7s p95); stamps depth=1, rounds_executed=1, stop_reason="ok"
  • Composer.RunDepth2 runs round 1, reflects, dedupes proposals against the ledger, runs round 2, merges with score-max dedup + re-rerank, under DEEPTAP_DEPTH2_TIMEOUT (default 18s, SLO 20s p95); stop reasons ok, llm_stop, reflector_error, saturation_deduped, timeout, depth1_fallback
  • Composer.RunDepth3 loops up to DEEPTAP_DEPTH3_MAX_ROUNDS (default 4) rounds under DEEPTAP_DEPTH3_TIMEOUT (default 110s, SLO 120s p95) with an SSE OnEvent callback; stops on hard timeout with 2s grace, Saturated(DEEPTAP_SATURATION_DELTA default 0.05, consecutive=2), LLM-stop advisory, max_rounds, or reflector_error in that precedence
  • Facet ledger SeedFacets heuristic: vs|versus -> comparative, history of|origin of|when was -> historical, what is|define|definition of -> definitional, current|latest|today|now -> current-status, no match -> general; per-facet coverage normalizes each result's rerank_score against DefaultFacetSaturation=5.0 and caps at 1.0; overall coverage is the unweighted mean; Contains, MarginalLift, Saturated, Export are the public readers
  • /v1/search dispatches depth=1 and depth=2 through the composer; depth=3 returns 400 use_research_endpoint pointing callers at /v1/research; envelope adds depth, rounds_executed, stop_reason, and (when include_ledger=true and IncludeLedgerAllowed) the ledger block; decomposition stays as-is from S07
  • POST /v1/research at internal/api/research.go is a POST text/event-stream endpoint with a single writer goroutine fed by a bounded channel plus a separate 15-second heartbeat goroutine (DEEPTAP_SSE_HEARTBEAT), emitting round_start, partial_results, facet_update, reflection, saturation, final, error, and ping events; client-disconnect cancels the orchestrator context; usage_ledger row (8.0 safe / 4.0 fast) is written AFTER the final event
  • Decomposer schema extension: optional facets array per sub-query exported via SubQueryFacets map[string]string; the reflector and ledger share facet attribution through this field
  • Prometheus metrics at internal/metrics/depth.go: deeptap_depth_rounds_total{depth}, deeptap_depth_saturation_total{reason}, deeptap_depth_duration_seconds{depth} (SLO-aligned buckets at 7s / 20s / 120s), deeptap_reflection_requests_total{outcome}, deeptap_facet_coverage_average{depth}, deeptap_sse_events_total{event}
  • OTEL spans depth.round (per round), depth.reflect (per reflection), depth.rerank (per rerank pass) with attributes depth, round, sub_query_count, docs_scored, coverage, plan.new_sub_queries, plan.stop, model, latency_ms
  • DEEPTAP_ENABLE_DEPTH_GT1 default flipped to true; config.Load() derives DepthGT1Disabled=true at startup when DEEPTAP_MODE=bootstrap AND DEEPTAP_OPENROUTER_API_KEY=="" so depth>=2 falls back to depth=1 with the reflection_unavailable_bootstrap warning; this is NOT a startup error
  • New env vars: DEEPTAP_DEPTH_MAX_URLS_PER_ROUND (default 10), DEEPTAP_DEPTH_MAX_INFLIGHT_EXTRACT (4), DEEPTAP_DEPTH_MAX_FANOUT (4), DEEPTAP_DEPTH1_TIMEOUT (7s), DEEPTAP_DEPTH2_TIMEOUT (18s), DEEPTAP_DEPTH3_TIMEOUT (110s), DEEPTAP_DEPTH3_MAX_ROUNDS (4), DEEPTAP_SATURATION_DELTA (0.05), DEEPTAP_REFLECTION_INPUT_MAX_CHARS (12000), DEEPTAP_REFLECTION_TIMEOUT (10s), DEEPTAP_SSE_HEARTBEAT (15s)

Done (Session 11: Caching + Freshness TTLs)

  • Go internal/cache/ package ships nine load-bearing files alongside the existing S02 redis.go and invalidation.go (eleven files total under internal/cache/): keys.go defines FullKey, SubQueryKey, ExtractKey, and FactKey with a v1: version prefix plus sha256 hashes; normalize.go runs NFC + lowercase + whitespace collapse + optional leading-article strip; manager.go is the tiered Manager with msgpack v5 encoding, byurl:{sha256} and bydomain:{sha256} reverse indices written through one Redis pipeline per store, and corrupt-key cleanup that deletes any payload that fails to decode; ttl.go exports DetermineTTL mapping volatile=5m, daily=1h, standard=4h, stable=24h with MinFullTTL=60s and MinExtractTTL=15m floors and a 1-hour sub-query cap; singleflight.go coalesces concurrent identical requests under DEEPTAP_CACHE_SF_TIMEOUT; invalidator.go opens a dedicated pgx.Conn, issues LISTEN deeptap_cache + LISTEN deeptap_dmca_suppress, decodes JSON payloads, and runs handlers under a panic-recover; fact_noop.go defines FactCache + NoopFactCache; tombstone.go writes the 60-second negative-cache tombstone on adversarial pages; metrics_sink.go emits the deeptap_cache_* Prometheus family
  • Depth orchestrator integration: depth1.go/depth2.go/depth3.go compute FullKey from (normalized query, depth, provider_class, country, language, safe_mode), call LookupFull at the top, call NoopFactCache.Lookup after full-miss, singleflight-wrap the pipeline on miss, and call StoreFull on success with the TTL derived from the freshness class; DepthResult now carries CacheHit, CacheHitType, CacheKeysHit
  • ResearchStage.RunStage does a per-sub-query batch MGet and only calls the provider for misses; writes back per success via StoreSubQuery; per-URL extract caching uses LookupExtract to skip fetch + Trafilatura on hit (the firewall still re-scores the cached text so a model update defends against stale injection classifications) and StoreExtract on miss-then-success; a successful extract cache hit emits the extract_cache_hit warning
  • /v1/extract handler integrates the same per-URL extract cache and emits the same warning; /v1/search envelope, /v1/research final event, and a new cache_hit SSE event all carry cache_hit, cache_hit_type in {full, subquery, extraction, fact, miss}, and cache_keys_hit[]
  • Negative-cache tombstone: when any per-result prompt_injection_score is at or above UnsafeScoreThreshold, the envelope is NOT stored; a 60-second tombstone at the same FullKey returns a shaped empty-results envelope with cache_hit=true, cache_hit_type="full", and unsafe_reasons populated so hot retries on adversarial pages never reach the upstream provider
  • Cross-instance DMCA invalidation: pg_notify('deeptap_dmca_suppress', '{"type":"suppress","url":"...","domain":"..."}') fans out to every DeepTap instance, looks up the byurl:{sha256} and bydomain:{sha256} reverse-index sets, and deletes every cache key that referenced the suppressed URL or domain within a 1-second SLO (tracked as deeptap_cache_invalidation_latency_seconds). Single-key eviction rides the deeptap_cache channel with {"type":"invalidate_key","key":"..."}
  • Production-mode and bootstrap-mode /v1/ready add a 100-millisecond Redis PING cache probe that returns 503 on failure
  • Prometheus internal/metrics/cache.go registers deeptap_cache_requests_total{tier, outcome} (tier: full|subquery|extract|fact; outcome: hit|miss|error|bypass), deeptap_cache_lookup_duration_seconds{tier}, deeptap_cache_store_bytes{tier}, deeptap_cache_evictions_total{reason} (reasons: dmca_url|dmca_domain|key|ttl|size_cap), deeptap_cache_singleflight_share_total, deeptap_cache_full_hit_latency_seconds, deeptap_cache_invalidation_latency_seconds
  • Billing unchanged: cache hits write the full usage_ledger credit cost (research artifact 25, Tavily parity) because the customer value is the answer, not the path
  • Redis 7 required for EXPIRE ... NX|XX|GT|LT option flags used by StoreFull
  • New env vars: DEEPTAP_CACHE_ENABLED (default true), DEEPTAP_CACHE_FULL_TIER (true), DEEPTAP_CACHE_SUBQUERY_TIER (true), DEEPTAP_CACHE_EXTRACT_TIER (true), DEEPTAP_CACHE_MIN_FULL_TTL (60s), DEEPTAP_CACHE_MIN_EXTRACT_TTL (15m), DEEPTAP_CACHE_MAX_VALUE_BYTES (262144), DEEPTAP_CACHE_STRIP_LEADING_ARTICLES (false), DEEPTAP_CACHE_SF_TIMEOUT (30s)

Done (Session 12: Rate Limiting + Concurrency)

  • Go internal/ratelimit/ package with three load-bearing files: lua/gcra.lua is a 40-line atomic Generic Cell Rate Algorithm Lua script embedded via go:embed; gcra.go wraps it in redis.NewScript (stable SHA1 across callers), preloads via Script.Load at boot with a WARN-log-on-fail fallback, runs each Allow under a 50 ms per-op timeout with a NOSCRIPT retry once, and fails OPEN on Redis error by returning Decision{Allowed: true} alongside a non-nil error; a MetricsSink interface keeps the package free of a metrics-package import; tier.go holds the canonical Tiers map and ResolveLimit(orgTier, rateLimitOverride) that falls back to Free on unknown tiers and lets a positive per-key api_keys.rate_limit_override replace CPS with Burst = 2 * CPS
  • Tier table (from specs/PROJECT-CONTEXT.md): Free 10 cps / 20 burst / 5 general / 2 depth3; Starter 50 / 100 / 20 / 5; Growth 200 / 400 / 50 / 10; Scale 500 / 1000 / 100 / 25; Secure 500 / 1000 / 100 / 25; Enterprise 1000 / 2000 / 200 / 50
  • Go internal/middleware/ratelimit.go mounts the GCRA middleware AFTER APIKey and BEFORE the cache layer so a cache hit still counts against the per-key budget; stamps X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset on every admission; renders a 429 RFC 7807 application/problem+json body with type=https://deeptap.ai/errors/rate_limited, a Retry-After header (ceiling of retry_after_ms / 1000, floored at 1), and a limits.rate block on denial; stashes the admission *Decision + resolved Tier on the request context via WithRateLimitDecision so /v1/search, /v1/extract, and /v1/research envelopes stamp a rate_limit: {limit, remaining, reset_ms} block from the same numbers used on the headers
  • Go internal/middleware/concurrency.go implements the per-org in-flight counter keyed on v1:conc:<org_id>:<bucket>: /v1/research maps to the depth3 bucket, every other /v1/* endpoint maps to general; INCR on entry with an atomic DECR rejection on saturation and a defer rdb.Decr(context.Background(), key) release so a handler panic still returns the slot (Recoverer above the chain catches the panic; Go's defer runs during the unwind); 503 RFC 7807 with type=https://deeptap.ai/errors/concurrency_exhausted + limits.concurrency{bucket, limit, in_flight} on saturation; Redis error on INCR fails OPEN
  • Go internal/auth/ctx.go introduces AuthContext + WithAuthContext / AuthContextFrom as a thin context-access shim so the ratelimit middleware does not have to import internal/middleware (the apikey middleware is still the authoritative producer of org / key identifiers)
  • Go internal/api/errors.go adds the Problem RFC 7807 struct, WriteProblem helper, and ErrTypeRateLimit + ErrTypeConcExhausted constants reused by both middlewares (the string literals are duplicated inside internal/middleware/ to avoid an api <- middleware <- api import cycle)
  • cmd/deeptap/main.go boots ratelimit.NewLimiter(redisClient, logger).WithMetrics(rateLimitMetrics), runs Limiter.LoadScript(ctx) with a WARN log on failure, and the chi chain order is now RequestID -> OTEL -> slog -> Recoverer -> Prom -> CORS -> APIKey -> RateLimit -> Concurrency -> handler
  • Prometheus internal/metrics/ratelimit.go registers deeptap_ratelimit_requests_total{tier, outcome} (outcomes allowed|denied|redis_error|redis_timeout), deeptap_ratelimit_decision_duration_seconds, deeptap_ratelimit_retry_after_ms, deeptap_concurrency_inflight{bucket} gauge, deeptap_concurrency_rejections_total{bucket}, and deeptap_ratelimit_redis_errors_total
  • 100-goroutine-vs-50-burst concurrency test against a real Redis container verifies the Lua compare-and-swap atomicity (exactly 50 admissions, exactly 50 denials); a separate test covers the panic-safe DECR path by mounting Concurrency downstream of Recoverer and a handler that panics and asserting the Redis counter returns to zero
  • Fail-OPEN policy is deliberate: rate limiting is an SLA lever, not a correctness gate; a Redis outage must not cause an API outage
  • New env vars: DEEPTAP_RATELIMIT_ENABLED (default true), DEEPTAP_RATELIMIT_BURST_2X (default true), DEEPTAP_CONCURRENCY_ENABLED (default true), DEEPTAP_CONCURRENCY_DEPTH3_OVERRIDE (default 0 = use tier)

Done (Session 13: MCP Server + Tavily Shim)

  • POST /v1/map endpoint wired at the DeepTap API via internal/mapsvc/: a two-phase orchestrator (Phase 1 reads robots.txt Sitemap: directives then falls back to <scheme>://<host>/sitemap.xml; Phase 2 escalates to a bounded HTML crawl when sitemap yield is below DEEPTAP_MAP_HTML_MIN_YIELD=10) plus post-processing that unions + normalizes + dedupes + filters (allow_external, include_subdomains, exclude[]) + truncates to DEEPTAP_MAP_LIMIT=1000; response carries results[], sources breakdown (robots_sitemap_urls, sitemap_urls, html_crawl_urls, pages_fetched), dropped, truncated, credits_used; one usage_ledger row per request at 1.0 credit safe / 0.5 fast
  • cmd/deeptap-mcp/ binary serves the Model Context Protocol with --transport stdio|http (SSE transport is rejected explicitly per MCP 2025-11-25); HTTP mounts the MCP handler at /mcp + /mcp/ with /healthz served separately and requires an Mcp-Api-Key header enforced by APIKeyMiddleware before the MCP handler runs; internal/mcp/ ships types.go (jsonschema-tagged SearchIn/Out, ExtractIn/Out, FactsIn/Out), client.go (APIClient.doJSON + RESTError), handlers.go (three tool handlers with attestation gate on deeptap_extract mode=full), server.go (NewServer), middleware.go (APIKeyMiddleware), and server_test.go (in-memory transport coverage)
  • cmd/deeptap-tavily-shim/ binary listens on :8082 for Tavily-wire POST /search + POST /extract + POST /map; internal/shim/ (translate_search.go, translate_extract.go, translate_map.go, translate_response.go, handler.go, country.go with embedded 255-entry ISO-3166 + alias lookup) translates Tavily requests into DeepTap and projects responses back into Tavily's exact wire shape (decimal response_time, null follow_up_questions, images passthrough, usage.credits); X-DeepTap-Compat-Notes response header lists every field-level translation; callers that send X-DeepTap-Compat-Mode: strict receive 409 Conflict when any compat note would have fired; the shim calls DEEPTAP_INTERNAL_URL with DEEPTAP_SHIM_KEY as its own bearer
  • Three new integration docs under docs/integrations/: claude-desktop.md (macOS/Linux direct binary, Windows double-backslash paths, remote via mcp-remote), claude-code.md (claude mcp add CLI with local/user/project scopes and Streamable HTTP), cursor.md (.cursor/mcp.json for stdio and remote streamable-http, with warning on Cursor's 40-tool global cap)
  • migrations/0014_map_jobs.sql scaffolds an async /v1/map mode for a future session; the current endpoint is synchronous and returns partial results with truncated=true on timeout rather than a 504
  • New env vars: DEEPTAP_MAP_MAX_DEPTH (default 2), DEEPTAP_MAP_MAX_BREADTH (default 200), DEEPTAP_MAP_LIMIT (default 1000), DEEPTAP_MAP_HTML_MAX_BYTES (default 2 MiB), DEEPTAP_MAP_SITEMAP_MAX_BYTES (default 50 MiB), DEEPTAP_MAP_TIMEOUT (default 20s), DEEPTAP_MAP_HTML_CONCURRENCY (default 8), DEEPTAP_MAP_HTML_MIN_YIELD (default 10)
  • Prometheus metrics: deeptap_map_sitemap_phase_seconds, deeptap_map_html_phase_seconds, deeptap_map_total_seconds, deeptap_map_urls_discovered_total{source}, deeptap_map_pages_fetched_total

Done (Session 14: A2A + Payment Middleware with Idempotency)

  • Agent-to-Agent Protocol v1.0 server at internal/a2a/ with BuildAgentCard wiring four skills (web-search, fact-lookup, url-extract, site-map), four security schemes (bearer, dpop, x402, mpp), and two transports (JSON-RPC, HTTP-plus-JSON); AgentExecutor maps TextPart to search, DataPart.urls to extract, DataPart.subject to facts
  • x402 pay-per-call at internal/payments/x402.go with a 402 challenge emitting Base64url v2 JSON plus WWW-Authenticate: Payment with an HMAC-SHA-256 id plus canonical USDC addresses for Base mainnet and Sepolia; VerifyAndSettle orchestrates a facilitator call against /v2/x402/verify + /v2/x402/settle
  • Merchant Payments Protocol at internal/payments/mpp_charge.go + mpp_session.go: Authorization: Payment parser with temporary / Stripe-SPT / Lightning dispatch; MPP sessions use 32-byte opaque access tokens stored as SHA-256 hash in mpp_sessions.access_token_hash with monotonic cumulative accounting
  • DPoP with go-dpop v1.1.2: ES256 / RS256 allow-list enforced pre and post parse, Redis nonce rotation via dpop:nonce:{jkt} under a 5-minute TTL, jti replay guard via dpop:jti:{jkt}:{jti} SET NX, 60-second clock-skew tolerance
  • internal/middleware/payment_dispatch.go 7-branch dispatcher (Bearer > x402 > MPP > DPoP > 402) mounted BEFORE APIKey on /v1 when cfg.X402Enabled || cfg.MPPEnabled
  • internal/middleware/idempotency.go with SHA-256 body hash under a 1 MiB cap, Redis SET NX lock on idemp:lock:{scope}:{key}, replay-cached 2xx bytes carrying Idempotency-Replayed: true, 409 on body-hash mismatch, 409 on in-flight collision, fail-open on Redis unreachable with a degraded counter; panic-safe lock release via deferred Redis DEL
  • 4 new migrations: 0015_payment_attempts with 12 seeded monthly partitions + 60-day retention; 0016_mpp_sessions with DPoP-thumbprint + SHA-256 hashed access-token binding; 0017_dpop_nonces append-only audit; 0018_a2a_tasks JSONB history + artifacts

Done (Session 15: TrustPlane Integration with Fallback)

  • SPIFFE X.509-SVID auth path with live-preferred + local-fallback verification via github.com/spiffe/go-spiffe/v2 v2.6.0; 3 Postgres migrations (0019 trustplane_bundle_cache, 0020 trustplane_verifications monthly-partitioned, 0021 portfolio_accounts with portfolio-revenue seed); 10 TrustPlane env vars with fail-fast boot validation when TRUSTPLANE_ENABLED=true
  • internal/identity/spiffeid.go URI-SAN extractor + trust-domain allow-list; internal/identity/tp_client.go dedicated http.Transport with hard-deny vs. network-error distinction for local fallthrough; internal/identity/bundler.go + metrics.go periodic spiffebundle fetch + Postgres + Redis persistence + 7 Prometheus instruments labeled trust_domain only to avoid SPIFFE-ID cardinality explosion
  • internal/identity/trustplane.go Verifier with live-preferred + local-fallback, deny short-circuit, stale-reject / warn, x509svid.Verify offline, async audit writer with bounded channel and drop counter
  • internal/billing/portfolio.go#PortfolioLedger.PostToLedger balanced double-entry posting with banker-rounded markup and spiffe_id > catch-all account lookup; payment dispatcher TrustPlane branch runs BEFORE Bearer (401 deny, 503 stale / no_bundle, passes through when no client certificate present)
  • internal/identity/tls.go ClientCAProvider + BuildTLSConfig with VerifyClientCertIfGiven + GetConfigForClient memoized at a 10-second minimum; .well-known / health / metrics remain mTLS-optional
  • internal/identity/admin.go chi-mountable admin router at /portfolio/accounts, /trustplane/bundle{,/refresh}, base64 bundle export; deeptap-cli trustplane {verify,bundle-refresh,bundle-status} subcommands; Caddy-based trustplane-mock compose service on port 8089 (dev-default OFF)

Done (Session 16: Billing Engine)

  • 30-task branch on build/S16-billing-engine. Four independent billing surfaces, one unified ledger feed. go.mod bumps toolchain to Go 1.26.1 + adds github.com/stripe/stripe-go/v85 + github.com/johnfercher/maroto/v2 v2.4.0
  • 7 Postgres migrations: 0022_stripe_customers with tier + dunning counters, 0023_stripe_meters, 0024_stripe_credit_grants, 0025_stripe_meter_events_outbox with UUID identifier + partial index on sent_at IS NULL, 0026_stripe_webhooks with event_id dedup, 0027_portfolio_invoices with UNIQUE(account_id, period_start, period_end), 0028_reconciliation_reports with NUMERIC(6,4) variance_pct
  • internal/billing/ package: client.go facade with DryRun kill and Enabled() check; meters.go idempotent EnsureMeters with MeterCreator injection; metrics.go 10 Prometheus instruments (tier / event_name / outcome / endpoint labels only, never customer_id); outbox.go transactional EnqueueMeterEvent with UUID v7 Stripe-dedup identifier + pure validateEnqueueArgs helper; flusher.go FlushOnce drains FOR UPDATE SKIP LOCKED in batches of 100 via V2BillingMeterEventStreams.Create, updates sent_at before commit (prefers duplicate delivery over double-charge); subscriptions.go UpsertSubscriptionForOrg with tier + SubscriptionItem translation + proration_behavior=create_prorations + LoadRateCard JSON; creditgrants.go CreateCreditGrant + SyncCreditGrantFromWebhook (price_type=metered applicability, category=paid)
  • internal/billing/webhooks.go signature verify via webhook.ConstructEvent + ON CONFLICT DO NOTHING dedup + panic recovery + per-type dispatch fans out to webhooks_invoice.go (created / finalized / paid / failed with dunning counter at threshold 3), webhooks_subscription.go (sync subscription_id without flipping tier), webhooks_creditgrant.go (shadow table upsert), webhooks_meter_error.go (log + counter without marking outbox sent); empty-secret hard-fail is deliberate (operator misconfiguration, not a dev fallback)
  • internal/billing/portfolio.go + portfolio_monthend.go month-end aggregator; pdf.go maroto/v2.4.0 A4 portrait renderer with header + customer block + line-item table + totals + footer, byte-stable across re-renders; reconcile.go Classify 3-bucket drift detector (clean under 0.1 percent, variance 0.1 to 5 percent, error above 5 percent) + SumStripeMeterEvents iterator across every active (stripe_customer_id, meter_id) pair + RunPeriodReconcile end-to-end
  • internal/billing/jobs.go + per-account MonthEndWorker in portfolio.go: 4 River workers (FlushMeterEventsWorker, PortfolioMonthEndWorker fan-out, ReconcileStripeWorker with caller-injected StripeTotaler closure, WebhookReplayWorker); internal/jobs/schedules.go PeriodicJobs at 60-second flush, 24-hour reconcile at 04:00 UTC via DailyAtUTC, monthly portfolio run at 03:00 UTC day-one via MonthlyAtUTC, 5-minute webhook replay
  • cmd/deeptap/main.go wires the billing outbox hook into /v1/search + /v1/extract + /v1/map handlers post-ledger-commit; mounts POST /v1/billing/webhooks/stripe at top-level r (outside /v1 so no auth middleware consumes raw body); constructs BillingClient + Metrics + PortalCreator unconditionally so webhook verification and zero-state metrics work even with BillingEnabled=false; River client stops BEFORE http.Server.Shutdown so in-flight billing jobs commit cleanly
  • Handler hooks skip the TrustPlane path. When PaymentDispatchModeTrustPlane rides on the request context, the outbox enqueue fast-exits because portfolio customers are billed via the internal double-entry ledger and their traffic never touches Stripe. TrustPlane detection uses the payment-dispatch mode flag set by the S15 dispatcher branch; the handlers do not inspect api_key_id directly
  • cmd/deeptap-cli/billing.go adds billing reconcile / portfolio-invoice / outbox-status subcommands; config/rate-card.json seed with three tiers (Starter $19 / 1,000 credits, Growth $99 / 10,000 credits, Scale $499 / 100,000 credits) and four meters each with placeholder price_id fields the operator fills post-EnsureMeters
  • 8 new env vars documented under docs/DEPLOYMENT.md: DEEPTAP_STRIPE_SECRET_KEY, DEEPTAP_STRIPE_WEBHOOK_SECRET, DEEPTAP_STRIPE_CLIMATE_ENABLED, DEEPTAP_DASHBOARD_URL, DEEPTAP_BILLING_RATE_CARD_PATH, DEEPTAP_PORTFOLIO_INVOICE_FROM_EMAIL, DEEPTAP_PORTFOLIO_INVOICE_LOGO_PATH, DEEPTAP_BILLING_DRY_RUN

Done (Sessions 17 + 18: DMCA Compliance + Dashboard UI)

  • Postmark transactional email delivery on the primary send.deeptap.ai domain plus separate dmca.deeptap.ai reputation domain via Amazon Simple Email Service for the takedown workflow
  • DMCA intake at POST /v1/dmca/report with sworn-statement validation, ticket-creation flow, and cross-instance cache suppression that lands inside one second via the existing deeptap_dmca_suppress LISTEN/NOTIFY channel
  • Counter-notice state machine (received -> actioned -> counter_notice -> resolved) covering the DMCA 512(g) 10-to-14-business-day window
  • Next.js 15 dashboard shell with seven routes (/overview, /usage, /billing, /apikeys, /domains, /facts, /settings), session-cookie middleware, four recharts time-series views, fact-cache analytics panel reading the S19-onward metrics, and a Mintlify docs site with 18 MDX pages plus a single-source-of-truth trust-report renderer

Done (Session 19: Fact Data Model + Extraction Pipeline)

  • internal/facts/ package with the full extraction pipeline: 40 predicate aliases, five-tier classification cascade, NFKC normalization, ROUGE-L recall gate, atomic per-day budget consumer, OpenRouter JSON-schema extractor, sidecar VerifyClaim and LinkEntities callers, three-tier dedup (exact, trigram, insert), conflict flagger, and a Kafka-driven per-page worker
  • cmd/deeptap-fact-worker/ binary with graceful shutdown and Prometheus on port 9091
  • Four migrations land the dual trigram GIN index + partial unique canonical constraint + audit run table + per-day budget counter

Done (Session 20: Staleness Model + Re-Verification)

  • Pure-function decay model EffectiveConfidence(base, rate, lastConfirmed, now) = base * exp(-rate * days_since_confirmed) with NeedsReverification and InOpportunisticBand predicates
  • River-backed scheduler with three queues (reverify_priority 10 workers, reverify_scan 1, maintenance 1) and two periodic jobs (hourly scan, monthly partition creator)
  • cmd/deeptap-scheduler/ standalone binary plus a /internal/river UI gated by X-DeepTap-Internal header
  • Contradiction-resolution state machine with MinConfirmingDomains=2 default, fact-supersession audit log, and opportunistic re-verification hook for the band [threshold, threshold + 0.10)

Done (Session 21: Fact Cache Query Integration)

  • POST /v1/facts/query exposes the fact cache with subject or subject_qid lookups, optional predicate, min_confidence floor, include_evidence toggle, and max_results cap (default 10, hard cap 50)
  • Redis read-through cache fact:q:v1:<hash> with snappy-compressed msgpack payloads; TTL driven by the fastest-decaying tier in the result (permanent 24h, slow 12h, moderate 1h, fast 10m, volatile 2m)
  • One usage_ledger row at 0.1 credit per request irrespective of hit or miss; conflict-flagged facts surface with conflict_flag=true; superseded facts never returned
  • Depth=1 search pipeline gains a top-of-RunDepth1 fact probe via Composer.Prober that short-circuits decomposition + search + extract + rerank on a confident fact-cache hit; billing becomes 0.1 credit instead of 1.0

Done (Session 22: Fact Cache Analytics + Warming)

  • Four always-on feed workers under internal/feeds/ (wikidata, cve, ietf, edgar) scheduled by River from internal/jobs/feed_workers.go on independent cadences (Wikidata daily delta + monthly seed dump, CVE/NVD daily, IETF weekly, SEC EDGAR hourly during US trading); workers persist resume state in feed_ingestion_state (last-run summary) and feed_cursor (opaque per-feed resume blob) so a worker restart picks up where the previous run stopped
  • Demand-triggered feed routing keyed off a feed_registry table with topic_pattern glob matching against subject:predicate; cache-miss queries that match a pattern queue a synchronous EnqueueDemandFeed River job behind a single 500-credit-per-day shared budget so a hot pattern cannot starve quiet ones
  • New deeptap.facts Kafka topic produced by the S19 fact-extraction worker, the S21 fact-query handler (on opportunistic-reverify writes), and every S22 feed worker; CloudEvents-shaped envelope per new, confirmed, or contradicted fact with a per-event UUID dedup identifier so worker restarts cannot double-count downstream
  • deploy/clickhouse/schemas/022_fact_events.sql defines the consumer side: a Kafka engine table on deeptap.facts plus four materialized views that drive the dashboard's Fact Cache tab (mv_fact_hit_rate rolling 5-minute hit-vs-miss ratio, mv_facts_by_type 5-tier decay-class histogram, mv_staleness_distribution days_since_confirmed histogram, mv_conflict_rate rolling proportion of conflict_flag=true facts); ClickHouse is read-only from the dashboard's perspective and never feeds back into the Postgres facts row
  • Tiny upstream-shape fixtures land at test/fixtures/feeds/{wikidata-tiny.ndjson, cve-sample.json, rfc-index-tiny.txt, edgar-sample.atom} with a schema README; build-tagged integration test skeletons under integration_feeds and integration_clickhouse reserve test names + paths and skip cleanly until a Postgres + ClickHouse + Kafka testcontainer stack is wired in go.mod
  • Phase 2 (Fact Cache, S19-S22) is now complete
  • Phase 3 (Knowledge Layer, S23-S28) is now complete; all 28 sessions of the build have shipped

In Progress

  • No session currently active. Phases 1, 2, and 3 are all complete; the platform is in v1 release-candidate posture pending the Phase 4 hardening surfaces.

Done (Knowledge Layer Analytics, S28 FINAL)

S28 is the read-only analytics surface over the Knowledge Layer. Three new Postgres materialized views (mv_source_index_stats, mv_entity_coverage, mv_topic_coverage) refreshed daily at 06:00 UTC by a River-managed worker. New public endpoint GET /v1/trust/domain returns the full domain trust profile (tier, consensus ratio, fact counters, ASN metadata, sample evidence count, last-updated timestamp) at 0.05 credit per call with a 5-minute Redis cache. Three new dashboard handlers (source-index, entity-coverage, topic-coverage) read the matching MV with a 5-minute Redis cache and degrade gracefully on a backing-store outage. New internal/diversity package adds an Autonomous System Number tiebreaker the reranker uses to prefer cross-network confirmation over single-network confirmation at the same effective confidence. Three Prometheus instruments cover lookup outcome, refresh duration per view, and mean diversity per tier. OpenAPI 3.1 schema documents the public endpoint; full unit-test coverage on every Go-side surface. With S28 shipped, all 28 sessions of the eighteen-to-twenty-month build are complete.

Scoped

  • All 28 sessions are now Done. Phase 4 hardening surfaces (SOC 2 Type 2, additional protocol adapters, multi-region production topology) remain on the horizon.

Scoping

  • Phase 4 hardening specifics (Service Organization Control (SOC) 2 Type 2, additional protocol adapters, multi-region, long-term residency controls)

Potential / On Horizon

  • Service Organization Control (SOC) 2 Type 2 certification
  • Additional protocol adapters beyond MCP, A2A, x402, MPP, TrustPlane
  • Multi-region production topology beyond the bootstrap iad (us-east) region
  • Additional language-model provider integrations beyond the initial OpenRouter default
  • Stripe Connect reseller integration (explicitly cut from version 1)
  • Voice-agent-grade latency tier with warm Fly.io machines

Documentation

Document Audience What You Will Find
How It Works Product and engineering User journeys for developers, agents, and enterprise; the full search flow, /v1/map orchestrator, MCP server, Tavily shim, fact-cache flow, firewall flow, billing flow, error behavior
Architecture Developers Tech stack with pinned versions, repository map, system topology, data flow, persistence layer, machine-learning inference, protocols, middleware stack, authentication, billing, non-obvious decisions with rationale
Feature Map Product and stakeholders Every feature grouped by domain with user-facing benefit, session that delivers it, and status
Deployment DevOps and deployers Account prerequisites, environment variables, local development setup, bootstrap deployment, production migration triggers, database migrations, proto regeneration, local-node deployment, observability, DMCA domain setup, secrets rotation
Business Value Marketing, investors, executives The opportunity, the problem, the solution, why us and why now, target customers, how we make money, what is different, traction plan, team, status
Claude Desktop Integration Claude Desktop users Copy-paste claude_desktop_config.json snippets for macOS/Linux direct binary, Windows double-backslash paths, and remote via mcp-remote
Claude Code Integration Claude Code users claude mcp add CLI snippets for stdio (local/user/project scopes) and Streamable HTTP
Cursor Integration Cursor users .cursor/mcp.json snippets for stdio and streamable-http; warning on Cursor's 40-tool global cap

Contributing

DeepTap is developed in public by RelayOne. Contribution guidelines are available via the repository at github.com/RelayOne/deeptap. The monorepo uses a Go workspace (go.work), sqlc for typed database queries, golang-migrate for migrations, and golangci-lint plus ruff for linting.

Branch naming follows session/S<NN>-<short-slug> for session-aligned work and fix/<short-slug> for bugfixes. All commits that touch documentation go in their own commit per the project rules. Every specification in /specs/ is considered frozen unless explicitly reopened.

License

License will be declared before Session 1 lands. The current plan is dual-licensed: source-available for the Go API and dashboard, permissive license for the SDKs (TypeScript on npm, Python on PyPI, Go as a module), commercial license for local-node binaries. This will be finalized in the Session 1 commit.


Last updated: 2026-04-24 (S22; Phase 2 complete)

About

DeepTap — private-corpus search + deep research with seven-stage pipeline

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors