Agent-native web search. Three depth modes on one endpoint. Fixed prices. Transparent firewall. The same search backbone Anthropic chose for Claude.
DeepTap is a web search API built specifically for AI agents. It lives between shallow SERP wrappers like Tavily and Serper on one end, and slow deep-research products like Perplexity Sonar Deep Research on the other. One endpoint, three depth modes: a fast single-pass lookup when you need speed, a gap-analysis round-trip when you need completeness, and a multi-iteration research loop when you need thorough coverage. All three carry the same response envelope. All three return with a predictable, fixed credit cost. No dynamic pricing, no 4-to-250 credit surprise bills, no context-size multipliers.
Underneath the public search layer sits a compounding fact cache with decay models. Every extracted page is distilled into structured claims with provenance, evidence counts, source diversity, and a freshness decay curve tuned to the fact type (permanent, slow-decay, moderate-decay, fast-decay, volatile). When the next agent asks a question that has already been answered on the public web, DeepTap answers in under fifteen milliseconds for a tenth of a credit instead of triggering a fresh search. The cache is self-healing: organic query traffic constantly re-verifies the most popular facts, and a priority-scored River job queue re-verifies the rest inside a bounded daily budget.
DeepTap is agent-native by construction. Retail developers authenticate with traditional API keys and pay through Stripe subscriptions. Agents with no account pay per-call with x402 crypto micropayments or per-session with Merchant Payments Protocol (MPP) credentials bound to Demonstrating Proof of Possession (DPoP) keys. Portfolio and enterprise customers verify through TrustPlane with a SPIFFE Verifiable Identity Document, which routes billing to a double-entry ledger instead of Stripe. The API speaks Model Context Protocol (MCP) for Claude Desktop and Claude Code, Agent-to-Agent Protocol (A2A) v1.0 for agent marketplaces, and a Tavily-compatible shim for frictionless migration. For customers who cannot send queries to a public cloud, DeepTap ships a local-node Docker Compose deployment with mutual TLS, delta sync, and an offline fact-lookup mode that continues serving when the cloud is unreachable.
The production architecture is a pure Go API binary backed by a Python 3.11 NLP sidecar that exposes seven gRPC remote procedure calls. All machine-learning inference runs in the sidecar, never in the Go process: reranking via ms-marco-MiniLM-L6-v2 INT8, embedding via bge-small-en-v1.5, natural-language-inference grounding via MiniCheck-Flan-T5-Large, entity linking via ReFinED, and prompt-injection scoring via Prompt Guard 2 86M. Bootstrap deployments swap the sidecar for managed inference APIs (Cohere Rerank, Voyage Embed, Cloud Run Trafilatura) behind stable Go interfaces, giving the project scale-to-zero economics from day zero and an interface-compatible migration path to self-hosted infrastructure once volume justifies it.
The agent search market is actively destabilizing. In February 2026 Nebius acquired Tavily for roughly 275 million US dollars, leaving an install base of Tavily customers facing the usual post-acquisition uncertainty: price changes, feature freezes, support regressions, product reprioritization toward the acquirer's roadmap. In December 2025 Google sued SerpApi for Digital Millennium Copyright Act (DMCA) section 1201 violations, marking the first major legal action against Google-SERP-scraping search APIs and creating existential legal risk for every competitor built on the same scraping pattern. Bing's search API was deprecated. Google's Custom Search API remains locked behind low quotas and per-query fees that do not scale for agent workloads. The market is smaller than it looks and the legally clean vendors are smaller still.
DeepTap is built on Brave Search as the default provider. Brave is the same independent index that Anthropic chose for Claude's web search, with confirmed 86.7% result overlap versus Google on evaluated queries. It is licensed, not scraped, and carries no DMCA 1201 exposure. Serper remains available as an opt-in provider_class=fast tier for customers who explicitly acknowledge the legal tradeoff at the API-key level. Defaulting to Brave is the single most important provider decision in the product: it is the one that makes DeepTap safe to integrate into a regulated pipeline without a lawyer conversation.
Pricing is the second wedge. Tavily Research calls can consume anywhere from 4 to 250 credits depending on opaque internal heuristics; Perplexity Sonar Deep Research lists $410 to $1,320 per 1,000 calls and takes 30 seconds to 2 minutes per answer; Exa Auto runs roughly $7 per 1,000 calls. DeepTap charges exactly 1 credit for depth=1 safe, exactly 3 credits for depth=2 safe, exactly 8 credits for depth=3 safe, exactly 0.1 credit for a fact lookup, exactly 0.5 credit for an extract excerpt. Every response includes credits_used. You can budget an agent deployment on a spreadsheet without instrumentation.
The third wedge is transparency on the firewall. Tavily markets an agent-native prompt-injection firewall; its implementation is proprietary, returns no injection score, no reasons array, and no sanitized-content field. The defender cannot audit what was blocked. DeepTap implements a layered firewall and exposes every scoring signal in the response envelope: prompt_injection_score (0.00 to 1.00), unsafe_reasons[] (machine-readable tags), sanitized_content (extracted text with injection vectors stripped), trusted_snippet (provider-supplied metadata that never touched untrusted HTML), and untrusted_content (the raw-extracted text kept separate so security teams can make their own call). The joint OpenAI/Anthropic/Google study from October 2025 (Nasr et al.) confirmed that every published prompt-injection detector was bypassed above 90% under adaptive attack, meaning detectors are scoring signals, not gates. DeepTap layers rule-based pre-extraction stripping in Go (Unicode tag chars, zero-width chars, CSS-hidden text, HTML comments, aria/meta injection vectors, off-screen positioning tricks) before any ML model touches the content, then Prompt Guard 2 86M in the sidecar as a scoring layer, then explicit trusted/untrusted splits in the response so the security team downstream can apply its own policy.
POST /v1/search takes a depth field of 1 or 2. Depth 1 runs a single RunStage pipeline (Fanout -> per-URL fetch + Layer 1 strip + Trafilatura extract + Layer 2 score -> Rerank) with automatic query decomposition; the 95th-percentile latency target is under 7 seconds and the hard timeout is DEEPTAP_DEPTH1_TIMEOUT (default 7s). Depth 2 adds one reflection round against anthropic/claude-sonnet-4.6 with a JSON-schema structured output that asks "what is missing from these findings?", issues a targeted second stage on the gaps, merges the two rounds with a score-max dedup on normalized URL, and re-sorts by rerank_score; the 95th-percentile target is under 20 seconds and the hard timeout is DEEPTAP_DEPTH2_TIMEOUT (default 18s). Depth 3 is served on a separate streaming endpoint (see below) because a two-minute blocking request is unusable in an agent loop; POST /v1/search with depth=3 returns 400 use_research_endpoint pointing callers at POST /v1/research. The envelope carries depth, rounds_executed, stop_reason, and (when include_ledger=true) the facet ledger.
The deep-research tier is a Server-Sent Events stream, not a blocking request. POST /v1/research emits a documented sequence of events as each round of the facet-ledger-guided loop completes: round_start ({round, sub_queries[]}), partial_results ({round, results[]} after each stage rerank), facet_update ({round, facet, coverage} as each facet's coverage moves), reflection ({round, gaps[], stop, model, latency_ms} after each reflector call), saturation ({reason} when the loop decides to stop), final (the full envelope including ledger, rounds_executed, stop_reason), error (RFC 7807-shaped), and ping ({}) every DEEPTAP_SSE_HEARTBEAT (default 15s) as a keep-alive heartbeat. The loop runs up to DEEPTAP_DEPTH3_MAX_ROUNDS (default 4) rounds under DEEPTAP_DEPTH3_TIMEOUT (default 110s, SLO 120s p95) and stops on one of five precedence-ordered conditions: hard timeout with 2s grace; marginal-lift saturation (Saturated(DEEPTAP_SATURATION_DELTA default 0.05, consecutive=2)); reflector LLM-stop advisory; max rounds; reflector error. A single writer goroutine fed by a bounded channel owns the flusher so event ordering is deterministic. Client disconnect cancels the orchestrator context. The usage_ledger row (8.0 safe / 4.0 fast) is written AFTER the final event, so a mid-stream error never bills.
Depth 2 and depth 3 share a facet ledger defined in internal/depth/ledger.go. SeedFacets(query) walks a small keyword table to seed facet names: vs/versus -> comparative, history of/origin of/when was -> historical, what is/define/definition of -> definitional, current/latest/today/now -> current-status, no match -> general. Each sub-query is attributed to a facet at dispatch time (via the decomposer's new facets schema extension, or the seed fallback). After every stage the ledger accumulates each result's rerank_score against its attributed facet, normalizes by DefaultFacetSaturation=5.0, caps at 1.0 per facet, and averages across facets to produce overall Coverage(). The ledger exposes MarginalLift(consecutive) and Saturated(delta, consecutive) so the composer can decide when to stop on evidence-gathering plateau. When include_ledger=true is set on the request and server config allows it, the envelope carries the full {facets[{name, coverage, sub_queries[], rounds[], docs_scored}], rounds, coverage_history[]} JSON view so agent-builders can audit exactly which facets were researched, how thoroughly, and across which rounds.
Every POST /v1/search request that does not already carry a caller-supplied sub_queries array is routed through a hand-written stdlib OpenRouter chat-completions client that decomposes the user's question into N diverse sub-queries under JSON-schema structured output. The model slug is chosen by depth: depth=1 uses anthropic/claude-haiku-4.5 for cheap, fast decomposition; depth=2 and depth=3 use anthropic/claude-sonnet-4.6 for the harder research questions. The decomposer runs under a 10-second handler-level timeout. Responses carry a decomposition object with the resolved model, upstream inference provider, generation_id, sub_queries[], tokens_prompt, tokens_completion, cost_usd, and latency_ms so callers can audit and meter the LLM call per request. On any decomposition failure (timeout, breaker open, policy violation, parse error) the handler logs a warning, adds decomposition_failed to warnings[], and falls back to the single-query path so the request still returns a useful answer.
Every organization has an organizations.llm_policy JSONB column that the handler loads into a typed LLMPolicy on every request: require_zdr, require_data_collection_deny, providers[] allowlist, models_allowed[] allowlist, max_tokens clamp, optional temperature. When the organization's tier is secure, the loader unconditionally clamps require_zdr=true, require_data_collection_deny=true, and defaults the provider allowlist to ["anthropic"] if empty. The three controls are then materialised directly into the OpenRouter request body as provider.zdr=true, provider.data_collection="deny", and provider.order=<allowlist> so a Secure-tier query cannot route to a log-retaining inference provider even if OpenRouter's default would have picked one. Malformed policy JSON does not fail open; the loader errors, the handler logs it as policy_load_failed in warnings[], and the decomposer runs against a zero-value permissive baseline so the degraded request still makes progress.
A deterministic Go classifier at internal/freshness/ labels every query as volatile, daily, standard, or stable before it reaches the provider. The classifier applies NFC normalization and lowercasing, extracts 4-digit year tokens to decide historical vs. current-year signals, and walks a fixed sequence of keyword unions and structural patterns (price-or-score live, weather live, status live, breaking, is-still, future-event, current-role, stable-pattern). The result populates freshness_class and freshness_reason on every response, bumps the deeptap_freshness_class_total{class} Prometheus counter, and is read by the S11 caching layer to choose a cache TTL (volatile 5 minutes, daily 1 hour, standard 4 hours, stable 24 hours). No network call; pure Go regex evaluation.
S11 wires a four-tier cache behind internal/cache/: full-envelope, per-sub-query, per-URL extract, and a fact-cache hook reserved for S21. Keys use a shared v1: prefix plus a sha256 hash over NFC-normalized + lowercased + whitespace-collapsed inputs (with optional leading-article stripping behind DEEPTAP_CACHE_STRIP_LEADING_ARTICLES). FullKey composes the normalized query plus depth plus provider class plus country plus language plus safe mode so depth-1 safe and depth-2 fast for the same question never collide. TTLs come from DetermineTTL(class) with MinFullTTL=60s and MinExtractTTL=15m floors and a 1-hour sub-query cap. The depth orchestrator LookupFull at the top of every depth, calls NoopFactCache.Lookup on miss (S21 will replace the noop), then wraps the pipeline in a Singleflighter under DEEPTAP_CACHE_SF_TIMEOUT so concurrent identical requests collapse into one upstream run. ResearchStage inside each round batch-MGets the sub-query tier and writes back on success; the per-URL extract tier short-circuits the fetch and Trafilatura call on hit but still re-scores the cached text through the firewall so a model update defends against stale injection classifications. Envelopes carry cache_hit, cache_hit_type in {full, subquery, extraction, fact, miss}, and cache_keys_hit[]; /v1/research emits a new cache_hit SSE event when the full-tier cache short-circuits the loop before round 1. Billing is unchanged: a full-tier hit still writes 1.0 / 0.5 / 3.0 / 1.5 / 8.0 / 4.0 credits at the normal rate (customer value is the answer, not the path).
Every StoreFull also adds the FullKey to a byurl:{sha256} and a bydomain:{sha256} Redis SET for each result URL and domain the envelope referenced, inside a single Redis pipeline. invalidator.go opens a dedicated pgx.Conn, subscribes to deeptap_cache (single-key eviction) and deeptap_dmca_suppress (URL + domain fan-out), and on a suppress NOTIFY looks up the reverse-index sets and deletes every cache key named there on the local Redis. Every instance of DeepTap runs the same listener, so a single pg_notify publishes to the fleet and suppression lands in under a second end-to-end (tracked as deeptap_cache_invalidation_latency_seconds). The listener is wrapped in a panic-recover so a malformed payload cannot take down the subscription.
When any per-result prompt_injection_score is at or above UnsafeScoreThreshold, the real envelope is NOT stored. A 60-second tombstone lands at the same FullKey carrying an empty-results envelope with cache_hit=true, cache_hit_type="full", and the surfaced unsafe_reasons[]. Hot retries against genuinely adversarial pages hit the tombstone and return immediately instead of re-driving the pipeline. The tombstone expires naturally after 60 seconds; deeptap_cache_evictions_total tracks normal evictions but not tombstone expiries.
S12 wires a Generic Cell Rate Algorithm limiter at internal/ratelimit/ backed by a 40-line atomic Lua script embedded via go:embed. The script reads the bucket's Theoretical Arrival Time (TAT), advances it by rate_period_ms = 1000 / tier.CPS, compares against new_tat - burst * rate_period_ms, and either admits with the new TAT SET-ed under a burst * rate_period_ms * 2 TTL or denies with retry_after_ms and reset_ms. The compare-and-swap happens inside a single Redis script execution, so a 100-goroutine-vs-50-burst test admits exactly 50 and denies exactly 50. Script.Load runs at boot with a WARN-log-on-fail fallback (NOSCRIPT recovery inside Allow re-loads lazily); each Allow runs under a 50 ms per-op deadline with a NOSCRIPT retry once. The middleware is mounted AFTER APIKey and BEFORE the cache layer so a cache hit still counts against the budget. On admission it stamps X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset (Unix seconds) on every response, and stashes the admission *Decision + resolved Tier on context so /v1/search, /v1/extract, and /v1/research envelopes stamp a rate_limit: {limit, remaining, reset_ms} block from the same values. On denial the middleware writes a 429 with an application/problem+json body: type=https://deeptap.ai/errors/rate_limited, retry_after_ms, limits.rate{limit, remaining, reset_ms}, and a Retry-After header (ceiling of retry_after_ms / 1000, floored at 1 second). Tier table from specs/PROJECT-CONTEXT.md: Free 10 cps / 20 burst, Starter 50 / 100, Growth 200 / 400, Scale 500 / 1000, Secure 500 / 1000, Enterprise 1000 / 2000. Per-API-key overrides live on api_keys.rate_limit_override: a positive override replaces CPS and sets Burst = 2 * CPS regardless of DEEPTAP_RATELIMIT_BURST_2X. Redis error fails OPEN: the middleware admits the request, records the fail-open in deeptap_ratelimit_redis_errors_total, and labels the deeptap_ratelimit_requests_total{tier, outcome} counter with redis_error or redis_timeout. Four env vars: DEEPTAP_RATELIMIT_ENABLED (default true), DEEPTAP_RATELIMIT_BURST_2X (default true), DEEPTAP_CONCURRENCY_ENABLED (default true), DEEPTAP_CONCURRENCY_DEPTH3_OVERRIDE (default 0).
S12 also wires a separate per-organization in-flight counter at internal/middleware/concurrency.go. The Redis key is v1:conc:<org_id>:<bucket> with two buckets: depth3 for /v1/research (Free 2, Starter 5, Growth 10, Scale 25, Secure 25, Enterprise 50) and general for every other authenticated /v1/* endpoint (Free 5, Starter 20, Growth 50, Scale 100, Secure 100, Enterprise 200). A single depth=3 research request holds an HTTP connection for up to two minutes and issues multiple OpenRouter calls per round; a runaway agent firing 100 depth=3 requests in parallel can drain an OpenRouter credit pool in minutes. The depth3 bucket stops that before it starts. The middleware INCRs the counter on entry, compares against the tier limit (or DEEPTAP_CONCURRENCY_DEPTH3_OVERRIDE for depth=3 when that env var is positive), and atomically DECRs + returns 503 with type=https://deeptap.ai/errors/concurrency_exhausted + limits.concurrency{bucket, limit, in_flight} on saturation. The release runs inside a defer rdb.Decr(context.Background(), key) so a handler panic still returns the slot: Go's defer semantics run the block during stack unwind, and the Recoverer middleware above the chain catches the panic AFTER the DECR has already fired. A test mounts Concurrency downstream of Recoverer + a panicking handler, runs 50 concurrent requests, and asserts the Redis counter returns to zero. Redis error on INCR fails OPEN. The operational override lets on-call engineers tighten depth=3 during an incident without a code deploy: DEEPTAP_CONCURRENCY_DEPTH3_OVERRIDE=3 drops every tier's depth=3 bucket to 3 until the override is cleared.
Every extracted page feeds a Kafka-driven fact-extraction worker that produces structured claims (subject, predicate, object) with entity linking to Wikidata QIDs, NLI grounding via MiniCheck, and a per-claim confidence score. Facts are classified into five decay buckets: permanent (zero decay), slow_decay (e.g., historical dates, decay rate 0.001), moderate_decay (e.g., executive biographies, decay rate 0.005), fast_decay (e.g., quarterly financials, decay rate 0.02), and volatile (e.g., stock prices, decay rate 0.1). Effective confidence is base_confidence * exp(-decay_rate * days_since_confirmed). Queries that match a high-confidence cached fact return in under 15 milliseconds for 0.1 credits. This is not response caching; it is a structured, domain-level knowledge base that compounds across every customer's query volume.
Every fetched page is attributed to a domain. Every fact carries evidence from specific source pages with stance markers (supports, contradicts, neutral) and an NLI score. A nightly River batch job computes consensus_ratio = facts_confirmed_by_others / (facts_confirmed_by_others + facts_contradicted_by_others) per domain, assigns a trust tier (authoritative, reliable, mixed, unreliable, adversarial, unknown), and flags suspicious Autonomous System Number (ASN) clusters where three or more newly registered domains publish coordinated content from the same hosting provider. Reranking boosts authoritative domains and penalizes adversarial ones. Unknown domains receive no boost or penalty so the system does not punish legitimate new sites.
Customers upload PDFs, Microsoft Word documents, Markdown, HTML, Comma Separated Values files, and JavaScript Object Notation payloads through a drag-and-drop uploader, an Amazon Simple Storage Service sync connector, a Google Cloud Storage sync connector, or a direct API push. Documents are parsed by a separate document-parser sidecar (avoiding the Affero General Public License on PyMuPDF by using pypdfium2 plus pdfplumber for PDFs and python-docx plus mammoth for Word files), chunked, embedded via bge-small-en-v1.5, and stored with row-level security enforced by set_config('app.current_tenant', $1, true). Client-domain results appear inline with public-web results labeled [PRIVATE]. Customer A's documents are invisible to Customer B at the database tier.
Layer 1 (Go, pre-extraction, at internal/firewall/strip.go + patterns.go): strip 13 documented patterns before Trafilatura sees the HTML: unicode_tag_chars (U+E0000 through U+E007F), zero_width, bidi_override, html_comment, css_display_none, css_visibility_hidden, css_font_size_zero, css_opacity_zero, css_text_indent_offscreen, css_position_offscreen, meta_injection, aria_hidden (gated by DEEPTAP_STRIP_ARIA_HIDDEN), and script_style. Layer 2 (Python sidecar, post-extraction, via the ScoreInjection gRPC RPC at internal/firewall/sidecar_scorer.go): score every extracted document with meta-llama/Prompt-Guard-2-86M and return a numeric score plus heuristic reasons; a NoopScorer bootstrap fallback at noop_scorer.go returns zero so operators can ship without standing up the sidecar. Layer 3 (response mutator at internal/firewall/safe_mode.go): when the request carries safe_mode: "agent", null untrusted_content on every result at the HTTP boundary and stamp safe_mode_applied: "agent" on the envelope. The envelope surfaces prompt_injection_score, unsafe_reasons[], sanitized_content_bytes, untrusted_content_bytes, and a firewall block carrying layer1_stripped_bytes, layer1_patterns_matched[], layer2_model, layer2_implementation, layer2_latency_ms, layer2_docs_scored. Every per-result object carries sanitized_content, untrusted_content, trusted_snippet, prompt_injection_score, and unsafe_reasons. Security teams can audit what was flagged, which patterns fired, which model ran, how long it took, and what it scored. The research is explicit that detectors are signals, not gates (Nasr et al., October 2025, joint OpenAI/Anthropic/Google study: every published detector bypassed above 90% under adaptive attack); the security guarantee DeepTap ships is transparency. Tavily's firewall returns none of these fields; DeepTap tells you what it saw so your agent can decide what to trust.
No account required. An agent sends a request without authentication, receives a 402 Payment Required response with a payment challenge, settles a United States Dollar Coin (USDC) transaction on Base mainnet at address 0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913 (or Base Sepolia for testing at 0x036CbD53842c5426634e7929541eC2318f3dCF7e), and retries the request with the payment signature in the X-PAYMENT-SIGNATURE header. The Coinbase CDP facilitator verifies and settles. Merchant Payments Protocol (MPP) layers on top: its charge scheme is wire-compatible with x402 exact, and a Go server using coinbase/x402/go automatically accepts MPP charge traffic through the Authorization: Payment header. MPP sessions add DPoP-bound session credentials for streaming micropayments in long agent loops.
DeepTap ships deeptap-mcp, a binary that serves MCP over stdio for Claude Desktop and Claude Code and Cursor and over Streamable HyperText Transfer Protocol (HTTP) for remote authenticated agents. Three tools are exposed: deeptap_search, deeptap_extract, deeptap_facts. The deeptap_extract tool carries the same attestation gate as the REST handler: mode=full requires an attestation and is rejected at the MCP layer before any REST call runs. The binary accepts --transport stdio|http; the HTTP transport mounts at /mcp + /mcp/ with /healthz served on a separate path so probes do not hit the authenticated route, and requires an Mcp-Api-Key header enforced by APIKeyMiddleware. Server-Sent Events transport is deprecated in MCP specification 2025-11-25 and is rejected explicitly. Copy-paste configuration snippets live in docs/integrations/claude-desktop.md, docs/integrations/claude-code.md, and docs/integrations/cursor.md. On the A2A side, DeepTap uses a2a-go/v2 v2.0.1 against the v1.0 specification (breaking from v0.3: new .well-known/agent-card.json path, new TASK_STATE_* enums, new google.rpc.Status error shape) to publish an agent card and serve task create, task status, and task stream endpoints.
cmd/deeptap-tavily-shim/ listens on port 8082 and accepts Tavily-wire POST /search, POST /extract, and POST /map. A Tavily customer points their SDK's base URL at tavily.deeptap.ai (or the locally-deployed shim) and receives responses projected back into Tavily's exact field names and units: response_time as a decimal number, follow_up_questions always null, images passthrough when requested, and usage.credits on every response. Every field-level translation lands in an X-DeepTap-Compat-Notes response header so integration teams can audit exactly what was adapted. Callers that send X-DeepTap-Compat-Mode: strict receive 409 Conflict with a JSON body explaining which notes would have fired instead of a silently-translated 200 response. An embedded 255-entry country lookup resolves Tavily's country field for 195 ISO-3166 alpha-2 codes plus 60 common aliases (usa, uk). The shim calls the DeepTap REST API at DEEPTAP_INTERNAL_URL with DEEPTAP_SHIM_KEY as its bearer so the shim is accountable separately in usage_ledger.
POST /v1/map discovers every URL on a starting domain. A two-phase orchestrator at internal/mapsvc/ runs sitemap discovery first (reading robots.txt Sitemap: directives, then root sitemap.xml) and escalates to a bounded HTML crawl when sitemap yield is below DEEPTAP_MAP_HTML_MIN_YIELD=10. The response returns results[] plus a sources breakdown (robots_sitemap_urls, sitemap_urls, html_crawl_urls, pages_fetched) plus dropped plus truncated plus credits_used, so callers see exactly how each URL landed in the response. Configurable max_depth, max_breadth, limit, allow_external, include_subdomains, and exclude[] parameters bound the crawl cost precisely. Credit pricing matches search: 1.0 safe, 0.5 fast, one usage_ledger row per request.
The local-node release is a distroless Go binary plus Postgres 17 with pgvector 0.8.2 plus a lightweight Python embedding sidecar, packaged as Docker Compose and Helm. Mutual TLS with a per-customer self-signed Certificate Authority is provisioned during setup. Public facts synchronize from cloud to local node via HTTP with Hash-based Message Authentication Code-signed cursors and idempotency keys (around 10,000 facts per day). Customer embeddings and fact metadata synchronize local to cloud, opt-in and defaulted off for European Union customers. Dual-key Ed25519 envelope signing binds requests to a customer-specific key. Offline mode keeps local fact lookup, pgvector semantic search, and embedding operational; public search and language-model calls fail with an RFC 7807 Problem+JavaScript Object Notation error that the agent can reason about. Telemetry is pushed over OpenTelemetry Protocol on HyperText Transfer Protocol Secure port 443 only, because enterprise networks will not open the OpenTelemetry default port.
The managed-bootstrap deployment runs on Fly.io Machines with scale-to-zero, Neon Postgres with scale-to-zero, Upstash Redis pay-per-request, Upstash Kafka pay-per-message, ClickHouse Cloud serverless, Vercel for the dashboard, and Cloud Run for Trafilatura extraction. Inference is outsourced to Cohere Rerank and Voyage Embed behind the Go Reranker and Embedder interfaces. Cost at zero traffic is approximately 10 cents per month. Cost at 1,000 requests per day is approximately 38 US dollars per month. Gross margin from the first paying customer is 87% or higher. Transition to self-hosted sidecar happens when volume exceeds 50,000 requests per day, at which point self-hosting wins on unit economics.
DeepTap is Building. All 28 sessions are complete. Phase 1 (Core Search Platform, S01-S18), Phase 2 (Fact Cache, S19-S22), and Phase 3 (Knowledge Layer, S23-S28) all shipped: Foundation, Data Layer + Append-Only Ledger, Search Provider Adapters, Fan-Out + Dedup + URL Normalization, Go-Owned Fetch + Trafilatura Sidecar, Playwright Pool + Domain Strategy Cache, Query Decomposition + LLM Policy, Reranking + Embeddings via Python Sidecar, Prompt Injection Firewall, Depth Modes + Facet Ledger, Caching + Freshness TTLs, Rate Limiting + Concurrency, MCP Server + Tavily Shim, A2A + Payment Middleware with Idempotency, TrustPlane Integration with Fallback, Billing Engine, Postmark Email + DMCA + Dashboard UI, Fact Data Model + Extraction, Staleness Model + Re-Verification, Fact Cache Query Integration, Fact Cache Analytics + Warming, Semantic Source Index + pgvector, Consensus Trust Scoring, Rapid Fact API, Client Domain Indexing, Local Node MVP, and Knowledge Layer Analytics. Twenty-eight session specs are frozen, the bootstrap-hosting plan is frozen. Items in {braces} become real once the referenced session lands.
# Clone and boot the full dev stack
git clone https://github.com/RelayOne/deeptap
cd deeptap
make dev # default: boots deeptap, nlp-sidecar, playwright-pool, postgres,
# redis, clickhouse, prometheus, grafana, otel-collector
# (River is the default event bus, no Kafka required)
make dev-kafka # opt-in: same as make dev plus Redpanda (Kafka profile)
# sets DEEPTAP_EVENT_BUS=kafka
# Apply database migrations (requires golang-migrate installed via
# `go install -tags postgres github.com/golang-migrate/migrate/v4/cmd/migrate@latest`)
make migrate-up # applies all 12 migrations (extensions, orgs, api_keys,
# append-only usage_ledger with 24 monthly partitions pre-created,
# facts, fact_evidence, source_pages, domain_profiles,
# client_domains with row-level security, dmca_requests,
# accounts + journal_entries with deferred balanced-sum trigger,
# seed chart-of-accounts)
# Verify health
curl http://localhost:8080/v1/health # returns {"status": "ok"}
curl http://localhost:8080/v1/ready # default: returns {"status": "ready"} when
# postgres + redis + river probes pass
# kafka mode: also probes the Kafka broker
curl http://localhost:8080/metrics # Prometheus exposition format
# Run your first search (once Session 3 ships the Brave adapter)
curl -X POST http://localhost:8080/v1/search \
-H "Authorization: Bearer dt_live_xxx" \
-H "Content-Type: application/json" \
-d '{"query": "Tavily acquisition", "depth": 1}'Authentication modes at a glance (all five are mounted on the same /v1/* surface, the middleware stack branches on the header type):
Authorization: Bearer dt_live_xxxfor API-key customers billed through StripeX-PAYMENT-SIGNATURE: <base64 EIP-3009 payload>for x402 agent micropaymentsAuthorization: Payment <MPP charge token>for MPP charge traffic, backward-compatible with x402Authorization: Payment <DPoP-bound session token>for MPP streaming sessionsX-TrustPlane-Credential: <SPIFFE SVID>for portfolio and enterprise TrustPlane verification
┌─────────────────────────────────────────┐
│ CLIENTS │
│ SDK (TS/Py/Go) · MCP · A2A · x402/MPP │
└──────────────────┬──────────────────────┘
│
┌──────────────────▼──────────────────────┐
│ GO API SERVER (chi) │
│ auth · rate limit · firewall · billing │
│ depth orchestration · caching · ledger │
└──┬───────┬────────┬───────┬─────────────┘
│ │ │ │
┌────────▼──┐ ┌──▼────┐ ┌─▼────┐ ┌▼──────────┐
│ Brave API │ │Serper │ │OpenRtr│ │Python NLP │
│ (safe) │ │(fast) │ │(LLM) │ │Sidecar │
└────────────┘ └───────┘ └──────┘ │ gRPC:50051 │
│ 7 RPCs │
└────────────┘
│ │ │
┌────────▼────────────▼───────────▼─────────┐
│ POSTGRES │
│ usage_ledger · facts · source_pages │
│ domain_profiles · api_keys · orgs │
│ pgvector (384d halfvec HNSW) │
└────────────────────────────────────────────┘
│ │ │
┌────────▼──┐ ┌─────▼────┐ ┌───▼──────────┐
│ Redis │ │ Kafka │ │ ClickHouse │
│ cache+rate │ │ events │ │ analytics │
└───────────┘ └──────────┘ └──────────────┘
The Go API server is a pure Go binary with no C Foreign Function Interface (CGO), no embedded models, no native libraries. All machine-learning inference routes through the Python NLP sidecar via gRPC on localhost port 50051. The sidecar is a single Python 3.11 process that exposes seven remote procedure calls: Parse (Trafilatura 2.0+), BatchParse, Rerank, Embed, VerifyClaim, LinkEntities (server-streamed), ScoreInjection. ThreadPoolExecutor-backed gRPC server parallelism is real because ONNX Runtime and PyTorch both release the Global Interpreter Lock in their C++ kernels. Minimum sidecar instance size is c7i.2xlarge (16 gibibytes of random-access memory).
Postgres 17 with pgvector 0.8.2 is the canonical data store and the authoritative billing source. The usage_ledger table is append-only, partitioned by month, and protected by triggers that reject UPDATE and DELETE. ClickHouse handles analytics materialized views, never billing. Redis 7 serves the hot cache, the Generic Cell Rate Algorithm (GCRA) rate limiter via a Lua script, cross-instance cache invalidation pub/sub, and idempotency locks. Redpanda (Kafka-compatible) carries usage events in CloudEvents format with idempotent producer semantics. River (Postgres-backed job queue) handles scheduled jobs: fact re-verification, trust-score batches, Wikidata incremental sync, domain-profile aggregation.
For the full technical deep-dive including middleware ordering, data models, protocol details, and every non-obvious decision with its rationale, see docs/ARCHITECTURE.md.
| Operation | Safe (Brave) | Fast (Serper) |
|---|---|---|
| depth=1 search | 1 credit | 0.5 credit |
| depth=2 search | 3 credits | 1.5 credits |
| depth=3 search | 8 credits | 4 credits |
| fact lookup | 0.1 credit | 0.1 credit |
| extract excerpt | 0.5 credit | 0.5 credit |
| extract full (attested) | 2 credits | 2 credits |
| map | 1 credit | 0.5 credit |
| Tier | Monthly | Credits | Overage / credit | Calls per second | Concurrent |
|---|---|---|---|---|---|
| Free | $0 | 500 | n/a | 10 | 5 |
| Starter | $30 | 4,000 | $0.010 | 50 | 20 |
| Growth | $200 | 30,000 | $0.008 | 200 | 50 |
| Scale | $1,000 | 200,000 | $0.006 | 500 | 100 |
| Secure | from $2,500 | custom | $0.020 | custom | custom |
| Enterprise | custom | custom | custom | custom | custom |
| Operation | x402 price |
|---|---|
| depth=1 safe | $0.008 |
| depth=1 fast | $0.002 |
| depth=2 safe | $0.024 |
| depth=3 safe | $0.064 |
| fact lookup | $0.001 |
The Secure tier adds Bring Your Own Key language-model routing, region pinning, fact-cache opt-out (reads and writes), stricter Zero Data Retention defaults, and a dedicated Customer Success contact. Portfolio tier uses a 40%-of-retail wholesale rate card with monthly invoicing to the internal double-entry ledger; portfolio companies that resell DeepTap handle their own end-customer billing off-platform.
DeepTap is Building. All 28 sessions are complete on their build branches: Foundation, Data Layer + Append-Only Ledger, Search Provider Adapters, Fan-Out + Dedup + URL Normalization, Go-Owned Fetch + Trafilatura Sidecar, Playwright Pool + Domain Strategy Cache, Query Decomposition + LLM Policy, Reranking + Embeddings via Python Sidecar, Prompt Injection Firewall, Depth Modes + Facet Ledger, Caching + Freshness TTLs, Rate Limiting + Concurrency, MCP Server + Tavily Shim, A2A + Payment Middleware with Idempotency, TrustPlane Integration with Fallback, Billing Engine, Postmark Email + DMCA Compliance + Dashboard UI, Fact Data Model + Extraction Pipeline, Staleness Model + Re-Verification, Fact Cache Query Integration, Fact Cache Analytics + Warming, Semantic Source Index + pgvector, Consensus Trust Scoring, Rapid Fact API, Client Domain Indexing, Local Node MVP, and Knowledge Layer Analytics. All three product phases (Phase 1 Core Search Platform, Phase 2 Fact Cache, Phase 3 Knowledge Layer) are now complete. Twenty-eight session specs are written and frozen. Thirty-two research artifacts have been completed and integrated. Two adversarial review passes (one technical, one security and legal) have been incorporated. River is the default event bus per specs/ADDENDUM-river-default.md; Kafka is an opt-in implementation of the same EventBus interface selected by DEEPTAP_EVENT_BUS=kafka. The bootstrap-hosting plan is frozen.
- Scoping across the full 18 to 20 month roadmap
- Master specification at
/specs/WORK.md(1,257 lines) - Frozen Statement of Work at
/specs/deeptap-sow-combined.md(3,715 lines) - Bootstrap-hosting plan at
/specs/deeptap-bootstrap-hosting.md - Thirty-two research artifacts in
/specs/research/raw/ - All key technology versions pinned (Go 1.25+, pgvector 0.8.2, chi v5.2.5, pgx v5.9.1, stripe-go v85, grpc-go v1.80.0 with CVE-2026-33186 patched)
- Monorepo structure, Go workspace (
go.work) - chi router v5.2.5 with verified 7-layer middleware stack (RequestID, OTel, slog, Recoverer, Prometheus, CORS, Auth)
- Health endpoints (
/v1/health,/v1/ready,/metrics) - Prometheus metrics with route-pattern labels (
deeptap_http_requests_total,_request_duration_seconds,_requests_inflight,_panics_total) - OTel tracing with OTLP or stdout exporter
- Structured logging via
log/slog - 7-RPC NLP sidecar gRPC proto + Python stub server with Health service reporting SERVING
- Docker Compose dev stack
- GitHub Actions CI (lint, test-go with 80% coverage gate, test-sidecar, build, docker-smoke)
- Four Go binaries compile (
deeptap,deeptap-mcp,deeptap-cli,deeptap-tavily-shim) - Coverage above 80% on every foundation package (config 100%, health 100%, logging 100%, version 100%, middleware 98.9%, server 93.8%, tracing 90%)
- End-to-end verified: compose stack boots, curl
/v1/healthreturns 200,/metricsexposesdeeptap_http_requests_total, sidecar Health RPC returns SERVING
- 12
golang-migratemigrations: extensions (vector 0.8.2, pgcrypto, btree_gin), organizations, api_keys, usage_ledger, facts, fact_evidence, source_pages, domain_profiles, client_domains, dmca_requests, accounts and journal_entries, seed chart-of-accounts - Append-only
usage_ledgerpartitioned by month with 24 monthly partitions pre-created;BEFORE UPDATEandBEFORE DELETEtriggers reject modification at the database tier - Double-entry
accounts+journal_entriestables with aDEFERRABLE INITIALLY DEFERRED CONSTRAINT TRIGGERthat enforcesSUM(amount_cents) = 0pertxn_idat commit time - sqlc v1.30.0 generates typed Go code in
internal/db/deeptapdb/for every.sqlquery file; pgx v5 driver package - pgxpool wrapper (
internal/db/pgx.go) withMaxConns = max(4, runtime.NumCPU()),MinConns=2, 30s HealthCheckPeriod, 1h MaxConnLifetime, pgvector type registration onAfterConnect, and statement-mode switch (cache_statementby default,cache_describewhenDEEPTAP_POSTGRES_PGBOUNCER=true) - go-redis v9.18.0 client with 10 pool size, 2 min idle connections, 5s dial timeout, 500ms read/write timeouts, and a
Ping(ctx)probe internal/eventbus/package with a singlePublisher/Subscriberinterface implemented byRiverBus(default, Postgres-backed, supports transactionalPublishTx) andKafkaBus(opt-in, franz-go v1.20.7 + outbox)- LISTEN/NOTIFY cache-invalidation listener on a dedicated
pgx.Conn(not the pool) with exponential-backoff reconnect on thedeeptap_cachechannel /v1/readyruns Postgres, Redis, and the active event-bus probe in parallel viaerrgroup.WithContextbounded to a 3-second timeout; Kafka probe is gated behindDEEPTAP_EVENT_BUS=kafka- Docker Compose default profile drops Redpanda;
make devboots 9 services (deeptap, nlp-sidecar, playwright-pool, postgres pgvector/pgvector:0.8.2-pg17, redis 7-alpine, clickhouse, prometheus, grafana, otel-collector).make dev-kafkaactivates thekafkacompose profile and adds Redpanda as a tenth service - 12 integration tests across
test/integration/{db,redis,eventbus}_test.gousing testcontainers-go againstpgvector/pgvector:0.8.2-pg17,redis:7-alpine, and (for the Kafka profile)redpandadata/redpanda:latest; covers migrations apply, pgvector 0.8.2 present, append-only UPDATE/DELETE rejected, accounts seeded, journal balanced-sum trigger, Redis Ping/SetGet/PubSub, River Publish enqueues a job, River PublishTx rollback removes it, PublishTx commit persists it, empty event type rejected - End-to-end verification:
make migrate-upapplies all 12 migrations cleanly;/v1/healthreturns 200;/v1/readyreturns 200 with postgres + redis + river probes green;DEEPTAP_EVENT_BUS=kafkaboot path verified against thekafkaprofile
SearchProviderinterface plus a typedRegistryatinternal/search/that picks the safe adapter (Brave) forprovider_class=safeand the fast adapter (Serper) forprovider_class=fast; returnsErrProviderUnavailablewhen an adapter is not configured- Brave adapter hitting
https://api.search.brave.com/res/v1/web/searchwithX-Subscription-Token,Accept: application/json,Cache-Control: no-cache; gobreaker/v2 circuit breaker trips on 5 consecutive failures; 2 retries on 429/5xx with full-jitter backoff andRetry-Afterrespected - Serper adapter hitting
https://google.serper.dev/searchvia POST withX-API-KEY; gobreaker/v2 trips on 6 consecutive failures (distinct threshold from Brave); same retry policy - URL
NormalizeplusDedupeatinternal/search/normalize.go: lowercase scheme/host, strip default ports, collapse repeated slashes, alpha-sort query params, strip fragment, drop tracking params (utm_*,gclid,fbclid,mc_cid,mc_eid,msclkid,ref,ref_src), prefer https when an http/https pair collapses to the same(host, path, query) - API-key middleware at
internal/middleware/apikey.goreadsAuthorization: Bearer dt_live_*, SHA-256 hashes the full token, resolves the row via sqlcGetAPIKeyWithOrg, and attaches{orgID, apiKeyID, providerClassFromKey, providerClassFromOrg, providerAckAt}to the request context POST /v1/searchhandler withdepth=1only: validates body (query1..1024 chars,depth == 1, otherwise 400unsupported_depth), resolvesprovider_classper body -> key -> org, returns 403fast_provider_not_acknowledgedwhenfastis requested andorganizations.provider_ack_at IS NULL, returns 503provider_unavailablewhen the selected adapter has no configured API key- Credit pricing in the handler: depth=1 safe = 1.0 credit, depth=1 fast = 0.5 credit; both write an append-only row to
usage_ledgervialedger.Append(uniquerequest_idenforces idempotency at the database tier) - Integration tests at
test/integration/search_handler_test.gocovering the four business paths (safe happy path with Brave mock, fast-without-ack 403, fast-with-ack Serper at 0.5 credits withusage_ledgerrow asserted, missing auth 401); 14 unit-test packages green; live compose smoke verified for missing auth, unsupported depth, and missing Brave key paths peraudit/s03-e2e-verification-2026-04-22.md
POST /v1/searchaccepts asub_queriesarray (up toDEEPTAP_FANOUT_MAX_SUBS, default 8) alongside the primary query; empty or missing falls back to single-query behavioursearch.Fanoutruns sub-queries in parallel against the resolved provider adapter with a semaphore-bounded concurrency limit (DEEPTAP_FANOUT_MAX_INFLIGHT, default 4), a per-call timeout (DEEPTAP_FANOUT_PER_CALL_TIMEOUT, default 5s), and an overall request deadline (DEEPTAP_FANOUT_OVERALL_TIMEOUT, default 10s)search.Dedupemerges results across sub-queries using first-seen title and snippet, maximum observed score, and the existing URLNormalizecollapse (lowercase scheme/host, tracking-param strip, http/https collapse)- One
usage_ledgerrow per user request regardless of sub-query count; idempotency preserved via the existingrequest_idunique constraint - Partial failures (some sub-queries error or time out, at least one succeeds) surface as
warnings[]in the 200 response with the failing sub-query index and error class; full failure returns 504upstream_timeoutwhen the request deadline is exceeded and 502upstream_errorwhen every sub-query fails with a non-timeout error - Integration tests at
test/integration/search_fanout_test.gocover parallel fan-out happy path, dedup across overlapping sub-queries, per-call timeout surfacing as warning, overall-deadline 504, all-fail 502; unit tests acrossinternal/search/packages green
- Go-side fetch client at
internal/fetch/with User-AgentDeepTapBot/1.0 (+https://deeptap.ai/bot), redirect cap 5, size cap 10 MiB, and a content-type gate that admitstext/*,application/xhtml+xml, and theapplication/*xmlfamily - In-process
sync.Maprobots.txt cache with 1-hour positive TTL and 5-minute negative TTL; per-domain probe order walks AI-specific tokens first (DeepTapBot,GPTBot,ClaudeBot,Claude-SearchBot,anthropic-ai) before falling back to the*user-agent rules Extractorinterface atinternal/extract/with two implementations:SidecarExtractor(gRPC to the Python Trafilatura sidecar atDEEPTAP_SIDECAR_ADDR) andCloudRunExtractor(HTTPS to a Cloud Run Trafilatura function) behind a factory keyed onDEEPTAP_EXTRACTOR_BACKEND(sidecar,cloudrun, orauto)POST /v1/extracthandler fans out per URL withrobots -> fetch -> extract -> optional source_pages upsert, bounded byDEEPTAP_EXTRACT_MAX_INFLIGHT(default 4),DEEPTAP_EXTRACT_OVERALL_TIMEOUT(default 20s), andDEEPTAP_EXTRACT_MAX_URLS(default 10); per-URL failures surface aswarnings[]; all-robots-deny returns 422; all-timeout returns 504;X-DeepTap-Attestationheader required formode=full- Exactly one
usage_ledgerrow per request regardless of URL count; pricing is 0.5 credit per URL formode=excerptand 2.0 credits per URL formode=full;modedefaults toexcerpt - Python sidecar now ships
trafilatura==2.0.0and the realParseRPC is wired onservices/nlp-sidecar/ - Integration tests at
test/integration/extract_handler_test.gocover excerpt happy path, attestation gate, robots-deny 422, timeout 504, per-URL warning surfacing, and ledger-row accounting across both extractor backends
- Node.js Playwright pool service at
services/playwright-pool/(Fastify + Chromium viamcr.microsoft.com/playwright:v1.49.0-jammy) exposingPOST /render,GET /health, andGET /metrics; shared-secret auth via theX-Internal-Tokenheader rejects unauthenticated callers before any browser work - Context pool with configurable size (default 4); overflow requests return 503
pool_exhaustedinstead of queueing unboundedly; per-render timeout and body-size caps match the Go fetch client - Per-domain strategy cache in Postgres (
domain_strategiestable) with a rolling empty-rate counter over the lastDEEPTAP_STRATEGY_WINDOWsamples (default 50); flip-on to Tier 2 at 50% empties with a minimum of 5 samples (DEEPTAP_STRATEGY_MIN_SAMPLES), flip-off back to Tier 1 at 20%; empty defined as extracted content shorter thanDEEPTAP_STRATEGY_EMPTY_FLOOR_CHARS(default 200) /v1/extractTier 1/Tier 2 escalation wired in Go: Tier 1 runs the S05 fetch plus Trafilatura path (cheap); if the domain strategy says escalate (or Tier 1 returned an empty page), Tier 2 hits the Playwright pool for a rendered DOM and re-runs Trafilatura on the post-render HTML; successful Tier 2 extractions surfacejs_renderedinwarnings[]; pool-down or pool-timeout surfacesplaywright_unavailableand falls back to the Tier 1 result- Bootstrap mode (
DEEPTAP_MODE=bootstrap) forcesPlaywrightEnabled=falseat config load; no Tier 2 ever runs in bootstrap, the domain strategy cache records samples but never escalates, and the Cloud Run extractor path handles every URL - New env vars:
DEEPTAP_PLAYWRIGHT_POOL_URL,DEEPTAP_PLAYWRIGHT_SHARED_SECRET(REQUIRED in production),DEEPTAP_PLAYWRIGHT_TIMEOUT_MS,DEEPTAP_PLAYWRIGHT_ENABLED,DEEPTAP_STRATEGY_EMPTY_FLOOR_CHARS,DEEPTAP_STRATEGY_FLIP_ON,DEEPTAP_STRATEGY_FLIP_OFF,DEEPTAP_STRATEGY_MIN_SAMPLES,DEEPTAP_STRATEGY_WINDOW - Unit tests green across
internal/extract/escalation,internal/strategy/flip thresholds, and the Node pool handlers; the full end-to-end integration test (TASK-13) requires a Playwright testcontainer and is deferred
- Hand-written stdlib OpenRouter chat-completions client at
internal/llm/openrouter/withAuthorization: Bearer, optionalHTTP-RefererandX-Titleattribution headers, poolednet/http.Transport,gobreaker/v2circuit breaker (trips on 5 consecutive failures or 60% failure ratio over 20 requests), exponential retry with jitter on 408, 429, 500, 502, 503, 504 viaAPIError.Retryable(), andX-Generation-Idresponse-header propagation for audit correlation - Startup
GET /api/v1/keyping wired incmd/deeptap/main.go: 200 logsopenrouter key verifiedand boots; 401 fails fast withopenrouter key rejected (401); misconfiguration, refusing to boot; 402 logs a warning and continues (LLM-dependent paths returndecomposition_faileduntil credits are topped up); 5xx or network error logs a warning and continues (the circuit breaker handles the next real call); missingDEEPTAP_OPENROUTER_API_KEYdisables the decomposer and every/v1/searchrequest runs the single-query path - Freshness classifier at
internal/freshness/producing four buckets (volatile,daily,standard,stable) via NFC normalization, 4-digit year extraction (historical vs. current/next-year signal), a volatile keyword union, structural patterns (price-or-score live, weather live, status live, breaking, is-still, future-event, current-role, stable-pattern), a daily keyword union, and astandarddefault; returns(Class, reason)where reason is a short machine-readable label - Per-organization LLM policy loader at
internal/policy/readingorganizations.llm_policyJSONB into a typedLLMPolicy(require_zdr,require_data_collection_deny,providers[],models_allowed[],max_tokens, optionaltemperature); Secure-SKU clamp whenorganizations.tier == "secure"unconditionally setsrequire_zdr=true,require_data_collection_deny=true, and defaultsprovidersto["anthropic"]when empty; malformed JSONB does not fail open (surfacespolicy_load_failedinwarnings[]and continues with the permissive baseline) OpenRouterDecomposeratinternal/decompose/with JSON-schema structured output ({subqueries: [{query, priority}]}), NFC-lowered dedup on trimmed query string, priority-descending stable sort, and truncation toDEEPTAP_DECOMPOSE_SUBQUERIES_D1(default 2) for depth=1 orDEEPTAP_DECOMPOSE_SUBQUERIES_D23(default 6) for depth=2/3- Model picker at
internal/decompose/picker.go: depth=1 prefersDEEPTAP_DECOMPOSE_MODEL_DEPTH1(defaultanthropic/claude-haiku-4.5); depth=2/3 prefersDEEPTAP_DECOMPOSE_MODEL_DEPTH23(defaultanthropic/claude-sonnet-4.6); falls back to the firstanthropic/*slug inpolicy.ModelsAllowedwhen the preferred slug is disallowed; returnsErrLLMPolicyViolationwhen no allowlisted anthropic model exists - Secure-SKU ZDR triple materialises in the OpenRouter request body as
provider.zdr=true,provider.data_collection="deny", andprovider.order=<allowlist>whenever the loaded policy requires any one of them; the three controls travel together, not separately /v1/searchhandler atinternal/api/search.gowires freshness classification (always runs, never fails), policy load (failure ->policy_load_failedin warnings, continue on zero-value policy), decomposition under a 10-second timeout (failure ->decomposition_failedin warnings, fall back to single-query path), and sub-query fan-out viasearch.Fanout; caller-suppliedsub_queriesalways wins over decomposer output; depth=2 and depth=3 are rejected 400unsupported_depthuntilDEEPTAP_ENABLE_DEPTH_GT1=trueunlocks them in S10- Response envelope extended with
freshness_class,freshness_reason, and nesteddecomposition {model, provider, sub_queries, tokens_prompt, tokens_completion, cost_usd, latency_ms, generation_id};decompositionis omitted on the single-query path or when decomposition fails; depth-based credit multipliers (depth=2 = 3x, depth=3 = 7x) wired in the handler, gated until S10 activates deeper depths - Prometheus metrics:
deeptap_decompose_requests_total{model, outcome}where outcome is one ofok,invalid_json,retry_succeeded,retry_failed,upstream_error,timeout,policy_violation;deeptap_decompose_duration_seconds{model}histogram;deeptap_decompose_subquery_count{depth}histogram;deeptap_decompose_tokens_total{model, kind}counter for prompt and completion tokens;deeptap_freshness_class_total{class}counter - OTEL span
llm.openrouter.chat_completionwithSpanKindClientand attributesllm.model,llm.provider,llm.prompt_tokens,llm.completion_tokens,llm.zdr,llm.latency_ms,llm.generation_id; errors recorded on the span viaRecordError+SetStatus(codes.Error) - New env vars:
DEEPTAP_OPENROUTER_API_KEY(required for decomposition),DEEPTAP_OPENROUTER_BASE_URL(defaulthttps://openrouter.ai/api/v1),DEEPTAP_OPENROUTER_REFERER+DEEPTAP_OPENROUTER_TITLE(optional attribution headers),DEEPTAP_OPENROUTER_TIMEOUT(default30s),DEEPTAP_DECOMPOSE_MODEL_DEPTH1(defaultanthropic/claude-haiku-4.5),DEEPTAP_DECOMPOSE_MODEL_DEPTH23(defaultanthropic/claude-sonnet-4.6),DEEPTAP_DECOMPOSE_SUBQUERIES_D1(default 2),DEEPTAP_DECOMPOSE_SUBQUERIES_D23(default 6),DEEPTAP_ENABLE_DEPTH_GT1(defaultfalse)
- Python sidecar
services/nlp-sidecar/rerank.pyloadsms-marco-MiniLM-L6-v2INT8 ONNX and serves the gRPCRerankRPC as a cross-encoder that scores every(query, document)pair;services/nlp-sidecar/embed.pyloadsbge-small-en-v1.5ONNX and serves the gRPCEmbedRPC producing L2-normalized 384-dim float32 vectors; both are mounted onserver.py, and a missing model at boot logs the absence and mounts a stub that returnsUNIMPLEMENTEDat RPC time rather than crashing the sidecar - Go
internal/rerank/package with aRerankerinterface (Rerank,Healthz),SidecarRerankergRPC adapter againstDEEPTAP_SIDECAR_ADDR,CohereRerankerHTTPSPOST /v2/rerankadapter againstDEEPTAP_COHERE_API_KEYwith one retry on 429 (honoringRetry-After) and async.Oncebootstrap-warn on first use, and a mode-keyed factory that picks sidecar in production and Cohere in bootstrap; both adapters are wrapped ingobreaker/v2withMaxRequests=3,Interval=60s,Timeout=30s,ReadyToTripat 5 consecutive failures - Go
internal/embed/package with anEmbedderinterface (Embed,Healthz) exposingModeQueryandModePassage,SidecarEmbeddergRPC adapter,VoyageEmbedderHTTPSPOST /v1/embeddingsadapter againstDEEPTAP_VOYAGE_API_KEYthat sendsoutput_dimension=384on every request and errors when the response dimensionality is not exactly 384, and a mode-keyed factory POST /v1/searchrerank step betweensearch.Dedupemerge and response write: caps atDEEPTAP_RERANK_MAX_DOCS(default 30), truncates snippets toDEEPTAP_RERANK_MAX_TEXT_CHARS(default 1024), runs underDEEPTAP_RERANK_TIMEOUT(default 1s), reorders results to the reranker's score order, stamps each surviving result with itsrerank_score, and attaches areranker {model, implementation, latency_ms, docs_scored, error}block to the response envelope; disable/nil/error skips silently and still returns 200POST /v1/extractembed step after each successful per-URL extraction: runs underDEEPTAP_EMBED_TIMEOUT(default 500ms), writes the 384-dim pgvector intosource_pages.embeddingvia the sqlcUpdateSourcePageEmbeddingquery, and is non-fatal on error- sqlc query
UpdateSourcePageEmbeddingplus a sqlc.yamloverridesentry mapping thevectorcolumn type togithub.com/pgvector/pgvector-go.Vectorsopgvector.NewVector([]float32{...})round-trips cleanly through pgx - Prometheus metrics at
internal/metrics/nlp.go:deeptap_rerank_requests_total{implementation, outcome},deeptap_rerank_duration_seconds{implementation},deeptap_rerank_docs_scored{implementation},deeptap_rerank_failures_total{implementation, outcome}, and the matchingdeeptap_embed_{requests,duration,failures}_totalfamily /v1/readyprobes addreranker.Healthzandembedder.Healthzunder a 1-second timeout each in production mode; bootstrap mode relies on HTTPS reachability at call time insteaddocker-composepinsnlp-sidecartomem_limit: 1gand adds a grpc-health healthcheck;deeptapservice now declaresdepends_on: nlp-sidecar: {condition: service_healthy}so the Go API will not start until the sidecar passes its healthcheck- New env vars:
DEEPTAP_ENABLE_RERANK(defaulttrue),DEEPTAP_ENABLE_EMBED(defaulttrue),DEEPTAP_RERANK_TIMEOUT(default1s),DEEPTAP_EMBED_TIMEOUT(default500ms),DEEPTAP_RERANK_MAX_DOCS(default30),DEEPTAP_RERANK_MAX_TEXT_CHARS(default1024),DEEPTAP_COHERE_API_KEY(required in bootstrap mode when rerank is enabled),DEEPTAP_VOYAGE_API_KEY(required in bootstrap mode when embed is enabled); bootstrap-modeLoad()rejects missing keys when the matching feature is enabled
- Go
internal/firewall/package with five load-bearing files:strip.go+patterns.goimplement the Layer 1 pre-extraction HTML stripper matching 13 documented patterns (unicode_tag_chars,zero_width,bidi_override,html_comment,css_display_none,css_visibility_hidden,css_font_size_zero,css_opacity_zero,css_text_indent_offscreen,css_position_offscreen,meta_injection,aria_hiddengated byDEEPTAP_STRIP_ARIA_HIDDEN,script_style);scorer.godefines theScorerinterface;sidecar_scorer.goimplementsSidecarScoreragainst theScoreInjectiongRPC RPC underDEEPTAP_SCORE_INJECTION_TIMEOUT(default 500ms) with the text truncated toDEEPTAP_SCORE_INJECTION_MAX_CHARS(default 4096);noop_scorer.goimplements the bootstrap-modeNoopScorerreturning(0.0, nil);factory.gowires a mode-keyed factory pickingSidecarScorerin production andNoopScorerin bootstrap;safe_mode.goimplements the Layer 3SafeModeOff | SafeModeAgentresponse mutator that nullsuntrusted_contenton every result at the HTTP boundary when agent mode is selected and stampssafe_mode_appliedon the envelope - Python sidecar
services/nlp-sidecar/score_injection.pyloads meta-llama/Prompt-Guard-2-86M when the Meta-gated weights are present (baked into the image at build time via theHF_TOKENDockerfile build arg or downloaded locally viaservices/nlp-sidecar/scripts/download-models.sh), emits heuristic reasons alongside the score, and mounts a stub that returnsUNIMPLEMENTEDat RPC time when weights are absent so a missing model is not a boot crash POST /v1/extractpipeline is now fetch -> Strip -> Extract -> Score -> persist; the sqlcUpdateSourcePageInjectionquery writesinjection_scoreand the reasons list tosource_pageson every extract (non-fatal on error, same pattern asUpdateSourcePageEmbedding)- Response envelope extended with
prompt_injection_score(max across results),unsafe_reasons[](deduped),sanitized_content_bytes,untrusted_content_bytes, afirewallblock withlayer1_stripped_bytes,layer1_patterns_matched[],layer2_model,layer2_implementation,layer2_latency_ms,layer2_docs_scored, andsafe_mode_applied; every per-result object carriessanitized_content,untrusted_content,trusted_snippet,prompt_injection_score, andunsafe_reasons - Request body accepts
safe_mode: "off" | "agent"; default pulled fromDEEPTAP_SAFE_MODE_DEFAULT(ships atoff);agentmode nullsuntrusted_contenton every result at the HTTP boundary /v1/searchfirewall hook is a documented no-op until S10 wires extraction into depth=1 search; today provider snippets from Brave and Serper are not attacker-controlled through our pipeline so Layer 1 and Layer 2 would have nothing to do- Prometheus
internal/metrics/firewall.goregistersdeeptap_firewall_l1_strips_total{pattern},deeptap_firewall_l1_stripped_bytes,deeptap_firewall_l2_requests_total{implementation, outcome},deeptap_firewall_l2_duration_seconds{implementation},deeptap_firewall_l2_score_bucket{implementation}, anddeeptap_firewall_unsafe_pages_total{provider_class}with the unsafe threshold pulled fromDEEPTAP_UNSAFE_SCORE_THRESHOLD(default 0.7) /v1/readyprobes addScorer.Healthzunder a 1-second timeout in production modeservices/nlp-sidecar/DockerfileacceptsHF_TOKENas a build argument so CI and local builds can fetch the Meta-gated Prompt-Guard-2-86M weights;services/nlp-sidecar/scripts/download-models.shis the shared helper for local development- New env vars:
DEEPTAP_ENABLE_FIREWALL_L1(defaulttrue),DEEPTAP_ENABLE_FIREWALL_L2(defaulttrue),DEEPTAP_SCORE_INJECTION_TIMEOUT(default500ms),DEEPTAP_SCORE_INJECTION_MAX_CHARS(default4096),DEEPTAP_STRIP_ARIA_HIDDEN(defaultfalse),DEEPTAP_SAFE_MODE_DEFAULT(defaultoff),DEEPTAP_UNSAFE_SCORE_THRESHOLD(default0.7)
- Go
internal/depth/package with three composers (depth1.go,depth2.go,depth3.go) built on oneComposerstruct defined indepth.go, plus shared primitivesstage.go(RunStage= Fanout -> per-URL fetch + Layer 1 strip + Trafilatura extract + Layer 2 score -> Rerank),ledger.go(facet ledger withSeedFacetsheuristic, coverage accumulation, marginal-lift saturation), andreflect.go(OpenRouterReflectoragainst anthropic/claude-sonnet-4.6 with JSON-schema structured output, retry-once on malformed JSON, top-15 findings at 400 chars each capped atDEEPTAP_REFLECTION_INPUT_MAX_CHARSdefault 12000, Secure-SKU ZDR triple in the request body) Composer.RunDepth1runs one stage underDEEPTAP_DEPTH1_TIMEOUT(default 7s, SLO 7s p95); stampsdepth=1,rounds_executed=1,stop_reason="ok"Composer.RunDepth2runs round 1, reflects, dedupes proposals against the ledger, runs round 2, merges with score-max dedup + re-rerank, underDEEPTAP_DEPTH2_TIMEOUT(default 18s, SLO 20s p95); stop reasonsok,llm_stop,reflector_error,saturation_deduped,timeout,depth1_fallbackComposer.RunDepth3loops up toDEEPTAP_DEPTH3_MAX_ROUNDS(default 4) rounds underDEEPTAP_DEPTH3_TIMEOUT(default 110s, SLO 120s p95) with an SSEOnEventcallback; stops on hard timeout with 2s grace,Saturated(DEEPTAP_SATURATION_DELTA default 0.05, consecutive=2), LLM-stop advisory,max_rounds, orreflector_errorin that precedence- Facet ledger
SeedFacetsheuristic:vs|versus->comparative,history of|origin of|when was->historical,what is|define|definition of->definitional,current|latest|today|now->current-status, no match ->general; per-facet coverage normalizes each result'srerank_scoreagainstDefaultFacetSaturation=5.0and caps at 1.0; overall coverage is the unweighted mean;Contains,MarginalLift,Saturated,Exportare the public readers /v1/searchdispatches depth=1 and depth=2 through the composer; depth=3 returns 400use_research_endpointpointing callers at/v1/research; envelope addsdepth,rounds_executed,stop_reason, and (wheninclude_ledger=trueandIncludeLedgerAllowed) theledgerblock;decompositionstays as-is from S07POST /v1/researchatinternal/api/research.gois a POSTtext/event-streamendpoint with a single writer goroutine fed by a bounded channel plus a separate 15-second heartbeat goroutine (DEEPTAP_SSE_HEARTBEAT), emittinground_start,partial_results,facet_update,reflection,saturation,final,error, andpingevents; client-disconnect cancels the orchestrator context; usage_ledger row (8.0 safe / 4.0 fast) is written AFTER thefinalevent- Decomposer schema extension: optional
facetsarray per sub-query exported viaSubQueryFacets map[string]string; the reflector and ledger share facet attribution through this field - Prometheus metrics at
internal/metrics/depth.go:deeptap_depth_rounds_total{depth},deeptap_depth_saturation_total{reason},deeptap_depth_duration_seconds{depth}(SLO-aligned buckets at 7s / 20s / 120s),deeptap_reflection_requests_total{outcome},deeptap_facet_coverage_average{depth},deeptap_sse_events_total{event} - OTEL spans
depth.round(per round),depth.reflect(per reflection),depth.rerank(per rerank pass) with attributesdepth,round,sub_query_count,docs_scored,coverage,plan.new_sub_queries,plan.stop,model,latency_ms DEEPTAP_ENABLE_DEPTH_GT1default flipped totrue;config.Load()derivesDepthGT1Disabled=trueat startup whenDEEPTAP_MODE=bootstrapANDDEEPTAP_OPENROUTER_API_KEY==""so depth>=2 falls back to depth=1 with thereflection_unavailable_bootstrapwarning; this is NOT a startup error- New env vars:
DEEPTAP_DEPTH_MAX_URLS_PER_ROUND(default 10),DEEPTAP_DEPTH_MAX_INFLIGHT_EXTRACT(4),DEEPTAP_DEPTH_MAX_FANOUT(4),DEEPTAP_DEPTH1_TIMEOUT(7s),DEEPTAP_DEPTH2_TIMEOUT(18s),DEEPTAP_DEPTH3_TIMEOUT(110s),DEEPTAP_DEPTH3_MAX_ROUNDS(4),DEEPTAP_SATURATION_DELTA(0.05),DEEPTAP_REFLECTION_INPUT_MAX_CHARS(12000),DEEPTAP_REFLECTION_TIMEOUT(10s),DEEPTAP_SSE_HEARTBEAT(15s)
- Go
internal/cache/package ships nine load-bearing files alongside the existing S02redis.goandinvalidation.go(eleven files total underinternal/cache/):keys.godefinesFullKey,SubQueryKey,ExtractKey, andFactKeywith av1:version prefix plus sha256 hashes;normalize.goruns NFC + lowercase + whitespace collapse + optional leading-article strip;manager.gois the tieredManagerwith msgpack v5 encoding,byurl:{sha256}andbydomain:{sha256}reverse indices written through one Redis pipeline per store, and corrupt-key cleanup that deletes any payload that fails to decode;ttl.goexportsDetermineTTLmappingvolatile=5m,daily=1h,standard=4h,stable=24hwithMinFullTTL=60sandMinExtractTTL=15mfloors and a 1-hour sub-query cap;singleflight.gocoalesces concurrent identical requests underDEEPTAP_CACHE_SF_TIMEOUT;invalidator.goopens a dedicatedpgx.Conn, issuesLISTEN deeptap_cache+LISTEN deeptap_dmca_suppress, decodes JSON payloads, and runs handlers under a panic-recover;fact_noop.godefinesFactCache+NoopFactCache;tombstone.gowrites the 60-second negative-cache tombstone on adversarial pages;metrics_sink.goemits thedeeptap_cache_*Prometheus family - Depth orchestrator integration:
depth1.go/depth2.go/depth3.gocomputeFullKeyfrom (normalized query, depth, provider_class, country, language, safe_mode), callLookupFullat the top, callNoopFactCache.Lookupafter full-miss, singleflight-wrap the pipeline on miss, and callStoreFullon success with the TTL derived from the freshness class;DepthResultnow carriesCacheHit,CacheHitType,CacheKeysHit ResearchStage.RunStagedoes a per-sub-query batchMGetand only calls the provider for misses; writes back per success viaStoreSubQuery; per-URL extract caching usesLookupExtractto skip fetch + Trafilatura on hit (the firewall still re-scores the cached text so a model update defends against stale injection classifications) andStoreExtracton miss-then-success; a successful extract cache hit emits theextract_cache_hitwarning/v1/extracthandler integrates the same per-URL extract cache and emits the same warning;/v1/searchenvelope,/v1/researchfinalevent, and a newcache_hitSSE event all carrycache_hit,cache_hit_typein{full, subquery, extraction, fact, miss}, andcache_keys_hit[]- Negative-cache tombstone: when any per-result
prompt_injection_scoreis at or aboveUnsafeScoreThreshold, the envelope is NOT stored; a 60-second tombstone at the sameFullKeyreturns a shaped empty-results envelope withcache_hit=true,cache_hit_type="full", andunsafe_reasonspopulated so hot retries on adversarial pages never reach the upstream provider - Cross-instance DMCA invalidation:
pg_notify('deeptap_dmca_suppress', '{"type":"suppress","url":"...","domain":"..."}')fans out to every DeepTap instance, looks up thebyurl:{sha256}andbydomain:{sha256}reverse-index sets, and deletes every cache key that referenced the suppressed URL or domain within a 1-second SLO (tracked asdeeptap_cache_invalidation_latency_seconds). Single-key eviction rides thedeeptap_cachechannel with{"type":"invalidate_key","key":"..."} - Production-mode and bootstrap-mode
/v1/readyadd a 100-millisecond RedisPINGcache probe that returns 503 on failure - Prometheus
internal/metrics/cache.goregistersdeeptap_cache_requests_total{tier, outcome}(tier:full|subquery|extract|fact; outcome:hit|miss|error|bypass),deeptap_cache_lookup_duration_seconds{tier},deeptap_cache_store_bytes{tier},deeptap_cache_evictions_total{reason}(reasons:dmca_url|dmca_domain|key|ttl|size_cap),deeptap_cache_singleflight_share_total,deeptap_cache_full_hit_latency_seconds,deeptap_cache_invalidation_latency_seconds - Billing unchanged: cache hits write the full
usage_ledgercredit cost (research artifact 25, Tavily parity) because the customer value is the answer, not the path - Redis 7 required for
EXPIRE ... NX|XX|GT|LToption flags used byStoreFull - New env vars:
DEEPTAP_CACHE_ENABLED(defaulttrue),DEEPTAP_CACHE_FULL_TIER(true),DEEPTAP_CACHE_SUBQUERY_TIER(true),DEEPTAP_CACHE_EXTRACT_TIER(true),DEEPTAP_CACHE_MIN_FULL_TTL(60s),DEEPTAP_CACHE_MIN_EXTRACT_TTL(15m),DEEPTAP_CACHE_MAX_VALUE_BYTES(262144),DEEPTAP_CACHE_STRIP_LEADING_ARTICLES(false),DEEPTAP_CACHE_SF_TIMEOUT(30s)
- Go
internal/ratelimit/package with three load-bearing files:lua/gcra.luais a 40-line atomic Generic Cell Rate Algorithm Lua script embedded viago:embed;gcra.gowraps it inredis.NewScript(stable SHA1 across callers), preloads viaScript.Loadat boot with a WARN-log-on-fail fallback, runs eachAllowunder a 50 ms per-op timeout with a NOSCRIPT retry once, and fails OPEN on Redis error by returningDecision{Allowed: true}alongside a non-nil error; aMetricsSinkinterface keeps the package free of a metrics-package import;tier.goholds the canonicalTiersmap andResolveLimit(orgTier, rateLimitOverride)that falls back to Free on unknown tiers and lets a positive per-keyapi_keys.rate_limit_overridereplace CPS withBurst = 2 * CPS - Tier table (from
specs/PROJECT-CONTEXT.md): Free 10 cps / 20 burst / 5 general / 2 depth3; Starter 50 / 100 / 20 / 5; Growth 200 / 400 / 50 / 10; Scale 500 / 1000 / 100 / 25; Secure 500 / 1000 / 100 / 25; Enterprise 1000 / 2000 / 200 / 50 - Go
internal/middleware/ratelimit.gomounts the GCRA middleware AFTERAPIKeyand BEFORE the cache layer so a cache hit still counts against the per-key budget; stampsX-RateLimit-Limit,X-RateLimit-Remaining,X-RateLimit-Reseton every admission; renders a 429 RFC 7807application/problem+jsonbody withtype=https://deeptap.ai/errors/rate_limited, aRetry-Afterheader (ceiling ofretry_after_ms / 1000, floored at 1), and alimits.rateblock on denial; stashes the admission*Decision+ resolvedTieron the request context viaWithRateLimitDecisionso/v1/search,/v1/extract, and/v1/researchenvelopes stamp arate_limit: {limit, remaining, reset_ms}block from the same numbers used on the headers - Go
internal/middleware/concurrency.goimplements the per-org in-flight counter keyed onv1:conc:<org_id>:<bucket>:/v1/researchmaps to thedepth3bucket, every other/v1/*endpoint maps togeneral;INCRon entry with an atomicDECRrejection on saturation and adefer rdb.Decr(context.Background(), key)release so a handler panic still returns the slot (Recoverer above the chain catches the panic; Go's defer runs during the unwind); 503 RFC 7807 withtype=https://deeptap.ai/errors/concurrency_exhausted+limits.concurrency{bucket, limit, in_flight}on saturation; Redis error onINCRfails OPEN - Go
internal/auth/ctx.gointroducesAuthContext+WithAuthContext/AuthContextFromas a thin context-access shim so the ratelimit middleware does not have to importinternal/middleware(the apikey middleware is still the authoritative producer of org / key identifiers) - Go
internal/api/errors.goadds theProblemRFC 7807 struct,WriteProblemhelper, andErrTypeRateLimit+ErrTypeConcExhaustedconstants reused by both middlewares (the string literals are duplicated insideinternal/middleware/to avoid anapi<-middleware<-apiimport cycle) cmd/deeptap/main.gobootsratelimit.NewLimiter(redisClient, logger).WithMetrics(rateLimitMetrics), runsLimiter.LoadScript(ctx)with a WARN log on failure, and the chi chain order is nowRequestID -> OTEL -> slog -> Recoverer -> Prom -> CORS -> APIKey -> RateLimit -> Concurrency -> handler- Prometheus
internal/metrics/ratelimit.goregistersdeeptap_ratelimit_requests_total{tier, outcome}(outcomesallowed|denied|redis_error|redis_timeout),deeptap_ratelimit_decision_duration_seconds,deeptap_ratelimit_retry_after_ms,deeptap_concurrency_inflight{bucket}gauge,deeptap_concurrency_rejections_total{bucket}, anddeeptap_ratelimit_redis_errors_total - 100-goroutine-vs-50-burst concurrency test against a real Redis container verifies the Lua compare-and-swap atomicity (exactly 50 admissions, exactly 50 denials); a separate test covers the panic-safe DECR path by mounting Concurrency downstream of Recoverer and a handler that panics and asserting the Redis counter returns to zero
- Fail-OPEN policy is deliberate: rate limiting is an SLA lever, not a correctness gate; a Redis outage must not cause an API outage
- New env vars:
DEEPTAP_RATELIMIT_ENABLED(defaulttrue),DEEPTAP_RATELIMIT_BURST_2X(defaulttrue),DEEPTAP_CONCURRENCY_ENABLED(defaulttrue),DEEPTAP_CONCURRENCY_DEPTH3_OVERRIDE(default0= use tier)
POST /v1/mapendpoint wired at the DeepTap API viainternal/mapsvc/: a two-phase orchestrator (Phase 1 readsrobots.txtSitemap:directives then falls back to<scheme>://<host>/sitemap.xml; Phase 2 escalates to a bounded HTML crawl when sitemap yield is belowDEEPTAP_MAP_HTML_MIN_YIELD=10) plus post-processing that unions + normalizes + dedupes + filters (allow_external,include_subdomains,exclude[]) + truncates toDEEPTAP_MAP_LIMIT=1000; response carriesresults[],sourcesbreakdown (robots_sitemap_urls,sitemap_urls,html_crawl_urls,pages_fetched),dropped,truncated,credits_used; oneusage_ledgerrow per request at 1.0 credit safe / 0.5 fastcmd/deeptap-mcp/binary serves the Model Context Protocol with--transport stdio|http(SSE transport is rejected explicitly per MCP 2025-11-25); HTTP mounts the MCP handler at/mcp+/mcp/with/healthzserved separately and requires anMcp-Api-Keyheader enforced byAPIKeyMiddlewarebefore the MCP handler runs;internal/mcp/shipstypes.go(jsonschema-taggedSearchIn/Out,ExtractIn/Out,FactsIn/Out),client.go(APIClient.doJSON+RESTError),handlers.go(three tool handlers with attestation gate ondeeptap_extractmode=full),server.go(NewServer),middleware.go(APIKeyMiddleware), andserver_test.go(in-memory transport coverage)cmd/deeptap-tavily-shim/binary listens on:8082for Tavily-wirePOST /search+POST /extract+POST /map;internal/shim/(translate_search.go,translate_extract.go,translate_map.go,translate_response.go,handler.go,country.gowith embedded 255-entry ISO-3166 + alias lookup) translates Tavily requests into DeepTap and projects responses back into Tavily's exact wire shape (decimalresponse_time, nullfollow_up_questions,imagespassthrough,usage.credits);X-DeepTap-Compat-Notesresponse header lists every field-level translation; callers that sendX-DeepTap-Compat-Mode: strictreceive409 Conflictwhen any compat note would have fired; the shim callsDEEPTAP_INTERNAL_URLwithDEEPTAP_SHIM_KEYas its own bearer- Three new integration docs under
docs/integrations/:claude-desktop.md(macOS/Linux direct binary, Windows double-backslash paths, remote viamcp-remote),claude-code.md(claude mcp addCLI with local/user/project scopes and Streamable HTTP),cursor.md(.cursor/mcp.jsonfor stdio and remotestreamable-http, with warning on Cursor's 40-tool global cap) migrations/0014_map_jobs.sqlscaffolds an async/v1/mapmode for a future session; the current endpoint is synchronous and returns partial results withtruncated=trueon timeout rather than a 504- New env vars:
DEEPTAP_MAP_MAX_DEPTH(default2),DEEPTAP_MAP_MAX_BREADTH(default200),DEEPTAP_MAP_LIMIT(default1000),DEEPTAP_MAP_HTML_MAX_BYTES(default2 MiB),DEEPTAP_MAP_SITEMAP_MAX_BYTES(default50 MiB),DEEPTAP_MAP_TIMEOUT(default20s),DEEPTAP_MAP_HTML_CONCURRENCY(default8),DEEPTAP_MAP_HTML_MIN_YIELD(default10) - Prometheus metrics:
deeptap_map_sitemap_phase_seconds,deeptap_map_html_phase_seconds,deeptap_map_total_seconds,deeptap_map_urls_discovered_total{source},deeptap_map_pages_fetched_total
- Agent-to-Agent Protocol v1.0 server at
internal/a2a/withBuildAgentCardwiring four skills (web-search,fact-lookup,url-extract,site-map), four security schemes (bearer,dpop,x402,mpp), and two transports (JSON-RPC, HTTP-plus-JSON);AgentExecutormapsTextPartto search,DataPart.urlsto extract,DataPart.subjectto facts - x402 pay-per-call at
internal/payments/x402.gowith a 402 challenge emitting Base64url v2 JSON plusWWW-Authenticate: Paymentwith an HMAC-SHA-256 id plus canonical USDC addresses for Base mainnet and Sepolia;VerifyAndSettleorchestrates a facilitator call against/v2/x402/verify+/v2/x402/settle - Merchant Payments Protocol at
internal/payments/mpp_charge.go+mpp_session.go:Authorization: Paymentparser with temporary / Stripe-SPT / Lightning dispatch; MPP sessions use 32-byte opaque access tokens stored as SHA-256 hash inmpp_sessions.access_token_hashwith monotonic cumulative accounting - DPoP with
go-dpop v1.1.2: ES256 / RS256 allow-list enforced pre and post parse, Redis nonce rotation viadpop:nonce:{jkt}under a 5-minute TTL, jti replay guard viadpop:jti:{jkt}:{jti}SET NX, 60-second clock-skew tolerance internal/middleware/payment_dispatch.go7-branch dispatcher (Bearer > x402 > MPP > DPoP > 402) mounted BEFOREAPIKeyon/v1whencfg.X402Enabled || cfg.MPPEnabledinternal/middleware/idempotency.gowith SHA-256 body hash under a 1 MiB cap, RedisSET NXlock onidemp:lock:{scope}:{key}, replay-cached 2xx bytes carryingIdempotency-Replayed: true, 409 on body-hash mismatch, 409 on in-flight collision, fail-open on Redis unreachable with a degraded counter; panic-safe lock release via deferred RedisDEL- 4 new migrations:
0015_payment_attemptswith 12 seeded monthly partitions + 60-day retention;0016_mpp_sessionswith DPoP-thumbprint + SHA-256 hashed access-token binding;0017_dpop_noncesappend-only audit;0018_a2a_tasksJSONB history + artifacts
- SPIFFE X.509-SVID auth path with live-preferred + local-fallback verification via
github.com/spiffe/go-spiffe/v2 v2.6.0; 3 Postgres migrations (0019 trustplane_bundle_cache,0020 trustplane_verificationsmonthly-partitioned,0021 portfolio_accountswith portfolio-revenue seed); 10 TrustPlane env vars with fail-fast boot validation whenTRUSTPLANE_ENABLED=true internal/identity/spiffeid.goURI-SAN extractor + trust-domain allow-list;internal/identity/tp_client.godedicatedhttp.Transportwith hard-deny vs. network-error distinction for local fallthrough;internal/identity/bundler.go+metrics.goperiodic spiffebundle fetch + Postgres + Redis persistence + 7 Prometheus instruments labeledtrust_domainonly to avoid SPIFFE-ID cardinality explosioninternal/identity/trustplane.goVerifierwith live-preferred + local-fallback, deny short-circuit, stale-reject / warn,x509svid.Verifyoffline, async audit writer with bounded channel and drop counterinternal/billing/portfolio.go#PortfolioLedger.PostToLedgerbalanced double-entry posting with banker-rounded markup andspiffe_id > catch-allaccount lookup; payment dispatcher TrustPlane branch runs BEFORE Bearer (401 deny, 503 stale / no_bundle, passes through when no client certificate present)internal/identity/tls.goClientCAProvider+BuildTLSConfigwithVerifyClientCertIfGiven+GetConfigForClientmemoized at a 10-second minimum;.well-known/health/metricsremain mTLS-optionalinternal/identity/admin.gochi-mountable admin router at/portfolio/accounts,/trustplane/bundle{,/refresh}, base64 bundle export;deeptap-cli trustplane {verify,bundle-refresh,bundle-status}subcommands; Caddy-basedtrustplane-mockcompose service on port 8089 (dev-default OFF)
- 30-task branch on
build/S16-billing-engine. Four independent billing surfaces, one unified ledger feed.go.modbumps toolchain to Go 1.26.1 + addsgithub.com/stripe/stripe-go/v85+github.com/johnfercher/maroto/v2 v2.4.0 - 7 Postgres migrations:
0022_stripe_customerswith tier + dunning counters,0023_stripe_meters,0024_stripe_credit_grants,0025_stripe_meter_events_outboxwith UUID identifier + partial index onsent_at IS NULL,0026_stripe_webhookswithevent_iddedup,0027_portfolio_invoiceswithUNIQUE(account_id, period_start, period_end),0028_reconciliation_reportswithNUMERIC(6,4) variance_pct internal/billing/package:client.gofacade withDryRunkill andEnabled()check;meters.goidempotentEnsureMeterswithMeterCreatorinjection;metrics.go10 Prometheus instruments (tier / event_name / outcome / endpoint labels only, never customer_id);outbox.gotransactionalEnqueueMeterEventwith UUID v7 Stripe-dedup identifier + purevalidateEnqueueArgshelper;flusher.goFlushOncedrainsFOR UPDATE SKIP LOCKEDin batches of 100 viaV2BillingMeterEventStreams.Create, updatessent_atbefore commit (prefers duplicate delivery over double-charge);subscriptions.goUpsertSubscriptionForOrgwith tier +SubscriptionItemtranslation +proration_behavior=create_prorations+LoadRateCardJSON;creditgrants.goCreateCreditGrant+SyncCreditGrantFromWebhook(price_type=metered applicability, category=paid)internal/billing/webhooks.gosignature verify viawebhook.ConstructEvent+ON CONFLICT DO NOTHINGdedup + panic recovery + per-type dispatch fans out towebhooks_invoice.go(created / finalized / paid / failed with dunning counter at threshold 3),webhooks_subscription.go(sync subscription_id without flipping tier),webhooks_creditgrant.go(shadow table upsert),webhooks_meter_error.go(log + counter without marking outbox sent); empty-secret hard-fail is deliberate (operator misconfiguration, not a dev fallback)internal/billing/portfolio.go+portfolio_monthend.gomonth-end aggregator;pdf.gomaroto/v2.4.0A4 portrait renderer with header + customer block + line-item table + totals + footer, byte-stable across re-renders;reconcile.goClassify3-bucket drift detector (clean under 0.1 percent, variance 0.1 to 5 percent, error above 5 percent) +SumStripeMeterEventsiterator across every active(stripe_customer_id, meter_id)pair +RunPeriodReconcileend-to-endinternal/billing/jobs.go+ per-accountMonthEndWorkerinportfolio.go: 4 River workers (FlushMeterEventsWorker,PortfolioMonthEndWorkerfan-out,ReconcileStripeWorkerwith caller-injectedStripeTotalerclosure,WebhookReplayWorker);internal/jobs/schedules.goPeriodicJobsat 60-second flush, 24-hour reconcile at 04:00 UTC viaDailyAtUTC, monthly portfolio run at 03:00 UTC day-one viaMonthlyAtUTC, 5-minute webhook replaycmd/deeptap/main.gowires the billing outbox hook into/v1/search+/v1/extract+/v1/maphandlers post-ledger-commit; mountsPOST /v1/billing/webhooks/stripeat top-levelr(outside/v1so no auth middleware consumes raw body); constructsBillingClient+Metrics+PortalCreatorunconditionally so webhook verification and zero-state metrics work even withBillingEnabled=false; River client stops BEFOREhttp.Server.Shutdownso in-flight billing jobs commit cleanly- Handler hooks skip the TrustPlane path. When
PaymentDispatchModeTrustPlanerides on the request context, the outbox enqueue fast-exits because portfolio customers are billed via the internal double-entry ledger and their traffic never touches Stripe. TrustPlane detection uses the payment-dispatch mode flag set by the S15 dispatcher branch; the handlers do not inspectapi_key_iddirectly cmd/deeptap-cli/billing.goaddsbilling reconcile / portfolio-invoice / outbox-statussubcommands;config/rate-card.jsonseed with three tiers (Starter $19 / 1,000 credits, Growth $99 / 10,000 credits, Scale $499 / 100,000 credits) and four meters each with placeholderprice_idfields the operator fills post-EnsureMeters- 8 new env vars documented under
docs/DEPLOYMENT.md:DEEPTAP_STRIPE_SECRET_KEY,DEEPTAP_STRIPE_WEBHOOK_SECRET,DEEPTAP_STRIPE_CLIMATE_ENABLED,DEEPTAP_DASHBOARD_URL,DEEPTAP_BILLING_RATE_CARD_PATH,DEEPTAP_PORTFOLIO_INVOICE_FROM_EMAIL,DEEPTAP_PORTFOLIO_INVOICE_LOGO_PATH,DEEPTAP_BILLING_DRY_RUN
- Postmark transactional email delivery on the primary
send.deeptap.aidomain plus separatedmca.deeptap.aireputation domain via Amazon Simple Email Service for the takedown workflow - DMCA intake at
POST /v1/dmca/reportwith sworn-statement validation, ticket-creation flow, and cross-instance cache suppression that lands inside one second via the existingdeeptap_dmca_suppressLISTEN/NOTIFY channel - Counter-notice state machine (received -> actioned -> counter_notice -> resolved) covering the DMCA 512(g) 10-to-14-business-day window
- Next.js 15 dashboard shell with seven routes (
/overview,/usage,/billing,/apikeys,/domains,/facts,/settings), session-cookie middleware, four recharts time-series views, fact-cache analytics panel reading the S19-onward metrics, and a Mintlify docs site with 18 MDX pages plus a single-source-of-truth trust-report renderer
internal/facts/package with the full extraction pipeline: 40 predicate aliases, five-tier classification cascade, NFKC normalization, ROUGE-L recall gate, atomic per-day budget consumer, OpenRouter JSON-schema extractor, sidecarVerifyClaimandLinkEntitiescallers, three-tier dedup (exact, trigram, insert), conflict flagger, and a Kafka-driven per-page workercmd/deeptap-fact-worker/binary with graceful shutdown and Prometheus on port 9091- Four migrations land the dual trigram GIN index + partial unique canonical constraint + audit run table + per-day budget counter
- Pure-function decay model
EffectiveConfidence(base, rate, lastConfirmed, now) = base * exp(-rate * days_since_confirmed)withNeedsReverificationandInOpportunisticBandpredicates - River-backed scheduler with three queues (
reverify_priority10 workers,reverify_scan1,maintenance1) and two periodic jobs (hourly scan, monthly partition creator) cmd/deeptap-scheduler/standalone binary plus a/internal/riverUI gated byX-DeepTap-Internalheader- Contradiction-resolution state machine with
MinConfirmingDomains=2default, fact-supersession audit log, and opportunistic re-verification hook for the band[threshold, threshold + 0.10)
POST /v1/facts/queryexposes the fact cache with subject or subject_qid lookups, optional predicate,min_confidencefloor,include_evidencetoggle, andmax_resultscap (default 10, hard cap 50)- Redis read-through cache
fact:q:v1:<hash>with snappy-compressed msgpack payloads; TTL driven by the fastest-decaying tier in the result (permanent 24h, slow 12h, moderate 1h, fast 10m, volatile 2m) - One
usage_ledgerrow at 0.1 credit per request irrespective of hit or miss; conflict-flagged facts surface withconflict_flag=true; superseded facts never returned - Depth=1 search pipeline gains a top-of-
RunDepth1fact probe viaComposer.Proberthat short-circuits decomposition + search + extract + rerank on a confident fact-cache hit; billing becomes 0.1 credit instead of 1.0
- Four always-on feed workers under
internal/feeds/(wikidata,cve,ietf,edgar) scheduled by River frominternal/jobs/feed_workers.goon independent cadences (Wikidata daily delta + monthly seed dump, CVE/NVD daily, IETF weekly, SEC EDGAR hourly during US trading); workers persist resume state infeed_ingestion_state(last-run summary) andfeed_cursor(opaque per-feed resume blob) so a worker restart picks up where the previous run stopped - Demand-triggered feed routing keyed off a
feed_registrytable withtopic_patternglob matching againstsubject:predicate; cache-miss queries that match a pattern queue a synchronousEnqueueDemandFeedRiver job behind a single 500-credit-per-day shared budget so a hot pattern cannot starve quiet ones - New
deeptap.factsKafka topic produced by the S19 fact-extraction worker, the S21 fact-query handler (on opportunistic-reverify writes), and every S22 feed worker; CloudEvents-shaped envelope per new, confirmed, or contradicted fact with a per-event UUID dedup identifier so worker restarts cannot double-count downstream deploy/clickhouse/schemas/022_fact_events.sqldefines the consumer side: a Kafka engine table ondeeptap.factsplus four materialized views that drive the dashboard's Fact Cache tab (mv_fact_hit_raterolling 5-minute hit-vs-miss ratio,mv_facts_by_type5-tier decay-class histogram,mv_staleness_distributiondays_since_confirmedhistogram,mv_conflict_raterolling proportion ofconflict_flag=truefacts); ClickHouse is read-only from the dashboard's perspective and never feeds back into the Postgresfactsrow- Tiny upstream-shape fixtures land at
test/fixtures/feeds/{wikidata-tiny.ndjson, cve-sample.json, rfc-index-tiny.txt, edgar-sample.atom}with a schema README; build-tagged integration test skeletons underintegration_feedsandintegration_clickhousereserve test names + paths and skip cleanly until a Postgres + ClickHouse + Kafka testcontainer stack is wired ingo.mod - Phase 2 (Fact Cache, S19-S22) is now complete
- Phase 3 (Knowledge Layer, S23-S28) is now complete; all 28 sessions of the build have shipped
- No session currently active. Phases 1, 2, and 3 are all complete; the platform is in v1 release-candidate posture pending the Phase 4 hardening surfaces.
S28 is the read-only analytics surface over the Knowledge Layer. Three new Postgres materialized views (mv_source_index_stats, mv_entity_coverage, mv_topic_coverage) refreshed daily at 06:00 UTC by a River-managed worker. New public endpoint GET /v1/trust/domain returns the full domain trust profile (tier, consensus ratio, fact counters, ASN metadata, sample evidence count, last-updated timestamp) at 0.05 credit per call with a 5-minute Redis cache. Three new dashboard handlers (source-index, entity-coverage, topic-coverage) read the matching MV with a 5-minute Redis cache and degrade gracefully on a backing-store outage. New internal/diversity package adds an Autonomous System Number tiebreaker the reranker uses to prefer cross-network confirmation over single-network confirmation at the same effective confidence. Three Prometheus instruments cover lookup outcome, refresh duration per view, and mean diversity per tier. OpenAPI 3.1 schema documents the public endpoint; full unit-test coverage on every Go-side surface. With S28 shipped, all 28 sessions of the eighteen-to-twenty-month build are complete.
- All 28 sessions are now Done. Phase 4 hardening surfaces (SOC 2 Type 2, additional protocol adapters, multi-region production topology) remain on the horizon.
- Phase 4 hardening specifics (Service Organization Control (SOC) 2 Type 2, additional protocol adapters, multi-region, long-term residency controls)
- Service Organization Control (SOC) 2 Type 2 certification
- Additional protocol adapters beyond MCP, A2A, x402, MPP, TrustPlane
- Multi-region production topology beyond the bootstrap iad (us-east) region
- Additional language-model provider integrations beyond the initial OpenRouter default
- Stripe Connect reseller integration (explicitly cut from version 1)
- Voice-agent-grade latency tier with warm Fly.io machines
| Document | Audience | What You Will Find |
|---|---|---|
| How It Works | Product and engineering | User journeys for developers, agents, and enterprise; the full search flow, /v1/map orchestrator, MCP server, Tavily shim, fact-cache flow, firewall flow, billing flow, error behavior |
| Architecture | Developers | Tech stack with pinned versions, repository map, system topology, data flow, persistence layer, machine-learning inference, protocols, middleware stack, authentication, billing, non-obvious decisions with rationale |
| Feature Map | Product and stakeholders | Every feature grouped by domain with user-facing benefit, session that delivers it, and status |
| Deployment | DevOps and deployers | Account prerequisites, environment variables, local development setup, bootstrap deployment, production migration triggers, database migrations, proto regeneration, local-node deployment, observability, DMCA domain setup, secrets rotation |
| Business Value | Marketing, investors, executives | The opportunity, the problem, the solution, why us and why now, target customers, how we make money, what is different, traction plan, team, status |
| Claude Desktop Integration | Claude Desktop users | Copy-paste claude_desktop_config.json snippets for macOS/Linux direct binary, Windows double-backslash paths, and remote via mcp-remote |
| Claude Code Integration | Claude Code users | claude mcp add CLI snippets for stdio (local/user/project scopes) and Streamable HTTP |
| Cursor Integration | Cursor users | .cursor/mcp.json snippets for stdio and streamable-http; warning on Cursor's 40-tool global cap |
DeepTap is developed in public by RelayOne. Contribution guidelines are available via the repository at github.com/RelayOne/deeptap. The monorepo uses a Go workspace (go.work), sqlc for typed database queries, golang-migrate for migrations, and golangci-lint plus ruff for linting.
Branch naming follows session/S<NN>-<short-slug> for session-aligned work and fix/<short-slug> for bugfixes. All commits that touch documentation go in their own commit per the project rules. Every specification in /specs/ is considered frozen unless explicitly reopened.
License will be declared before Session 1 lands. The current plan is dual-licensed: source-available for the Go API and dashboard, permissive license for the SDKs (TypeScript on npm, Python on PyPI, Go as a module), commercial license for local-node binaries. This will be finalized in the Session 1 commit.
Last updated: 2026-04-24 (S22; Phase 2 complete)