Skip to content

feat(connectors): add generic HTTP sink connector#2925

Open
mlevkov wants to merge 23 commits intoapache:masterfrom
mlevkov:feat/http-sink-connector
Open

feat(connectors): add generic HTTP sink connector#2925
mlevkov wants to merge 23 commits intoapache:masterfrom
mlevkov:feat/http-sink-connector

Conversation

@mlevkov
Copy link

@mlevkov mlevkov commented Mar 12, 2026

Summary

Adds a generic HTTP sink connector that delivers consumed Iggy stream messages to any HTTP endpoint — webhooks, Lambda functions, REST APIs, or SaaS integrations.

  • 4 batch modes: individual (one request per message), ndjson (newline-delimited), json_array (single array), raw (bytes)
  • Exponential backoff retry with configurable delay, multiplier, max delay, and Retry-After header support
  • Flexible metadata: optional Iggy envelope (stream/topic/offset/timestamp), checksum, origin timestamp
  • Connection pooling: reqwest client with TCP keep-alive (30s), pool idle timeout (90s), configurable max connections
  • Health checks: opt-in startup probe (HEAD/GET/OPTIONS) with graceful degradation
  • TLS: optional danger_accept_invalid_certs for dev environments
  • Custom headers: arbitrary HTTP headers for auth tokens, API keys, routing

Files Added (4,233 lines)

File Lines Purpose
sinks/http_sink/src/lib.rs 2,062 Core implementation — types, config, Sink trait, retry, batch modes, 60 unit tests
sinks/http_sink/README.md 810 Usage guide, config reference, runtime model, deployment patterns, message flow
sinks/http_sink/Cargo.toml 48 Crate manifest
sinks/http_sink/config.toml 90 Example connector configuration
fixtures/http/container.rs 238 Docker test container with WireMock
fixtures/http/sink.rs 216 Test fixtures (single-topic, multi-topic, batch mode variants)
http/http_sink.rs 662 7 integration tests (delivery, metadata, batch modes, multi-topic)
Other fixtures/config 64 Module files, WireMock mappings, test config

Architecture

Iggy Stream → [Runtime polls topic] → consume(messages) → [batch mode] → HTTP endpoint
                                            │
                                    ┌───────┴────────┐
                                    │  individual    │ → 1 request per message
                                    │  ndjson        │ → all messages, newline-delimited
                                    │  json_array    │ → all messages, JSON array
                                    │  raw           │ → 1 request per message, raw bytes
                                    └────────────────┘

Code Review History

4 rounds of automated review with 4 specialized agents each:

Round Agents Findings Fixed
Round 1 code-reviewer, silent-failure-hunter, comment-analyzer, code-simplifier 12 12
Round 2 Same 4 agents (follow-up) 7 7
Round 3 Same 4 agents (post-feature additions) 17 15 (2 deferred)
Round 4 Same 4 agents (double-review follow-up) 6 6
Total 42 40

Key fixes across rounds:

  • Error accounting correctness (errors_count + messages_delivered = total for all code paths)
  • Status code validation (200-599 range, rejects 1xx informational codes)
  • Overflow protection in retry delay computation
  • Non-UTF-8 Retry-After header warning
  • Unused dependency removal (dashmap, once_cell — re-exported by SDK)
  • Shared send_batch_body() helper eliminating duplication
  • README: function name references (not line numbers), accurate retry math, runtime model docs

Deferred (tracked in issues)

  • L4: Structured error type enum replacing string-based Error::HttpRequestFailed#2927
  • D1: Expose internal metrics (errors_count, retries_count) via runtime health API — #2928

Test Plan

  • 60 unit tests covering config parsing, validation, serialization, retry logic, batch modes, edge cases
  • 7 integration tests with WireMock in Docker:
    • single_json_message_delivered_with_metadata — basic delivery + envelope verification
    • metadata_fields_respect_config — include_checksum, include_origin_timestamp toggles
    • ndjson_batch_delivers_all_messages — NDJSON batch mode
    • json_array_batch_delivers_all_messages — JSON array batch mode
    • individual_mode_sends_separate_requests — per-message delivery
    • raw_mode_delivers_bytes — raw byte passthrough
    • multi_topic_messages_delivered_with_correct_topic_metadata — 2 topics, metadata accuracy
  • cargo clippy -p iggy_connector_http_sink -- -D warnings — 0 warnings
  • cargo clippy -p integration -- -D warnings — 0 warnings

Related

🤖 Generated with Claude Code

mlevkov and others added 8 commits March 12, 2026 13:21
…nk impl

Add generic HTTP sink connector for delivering consumed messages to any
HTTP endpoint (webhooks, REST APIs, serverless functions). This commit
establishes the crate structure, config types, and stub trait implementation.

- HttpMethod enum (Get, Head, Post, Put, Patch, Delete) with Default=Post
- BatchMode enum (Individual, Ndjson, JsonArray, Raw) with Default=Individual
- HttpSinkConfig with 20 fields covering retry, TLS, batching, metadata
- HttpSink struct with Option<Client> (built in open(), not new())
- Stub Sink trait impl (open/consume/close) with TODO markers for Commit 2
- Document runtime consume() Result discard (upstream sink.rs:585 bug)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Full implementation of the HTTP sink connector's Sink trait:

open(): Build reqwest::Client from config (timeout, TLS, pool size),
  optional health check with configurable HTTP method.

consume(): Four batch modes — individual (partial delivery on failure),
  ndjson (newline-delimited), json_array (single array), raw (bytes).
  Metadata envelope wrapping with UUID-formatted u128 IDs, base64 for
  binary payloads (Raw/Proto/FlatBuffer). Configurable success status
  codes, checksum and origin timestamp inclusion.

Retry: Exponential backoff with configurable multiplier and cap.
  Transient errors (429/500/502/503/504) and network errors retry;
  non-transient errors fail immediately. Respects Retry-After header
  on HTTP 429.

close(): Log cumulative stats (requests, delivered, errors, retries).

Config resolution: All Option fields resolved to concrete values in
new() following MongoDB sink pattern. Duration strings parsed with
humantime. UTF-8-safe response truncation in logs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Addresses all findings from 4-agent code review:
- Cap Retry-After to max_retry_delay, use reqwest::header::RETRY_AFTER
- Health check uses configured success_status_codes, applies custom headers
- NDJSON trailing newline for spec compliance
- Skip-and-continue on per-message serialization failure (ndjson/json_array)
- MAX_CONSECUTIVE_FAILURES=3 threshold in individual/raw modes
- Direct simd_json→serde_json structural conversion (ported from ES sink)
- Verbose consume() log downgraded to debug level
- Explicit error on response body read failure
- Empty URL validation with Error::InitError
- UUID format documented as non-RFC-4122
- Contradictory config warnings (Raw+metadata, GET/HEAD+batch)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…CR round 2)

Round 2 double-review findings:
- CRITICAL: JSON array batch serialization error now logs batch size context
- HIGH: success_status_codes validated non-empty in open() (prevents retry storms)
- HIGH: Partial delivery logs separate HTTP failures vs serialization errors
- HIGH: saturating_sub prevents usize underflow in remaining-messages calc
- MEDIUM: Skip count logged on ndjson/json_array failure path (not just success)
- MEDIUM: payload_to_json documented as defensive (all current variants infallible)
- LOW: Raw/FlatBuffer match arms merged in payload_to_json

Deferred (documented, not bugs):
- Retry-After HTTP-date format (needs httpdate dependency, out of scope for v1)
- Payload::Proto raw mode semantic inconsistency (follows SDK try_into_vec behavior)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Example configuration with all plugin_config fields documented.
Follows the MongoDB/PostgreSQL sink config.toml pattern.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…known limitations

Follows MongoDB sink README structure: Try It, Quick Start, Configuration,
Batch Modes, Retry Strategy, Example Configs, Known Limitations.

Documents 3 deferred review findings and 2 runtime issues as known limitations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests cover:
- Config resolution (defaults, overrides, backoff clamp, invalid duration fallback)
- Duration parsing (valid strings, None fallback)
- HttpMethod serde (uppercase serialize/deserialize, invalid rejection)
- BatchMode serde (snake_case serialization)
- Content-type mapping for all 4 batch modes
- UUID formatting (zero, max, specific grouping)
- UTF-8-safe truncation (short, long, multibyte)
- Payload conversion (JSON, Text, Raw, FlatBuffer, Proto)
- Metadata envelope (with/without metadata, checksum, origin_timestamp)
- Retry delay computation (base, exponential backoff, max cap)
- Transient status classification (429/5xx vs 4xx)
- owned_value_to_serde_json (null, bool, int, f64, NaN, infinity, nested)
- TOML config deserialization (minimal, full, invalid method/batch_mode)
- open() validation (empty URL, invalid URL, empty success_status_codes, valid)

Adds toml as dev-dependency for config deserialization tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests:
- Add iggy_timestamp assertion to metadata envelope test
- Add negative assertions for absent checksum/origin_timestamp by default
- Strengthen multibyte truncation test with concrete expected value
- Add raw mode + include_metadata invariant test (47 tests total)

Docs:
- Fix README retry sequence (attempt 1 is retry_delay, not immediate)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
mlevkov and others added 15 commits March 12, 2026 14:34
Add 6 end-to-end integration tests covering all batch modes and metadata
behavior of the HTTP sink connector. Tests use WireMock container as a
programmable HTTP endpoint and verify received requests via admin API.

Tests:
- individual_json_messages_delivered_as_separate_posts
- ndjson_messages_delivered_as_single_request
- json_array_messages_delivered_as_single_request
- raw_binary_messages_delivered_without_envelope
- metadata_disabled_sends_bare_payload
- individual_messages_have_sequential_offsets

Fixture variants: Individual, NDJSON, JsonArray, Raw, NoMetadata
Following MongoDB sink integration test patterns.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CRITICAL fixes:
- C1: SSRF prevention — URL scheme validation (http/https only) in open()
- C2: Header validation — reject invalid header names/values at init, not per-request
- C3: O(1) retry clones — send_with_retry takes bytes::Bytes instead of Vec<u8>

HIGH fixes:
- H1: Content-Type deduplication — filter user-supplied Content-Type in request_builder()
- H3: Skipped message accounting — abort path now records skipped messages in errors_count

TEST fixes:
- T1: Content-Type assertions use expect() instead of silent if-let skip
- T2: Exact count assertions (==) instead of >= that masks over-delivery
- T3: Offset test checks contiguous ordering, not absolute base-0 assumption
- T4: New test for consume() before open() returns InitError

DOCS fixes:
- D1: Disambiguate sink.rs:585 → runtime/src/sink.rs:585
- D2: send_individual doc mentions MAX_CONSECUTIVE_FAILURES abort behavior

9 new unit tests (47 → 56), all passing, zero clippy warnings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
7 findings from 4-agent double-review:

R2-1 (HIGH): WireMockRequest::header() now actually case-insensitive per RFC 7230
R2-2 (HIGH): Offset test uses explicit unwrap_or_else instead of silent filter_map
R2-3 (MEDIUM): URL parse error now includes the actual parse error message
R2-4 (MEDIUM): Abort accounting uses saturating_sub + debug_assert for defensive safety
R2-5 (MEDIUM): open() warns when user Content-Type header will be overridden by batch_mode
R2-6 (MEDIUM): Batch modes (ndjson/json_array) now count all undelivered messages in errors_count
R2-7 (LOW): Content-Type test improved with set-based assertion and documented limitation

Deferred (pre-existing, not regressions):
- parse_duration silent fallback (requires SDK contract change)
- Runtime discards consume() errors (upstream issue apache#2927)
- Retry-After HTTP-date format (nice-to-have)
- NaN/Infinity to null (documented, matches ES sink)

56 unit tests passing, zero clippy warnings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…yment patterns

New sections:
- Use Cases: webhook delivery, REST API ingestion, serverless triggers,
  IoT relay, multi-service fan-out, observability pipeline
- Authentication: Bearer, API key, Basic auth, multi-header, limitations
  (no OAuth2 refresh, no SigV4, no mTLS)
- Deployment Patterns: single destination/multi-topic, multi-destination
  (one connector per destination), fan-out (same topic to multiple
  endpoints via separate consumer groups), Docker/container deployment,
  environment variable overrides for secrets
- Updated Known Limitations: added per-topic routing, OAuth2, env var
  expansion; linked upstream issues apache#2927 and apache#2928

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Configure reqwest client with tcp_keepalive(30s) and pool_idle_timeout(90s)
to detect dead connections behind cloud load balancers and clean up stale
idle connections. Add Performance Considerations section to README covering
batch mode selection, memory implications, connection pooling, and retry impact.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add connector_multi_topic_stream seed function that creates one stream with
two topics. Add HttpSinkMultiTopicFixture that subscribes to both topics via
the STREAMS_0_TOPICS env var. The test sends messages to each topic and verifies
all arrive at WireMock with correct iggy_topic metadata, demonstrating the
multi-topic single-connector deployment pattern.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Explain what "deploying multiple instances" means tactically — each instance
is a separate OS process with its own config directory, not a config option
within one process. Add a clear table showing which deployment patterns are
achievable today vs. not, and annotate each deployment pattern section with
its achievability status.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tails

Add links to runtime source code (sink.rs, sdk/src/sink.rs) explaining how
the connector runtime spawns one task per topic, uses DashMap for plugin
instance multiplexing, and calls consume() sequentially. Expand connection
pooling section with reqwest client sharing semantics, TCP keep-alive
rationale for cloud LB idle timeouts, and cross-process pool isolation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…structure

Clarify that the connector does not require any particular message structure
on input — it receives raw bytes from the Iggy runtime. The metadata envelope
is added by the sink on the way out, not expected on the way in. Includes
ASCII flow diagram, schema interpretation table, and guidance for publishing
existing structs in any serialization format.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lidation, docs

Address 15 findings from 4-agent code review (CR round 3):

CRITICAL:
- C1: Add errors_count for payload-size-exceeded in ndjson/json_array batch modes

HIGH:
- H1: Remove HTTP-sink-specific constants from shared harness (seeds.rs),
  create second topic inline in multi-topic integration test
- H2: Add errors_count for json_array whole-batch serialization failure
- H3: Replace fragile line-number references with function names in README

MEDIUM:
- M1: Prevent panic in compute_retry_delay on f64 overflow (extreme backoff)
- M2: Validate status codes in open() — reject codes outside 100-599
- M3: Fix retry math in README (3 attempts not 4, include timeout)
- M4: Fix GCP timeout comment (60-350s -> AWS ALB ~60s, GCP ~600s)
- M5: Remove specific RSS claim from README
- M6: Clarify FFI boundary in consume() error log and README
- M7: Warn on non-integer Retry-After header instead of silently ignoring
- M8: Remove unused dashmap/once_cell direct dependencies
- M9: Replace magic string match arms with constants in integration test

LOW:
- L1: Extract shared send_batch_body() helper from ndjson/json_array
- L2: Add last_success_timestamp to close() stats log
- L3: Add credential placeholder warning comment in config.toml

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…clarity

Change send_batch_body parameter from Vec<u8> to Bytes — makes the
zero-copy intent explicit and idiomatic. Callers wrap with Bytes::from()
at the call site after payload size checks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address 6 findings from double-review round 4:

F1 (HIGH): Narrow status code validation from 200-599, rejecting HTTP 1xx
    informational codes that are not valid terminal response codes.
F2 (HIGH): Warn on non-UTF-8 Retry-After header values instead of
    silently dropping them via .to_str().ok().
F3 (HIGH): Add debug_assert!(count > 0) in send_batch_body() for
    defense-in-depth against empty batch calls.
F4 (MEDIUM): Replace line number reference (runtime/src/sink.rs:585)
    with function name (process_messages()) in consume() doc comment.
F5 (MEDIUM): Clarify README retry labels — "Initial request" + "Retry 1/2/3"
    instead of ambiguous "Attempt 1/2/3".
F6 (MEDIUM): Warn in constructor when retry_delay > max_retry_delay,
    since all delays will be silently capped.

New test: given_informational_status_code_should_fail_open (60 total).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Apply rigorous test documentation standards to all 7 integration tests:

Module-level documentation (~130 lines):
- Connector architecture diagram (test code → runtime → sink → WireMock)
- Runtime model explanation (1 process = 1 config = 1 plugin, per-topic tasks)
- What each test validates (7-test summary)
- Full-stack infrastructure details (iggy-server, runtime, WireMock, fixtures)
- Fixture architecture and env var override pattern
- Running instructions with prerequisites
- Success criteria, known limitations, related documentation
- Test history with code review changes

Per-test documentation (40-65 lines each):
- Purpose, Behavior Under Test, Why This Matters
- Numbered Test Flow steps
- Key Validations with rationale
- Related Code with function names (not line numbers)
- Test History where applicable (multi-topic H1/M9 changes)

Inline commentary:
- Step comments explaining each phase of the test
- Assertion messages with expected vs actual context

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix missing HttpSinkMultiTopicFixture re-export in fixtures/mod.rs that
caused E0432 + cascading E0282 type inference errors. Remove dead
re-exports (HttpSinkWireMockContainer, WireMockRequest) from http/mod.rs.
Add #[allow(dead_code)] to reset_requests() test utility. Apply rustfmt
across lib.rs and http_sink.rs integration tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codecov
Copy link

codecov bot commented Mar 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.07%. Comparing base (261d255) to head (ec179a9).

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #2925      +/-   ##
============================================
- Coverage     70.08%   70.07%   -0.01%     
  Complexity      776      776              
============================================
  Files          1028     1028              
  Lines         85279    85279              
  Branches      62653    62663      +10     
============================================
- Hits          59771    59763       -8     
+ Misses        22980    22979       -1     
- Partials       2528     2537       +9     
Flag Coverage Δ
csharp 67.47% <ø> (-0.15%) ⬇️
go 36.37% <ø> (ø)
java 56.26% <ø> (ø)
node 91.28% <ø> (-0.17%) ⬇️
python 81.43% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.
see 6 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant