feat(connectors): add generic HTTP sink connector by mlevkov · Pull Request #2925 · apache/iggy

mlevkov · 2026-03-12T20:22:19Z

Summary

Adds a generic HTTP sink connector that delivers consumed Iggy stream messages to any HTTP endpoint — webhooks, Lambda functions, REST APIs, or SaaS integrations.

4 batch modes: individual (one request per message), ndjson (newline-delimited), json_array (single array), raw (bytes)
Exponential backoff retry with configurable delay, multiplier, max delay, and Retry-After header support
Flexible metadata: optional Iggy envelope (stream/topic/offset/timestamp), checksum, origin timestamp
Connection pooling: reqwest client with TCP keep-alive (30s), pool idle timeout (90s), configurable max connections
Health checks: opt-in startup probe (HEAD/GET/OPTIONS) with graceful degradation
TLS: optional danger_accept_invalid_certs for dev environments
Custom headers: arbitrary HTTP headers for auth tokens, API keys, routing

Files Added (4,233 lines)

File	Lines	Purpose
`sinks/http_sink/src/lib.rs`	2,062	Core implementation — types, config, Sink trait, retry, batch modes, 60 unit tests
`sinks/http_sink/README.md`	810	Usage guide, config reference, runtime model, deployment patterns, message flow
`sinks/http_sink/Cargo.toml`	48	Crate manifest
`sinks/http_sink/config.toml`	90	Example connector configuration
`fixtures/http/container.rs`	238	Docker test container with WireMock
`fixtures/http/sink.rs`	216	Test fixtures (single-topic, multi-topic, batch mode variants)
`http/http_sink.rs`	662	7 integration tests (delivery, metadata, batch modes, multi-topic)
Other fixtures/config	64	Module files, WireMock mappings, test config

Architecture

Iggy Stream → [Runtime polls topic] → consume(messages) → [batch mode] → HTTP endpoint
                                            │
                                    ┌───────┴────────┐
                                    │  individual    │ → 1 request per message
                                    │  ndjson        │ → all messages, newline-delimited
                                    │  json_array    │ → all messages, JSON array
                                    │  raw           │ → 1 request per message, raw bytes
                                    └────────────────┘

Code Review History

4 rounds of automated review with 4 specialized agents each:

Round	Agents	Findings	Fixed
Round 1	code-reviewer, silent-failure-hunter, comment-analyzer, code-simplifier	12	12
Round 2	Same 4 agents (follow-up)	7	7
Round 3	Same 4 agents (post-feature additions)	17	15 (2 deferred)
Round 4	Same 4 agents (double-review follow-up)	6	6
Total		42	40

Key fixes across rounds:

Error accounting correctness (errors_count + messages_delivered = total for all code paths)
Status code validation (200-599 range, rejects 1xx informational codes)
Overflow protection in retry delay computation
Non-UTF-8 Retry-After header warning
Unused dependency removal (dashmap, once_cell — re-exported by SDK)
Shared send_batch_body() helper eliminating duplication
README: function name references (not line numbers), accurate retry math, runtime model docs

Deferred (tracked in issues)

L4: Structured error type enum replacing string-based Error::HttpRequestFailed — #2927
D1: Expose internal metrics (errors_count, retries_count) via runtime health API — #2928

Test Plan

60 unit tests covering config parsing, validation, serialization, retry logic, batch modes, edge cases
7 integration tests with WireMock in Docker:
- single_json_message_delivered_with_metadata — basic delivery + envelope verification
- metadata_fields_respect_config — include_checksum, include_origin_timestamp toggles
- ndjson_batch_delivers_all_messages — NDJSON batch mode
- json_array_batch_delivers_all_messages — JSON array batch mode
- individual_mode_sends_separate_requests — per-message delivery
- raw_mode_delivers_bytes — raw byte passthrough
- multi_topic_messages_delivered_with_correct_topic_metadata — 2 topics, metadata accuracy
cargo clippy -p iggy_connector_http_sink -- -D warnings — 0 warnings
cargo clippy -p integration -- -D warnings — 0 warnings

…nk impl Add generic HTTP sink connector for delivering consumed messages to any HTTP endpoint (webhooks, REST APIs, serverless functions). This commit establishes the crate structure, config types, and stub trait implementation. - HttpMethod enum (Get, Head, Post, Put, Patch, Delete) with Default=Post - BatchMode enum (Individual, Ndjson, JsonArray, Raw) with Default=Individual - HttpSinkConfig with 20 fields covering retry, TLS, batching, metadata - HttpSink struct with Option<Client> (built in open(), not new()) - Stub Sink trait impl (open/consume/close) with TODO markers for Commit 2 - Document runtime consume() Result discard (upstream sink.rs:585 bug) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Full implementation of the HTTP sink connector's Sink trait: open(): Build reqwest::Client from config (timeout, TLS, pool size), optional health check with configurable HTTP method. consume(): Four batch modes — individual (partial delivery on failure), ndjson (newline-delimited), json_array (single array), raw (bytes). Metadata envelope wrapping with UUID-formatted u128 IDs, base64 for binary payloads (Raw/Proto/FlatBuffer). Configurable success status codes, checksum and origin timestamp inclusion. Retry: Exponential backoff with configurable multiplier and cap. Transient errors (429/500/502/503/504) and network errors retry; non-transient errors fail immediately. Respects Retry-After header on HTTP 429. close(): Log cumulative stats (requests, delivered, errors, retries). Config resolution: All Option fields resolved to concrete values in new() following MongoDB sink pattern. Duration strings parsed with humantime. UTF-8-safe response truncation in logs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Addresses all findings from 4-agent code review: - Cap Retry-After to max_retry_delay, use reqwest::header::RETRY_AFTER - Health check uses configured success_status_codes, applies custom headers - NDJSON trailing newline for spec compliance - Skip-and-continue on per-message serialization failure (ndjson/json_array) - MAX_CONSECUTIVE_FAILURES=3 threshold in individual/raw modes - Direct simd_json→serde_json structural conversion (ported from ES sink) - Verbose consume() log downgraded to debug level - Explicit error on response body read failure - Empty URL validation with Error::InitError - UUID format documented as non-RFC-4122 - Contradictory config warnings (Raw+metadata, GET/HEAD+batch) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…CR round 2) Round 2 double-review findings: - CRITICAL: JSON array batch serialization error now logs batch size context - HIGH: success_status_codes validated non-empty in open() (prevents retry storms) - HIGH: Partial delivery logs separate HTTP failures vs serialization errors - HIGH: saturating_sub prevents usize underflow in remaining-messages calc - MEDIUM: Skip count logged on ndjson/json_array failure path (not just success) - MEDIUM: payload_to_json documented as defensive (all current variants infallible) - LOW: Raw/FlatBuffer match arms merged in payload_to_json Deferred (documented, not bugs): - Retry-After HTTP-date format (needs httpdate dependency, out of scope for v1) - Payload::Proto raw mode semantic inconsistency (follows SDK try_into_vec behavior) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Example configuration with all plugin_config fields documented. Follows the MongoDB/PostgreSQL sink config.toml pattern. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…known limitations Follows MongoDB sink README structure: Try It, Quick Start, Configuration, Batch Modes, Retry Strategy, Example Configs, Known Limitations. Documents 3 deferred review findings and 2 runtime issues as known limitations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Tests cover: - Config resolution (defaults, overrides, backoff clamp, invalid duration fallback) - Duration parsing (valid strings, None fallback) - HttpMethod serde (uppercase serialize/deserialize, invalid rejection) - BatchMode serde (snake_case serialization) - Content-type mapping for all 4 batch modes - UUID formatting (zero, max, specific grouping) - UTF-8-safe truncation (short, long, multibyte) - Payload conversion (JSON, Text, Raw, FlatBuffer, Proto) - Metadata envelope (with/without metadata, checksum, origin_timestamp) - Retry delay computation (base, exponential backoff, max cap) - Transient status classification (429/5xx vs 4xx) - owned_value_to_serde_json (null, bool, int, f64, NaN, infinity, nested) - TOML config deserialization (minimal, full, invalid method/batch_mode) - open() validation (empty URL, invalid URL, empty success_status_codes, valid) Adds toml as dev-dependency for config deserialization tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Tests: - Add iggy_timestamp assertion to metadata envelope test - Add negative assertions for absent checksum/origin_timestamp by default - Strengthen multibyte truncation test with concrete expected value - Add raw mode + include_metadata invariant test (47 tests total) Docs: - Fix README retry sequence (attempt 1 is retry_delay, not immediate) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add 6 end-to-end integration tests covering all batch modes and metadata behavior of the HTTP sink connector. Tests use WireMock container as a programmable HTTP endpoint and verify received requests via admin API. Tests: - individual_json_messages_delivered_as_separate_posts - ndjson_messages_delivered_as_single_request - json_array_messages_delivered_as_single_request - raw_binary_messages_delivered_without_envelope - metadata_disabled_sends_bare_payload - individual_messages_have_sequential_offsets Fixture variants: Individual, NDJSON, JsonArray, Raw, NoMetadata Following MongoDB sink integration test patterns. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

CRITICAL fixes: - C1: SSRF prevention — URL scheme validation (http/https only) in open() - C2: Header validation — reject invalid header names/values at init, not per-request - C3: O(1) retry clones — send_with_retry takes bytes::Bytes instead of Vec<u8> HIGH fixes: - H1: Content-Type deduplication — filter user-supplied Content-Type in request_builder() - H3: Skipped message accounting — abort path now records skipped messages in errors_count TEST fixes: - T1: Content-Type assertions use expect() instead of silent if-let skip - T2: Exact count assertions (==) instead of >= that masks over-delivery - T3: Offset test checks contiguous ordering, not absolute base-0 assumption - T4: New test for consume() before open() returns InitError DOCS fixes: - D1: Disambiguate sink.rs:585 → runtime/src/sink.rs:585 - D2: send_individual doc mentions MAX_CONSECUTIVE_FAILURES abort behavior 9 new unit tests (47 → 56), all passing, zero clippy warnings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

7 findings from 4-agent double-review: R2-1 (HIGH): WireMockRequest::header() now actually case-insensitive per RFC 7230 R2-2 (HIGH): Offset test uses explicit unwrap_or_else instead of silent filter_map R2-3 (MEDIUM): URL parse error now includes the actual parse error message R2-4 (MEDIUM): Abort accounting uses saturating_sub + debug_assert for defensive safety R2-5 (MEDIUM): open() warns when user Content-Type header will be overridden by batch_mode R2-6 (MEDIUM): Batch modes (ndjson/json_array) now count all undelivered messages in errors_count R2-7 (LOW): Content-Type test improved with set-based assertion and documented limitation Deferred (pre-existing, not regressions): - parse_duration silent fallback (requires SDK contract change) - Runtime discards consume() errors (upstream issue apache#2927) - Retry-After HTTP-date format (nice-to-have) - NaN/Infinity to null (documented, matches ES sink) 56 unit tests passing, zero clippy warnings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…yment patterns New sections: - Use Cases: webhook delivery, REST API ingestion, serverless triggers, IoT relay, multi-service fan-out, observability pipeline - Authentication: Bearer, API key, Basic auth, multi-header, limitations (no OAuth2 refresh, no SigV4, no mTLS) - Deployment Patterns: single destination/multi-topic, multi-destination (one connector per destination), fan-out (same topic to multiple endpoints via separate consumer groups), Docker/container deployment, environment variable overrides for secrets - Updated Known Limitations: added per-topic routing, OAuth2, env var expansion; linked upstream issues apache#2927 and apache#2928 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Configure reqwest client with tcp_keepalive(30s) and pool_idle_timeout(90s) to detect dead connections behind cloud load balancers and clean up stale idle connections. Add Performance Considerations section to README covering batch mode selection, memory implications, connection pooling, and retry impact. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add connector_multi_topic_stream seed function that creates one stream with two topics. Add HttpSinkMultiTopicFixture that subscribes to both topics via the STREAMS_0_TOPICS env var. The test sends messages to each topic and verifies all arrive at WireMock with correct iggy_topic metadata, demonstrating the multi-topic single-connector deployment pattern. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Explain what "deploying multiple instances" means tactically — each instance is a separate OS process with its own config directory, not a config option within one process. Add a clear table showing which deployment patterns are achievable today vs. not, and annotate each deployment pattern section with its achievability status. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…tails Add links to runtime source code (sink.rs, sdk/src/sink.rs) explaining how the connector runtime spawns one task per topic, uses DashMap for plugin instance multiplexing, and calls consume() sequentially. Expand connection pooling section with reqwest client sharing semantics, TCP keep-alive rationale for cloud LB idle timeouts, and cross-process pool isolation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…structure Clarify that the connector does not require any particular message structure on input — it receives raw bytes from the Iggy runtime. The metadata envelope is added by the sink on the way out, not expected on the way in. Includes ASCII flow diagram, schema interpretation table, and guidance for publishing existing structs in any serialization format. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…lidation, docs Address 15 findings from 4-agent code review (CR round 3): CRITICAL: - C1: Add errors_count for payload-size-exceeded in ndjson/json_array batch modes HIGH: - H1: Remove HTTP-sink-specific constants from shared harness (seeds.rs), create second topic inline in multi-topic integration test - H2: Add errors_count for json_array whole-batch serialization failure - H3: Replace fragile line-number references with function names in README MEDIUM: - M1: Prevent panic in compute_retry_delay on f64 overflow (extreme backoff) - M2: Validate status codes in open() — reject codes outside 100-599 - M3: Fix retry math in README (3 attempts not 4, include timeout) - M4: Fix GCP timeout comment (60-350s -> AWS ALB ~60s, GCP ~600s) - M5: Remove specific RSS claim from README - M6: Clarify FFI boundary in consume() error log and README - M7: Warn on non-integer Retry-After header instead of silently ignoring - M8: Remove unused dashmap/once_cell direct dependencies - M9: Replace magic string match arms with constants in integration test LOW: - L1: Extract shared send_batch_body() helper from ndjson/json_array - L2: Add last_success_timestamp to close() stats log - L3: Add credential placeholder warning comment in config.toml Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…clarity Change send_batch_body parameter from Vec<u8> to Bytes — makes the zero-copy intent explicit and idiomatic. Callers wrap with Bytes::from() at the call site after payload size checks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Address 6 findings from double-review round 4: F1 (HIGH): Narrow status code validation from 200-599, rejecting HTTP 1xx informational codes that are not valid terminal response codes. F2 (HIGH): Warn on non-UTF-8 Retry-After header values instead of silently dropping them via .to_str().ok(). F3 (HIGH): Add debug_assert!(count > 0) in send_batch_body() for defense-in-depth against empty batch calls. F4 (MEDIUM): Replace line number reference (runtime/src/sink.rs:585) with function name (process_messages()) in consume() doc comment. F5 (MEDIUM): Clarify README retry labels — "Initial request" + "Retry 1/2/3" instead of ambiguous "Attempt 1/2/3". F6 (MEDIUM): Warn in constructor when retry_delay > max_retry_delay, since all delays will be silently capped. New test: given_informational_status_code_should_fail_open (60 total). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Apply rigorous test documentation standards to all 7 integration tests: Module-level documentation (~130 lines): - Connector architecture diagram (test code → runtime → sink → WireMock) - Runtime model explanation (1 process = 1 config = 1 plugin, per-topic tasks) - What each test validates (7-test summary) - Full-stack infrastructure details (iggy-server, runtime, WireMock, fixtures) - Fixture architecture and env var override pattern - Running instructions with prerequisites - Success criteria, known limitations, related documentation - Test history with code review changes Per-test documentation (40-65 lines each): - Purpose, Behavior Under Test, Why This Matters - Numbered Test Flow steps - Key Validations with rationale - Related Code with function names (not line numbers) - Test History where applicable (multi-topic H1/M9 changes) Inline commentary: - Step comments explaining each phase of the test - Assertion messages with expected vs actual context Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix missing HttpSinkMultiTopicFixture re-export in fixtures/mod.rs that caused E0432 + cascading E0282 type inference errors. Remove dead re-exports (HttpSinkWireMockContainer, WireMockRequest) from http/mod.rs. Add #[allow(dead_code)] to reset_requests() test utility. Apply rustfmt across lib.rs and http_sink.rs integration tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

codecov · 2026-03-13T06:07:35Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.07%. Comparing base (261d255) to head (ec179a9).

Additional details and impacted files

@@             Coverage Diff              @@
##             master    #2925      +/-   ##
============================================
- Coverage     70.08%   70.07%   -0.01%     
  Complexity      776      776              
============================================
  Files          1028     1028              
  Lines         85279    85279              
  Branches      62653    62663      +10     
============================================
- Hits          59771    59763       -8     
+ Misses        22980    22979       -1     
- Partials       2528     2537       +9

Flag	Coverage Δ
csharp	`67.47% <ø> (-0.15%)`	⬇️
go	`36.37% <ø> (ø)`
java	`56.26% <ø> (ø)`
node	`91.28% <ø> (-0.17%)`	⬇️
python	`81.43% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.
see 6 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

mlevkov and others added 8 commits March 12, 2026 13:21

docs(connectors): add HTTP sink example config.toml

81036b0

Example configuration with all plugin_config fields documented. Follows the MongoDB/PostgreSQL sink config.toml pattern. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

This was referenced Mar 12, 2026

bug(connectors): consume() return value discarded in sink runtime #2927

Open

bug(connectors): PollingMessages auto-commit commits offsets before sink processing #2928

Open

mlevkov and others added 15 commits March 12, 2026 14:34

style(http-sink): align ASCII architecture diagram in test docs

ec179a9

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(connectors): add generic HTTP sink connector#2925

feat(connectors): add generic HTTP sink connector#2925
mlevkov wants to merge 23 commits intoapache:masterfrom
mlevkov:feat/http-sink-connector

mlevkov commented Mar 12, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mlevkov commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files Added (4,233 lines)

Architecture

Code Review History

Deferred (tracked in issues)

Test Plan

Related

Uh oh!

codecov bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mlevkov commented Mar 12, 2026 •

edited

Loading

codecov bot commented Mar 13, 2026 •

edited

Loading