Skip to content

docs(connectors): add Generic HTTP Sink page and update connectors table#41

Open
mlevkov wants to merge 1 commit into
apache:mainfrom
mlevkov:add-generic-http-sink-docs
Open

docs(connectors): add Generic HTTP Sink page and update connectors table#41
mlevkov wants to merge 1 commit into
apache:mainfrom
mlevkov:add-generic-http-sink-docs

Conversation

@mlevkov
Copy link
Copy Markdown

@mlevkov mlevkov commented Apr 26, 2026

Summary

  • Adds content/docs/connectors/sinks/http.mdx (~179 lines) — a curated subset of the upstream core/connectors/sinks/http_sink/README.md, with a <Callout> at the top pointing readers back to the canonical README for the full surface.
  • Inserts "http" in content/docs/connectors/sinks/meta.json between iceberg and stdout so the sidebar reflects the new page.
  • Adds Generic HTTP to the Sink row of the Available Connectors table in content/docs/connectors/introduction.mdx.

Naming and content-scope decisions (file slug, page title, table cell wording, what to include vs. summarize-and-link) are documented in #39, including the rationale for Generic HTTP (vs. plain HTTP) — the qualifier disambiguates a transport-level connector from the several sinks that already speak HTTP under the hood (Elasticsearch, Quickwit).

Test plan

  • npm run build passes locally — 77/77 static pages generated, no MDX/TS errors
  • /docs/connectors/sinks/http renders at HTTP 200 with all expected section headings (Configuration, Configuration Options, Batch Modes, Metadata Envelope, Authentication, Retry & Delivery Semantics, Example Configurations, Deployment & Performance, Known Limitations)
  • /docs/connectors/introduction#available-connectors shows Apache Iceberg, Generic HTTP, Stdout in the Sink row
  • Sidebar on any sinks/* page lists the new entry between iceberg and stdout
  • Fumadocs <Callout type="info"> renders at the top of the new page

Closes #39

Surfaces the Generic HTTP Sink connector (shipped in 0.8.0 via apache/iggy#2925)
on the docs site as a curated reference subset of the upstream README. Adds the
new sinks/http page with a Callout linking to the canonical README, inserts it
into the sidebar between iceberg and stdout, and lists "Generic HTTP" in the
Available Connectors table.

Closes apache#39
|--------|----------------------------------------------------------------|
| Source | PostgreSQL, Elasticsearch, Random |
| Sink | PostgreSQL, MongoDB, Elasticsearch, Quickwit, Apache Iceberg, Stdout |
| Sink | PostgreSQL, MongoDB, Elasticsearch, Quickwit, Apache Iceberg, Generic HTTP, Stdout |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the table cell here says "Generic HTTP" but the page it links to has frontmatter title "HTTP Sink" and never uses the word "Generic" anywhere in the body. clicking through gives the reader two different names for the same thing. either rename the new page's frontmatter to something like "HTTP Sink (Generic)" and add a one-liner in the opening paragraph noting it's the transport-level connector (vs Elasticsearch/Quickwit which speak HTTP under the hood), or drop "Generic" from this row.


### Configuration Options

| Option | Type | Default | Description |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this whole config table is a verbatim copy of the upstream core/connectors/sinks/http_sink/README.md. defaults, types, and the transient retry codes (429/500/502/503/504 a few sections down) are all hardcoded constants in lib.rs and will silently drift the moment that file changes. options: trim this to the 5-6 most-used knobs plus a clear pointer to the upstream README for the full list, or add a CI step that diffs this table against the upstream README on each build. note MAX_CONSECUTIVE_FAILURES = 3 is also a const but only surfaced in prose - if it changes upstream nothing flags it here.

personally, i prefer linking upstream README.

| Option | Type | Default | Description |
| ------ | ---- | ------- | ----------- |
| `url` | string | **required** | Target URL for HTTP requests |
| `method` | string | `POST` | HTTP method: `GET`, `HEAD`, `POST`, `PUT`, `PATCH`, `DELETE` |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the method list includes GET and HEAD without flagging that those methods with non-individual batch modes produce a warning at runtime ("may be rejected by the server") since GET/HEAD don't conventionally carry bodies. either drop GET/HEAD from this list (rarely useful for a sink) or add a one-liner about the batch-mode interaction.

| `retry_delay` | string | `1s` | Base delay between retries |
| `retry_backoff_multiplier` | u32 | `2` | Exponential backoff multiplier (min 1) |
| `max_retry_delay` | string | `30s` | Maximum retry delay cap |
| `success_status_codes` | [u16] | `[200, 201, 202, 204]` | Status codes considered successful |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the description is fine but misses a useful behavior: codes in this set are also never retried, even normally-transient ones like 429. so users who want to treat 429 as "queued/accepted" can put 429 here and it will short-circuit retries. worth a sentence on this - it's a non-obvious knob.

- **`json_array`**: all messages as a single JSON array. Best for APIs that expect array payloads. `Content-Type: application/json`.
- **`raw`**: raw bytes, one request per message. For non-JSON payloads (Protobuf, FlatBuffers, binary). The metadata envelope is not applied. `Content-Type: application/octet-stream`.

For production throughput, prefer `ndjson` or `json_array` over `individual` — they collapse N round trips per poll cycle into one.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raw has the same N-round-trips problem as individual (one HTTP request per message - confirmed at send_raw in lib.rs) but isn't called out alongside in this throughput note. either include raw here or be explicit that raw shares the per-message cost.

```

- `iggy_id` is a 32-character lowercase hex string (no dashes).
- For non-JSON payloads (`raw`, `flatbuffer`, `proto` schemas), the payload is base64-encoded and an `iggy_payload_encoding: "base64"` field is added.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the prose says "an iggy_payload_encoding: "base64" field is added" but never shows the actual JSON shape. for raw/flatbuffer/proto schemas the sink emits the payload as {"data": "<base64>", "iggy_payload_encoding": "base64"} (see EncodedPayload struct). worth a 4-line example block - non-obvious that the bytes live in a data field.

[plugin_config]
url = "https://hooks.slack.com/services/T00/B00/xxx"
batch_mode = "individual"
include_metadata = false # Slack expects bare JSON payload
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment "Slack expects bare JSON payload" is true but easy to misread. flipping include_metadata = false doesn't transform arbitrary payloads into Slack's {"text": "..."} shape - the sink does no payload transformation on outbound. add a one-liner: "your producer must publish Slack-compatible JSON; the sink does not transform payloads."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

docs: add Generic HTTP Sink connector page and update connectors table

2 participants