Skip to content

fix(connectors): bring quickwit_sink up to convention#3523

Open
mfyuce wants to merge 1 commit into
apache:masterfrom
mfyuce:fix/quickwit-sink-convention
Open

fix(connectors): bring quickwit_sink up to convention#3523
mfyuce wants to merge 1 commit into
apache:masterfrom
mfyuce:fix/quickwit-sink-convention

Conversation

@mfyuce

@mfyuce mfyuce commented Jun 21, 2026

Copy link
Copy Markdown

Summary

  • Defer index_id YAML parse to open() — return InvalidConfigValue instead of panicking via expect()
  • Use ClientWithMiddleware (build_retry_client) for ingest retries with exponential backoff on 5xx / connection errors
  • check_connectivity_with_retry in open() against /health/livez
  • Map 4xx responses (except 429) to PermanentHttpError so the circuit breaker is not tripped by bad payloads
  • Add verbose_logging / max_retries / retry_delay / max_retry_delay / max_open_retries / open_retry_max_delay to QuickwitSinkConfig (all optional, forward-compatible)
  • Downgrade per-batch consume() log to debug! (upgraded to info! when verbose_logging = true)
  • Set Content-Type: application/x-ndjson on ingest requests
  • Drop unused dashmap / once_cell deps; add reqwest-middleware
  • 5 unit tests covering verbose flag, client init state, and index_id init state

Test plan

  • cargo clippy -p iggy_connector_quickwit_sink --all-targets -- -D warnings — clean
  • cargo test -p iggy_connector_quickwit_sink — 5 tests pass
  • Integration tests: cargo test -p integration -- connectors::quickwit

🤖 Generated with Claude Code

https://claude.ai/code/session_01QBxbPbdKXzoMdvLBeugNBX

- Defer index_id YAML parse to open(); return InvalidConfigValue instead
  of panicking via expect()
- Use ClientWithMiddleware (build_retry_client) for ingest retries with
  exponential backoff on 5xx / connection errors
- check_connectivity_with_retry on open() against /health/livez
- Map 4xx responses (except 429) to PermanentHttpError so the circuit
  breaker is not tripped by bad payloads
- Add verbose_logging / max_retries / retry_delay / max_retry_delay /
  max_open_retries / open_retry_max_delay to QuickwitSinkConfig
- Downgrade per-batch consume() log to debug! (info! when verbose)
- Set Content-Type: application/x-ndjson on ingest requests
- Drop unused dashmap / once_cell deps; add reqwest-middleware
- 5 unit tests: verbose flag, client init, index_id init

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01QBxbPbdKXzoMdvLBeugNBX
@github-actions

Copy link
Copy Markdown

Thanks for the PR. It is labeled S-waiting-on-review and queued for review.

Slash commands (own line, regular comment) move it around the queue:

  • /ready - back to S-waiting-on-review after addressing feedback
  • /author - flip to S-waiting-on-author while you finish changes
  • /request-review @user-or-team - request a reviewer

See CONTRIBUTING.md for details.

@github-actions github-actions Bot added the S-waiting-on-review PR is waiting on a reviewer label Jun 21, 2026
mfyuce added a commit to mfyuce/iggy that referenced this pull request Jun 21, 2026
- AGENTS.md: 104→75 lines. Removed redundant repo structure (derivable
  by ls), collapsed principles to iggy-specific rules only, merged
  Jenkins/QW infra into Infra section, updated handover block.
- TODO.md: replaced stale checked items with 4 open PRs (apache#3516 apache#3517
  apache#3523 apache#3525) + QW 0.9 upgrade task.
- DONE.md: added sessions 5-10 block (QW sink pipeline, collector
  cutover, InvalidOffset bug + fix).
- quickwit_sink/src/lib.rs: cargo fmt reformatting only.
info!("Created index: {}", self.index_id);
Ok(())
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lib.rs:ingest() — create_index() treats 409 Conflict as InitError; concurrent open() calls (multi-instance, restart race) both see has_index()=false, both POST, second gets 4xx → InitError → connector
never opens. Fix: absorb 409 (and 400 "already exists") as Ok(()) in create_index(). Retry middleware also retries a 5xx create that succeeded server-side; on retry, server returns 409 → same path. Same fix
covers both.

self.config.open_retry_max_delay.as_deref(),
DEFAULT_OPEN_RETRY_MAX_DELAY,
);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reqwest::Client::new() has no timeout; health probe and has_index()/create_index() can hang indefinitely under network partition or slow-starting Quickwit. Fix: reqwest::Client::builder().timeout(...) with configurable or sensible default (e.g. 30s)

@ryerraguntla

Copy link
Copy Markdown
Contributor

@mfyuce - all the above are minors. I am not sure of quickwit s production data set distribution to mention about the need for circuit breakers (if there are huge number of records/documents for the same cursor key for a given batch size) . Please make a judgement call about the need for circuit breaker implementation . otherwise it is all set for second reviewer's comments before merging.

@ryerraguntla

Copy link
Copy Markdown
Contributor

/author

@github-actions github-actions Bot added S-waiting-on-author PR is waiting on author response and removed S-waiting-on-review PR is waiting on a reviewer labels Jun 21, 2026
mfyuce added a commit to mfyuce/iggy that referenced this pull request Jun 21, 2026
…t timeout

409 CONFLICT (and 400 "already exists" for older QW) returned by
create_index() no longer fails connector open. This covers two races:
concurrent open() calls where both see has_index()=false, and retry
middleware retrying a 5xx create that already succeeded server-side.

Add request_timeout (default 30s) on the underlying reqwest Client so
health probes and index management calls time out under network partition
instead of hanging indefinitely.

Fixes review feedback from ryerraguntla on PR apache#3523.
@mfyuce

mfyuce commented Jun 21, 2026

Copy link
Copy Markdown
Author

Thanks for the thorough review @ryerraguntla! Both issues addressed in the latest push:

409 / "already exists" on create_index()create_index() now absorbs 409 CONFLICT and 400 BAD_REQUEST whose body contains "already exists" as Ok(()) with an info! log. Covers the concurrent-open race and the retry-after-succeeded-5xx path.

No timeout on reqwest::Client — Added request_timeout: Option<String> to QuickwitSinkConfig (default 30s, configurable via TOML). The raw client is now built with Client::builder().timeout(request_timeout), bounding health probes, has_index(), and create_index() under network partition.

/ready

@github-actions github-actions Bot added S-waiting-on-review PR is waiting on a reviewer and removed S-waiting-on-author PR is waiting on author response labels Jun 21, 2026
@mfyuce

mfyuce commented Jun 21, 2026

Copy link
Copy Markdown
Author

@ryerraguntla — judgment call on the circuit breaker:

The existing build_retry_client wraps a HttpRetryMiddleware that already retries 429 and 5xx with exponential backoff, and the PermanentHttpError mapping cuts retries for permanent 4xx errors (bad data won't hammer the backend). Ingest throughput is also naturally rate-bounded by batch_length and poll_interval.

QuickWit is typically an internal service in the same cluster, so prolonged partition is rare, and the runtime already isolates failures per connector. A full half-open / trip-threshold circuit breaker on top of this would add complexity without clear benefit for this specific sink. Happy to revisit if the second reviewer sees a concrete failure mode that the existing retry policy doesn't cover.

mfyuce added a commit to mfyuce/iggy that referenced this pull request Jun 21, 2026
apache#3523 review addressed (409 absorb + request_timeout). Four PRs all
S-waiting-on-review; pipeline live and clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-review PR is waiting on a reviewer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants