Fix infinite loop in skipPrefixBeforeHeader when file has fewer lines than skip_first_lines#101111
Conversation
…es than `skip_first_lines` When `input_format_csv_skip_first_lines` or `input_format_tsv_skip_first_lines` is set to a value larger than the number of lines in the file, `skipPrefixBeforeHeader` enters an infinite loop doing `pread` at EOF. Each call to `readRow` succeeds with an empty result even at EOF, so the loop never terminates. Add an EOF check before each `readRow` call to break out of the loop when the file is exhausted. Found via AST fuzzer in Stress test (amd_msan) on PR #100947. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Workflow [PR], commit [8e368ed] Summary: ✅ AI ReviewSummaryThis PR fixes an infinite-loop-at-EOF behavior in ClickHouse Rules
Final VerdictStatus: ✅ Approve |
| @@ -0,0 +1,7 @@ | |||
| -- Verify that skip_first_lines doesn't cause an infinite loop when the file has fewer lines than requested. | |||
There was a problem hiding this comment.
Does not reproduce.
… loop The `format()` function with inline data uses `ReadBufferFromMemory`, which doesn't trigger the pread-based infinite loop. Use `file()` table function instead, which reads via `ReadBufferFromFileDescriptorPRead`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
With `skip_first_lines = 100`, the loop over `readRow` at EOF completes
quickly even on unfixed builds. Bumping to 1 billion makes the loop take
long enough for the test-runner timeout to detect the hang.
This addresses the bugfix-validation failure ("does not reproduce").
…rst-lines-infinite-loop
The .sql test does not reproduce the bug in bugfix validation because the loop in `skipPrefixBeforeHeader` (1B iterations of a fast no-op at EOF) completes within the test timeout, producing the same empty output on both old and new code. Convert to a .sh test that wraps the query in `timeout 10`. On unfixed builds, the 1B-iteration loop takes much longer than 10 seconds, so `timeout` kills the process with a non-zero exit code. On fixed builds, the EOF check breaks the loop instantly and the query succeeds with exit code 0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rst-lines-infinite-loop
Test number 04068 was already taken by `04068_constant_fold_union_intersect` on master. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| @@ -0,0 +1,20 @@ | |||
| #!/usr/bin/env bash | |||
| # Tags: no-parallel | |||
There was a problem hiding this comment.
# Tags: no-parallel looks unnecessary here and will reduce stateless test parallelism in CI. This test already uses ${CLICKHOUSE_DATABASE}-scoped file names, so it should be isolated across parallel runs.
Please remove the no-parallel tag unless there is a concrete cross-test interference that requires serialization.
…rst-lines-infinite-loop
The test uses `${CLICKHOUSE_DATABASE}`-scoped file names, so it is
already isolated across parallel runs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rst-lines-infinite-loop
|
The Stress test (arm_msan) failure is fixed by #101239, which should be merged first. After it is merged, please update the branch to include the fix. |
|
The flaky check failure is fixed in #102148, let's update the branch. |
…rst-lines-infinite-loop
…rst-lines-infinite-loop
…rst-lines-infinite-loop
…rst-lines-infinite-loop
…rst-lines-infinite-loop
…rst-lines-infinite-loop
LLVM Coverage Report
Changed lines: 100.00% (8/8) · Uncovered code |
|
@groeneai, you missed my comment above. |
|
@alexey-milovidov sorry for the delay — the fix is already up at #103547 (filed 2026-04-26 ~04:30 UTC) and Root cause:
Fix: in Verification:
|
|
Hi — this PR may need backporting to Affected code: Why: If this should be backported, consider adding |
When
input_format_csv_skip_first_linesorinput_format_tsv_skip_first_linesis set to a value larger than the number of lines in the file,skipPrefixBeforeHeaderenters an infinite loop doingpreadat EOF. Each call toreadRowsucceeds with an empty result even at EOF, so the loop never terminates.Add an EOF check before each
readRowcall to break out of the loop when the file is exhausted.Found via AST fuzzer in Stress test (amd_msan) on PR #100947.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fix infinite loop when
input_format_csv_skip_first_linesorinput_format_tsv_skip_first_linesexceeds the number of lines in the file.Documentation entry for user-facing changes