Skip to content

Fix infinite loop in skipPrefixBeforeHeader when file has fewer lines than skip_first_lines#101111

Merged
alexey-milovidov merged 19 commits into
masterfrom
fix/csv-tsv-skip-first-lines-infinite-loop
Apr 27, 2026
Merged

Fix infinite loop in skipPrefixBeforeHeader when file has fewer lines than skip_first_lines#101111
alexey-milovidov merged 19 commits into
masterfrom
fix/csv-tsv-skip-first-lines-infinite-loop

Conversation

@alexey-milovidov
Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov commented Mar 29, 2026

When input_format_csv_skip_first_lines or input_format_tsv_skip_first_lines is set to a value larger than the number of lines in the file, skipPrefixBeforeHeader enters an infinite loop doing pread at EOF. Each call to readRow succeeds with an empty result even at EOF, so the loop never terminates.

Add an EOF check before each readRow call to break out of the loop when the file is exhausted.

Found via AST fuzzer in Stress test (amd_msan) on PR #100947.

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix infinite loop when input_format_csv_skip_first_lines or input_format_tsv_skip_first_lines exceeds the number of lines in the file.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

…es than `skip_first_lines`

When `input_format_csv_skip_first_lines` or `input_format_tsv_skip_first_lines`
is set to a value larger than the number of lines in the file,
`skipPrefixBeforeHeader` enters an infinite loop doing `pread` at EOF.
Each call to `readRow` succeeds with an empty result even at EOF, so the
loop never terminates.

Add an EOF check before each `readRow` call to break out of the loop
when the file is exhausted.

Found via AST fuzzer in Stress test (amd_msan) on PR #100947.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Mar 29, 2026

Workflow [PR], commit [8e368ed]

Summary:


AI Review

Summary

This PR fixes an infinite-loop-at-EOF behavior in CSV/TSV skip_first_lines handling by breaking the skip loop when the buffer reaches EOF, and adds a regression test that reproduces the issue with file-backed reads. The fix is targeted, covers both affected readers, and the test validates the user-visible behavior.

ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
No large/binary files
Final Verdict

Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Mar 29, 2026
@@ -0,0 +1,7 @@
-- Verify that skip_first_lines doesn't cause an infinite loop when the file has fewer lines than requested.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not reproduce.

alexey-milovidov and others added 6 commits March 29, 2026 20:17
… loop

The `format()` function with inline data uses `ReadBufferFromMemory`,
which doesn't trigger the pread-based infinite loop. Use `file()` table
function instead, which reads via `ReadBufferFromFileDescriptorPRead`.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
With `skip_first_lines = 100`, the loop over `readRow` at EOF completes
quickly even on unfixed builds.  Bumping to 1 billion makes the loop take
long enough for the test-runner timeout to detect the hang.

This addresses the bugfix-validation failure ("does not reproduce").
The .sql test does not reproduce the bug in bugfix validation because the
loop in `skipPrefixBeforeHeader` (1B iterations of a fast no-op at EOF)
completes within the test timeout, producing the same empty output on both
old and new code.

Convert to a .sh test that wraps the query in `timeout 10`. On unfixed
builds, the 1B-iteration loop takes much longer than 10 seconds, so
`timeout` kills the process with a non-zero exit code. On fixed builds,
the EOF check breaks the loop instantly and the query succeeds with exit
code 0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Test number 04068 was already taken by `04068_constant_fold_union_intersect`
on master.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@@ -0,0 +1,20 @@
#!/usr/bin/env bash
# Tags: no-parallel
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Tags: no-parallel looks unnecessary here and will reduce stateless test parallelism in CI. This test already uses ${CLICKHOUSE_DATABASE}-scoped file names, so it should be isolated across parallel runs.

Please remove the no-parallel tag unless there is a concrete cross-test interference that requires serialization.

alexey-milovidov and others added 4 commits March 30, 2026 15:45
The test uses `${CLICKHOUSE_DATABASE}`-scoped file names, so it is
already isolated across parallel runs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@alexey-milovidov
Copy link
Copy Markdown
Member Author

The Stress test (arm_msan) failure is fixed by #101239, which should be merged first. After it is merged, please update the branch to include the fix.

Copy link
Copy Markdown
Member Author

@alexey-milovidov alexey-milovidov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good change!

@alexey-milovidov
Copy link
Copy Markdown
Member Author

The flaky check failure is fixed in #102148, let's update the branch.

@alexey-milovidov
Copy link
Copy Markdown
Member Author

@groeneai, fix the flaky test 03203_system_query_metric_log, send a separate PR with the fix, and request review from @pamarcos, because he is the author of that test.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Apr 26, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.00% 84.00% +0.00%
Functions 91.10% 91.10% +0.00%
Branches 76.50% 76.50% +0.00%

Changed lines: 100.00% (8/8) · Uncovered code

Full report · Diff report

@alexey-milovidov alexey-milovidov self-assigned this Apr 27, 2026
@alexey-milovidov alexey-milovidov added this pull request to the merge queue Apr 27, 2026
@alexey-milovidov
Copy link
Copy Markdown
Member Author

@groeneai, you missed my comment above.

Merged via the queue into master with commit c007b43 Apr 27, 2026
165 checks passed
@alexey-milovidov alexey-milovidov deleted the fix/csv-tsv-skip-first-lines-infinite-loop branch April 27, 2026 05:28
@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label Apr 27, 2026
@groeneai
Copy link
Copy Markdown
Contributor

@alexey-milovidov sorry for the delay — the fix is already up at #103547 (filed 2026-04-26 ~04:30 UTC) and @pamarcos was pinged for review. Posting this reply to make sure you see it.

Root cause: QueryMetricLog was losing rows when BackgroundSchedulePool fell behind:

  1. Periodic events scheduled-but-fired-after-finishQuery were silently dropped (the periodic task hits getQueryInfo() returning null on a finished query).
  2. The final "query finished" row was dropped on timestamp-collision with the last periodic (the <= dedup guard at QueryMetricLog.cpp:266).

Fix: in finishQuery, backfill any periodics whose next_collect_time < finish_time before emitting the final event, and pass is_final=true so the final event bypasses the dedup guard. last_collect_time is updated via std::max so it never moves backwards.

Verification:

  • 03203_system_query_metric_log: 50/50 OK (pre-existing test now more deterministic)
  • 03203 under stress-ng --cpu 96 CPU stress: 30/30 OK
  • Sibling 03549_system_query_metric_log_with_queries_with_same_query_id: 10/10 OK
  • New regression test 04121_query_metric_log_backfill_missed_periodics.sh: 30/30 OK

PR #103547 ready for review when @pamarcos has a moment.

@clickgapai
Copy link
Copy Markdown
Contributor

Hi — this PR may need backporting to 26.3 (LTS), 26.2, 26.1, 25.8 (LTS), but no backport label was found.

Affected code: CSVFormatReader::skipPrefixBeforeHeader and TabSeparatedFormatReader::skipPrefixBeforeHeader — core code present in all supported branches.

Why: skip_first_lines setting and skipPrefixBeforeHeader loop were added in commit 4c9812d (2022-05-25, 'Allow to skip some of the first rows in CSV/TSV formats'), predating all supported release branches. Bug affects any user passing a skip_first_lines value larger than file lines — clear user-visible hang. Fix is small (3 lines per format, EOF check) and low-risk.

If this should be backported, consider adding pr-must-backport or a version-specific label (e.g. v26.3-must-backport). Ignore this if backporting is not applicable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants