Skip to content

ci: fix waiting for queries in stress check#82562

Merged
azat merged 1 commit intoClickHouse:masterfrom
azat:ci/stress-query-wait-fix
Jun 25, 2025
Merged

ci: fix waiting for queries in stress check#82562
azat merged 1 commit intoClickHouse:masterfrom
azat:ci/stress-query-wait-fix

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Jun 25, 2025

Right now it always fails with timeout:

   Received exception from server (version 25.7.1):
   Code: 159. DB::Exception: Received from localhost:9000. DB::Exception: Timeout exceeded: elapsed 60033.770956 ms, maximum: 60000 ms. (TIMEOUT_EXCEEDED)
   (query: SELECT sleepEachRow((
           SELECT maxOrDefault(300 - elapsed) + 1
           FROM system.processes
           WHERE query NOT LIKE '%FROM system.processes%' AND elapsed < 300
       ) / 300)
       FROM numbers(300)
       FORMAT Null
       SETTINGS function_sleep_max_microseconds_per_block = 0
       )

Rewrite it to sleep only until there are some queries

CI: https://s3.amazonaws.com/clickhouse-test-reports/json.html?
PR=82364&sha=latest&name_0=PR&name_1=Stress+test+%28amd_debug%29

Changelog category (leave one):

  • CI Fix or Improvement (changelog entry is not required)

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Jun 25, 2025

Workflow [PR], commit [0ac3781]

Summary:

job_name test_name status info comment
Stress test (amd_tsan) failure
Server died FAIL
Hung check failed, possible deadlock found (see hung_check.log) FAIL
Killed by signal (in clickhouse-server.log) FAIL
Fatal message in clickhouse-server.log (see fatal_messages.txt) FAIL
Killed by signal (output files) FAIL
Found signal in gdb.log FAIL

Right now it always fails with timeout:

   Received exception from server (version 25.7.1):
   Code: 159. DB::Exception: Received from localhost:9000. DB::Exception: Timeout exceeded: elapsed 60033.770956 ms, maximum: 60000 ms. (TIMEOUT_EXCEEDED)
   (query: SELECT sleepEachRow((
           SELECT maxOrDefault(300 - elapsed) + 1
           FROM system.processes
           WHERE query NOT LIKE '%FROM system.processes%' AND elapsed < 300
       ) / 300)
       FROM numbers(300)
       FORMAT Null
       SETTINGS function_sleep_max_microseconds_per_block = 0
       )

Rewrite it to sleep only until there are some queries
@clickhouse-gh clickhouse-gh Bot added the pr-ci label Jun 25, 2025
@azat azat force-pushed the ci/stress-query-wait-fix branch from d38c9d6 to 0ac3781 Compare June 25, 2025 11:12
@Algunenano Algunenano self-assigned this Jun 25, 2025
Copy link
Copy Markdown
Member

@Algunenano Algunenano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea why we don't KILL all queries before waiting? It'd be faster in most cases (and when it doesn't it's a misbehaviour)

@azat
Copy link
Copy Markdown
Member Author

azat commented Jun 25, 2025

Any idea why we don't KILL all queries before waiting? It'd be faster in most cases (and when it doesn't it's a misbehaviour)

No clue, but if it works now w/o hanged queries, then it make sense to preserve to make sure that all queries can be finished even after stressing

@azat
Copy link
Copy Markdown
Member Author

azat commented Jun 25, 2025

@azat azat added this pull request to the merge queue Jun 25, 2025
Merged via the queue into ClickHouse:master with commit 24b5ead Jun 25, 2025
119 of 122 checks passed
@azat azat deleted the ci/stress-query-wait-fix branch June 25, 2025 12:42
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-ci pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants