Re-instate failing fast on row limits#62804
Re-instate failing fast on row limits#62804alexey-milovidov merged 1 commit intoClickHouse:masterfrom
Conversation
5687a03 to
80b8cce
Compare
ba73a6c to
949f63e
Compare
949f63e to
37ed8ee
Compare
|
Workflow [PR], commit [96ee1c5] Summary: ❌
|
37ed8ee to
c65a507
Compare
|
@alexey-milovidov Hey! I'm so sorry for the delay with updating this one to get projection tests passing. I've tested changes I've just pushed against the master branch, and confirmed that projections now handle this optimisation correctly. All projection tests now also pass. PTAL if you have the time, or let me know if there is someone else we can speak with. Thank you as always! |
|
This is great! Although I don't like catching exceptions (during normal control flow). How practical will it be to remove try/catch in this case? |
c65a507 to
8dcc873
Compare
Thanks Alexey! This is a fair point and I went back and forth in between using exceptions, and finding a way of marking an The main reason why I went for exceptions is I was worried about other use cases being introduced calling I will try again though to see if I can find a slightly different approach that allows us to avoid exception handling. Thank you for this feedback. |
8dcc873 to
053a76f
Compare
3b16b4d to
6d8e619
Compare
6204d2d to
c25e3b0
Compare
|
@alexey-milovidov thanks again for the feedback. I've removed the use of exceptions in projections code as part of normal control flow by marking an All projection and row limit tests still pass when running against master with these changes. PTAL 🙏 |
a2f2c57 to
e3e4422
Compare
|
@alexey-milovidov all tests have passed with the exception of 1 test type that is the source of all failures: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=62804&sha=e3e4422488c81fc9463242b82018a858c29f0395&name_0=PR&name_1=BuzzHouse%20%28amd_debug%29 It looks unrelated, and I can see it on some other PRs too. Is it OK getting some help to get this merged please? |
be7fbc6 to
4695552
Compare
|
Hey @alexey-milovidov , sorry to ping you again, but just wondered if you were happy with this change now or you wished for me to check anything else? I'm still only seeing 2 unrelated test failures after rebasing. |
In v22.3 We evaluated both max_rows_to_read and max_rows_to_read_leaf in MergeTreeDataSelectExecutor code when scanning part ranges. This meant the moment we exceeded rows counts when processing ranges, we return an error to clients to save them waiting for queries to finish that are going to fail anyway. This was removed in: ClickHouse@f524dae It was removed as projection code also looks at ranges and the total number of mark files that would be used for an "ordinary" query as well as all projections. So even though query analysis results in us using the a projection, we hit the memory limits when enumerating mark files and saw exceptions. This re-adds limits back alongside making them compatible with projections
4695552 to
96ee1c5
Compare
|
I'm happy with this change, already reviewed the code... |
|
Thanks very much, @alexey-milovidov ! |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fail fast when queries reach row limits. Resolves #61872
Documentation entry for user-facing changes
No documentation required
Details
In ClickHouse <= v23.3, we evaluated the row limit settings max_rows_to_read and max_rows_to_read_leaf in MergeTreeDataSelectExecutor code when processing parts. This worked by using a row counter that was updated as threads called process_part. This optimization was added here: #13677
The benefit of this optimization meant that if users execute a query that is particularly expensive, such as a large volume of part processing or range scans, we would fail almost immediately (in the order of hundreds of milliseconds) rather than taking a significant amount of time (potentially minutes) for the user to be presented with a "Row limit exceeded" exception anyway.
So it gave users feedback very quickly, and saved ClickHouse from doing a lot of wasted work.
The optimization was removed here as part of work on projections.
I believe it was removed as projection code also looks at ranges and the total number of mark files that would be used for an "ordinary" query as well as all projections. So even though projection analysis could result in ClickHouse selecting a projection that uses less marks, we still do the range scan of the "ordinary" parts, which meant projection code saw exceptions.
The consequence of this optimization not being in place can be observed on the below ReplicatedMergeTree table that contains ~870 billion rows:
Note the execution time of 84 seconds and number of current rows processed.
Here's a flamegraph showing where we spent most of the time:
As we can see, we spend most of the time doing range scans via MergeTreeDataSelectExecutor::filterPartsByPrimaryKeyAndSkipIndexes
With the changes in this PR applied on our clusters, here's the same query that I executed before that took 87 seconds:
This PR also includes an optimization for projections that utilize this "failing fast" on row limits.
When we do analysis for projections, I noticed that they could benefit from failing quickly on these row limits too. For example, I noticed that we do a full range scan on the ordinary table, even if we end up determining a projection uses less marks and using it.
So I've included a change that means that if we encounter row limits whilst doing a range scan on the ordinary table, we dismiss that as a candidate to use. This should be safe, as if it was the candidate that selects less data, the analyze for other projections will also hit row limits and return an error to the client. Whereas if you hit row limits on the ordinary table, we dismiss it and move onto projections in the order of milliseconds with queries that touch a lot of data, so projections should be significantly faster if users would hit row limits on the ordinary table.
In all honesty, we (Cloudflare) don't use projections in production, so it would be great to get some opinions on this to make sure I haven't missed anything else.