Describe what's wrong
In ClickHouse versions <= 23.3, there was an optimisation when reading from MergeTree tables that evaluated settings max_rows_to_read and max_rows_to_read_leaf very early on in the query pipeline when doing a range scan and part processing.
This optimisation was added here: #13677
The positive impact of this meant that if users tried to select data in the order of billions of rows, queries would fail very fast if they would exceed row limits. (in the order of milliseconds).
This code was MergeTreeDataSelectExecutor.cpp was removed here (February 2023): (f524dae#diff-a2e0909d75f772d3b9b704923917d686f10aa2167547e7ce8c5cec5f54c11f5d)
This means that in ClickHouse versions >= v23.3, we see queries taking seconds (or longer) before we return memory limit exceptions.
The time which we evaluate limits look to be on an AnalysisResult https://github.com/ClickHouse/ClickHouse/blob/24.2/src/Processors/QueryPlan/ReadFromMergeTree.cpp#L1883
ReadFromMergeTree::AnalysisResult ReadFromMergeTree::getAnalysisResult() const
{
auto result_ptr = analyzed_result_ptr ? analyzed_result_ptr : selectRangesToRead(prepared_parts);
if (std::holds_alternative<std::exception_ptr>(result_ptr->result))
std::rethrow_exception(std::get<std::exception_ptr>(result_ptr->result));
return std::get<AnalysisResult>(result_ptr->result);
}
Looking at this so far, it seems like obtaining this result over a large number of parts takes significantly longer as it has to part processing for all ranges that need to be read before it returns the result.
Means of reproducing:
If you run any query that would select billions (or trillions) or rows without memory limits kicking in but then set max_rows_to_read to a value like 1 million:
SELECT
(intDiv(toUInt32(timestamp), 900) * 900) * 1000 AS t,
sum(_sample_interval) / 900,
foo
FROM default.example_table
WHERE timestamp >= toDateTime(1710759116)
GROUP BY
t,
foo
ORDER BY t ASC
SETTINGS max_rows_to_read = 10000000
We consistently see queries take >= 5 seconds before it returns exceptions:
0 rows in set. Elapsed: 5.114 sec.
Received exception from server (version 23.3.9):
Code: 158. DB::Exception: Received from hostname:port. DB::Exception: Received from hostname:port. DB::Exception: Limit for rows (controlled by 'max_rows_to_read' setting) exceeded, max rows: 10.00 million, current rows: 180.02 billion. (TOO_MANY_ROWS)
Does it reproduce on the most recent release?
Yes.
We're happy to re-instate this with some help and feedback, as it's important to us as we use a "ladder" strategy to step down a set of sampled tables of varying resolutions.
Thanks as always for the amazing work and help!
Describe what's wrong
In ClickHouse versions <= 23.3, there was an optimisation when reading from MergeTree tables that evaluated settings max_rows_to_read and max_rows_to_read_leaf very early on in the query pipeline when doing a range scan and part processing.
This optimisation was added here: #13677
The positive impact of this meant that if users tried to select data in the order of billions of rows, queries would fail very fast if they would exceed row limits. (in the order of milliseconds).
This code was
MergeTreeDataSelectExecutor.cppwas removed here (February 2023): (f524dae#diff-a2e0909d75f772d3b9b704923917d686f10aa2167547e7ce8c5cec5f54c11f5d)This means that in ClickHouse versions >= v23.3, we see queries taking seconds (or longer) before we return memory limit exceptions.
The time which we evaluate limits look to be on an
AnalysisResulthttps://github.com/ClickHouse/ClickHouse/blob/24.2/src/Processors/QueryPlan/ReadFromMergeTree.cpp#L1883Looking at this so far, it seems like obtaining this result over a large number of parts takes significantly longer as it has to part processing for all ranges that need to be read before it returns the result.
Means of reproducing:
If you run any query that would select billions (or trillions) or rows without memory limits kicking in but then set
max_rows_to_readto a value like 1 million:We consistently see queries take >= 5 seconds before it returns exceptions:
Does it reproduce on the most recent release?
Yes.
We're happy to re-instate this with some help and feedback, as it's important to us as we use a "ladder" strategy to step down a set of sampled tables of varying resolutions.
Thanks as always for the amazing work and help!