Replies: 1 comment
-
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello and thank you for DuckDB. I have a question about the use of OFFSET+LIMIT pagination:
I have a few dynamic queries that have WHERE conditions and an ORDER BY, that have a total count of around 1-5 million rows.
Going in LIMIT chunks of 10,000 rows, we can successfully query using OFFSET to get maybe around the first half of the results.
Any further, and the memory starts to really balloon up, eventually resulting in the container getting OOM killed (4GB - 8GB containers).
Now, I do realize that using OFFSET+LIMIT has bad performance the further the OFFSET increases. However, I expected the bad performance just to mean that the time to finish the query took longer. Not that it would use more memory.
Whatever sorting/order-by has to be done, needs to be done regardless of whether we are getting the first 10k rows or the last 10k rows.
Then to retrieve, say, the last 10k rows, duckdb should just be discarding the the first million or so rows as it scans through them, without keeping them in memory.
Like for example, if you had a table with 10 rows, and you queried for OFFSET 6 LIMIT 2, the db would order the results, then scan through until it reached the 8th row, keeping 2 rows in memory while discarding the rest as it scanned (ie, worst case, scan 2 rows, keep them, scan two more, replace the existing 2, discard the old 2, then scan 2 more, replacing those 2, etc). Naive maybe, but it also wouldn't OOM.
Can anyone tell me, is massive seemingly exponential memory use growth to be expected when using OFFSET?
Beta Was this translation helpful? Give feedback.
All reactions