-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
R: ERROR in very long vector in IN () statement #686
Comments
Confirmed, will investigate. |
@hannesmuehleisen Hello! Is any news about this issue? |
This is triggered because the SQL statement is too long, which results in a memory allocation limit being exceeded in the parser ( The limit is currently on 100MB. We could increase this limit, but I am a bit hesitant on doing that. I think having this limit there is a good thing. One issue here is that the error message is unclear/unhelpful. I will have a go at fixing that. In general if you want to perform an IN clause with hundreds of thousands of elements it is going to be much more efficient to write to a temporary table and perform the IN clause on that, or to perform the in-clause on the data of an R dataframe directly. For example: CREATE TEMPORARY TABLE temp_table AS SELECT * FROM range(212325) tbl(i);
SELECT "Species" FROM iris WHERE "Petal.Width" IN (SELECT * FROM temp_table); Here is a microbenchmark that illustrates the drastic performance difference: -- giant IN-clause
SELECT * FROM range(1000) tbl(i) WHERE i IN (1, 2, 3, 4, 5, ..., 100000);
-- temporary table
CREATE TEMPORARY TABLE temp_table AS SELECT * FROM range(100000) tbl(i);
SELECT * FROM range(1000) tbl(i) WHERE i IN (SELECT * FROM temp_table);
The time spend constructing a string from R, then parsing it in DuckDB and then handling all the symbols that come out of the parse tree becomes very significant when dealing with multi-megabyte SQL. |
…lly allocate blocks, and improve error message propagation from parser in case of exceptions
After some more investigation into other systems I have decided to remove the memory limit in the parser in #2648 after all. Neither Postgres nor SQLite has this limit and hitting this limit leads to unexpected behavior from the users' perspective. In the future we will want to integrate this with our buffer manager/memory allocator so that the memory usage by the parser can be tracked and is subject to the same memory limits as the rest of the system. In general though, the point above still holds and constructing giant |
Fix #686: remove hard-coded memory limit in parser and fix error message propagation from exceptions thrown in parser
@Mytherin Thank you so much! |
There is an error in a long vector (large than 212324 elements) in
IN ()
statementSuccessful example (vector length = 212324):
Example with error (vector length = 212325):
The text was updated successfully, but these errors were encountered: