Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C stack usage is too close to the limit with an Arrow source and parallelism #2100

Closed
jonkeane opened this issue Aug 4, 2021 · 4 comments
Closed

Comments

@jonkeane
Copy link
Contributor

jonkeane commented Aug 4, 2021

We first noticed this in Arrow's CI after merging our DuckDB integration with Error: INTERNAL Error: Failed to eval R expression Error: C stack usage 768360237712 is too close to the limit.

I've since been able to reproduce this locally on my Mac as well (6 core, 12 threads intel):

> library(duckdb)
> library(arrow)
>
> con <- DBI::dbConnect(duckdb::duckdb())
> 
> ds <- arrow::InMemoryDataset$create(mtcars)
> duckdb::duckdb_register_arrow(con, "mtcars_arrow", ds)
> # Using all available threads by default, but any number greater than 1 shows 
> # this eventually, the larger the number the more likely to get the error.
> DBI::dbExecute(con, paste0("PRAGMA threads=", arrow::cpu_count()))
[1] NA
> 
> DBI::dbGetQuery(con, "SELECT cyl, COUNT(mpg) FROM mtcars_arrow GROUP BY cyl")
Error: C stack usage  17587177494784 is too close to the limit
Error in duckdb_execute(res) : duckdb_execute_R: Failed to run query
Error: INTERNAL Error: Failed to eval R expression Error: C stack usage  17587177494784 is too close to the limit
> DBI::dbGetQuery(con, "SELECT cyl, SUM(mpg) FROM mtcars_arrow GROUP BY cyl")
Error: C stack usage  17587177494784 is too close to the limit
Error: C stack usage  17587179641088 is too close to the limit
Error: C stack usage  17587180177664 is too close to the limit
Error in duckdb_execute(res) : duckdb_execute_R: Failed to run query
Error: INTERNAL Error: Failed to eval R expression Error: C stack usage  17587177494784 is too close to the limit
> DBI::dbGetQuery(con, "SELECT cyl, AVG(mpg) FROM mtcars_arrow GROUP BY cyl")
Error: C stack usage  17587178567936 is too close to the limit
Error: C stack usage  17587180177664 is too close to the limit
Error: C stack usage  17587179104512 is too close to the limit
Error: C stack usage  17587179641088 is too close to the limit
Error in duckdb_execute(res) : duckdb_execute_R: Failed to run query
Error: INTERNAL Error: Failed to eval R expression Error: C stack usage  17587178567936 is too close to the limit

This happens when using the Arrow wrapper to_duckdb() as well, but was able to reproduce it using duckdb_refister_arrow directly so I don't think there's anything going on in the wrapper that is causing it.

I also tried creating a fully instantiated table with DBI::dbWriteTable(con, "mtcars_table", mtcars) and that works just fine with the same queries, so it looks like it's something related to the registered source that's causing this to pop up.

@hannes
Copy link
Member

hannes commented Aug 4, 2021

This is odd but probably something benign.

@pdet
Copy link
Contributor

pdet commented Aug 5, 2021

Ha, I just came across this on #2077
Will have a look.

pdet added a commit to pdet/duckdb that referenced this issue Aug 6, 2021
@Mytherin
Copy link
Collaborator

Mytherin commented Aug 6, 2021

Should be fixed by #2077

@Mytherin Mytherin closed this as completed Aug 6, 2021
@david-cortes
Copy link
Contributor

I'm getting this same error with data.table objects. I haven't been able to create a minimal reproducer, but I think it seems to be connected with certain column types that come from converting from a type that was initially arrow.

The error tends to happen at random though, half of the times the query succeeds, and half of the times it tries to allocate several GB on the stack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants