Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Acero] Engine fails on long joins #36995

Closed
vkhodygo opened this issue Aug 2, 2023 · 5 comments
Closed

[C++][Acero] Engine fails on long joins #36995

vkhodygo opened this issue Aug 2, 2023 · 5 comments

Comments

@vkhodygo
Copy link

vkhodygo commented Aug 2, 2023

Describe the bug, including details regarding any error messages, version, and platform.

Tried to run some R code, and it fails with the following error:

Error in `compute.arrow_dplyr_query()`:
! Invalid: There are more than 2^32 bytes of key data.  Acero cannot process a join of this magnitude

I suppose this problem is related to the engine itself, not to R in particular. Is it possible to get a fix somehow?

Component(s)

C++

@thisisnic
Copy link
Member

Thanks for reporting this @vkhodygo. It looks like the sheer size of the join means it's not supported.

There's more information in the comment on this issue here which shows what determines the amount of key data (number of rows, number of join key columns, and data types of the join key columns).

@vkhodygo
Copy link
Author

vkhodygo commented Aug 2, 2023

@thisisnic Thanks for getting back to me. I did some reading, and it looks like a common problem. Do you think it'll be resolved any time soon?

@thisisnic
Copy link
Member

I think it would depend on #31769 being resolved, and I don't think that anyone is working on this at the moment, so I'm afraid the answer is probably "no".

@vkhodygo
Copy link
Author

vkhodygo commented Aug 3, 2023

Well, it looks like a workaround by splitting the data into batches manually is the only solid solution at the moment.

I'll keep my eye on #31769, if you think this issue should be closed please do so.

@thisisnic
Copy link
Member

I'll close this as it'll be covered in that other ticket, but if you have encounter any other problems around this, feel free to open another issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants