Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for #8440 #8879

Merged
merged 1 commit into from
Sep 12, 2023
Merged

Fix for #8440 #8879

merged 1 commit into from
Sep 12, 2023

Conversation

pdet
Copy link
Contributor

@pdet pdet commented Sep 11, 2023

No description provided.

@Tishj
Copy link
Contributor

Tishj commented Sep 11, 2023

I never thought that this caused a bug but I did think it was dirty to preserve the arrow data if direct conversion could not be performed 👍

Actually, can we only populate this map inside of DirectConversion? That would solve both issues

@pdet
Copy link
Contributor Author

pdet commented Sep 11, 2023

Strings don't really go through the DirectConversion method, or do I misunderstand your comment?

@Tishj
Copy link
Contributor

Tishj commented Sep 11, 2023

Strings don't really go through the DirectConversion method, or do I misunderstand your comment?

Currently we create the AuxiliaryData to preserve the arrow array before we scan the column, we only need to preserve this arrow array though if DirectConversion was used, otherwise we will have made a copy of the input anyways.

In cases where we don't need to preserve the arrow array, we currently still do

@pdet
Copy link
Contributor Author

pdet commented Sep 11, 2023

Strings don't really go through the DirectConversion method, or do I misunderstand your comment?

Currently we create the AuxiliaryData to preserve the arrow array before we scan the column, we only need to preserve this arrow array though if DirectConversion was used, otherwise we will have made a copy of the input anyways.

In cases where we don't need to preserve the arrow array, we currently still do

That's not true for strings. I could move it to direct conversion and string conversion, but I think that's dirtier, no?

@Tishj
Copy link
Contributor

Tishj commented Sep 11, 2023

Hmm true, it's not hurting anyone but it is a little wasteful to keep this memory alive needlessly

@Mytherin Mytherin merged commit 44057aa into duckdb:main Sep 12, 2023
47 checks passed
@Mytherin
Copy link
Collaborator

Thanks!

@Tishj
Copy link
Contributor

Tishj commented Oct 25, 2023

The more I look at it I'm afraid this is a bandaid for the actual problem?

Creating the ArrowAuxiliaryData from a shared_ptr<ArrowArrayWrapper> held in the ArrowScanLocalState should have the exact same behavior as this new change.
The ArrowScanLocalState will overwrite it with a new chunk, decreasing the refcount, but we stored it inside the auxiliary data so it shouldn't go out of scope.

The only difference this makes is that the first chunk (ONLY the first chunk) is kept alive longer, for the duration that the ArrowScanLocalState stays alive.

That would indicate that the ArrowAuxiliaryData is not being kept alive long enough, which I think is the real problem


Another issue with this change is that we give ownership over the chunk to the first column, for the other columns:
arrow_data->arrow_array = scan_state.chunk->arrow_array; this will give an already released array, so there is no ownership there.

I haven't been able to produce a query that triggers this issue, but I suspect it could happen because that's definitely not intended behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants