Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArrowArray::try_from_raw should not assume the pointers are from Arc #1334

Merged
merged 1 commit into from
Feb 20, 2022

Conversation

viirya
Copy link
Member

@viirya viirya commented Feb 18, 2022

Which issue does this PR close?

Closes #1333.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Feb 18, 2022
@codecov-commenter
Copy link

Codecov Report

Merging #1334 (133cb01) into master (f4c7102) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1334   +/-   ##
=======================================
  Coverage   83.03%   83.03%           
=======================================
  Files         181      181           
  Lines       52949    52951    +2     
=======================================
+ Hits        43965    43968    +3     
+ Misses       8984     8983    -1     
Impacted Files Coverage Δ
arrow/src/ffi.rs 84.61% <100.00%> (+0.08%) ⬆️
arrow/src/datatypes/field.rs 53.79% <0.00%> (-0.31%) ⬇️
arrow/src/array/transform/mod.rs 84.52% <0.00%> (+0.13%) ⬆️
parquet/src/encodings/encoding.rs 93.71% <0.00%> (+0.19%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f4c7102...133cb01. Read the comment docs.

Copy link
Member

@sunchao sunchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. An Arrow array or schema can be allocated by other languages like Java (see here), so we shouldn't assume an Arc here.

cc @jorgecarleitao @alamb

@sunchao sunchao merged commit f84c436 into apache:master Feb 20, 2022
@sunchao
Copy link
Member

sunchao commented Feb 20, 2022

Merged, thanks @viirya !

@viirya
Copy link
Member Author

viirya commented Feb 20, 2022

Thanks @sunchao !

@@ -721,9 +721,11 @@ impl ArrowArray {
.to_string(),
));
};
let ffi_array = (*array).clone();
Copy link
Contributor

@alamb alamb Feb 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not super familiar with this code but it makes sense to me. The clone here seems to just be a clone of the FFI_ArrowArray struct itself (which is some ints and pointers) which seems reasonable enough to me.

If someone seems performance issues from this code, we can always add a try_from_raw_arc or something, but this looks good to me for now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clone here seems to just be a clone of the FFI_ArrowArray struct itself (which is some ints and pointers) which seems reasonable enough to me.

That's right.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alamb

@wangfenjin
Copy link
Contributor

Guys, not sure if my understanding is right, but I think this commit will break the design and create memory leak.

If we clone the FFI struct, then it means we need to free the pointer by ourself, but if we free FFI_ArrowArray, then the data in this Array will also be free? Which means we can't free the pointer(until the data are used and ready to free, but in reality we can't hold this useless pointer in a big project for such a long time), which create memory leak.

As to the question @viirya raised in #1333 , when manage memory, the one who allocate it should free it, which means in our case, we need to alloc the struct in rust and pass the pointer to java and then also free the memory in rust.

I suggest we revert this commit. cc @alamb @sunchao

@viirya
Copy link
Member Author

viirya commented Mar 11, 2022

As to the question @viirya raised in #1333 , when manage memory, the one who allocate it should free it, which means in our case, we need to alloc the struct in rust and pass the pointer to java and then also free the memory in rust.

Simply said, I don't think this is correct. That is the whole point of C data interface. The release callback is designed to be called by not only the one allocates the C data interface structure.

@viirya
Copy link
Member Author

viirya commented Mar 11, 2022

If we clone the FFI struct, then it means we need to free the pointer by ourself, but if we free FFI_ArrowArray, then the data in this Array will also be free? Which means we can't free the pointer(until the data are used and ready to free, but in reality we can't hold this useless pointer in a big project for such a long time), which create memory leak.

As the raw pointers are converted to Arc, they will be released correctly eventually. But the data is still there, so of course you should not release the pointers.

wangfenjin added a commit to wangfenjin/arrow-rs that referenced this pull request Mar 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ArrowArray::try_from_raw should not assume the pointers are from Arc
5 participants