Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-12052: [Rust] Add Child Data to Arrow's C FFI implementation. … #9778

Closed
wants to merge 12 commits into from

Conversation

ritchie46
Copy link
Contributor

This PR adds child data to Arrow's C FFI implementation and implements it for List and LargeList datatypes.

@ritchie46 ritchie46 changed the title Add Child Data to Arrow's C FFI implementation. … ARROW-12052: [Rust] Add Child Data to Arrow's C FFI implementation. … Mar 23, 2021
@github-actions
Copy link

@alamb
Copy link
Contributor

alamb commented Mar 24, 2021

I am not an expert in this level of code -- perhaps @jorgecarleitao has time to take a look at this PR?

Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this, @ritchie46 . it has the right ideas and looks great so far. 💯

I left some comments throughout the code.

My last general comment would be to add this type to the pyarrow-integration-tests crate, which contains real tests against pyarrow, which allow us to validate the behavior against the C++ implementation.

rust/arrow/src/datatypes/field.rs Outdated Show resolved Hide resolved
rust/arrow/src/ffi.rs Outdated Show resolved Hide resolved
rust/arrow/src/ffi.rs Outdated Show resolved Hide resolved
@ritchie46
Copy link
Contributor Author

Thanks a lot for this, @ritchie46 . it has the right ideas and looks great so far. 100

I left some comments throughout the code.

My last general comment would be to add this type to the pyarrow-integration-tests crate, which contains real tests against pyarrow, which allow us to validate the behavior against the C++ implementation.

Cool! I think I've tackled all your comments @jorgecarleitao . I also added a test to the pyarrow-integration-tests-crate. That was a good one, because it did not succeed.

It turns out that we need to provide a name in the FFI_ArrowSchema when we provide child data. This was a null ptr, so I replaced this with an empty "" string, that seemed easiest to me. If you want to tackle that differently, I am open to suggestions.

Copy link
Contributor

@nevi-me nevi-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments, I'm happy with the implementation though. Thanks @ritchie46 , and I apologise for taking long to review this.

rust/arrow/src/ffi.rs Outdated Show resolved Hide resolved
rust/arrow-pyarrow-integration-testing/tests/test_sql.py Outdated Show resolved Hide resolved
rust/arrow/src/ffi.rs Outdated Show resolved Hide resolved
rust/arrow/src/ffi.rs Show resolved Hide resolved
@ritchie46
Copy link
Contributor Author

Some minor comments, I'm happy with the implementation though. Thanks @ritchie46 , and I apologise for taking long to review this.

No worries! 😄

@codecov-io
Copy link

Codecov Report

Merging #9778 (ba51b55) into master (29feea0) will decrease coverage by 0.16%.
The diff coverage is 77.71%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9778      +/-   ##
==========================================
- Coverage   82.59%   82.43%   -0.17%     
==========================================
  Files         248      252       +4     
  Lines       58294    59024     +730     
==========================================
+ Hits        48149    48655     +506     
- Misses      10145    10369     +224     
Impacted Files Coverage Δ
rust/arrow-pyarrow-integration-testing/src/lib.rs 0.00% <ø> (ø)
rust/arrow/src/compute/kernels/sort.rs 94.37% <ø> (+0.80%) ⬆️
rust/arrow/src/compute/kernels/substring.rs 98.29% <ø> (ø)
rust/arrow/src/compute/kernels/take.rs 96.06% <ø> (-0.01%) ⬇️
rust/arrow/src/compute/kernels/window.rs 100.00% <ø> (ø)
rust/arrow/src/compute/kernels/zip.rs 82.14% <ø> (ø)
rust/arrow/src/compute/util.rs 98.92% <ø> (ø)
rust/arrow/src/datatypes/field.rs 55.47% <ø> (ø)
rust/arrow/src/ffi.rs 80.23% <ø> (+4.00%) ⬆️
rust/arrow/src/ipc/reader.rs 84.36% <ø> (ø)
... and 90 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ee24399...ba51b55. Read the comment docs.

@ritchie46
Copy link
Contributor Author

@nevi-me there are some new clippy warnings due to new Rust version not related to this PR. Do they need to be fixed?

@nevi-me
Copy link
Contributor

nevi-me commented Mar 28, 2021

@nevi-me there are some new clippy warnings due to new Rust version not related to this PR. Do they need to be fixed?

@ritchie46 I fixed them last night (depending on where in the world one is lol). CI's fine now. The integration failures are known issues at ARROW-12112

Copy link
Contributor

@nevi-me nevi-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from looking at what you're doing to construct the lists.

@jorgecarleitao can do the honours of the PR blessing approval :)

@nevi-me
Copy link
Contributor

nevi-me commented Mar 28, 2021

It doesn't look like anything needs to be updated in docs/source/status.rst

@alamb
Copy link
Contributor

alamb commented Apr 1, 2021

@ritchie46 / @jorgecarleitao / @nevi-me is this one ready to go? There is one smally clippy lint left which I can fixup but I didn't want to ram this PR through to keep the queue down if it wasn't actually ready

Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks a lot @ritchie46 for taking the extra mile to have the equality done right :) 💯

Also, thanks a lot @pitrou for your help here. 👍

@ritchie46
Copy link
Contributor Author

@alamb I fixed the clippy issue, so I think we're good to go. :)

@alamb
Copy link
Contributor

alamb commented Apr 2, 2021

Integration test failure seems due to https://github.com/apache/arrow/pull/9778/checks?check_run_id=2253660088 (not related to this PR):

-----------------------
/
/arrow/js /
npm WARN tar ENOSPC: no space left on device, write
npm WARN tar ENOSPC: no space left on device, write
npm ERR! cb() never called!

@alamb alamb closed this in 2f3ed3a Apr 2, 2021
@alamb
Copy link
Contributor

alamb commented Apr 2, 2021

Thanks again @ritchie46 👍

@ritchie46
Copy link
Contributor Author

ritchie46 commented Apr 4, 2021

Hmm.. This is sadly a bit too late. But the current implementation does an invalid read/write. I get a SIGILL if I run this test a 1000 times.

Current thread 0x00007f3bd56f4740 (most recent call first):
  File "/home/ritchie46/code/arrow/rust/arrow-pyarrow-integration-testing/tests/test_sql.py", line 89 in test_list_array
  File "/opt/miniconda3/lib/python3.8/unittest/case.py", line 633 in _callTestMethod
  File "/opt/miniconda3/lib/python3.8/unittest/case.py", line 676 in run
  File "/opt/miniconda3/lib/python3.8/unittest/case.py", line 736 in __call__
  File "/opt/miniconda3/lib/python3.8/site-packages/_pytest/unittest.py", line 207 in runtest
  File "/opt/miniconda3/lib/python3.8/site-packages/_pytest/runner.py", line 131 in pytest_runtest_call
  File "/opt/miniconda3/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/opt/miniconda3/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/opt/miniconda3/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/opt/miniconda3/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/opt/miniconda3/lib/python3.8/site-packages/_pytest/runner.py", line 207 in <lambda>
  File "/opt/miniconda3/lib/python3.8/site-packages/_pytest/runner.py", line 234 in from_call
  File "/opt/miniconda3/lib/python3.8/site-packages/_pytest/runner.py", line 206 in call_runtest_hook
  File "/opt/miniconda3/lib/python3.8/site-packages/_pytest/runner.py", line 182 in call_and_report
  File "/opt/miniconda3/lib/python3.8/site-packages/_pytest/runner.py", line 96 in runtestprotocol
  File "/opt/miniconda3/lib/python3.8/site-packages/_pytest/runner.py", line 81 in pytest_runtest_protocol
  File "/opt/miniconda3/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/opt/miniconda3/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/opt/miniconda3/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/opt/miniconda3/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/opt/miniconda3/lib/python3.8/site-packages/_pytest/main.py", line 270 in pytest_runtestloop
  File "/opt/miniconda3/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/opt/miniconda3/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/opt/miniconda3/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/opt/miniconda3/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/opt/miniconda3/lib/python3.8/site-packages/_pytest/main.py", line 246 in _main
  File "/opt/miniconda3/lib/python3.8/site-packages/_pytest/main.py", line 196 in wrap_session
  File "/opt/miniconda3/lib/python3.8/site-packages/_pytest/main.py", line 239 in pytest_cmdline_main
  File "/opt/miniconda3/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/opt/miniconda3/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/opt/miniconda3/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/opt/miniconda3/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/opt/miniconda3/lib/python3.8/site-packages/_pytest/config/__init__.py", line 91 in main
  File "/opt/miniconda3/bin/pytest", line 8 in <module>
Illegal instruction (core dumped)

I hope I can find the location of this invalid read/write. If anybody has an idea of where did could occur that would be very much appreciated.

Some additional info/ thoughts:

  • I cannot reproduce this in the rust tests. Is there a way to interact with C++ arrow from rust? Using valgrind on python creates a lot of noise from the python VM.

  • the location of SIGILL, during the second traversal of create children (called by the children arrays), the Arc::clone of self.array leads to the SIGILL.

@ritchie46
Copy link
Contributor Author

ritchie46 commented Apr 5, 2021

If I prevent the drop in release array, this issue is resolved but we leak data.

TBH, I am stuck. @pitrou @jorgecarleitao have you got any idea how this can be resolved?

could this be related?

@jorgecarleitao
Copy link
Member

I am really sorry, this was sloppiness on my part: I should have checkout the code and go through it more carefully as FFI is always risky stuff. If you think it would take some pressure off, we can revert this PR until we find and fix this.

Regardless, could you run the memory-check to see if we find the problem in our internal roundtrips? Something like

cargo test --lib --feature memory-check -- --test-threads=1

on the rust crate. This counts every alloc/realloc/dealloc over all buffers over all tests and verifies that the sum is zero. The test-threads must be 1 so that tests run sequentially and the last test is the memory check.

I would try to run this before this PR's commit just to make sure that things work as expected since we do not run this as part of our CI. If it passes, then I would try again after this PR's commit.

@pitrou
Copy link
Member

pitrou commented Apr 5, 2021

@ritchie46 The guidelines for implementers of a release callback here. I would suggest following the example:
https://arrow.apache.org/docs/format/CDataInterface.html#release-callback-semantics-for-producers

@ritchie46
Copy link
Contributor Author

@ritchie46 The guidelines for implementers of a release callback here. I would suggest following the example:
https://arrow.apache.org/docs/format/CDataInterface.html#release-callback-semantics-for-producers

Yes, thank you. I will be going through that.

@ritchie46
Copy link
Contributor Author

I am really sorry, this was sloppiness on my part: I should have checkout the code and go through it more carefully as FFI is always risky stuff. If you think it would take some pressure off, we can revert this PR until we find and fix this.

Yes, in that case we panic instead of UB, which cleary is better.

Regardless, could you run the memory-check to see if we find the problem in our internal roundtrips? Something like

Will do that.

Some extra info: I realize that we don't have any owned child data in private_data so, maybe the child_data is already dropped.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this seems a bit delicate to get right, I think you should first add tests for roundtripping schemas:

  • roundtrip a schema as is, Rust->Python->Rust
  • roundtrip a schema as is, Python->Rust->Python
  • create a primitive type in Rust, return pa.list(primitive_type) from Python
  • create a list type in Rust, return list_type.value_type from Python

For each case, verify the expected result, also check for allocation/deallocation/leaks.

Once you got that right, you can tackle the array roundtrip issue.

.child_data()
.iter()
.map(|arr| {
let len = arr.len();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand: why isn't try_from called recursively?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because I otherwise cannot pass nullable: bool information from the parent. If should split this up in a function separate from try_from to make this more explicit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, you'll need to handle recursive types more generally anyway. Think list(list(int8)).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would not work like this. I will fix that.

a = pyarrow.array([[], None, [1, 2], [4, 5, 6]], pyarrow.list_(pyarrow.int64()))
b = arrow_pyarrow_integration_testing.round_trip(a)

b.validate(full=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be interesting to add del a above this, to make sure that b keeps the data alive.

// <https://arrow.apache.org/docs/format/CDataInterface.html#c.ArrowSchema>
FFI_ArrowSchema {
format: CString::new(format).unwrap().into_raw(),
name: std::ptr::null_mut(),
// For child data a non null string is expected and is called item
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only for lists, though. For more general nested types you'll have to take the actual field name.

rust/arrow/src/ffi.rs Show resolved Hide resolved
// at that point the child data is not yet known, but it is also not required to determine
// the buffer length of the list arrays.
"+l" => {
let nullable = schema.flags == 2;
Copy link
Member

@pitrou pitrou Apr 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be (schema.flags & 2) != 0. Also I'm surprised you're sprinkling magic numbers in the code instead of defining a constant.

// Safety
// Should be set as this is expected from the C FFI definition
debug_assert!(!schema.name.is_null());
let name = unsafe { CString::from_raw(schema.name as *mut c_char) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right. The doc for from_raw says:

This should only ever be called with a pointer that was earlier obtained by calling CString::into_raw. Other usage (e.g., trying to take ownership of a string that was allocated by foreign code) is likely to lead to undefined behavior or allocator corruption.

But we are exactly in the case where schema.name can have been allocated by C++ or Python or anything else. It seems instead you should use CStr instead: "Representation of a borrowed C string".

(DataType::Utf8, 2) | (DataType::Binary, 2) => size_of::<u8>() * 8,
(DataType::Utf8, _) | (DataType::Binary, _) => {
(DataType::Utf8, 1) | (DataType::Binary, 1) | (DataType::List(_), 1) => size_of::<i32>() * 8,
(DataType::Utf8, 2) | (DataType::Binary, 2) | (DataType::List(_), 2) => size_of::<u8>() * 8,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lists don't have a buffer number 2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. I thought it was a validity bitmap and an offset buffer and that the child data was counted differently.

What should be the correct number?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validity bitmap is buffer 0 and the offsets are buffer 1. You are defining a buffer 2 (of u8) which doesn't exist.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validity is the buffer 0, the offsets the buffer 1. The List has no buffer number two. If someone requests buffer 2 from a List Array, we should error instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, Thanks!

rust/arrow/src/ffi.rs Show resolved Hide resolved
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
This PR adds child data to Arrow's C FFI implementation and implements it for `List` and `LargeList` datatypes.

Closes apache#9778 from ritchie46/ffi_types

Authored-by: Ritchie Vink <ritchie46@gmail.com>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
michalursa pushed a commit to michalursa/arrow that referenced this pull request Jun 10, 2021
This PR adds child data to Arrow's C FFI implementation and implements it for `List` and `LargeList` datatypes.

Closes apache#9778 from ritchie46/ffi_types

Authored-by: Ritchie Vink <ritchie46@gmail.com>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
michalursa pushed a commit to michalursa/arrow that referenced this pull request Jun 13, 2021
This PR adds child data to Arrow's C FFI implementation and implements it for `List` and `LargeList` datatypes.

Closes apache#9778 from ritchie46/ffi_types

Authored-by: Ritchie Vink <ritchie46@gmail.com>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants