Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-14374: [Java] Integration tests for the C data Interface implementation for Java #11543

Conversation

tomersolomon1
Copy link
Contributor

@tomersolomon1 tomersolomon1 commented Oct 26, 2021

Integration tests for the C data Interface implementation for Java.

…entation for Java

Lead-authored-by: tomersolomon1 <tomer.solomon1@gmail.com>
Co-authored-by: tomersolomon1 <tomer.solomon1@gmail.com>
Co-authored-by: roee88 <roee88@gmail.com>
Signed-off-by: tomersolomon1 <tomer.solomon1@gmail.com>
@github-actions
Copy link

@github-actions
Copy link

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

- ${DOCKER_VOLUME_PREFIX}debian-ccache:/ccache:delegated
command:
[ "/arrow/ci/scripts/cpp_build.sh /arrow /build &&
/arrow/ci/scripts/python_build.sh /arrow /build &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to build Arrow C++ and PyArrow from scratch. You should be able to just pip install a recent binary wheel. Did you try to do that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pitrou the reason was to do the integration test against the code in the main branch for cases where related fixes are made to the python/C++ implementation. It seems like this is what conda-python-jpype does. If undesired then we can switch to just pip install (which was @tomersolomon1's original intent, so my bad)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you prefer, though this certainly makes the CI and docker run slower than installing a binary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I prefer to keep it this way b/c of the reasons @roee88 mentioned, and also this lowers the risk of introducing new inconsistencies between the implementations.

@pitrou
Copy link
Member

pitrou commented Nov 3, 2021

@amol-
Copy link
Member

amol- commented Nov 4, 2021

@amol- This should interest you, especially the bridge here: https://github.com/apache/arrow/pull/11543/files#diff-386b108c1492f76426f3ea17399c681593a8cc86b34daeb1b451d4da600863e8R56

Yes, I was reading it, there are some useful helpers. Some of those should probably live in pyarrow.jvm so that you have to deal with the whole ffi.* machinery manually? Also those helpers could probably take care of ffi.release for you using context managers.

def __init__(self):
self.allocator = jpype.JPackage(
"org").apache.arrow.memory.RootAllocator(sys.maxsize)
self.jc = jpype.JPackage("org").apache.arrow.c
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestion, I feel it might help us when we will come back to this codebase in 6 months to be a bit more explicit about when we are invoking Python things and when we are invoking Java things. Given that both of them appear as Python codebase as jpype is proxying Java calls too it's not as explicit as it could be.

Probably naming self.java_allocator and self.java_c would make a bit more obvious that we are dealing with Java methods there and not with Python things.

Signed-off-by: tomersolomon1 <tomer.solomon1@gmail.com>
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @roee88 Do you want to add anything?

Copy link
Contributor

@roee88 roee88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Eventually this could reuse code from ARROW-14319 but there is no agreement on how a new jvm module should look like.


python integration_tests.py

popd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting (missing new line at end of file)


if __name__ == '__main__':
setup_jvm()
unittest.main(verbosity=2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same formatting issue here

Signed-off-by: tomersolomon1 <tomer.solomon1@gmail.com>
@pitrou pitrou closed this in bb776d8 Nov 10, 2021
@ursabot
Copy link

ursabot commented Nov 10, 2021

Benchmark runs are scheduled for baseline = 3c1f702 and contender = bb776d8. bb776d8 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.22% ⬆️0.0%] ursa-thinkcentre-m75q
Supported benchmarks:
ursa-i9-9960x: langs = Python, R, JavaScript
ursa-thinkcentre-m75q: langs = C++, Java
ec2-t3-xlarge-us-east-2: cloud = True

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants