New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-16340: [C++][Python] Move all Python related code into PyArrow #13311
Conversation
|
arrow_find_package(ARROW_PYTHON | ||
"${ARROW_HOME}" | ||
"${CPYARROW_HOME}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This smells a bit hackish. Feels a bit like we are building two totally different projects.
On long term we should probably have libarrow_python.so
just be one of the shared objects constituting pyarrow. libarrow_python.so
should probably be possible to integrate with the rest of the built files using add_subdirectory
or something equivalent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On long term we should probably have libarrow_python.so just be one of the shared objects constituting pyarrow.
That's actually what is already happening, I think. The libarrow_python.so gets copied into the pyarrow directory (either the repo itself for an inplace build, or the build directory, before that one gets copied to site-packages for a normal install).
But so for building the cython extensions, we need to point to where libarrow_python.so can be found (which is now a different location as where libarrow.so can be found). And doing that with a argument to cmake seems a good way to do that? (alternatively we might need to edit the arrow_find_package
to work with this new situation?)
abbdece
to
b2f150c
Compare
@kou there is one more issue I am struggling with and would be happy to get your opinion. On the CI I get linker errors for python flight module (not locally on M1): I am facing similar issue locally when trying to build the tests for C++ part of PyArrow with GTest:
What could be the setup work I am missing so that the libraries would be linked correctly? |
It seems that this link is wrong. (This doesn't have the error message.)
We need |
Maybe this way would be easier: Error with make build of C PyArrow
|
We need |
Thank you @kou! |
I would like to put this PR ready for review. There are still two CI failures I am investigating (see ⬇️ ) plus
|
Hmm, we should try to make sure the Python Flight bindings can compile without including some internal headers. |
Are you thinking in terms of this PR or in general? |
I mean in general, though that could be done as part of this PR (unless I'm missing something that makes internal headers mandatory here, but that would be surprising). |
I think it should be easy enough to remove the need for |
Thanks for the comments! |
Is there something similar I can do with this part: arrow/cpp/src/arrow/python/flight.cc Lines 385 to 393 in 38918ef
that also uses I have successfully changed the construction of |
Hmm, looks like we need to add a factory method for |
I have rebased the PR and corrected the |
Yes. Thanks. |
…apache-arrow/yum/*/Dockerfile
@github-actions crossbow submit -g nightly-tests -g nightly-packaging -g nightly-release |
Revision: 0860306 Submitted crossbow builds: ursacomputing/crossbow @ actions-58857300ac |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
The centos-9-stream-amd64 failure is a new failure but I think that it's not related to this pull request. We can fix it as a follow-up task.
https://github.com/ursacomputing/crossbow/runs/8027904112?check_suite_focus=true#step:6:4839
FAILED: gandiva-glib/Gandiva-1.0.gir
/usr/bin/meson --internal exe --unpickle /build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/build/meson-private/meson_exe_g-ir-scanner_f62efc76b62635482d998451ca9762c4e34e9776.dat
while executing ['/usr/bin/g-ir-scanner', '--no-libtool', '--namespace=Gandiva', '--nsversion=1.0', '--warn-all', '--output', 'gandiva-glib/Gandiva-1.0.gir', '--c-include=gandiva-glib/gandiva-glib.h', '--warn-all', '--include-uninstalled=./arrow-glib/Arrow-1.0.gir', '-I/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/gandiva-glib', '-I/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/build/gandiva-glib', '-I/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/.', '-I/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/build/.', '-I/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/../cpp/redhat-linux-build/src', '-I/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/build/../cpp/redhat-linux-build/src', '-I/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/../cpp/src', '-I/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/build/../cpp/src', '-I/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/.', '-I/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/build/.', '-I/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/../cpp/redhat-linux-build/src', '-I/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/build/../cpp/redhat-linux-build/src', '-I/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/../cpp/src', '-I/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/build/../cpp/src', '--filelist=/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/build/gandiva-glib/libgandiva-glib.so.1000.0.0.p/Gandiva_1.0_gir_filelist', '--include=Arrow-1.0', '--symbol-prefix=ggandiva', '--identifier-prefix=GGandiva', '--pkg-export=gandiva-glib', '--cflags-begin', '-I/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/.', '-I/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/build/.', '-I/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/../cpp/redhat-linux-build/src', '-I/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/build/../cpp/redhat-linux-build/src', '-I/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/../cpp/src', '-I/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/build/../cpp/src', '-I/usr/include/glib-2.0', '-I/usr/lib64/glib-2.0/include', '-I/usr/include/sysprof-4', '-I/usr/include/gobject-introspection-1.0', '--cflags-end', '--add-include-path=/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/build/arrow-glib', '--add-include-path=/usr/share/gir-1.0', '-L/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/build/gandiva-glib', '--library', 'gandiva-glib', '-L/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/build/arrow-glib', '-L/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/build/../../cpp/redhat-linux-build/release', '--extra-library=gobject-2.0', '--extra-library=glib-2.0', '--extra-library=girepository-1.0', '--sources-top-dirs', '/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/subprojects/', '--sources-top-dirs', '/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/build/subprojects/', '--warn-error']
--- stdout ---
Package arrow was not found in the pkg-config search path.
Perhaps you should add the directory containing `arrow.pc'
to the PKG_CONFIG_PATH environment variable
Package 'arrow', required by 'arrow-glib-uninstalled', not found
g-ir-scanner: link: gcc -o /build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/build/tmp-introspecttrutbm6g/Gandiva-1.0 -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -march=x86-64-v2 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection /build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/build/tmp-introspecttrutbm6g/Gandiva-1.0.o -L. -Wl,-rpath,. -Wl,--no-as-needed -L/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/build/gandiva-glib -Wl,-rpath,/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/build/gandiva-glib -L/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/build/arrow-glib -Wl,-rpath,/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/build/arrow-glib -L/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/build/../../cpp/redhat-linux-build/release -Wl,-rpath,/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/build/../../cpp/redhat-linux-build/release -lgandiva-glib -lgobject-2.0 -lglib-2.0 -lgirepository-1.0 -lgio-2.0 -lgobject-2.0 -Wl,--export-dynamic -lgmodule-2.0 -pthread -lglib-2.0 -lglib-2.0 -Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1
--- stderr ---
/usr/bin/ld: /build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/build/../../cpp/redhat-linux-build/release/libgandiva.so.1000: undefined reference to `std::__glibcxx_assert_fail(char const*, int, char const*, char const*)'
collect2: error: ld returned 1 exit status
linking of temporary binary failed: Command '['gcc', '-o', '/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/build/tmp-introspecttrutbm6g/Gandiva-1.0', '-O2', '-flto=auto', '-ffat-lto-objects', '-fexceptions', '-g', '-grecord-gcc-switches', '-pipe', '-Wall', '-Werror=format-security', '-Wp,-D_FORTIFY_SOURCE=2', '-Wp,-D_GLIBCXX_ASSERTIONS', '-specs=/usr/lib/rpm/redhat/redhat-hardened-cc1', '-fstack-protector-strong', '-specs=/usr/lib/rpm/redhat/redhat-annobin-cc1', '-m64', '-march=x86-64-v2', '-mtune=generic', '-fasynchronous-unwind-tables', '-fstack-clash-protection', '-fcf-protection', '/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/build/tmp-introspecttrutbm6g/Gandiva-1.0.o', '-L.', '-Wl,-rpath,.', '-Wl,--no-as-needed', '-L/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/build/gandiva-glib', '-Wl,-rpath,/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/build/gandiva-glib', '-L/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/build/arrow-glib', '-Wl,-rpath,/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/build/arrow-glib', '-L/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/build/../../cpp/redhat-linux-build/release', '-Wl,-rpath,/build/rpmbuild/BUILD/apache-arrow-9.0.0.dev783/c_glib/build/../../cpp/redhat-linux-build/release', '-lgandiva-glib', '-lgobject-2.0', '-lglib-2.0', '-lgirepository-1.0', '-lgio-2.0', '-lgobject-2.0', '-Wl,--export-dynamic', '-lgmodule-2.0', '-pthread', '-lglib-2.0', '-lglib-2.0', '-Wl,-z,relro', '-Wl,--as-needed', '-Wl,-z,now', '-specs=/usr/lib/rpm/redhat/redhat-hardened-ld', '-specs=/usr/lib/rpm/redhat/redhat-annobin-cc1']' returned non-zero exit status 1.
Could you update the description of this pull request before we merge this? |
Will also make a JIRA for centos-9-stream-amd64 failure. |
@kou Description is updated and the JIRA issue for the |
Thanks! |
Benchmark runs are scheduled for baseline = 7e7b8e1 and contender = b832853. b832853 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
['Python', 'R'] benchmarks have high level of regressions. |
…pache#13311) This PR moves `src/arrow/python` directory into `pyarrow` and arranges PyArrow to build it. The build on the Python side is made in two steps: 1. `_run_cmake_pyarrow_cpp()` where the C++ part of the pyarrow is build first (the part that was moved in the refactoring) 2. `_run_cmake()` where pyarrow is built as before No changes are needed in the build process from the user side to successfully build pyarrow after this refactoring. The test for PyArrow CPP will however be moved into Cython and can currently be run with: ```shell >>> pushd python/build/dist/temp >>> ctest ``` Lead-authored-by: Alenka Frim <frim.alenka@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…pache#13311) This PR moves `src/arrow/python` directory into `pyarrow` and arranges PyArrow to build it. The build on the Python side is made in two steps: 1. `_run_cmake_pyarrow_cpp()` where the C++ part of the pyarrow is build first (the part that was moved in the refactoring) 2. `_run_cmake()` where pyarrow is built as before No changes are needed in the build process from the user side to successfully build pyarrow after this refactoring. The test for PyArrow CPP will however be moved into Cython and can currently be run with: ```shell >>> pushd python/build/dist/temp >>> ctest ``` Lead-authored-by: Alenka Frim <frim.alenka@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…es (#14275) This PR is a follow-up of #13311 where it was decided to change the base directory for PyArrow C++ headers to avoid top level inclusions. See #13311 (comment) Authored-by: Alenka Frim <frim.alenka@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
…pache#13311) This PR moves `src/arrow/python` directory into `pyarrow` and arranges PyArrow to build it. The build on the Python side is made in two steps: 1. `_run_cmake_pyarrow_cpp()` where the C++ part of the pyarrow is build first (the part that was moved in the refactoring) 2. `_run_cmake()` where pyarrow is built as before No changes are needed in the build process from the user side to successfully build pyarrow after this refactoring. The test for PyArrow CPP will however be moved into Cython and can currently be run with: ```shell >>> pushd python/build/dist/temp >>> ctest ``` Lead-authored-by: Alenka Frim <frim.alenka@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…es (apache#14275) This PR is a follow-up of apache#13311 where it was decided to change the base directory for PyArrow C++ headers to avoid top level inclusions. See apache#13311 (comment) Authored-by: Alenka Frim <frim.alenka@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
This PR moves
src/arrow/python
directory intopyarrow
and arranges PyArrow to build it. The build on the Python side is made in two steps:_run_cmake_pyarrow_cpp()
where the C++ part of the pyarrow is build first (the part that was moved in the refactoring)_run_cmake()
where pyarrow is built as beforeNo changes are needed in the build process from the user side to successfully build pyarrow after this refactoring. The test for PyArrow CPP will however be moved into Cython and can currently be run with:
>>> pushd python/build/dist/temp >>> ctest