Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmake: add submodule for Apache Arrow at v6.0.1 #44696

Merged
merged 8 commits into from Mar 23, 2022

Conversation

cbodley
Copy link
Contributor

@cbodley cbodley commented Jan 20, 2022

adds an arrow submodule. when WITH_RADOSGW_SELECT_PARQUET is enabled, the submodule is built as an external project and rgw links against its imported Arrow::Parquet target

TODO:

  • enable ARROW_WITH_SNAPPY, and maybe other compression libs
  • figure out how to expose the include directories (though a local build somehow worked without?)
  • add WITH_SYSTEM_ARROW option and use find_package() for arrow and parquet
  • add utf8proc submodule for centos builds
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@cbodley
Copy link
Contributor Author

cbodley commented Jan 20, 2022

jenkins test make check


# only build static library
list(APPEND arrow_CMAKE_ARGS -DARROW_BUILD_SHARED=OFF)
list(APPEND arrow_CMAKE_ARGS -DARROW_BUILD_STATIC=ON)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the static linkage?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, more broadly, what is built statically, with this option? does it leave the arrow and parquet targets as static libs? if so, yeah, I am skeptical.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i just wanted to avoid packaging those shared libraries with radosgw. it seems like that would lead to conflicts if/when distros provide them. is shared linkage worth this risk?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, more broadly, what is built statically, with this option? does it leave the arrow and parquet targets as static libs? if so, yeah, I am skeptical.

the intent was to build arrow and parquet statically, but leave their dependencies as shared. we'll just need to make sure that arrow's cmake finds the same libraries ceph is using, which might be tricky if ceph is building them from submodules

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I'd like to optimize later, but do whatever is cleanest for now

@cbodley cbodley force-pushed the wip-arrow-submodule-ext branch 4 times, most recently from 75e48f6 to 1204d05 Compare January 21, 2022 14:24
@cbodley
Copy link
Contributor Author

cbodley commented Jan 21, 2022

i bumped arrow to v6.0.1, because 4.0.1 doesn't build on recent compilers. once i add a WITH_SYSTEM_ARROW, that should still be able to link against Kaleb's centos package for arrow-4.0.1

the jenkins builds (ubuntu) are working pretty well now

i'm still working through some shaman issues, one due to our builders setting CMAKE_BUILD_TYPE=None, and another when arrow looks for our system boost

i haven't figured out whether or not libre2 and libutf8proc are runtime dependencies yet. centos doesn't have a utf8proc, so the builds will fail there until we add Kaleb's repo everywhere

@cbodley
Copy link
Contributor Author

cbodley commented Jan 26, 2022

it looks like i've made it past the boost issues for ubuntu, but now the arrow submodule is failing to find headers from its own xsimd submodule:

[ 12%] Building CXX object src/arrow/CMakeFiles/arrow_objlib.dir/array/validate.cc.o
cd /build/ceph-17.0.0-10335-g20f97bb9/obj-x86_64-linux-gnu/src/arrow/cpp/src/arrow && /usr/bin/c++  -DARROW_EXPORTING -DARROW_HAVE_RUNTIME_AVX2 -DARROW_HAVE_RUNTIME_AVX512 -DARROW_HAVE_RUNTIME_BMI2 -DARROW_HAVE_RUNTIME_SSE4_2 -DARROW_HAVE_SSE4_2 -DARROW_WITH_BACKTRACE -DARROW_WITH_LZ4 -DARROW_WITH_RE2 -DARROW_WITH_SNAPPY -DARROW_WITH_TIMING_TESTS -DARROW_WITH_UTF8PROC -DARROW_WITH_ZLIB -DURI_STATIC_BUILD -I/build/ceph-17.0.0-10335-g20f97bb9/obj-x86_64-linux-gnu/src/arrow/cpp/src -I/build/ceph-17.0.0-10335-g20f97bb9/src/arrow/cpp/src -I/build/ceph-17.0.0-10335-g20f97bb9/src/arrow/cpp/src/generated -isystem /build/ceph-17.0.0-10335-g20f97bb9/src/arrow/cpp/thirdparty/flatbuffers/include -isystem /build/ceph-17.0.0-10335-g20f97bb9/obj-x86_64-linux-gnu/boost/include -isystem /build/ceph-17.0.0-10335-g20f97bb9/obj-x86_64-linux-gnu/src/arrow/cpp/xsimd_ep/src/xsimd_ep-install/include -isystem /build/ceph-17.0.0-10335-g20f97bb9/src/arrow/cpp/thirdparty/hadoop/include  -Wno-noexcept-type -g -O2 -fdebug-prefix-map=/build/ceph-17.0.0-10335-g20f97bb9=. -fstack-protector-strong -Wformat -Werror=format-security -fdiagnostics-color=always -O3 -DNDEBUG  -Wall -fno-semantic-interposition -msse4.2  -O3 -DNDEBUG -fPIC   -std=c++11 -o CMakeFiles/arrow_objlib.dir/array/validate.cc.o -c /build/ceph-17.0.0-10335-g20f97bb9/src/arrow/cpp/src/arrow/array/validate.cc
In file included from /build/ceph-17.0.0-10335-g20f97bb9/src/arrow/cpp/src/arrow/array/validate.cc:32:
/build/ceph-17.0.0-10335-g20f97bb9/src/arrow/cpp/src/arrow/util/utf8.h:27:10: fatal error: xsimd/xsimd.hpp: No such file or directory
   27 | #include <xsimd/xsimd.hpp>
      |          ^~~~~~~~~~~~~~~~~

the command line shows the right include dir for it:
-isystem /build/ceph-17.0.0-10335-g20f97bb9/obj-x86_64-linux-gnu/src/arrow/cpp/xsimd_ep/src/xsimd_ep-install/include
and the install step for the xsimd_ep external project had already run. this works fine on my local builds

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@cbodley cbodley changed the title cmake: add submodule for Apache Arrow at v4.0.1 cmake: add submodule for Apache Arrow at v6.0.1 Jan 27, 2022
@cbodley
Copy link
Contributor Author

cbodley commented Jan 28, 2022

it looks like we can't get utf8proc into EPEL for centos, so i submoduled that too. the centos build in shaman succeeded without requiring any extra repos

@cbodley
Copy link
Contributor Author

cbodley commented Jan 31, 2022

it looks like i've made it past the boost issues for ubuntu, but now the arrow submodule is failing to find headers from its own xsimd submodule:

i have not been able to reproduce this on a ubuntu focal vm with the exact same cmake options

@cbodley
Copy link
Contributor Author

cbodley commented Feb 3, 2022

still trying to debug the ubuntu build. i forked arrow and added a commit in https://github.com/cbodley/arrow/commits/ceph-6.0.1-xsimd-debug that lists everything under this include directory after the xsimd install step

@cbodley
Copy link
Contributor Author

cbodley commented Feb 3, 2022

wow, okay. so when arrow's cmake builds the xsimd library as a submodule, it passes a -DCMAKE_INSTALL_PREFIX of:

/build/ceph-17.0.0-10512-g353e9e1e/obj-x86_64-linux-gnu/src/arrow/cpp/xsimd_ep/src/xsimd_ep-install

however, xsimd's install step installs the header to:

/build/ceph-17.0.0-10512-g353e9e1e/debian/tmp/build/ceph-17.0.0-10512-g353e9e1e/obj-x86_64-linux-gnu/src/arrow/cpp/xsimd_ep/src/xsimd_ep-install/include/xsimd/xsimd.hpp

this path has an extra prefix of /build/ceph-17.0.0-10512-g353e9e1e/debian/tmp, which is coming from ceph's debian/rules:

export DESTDIR=$(CURDIR)/debian/tmp

so we need a way to hide that environment variable from arrow's build

@cbodley cbodley force-pushed the wip-arrow-submodule-ext branch 3 times, most recently from 7f4a8b6 to c73e1fd Compare February 4, 2022 04:13
@cbodley
Copy link
Contributor Author

cbodley commented Feb 4, 2022

@galsalomon66 do you know whether we need this flag enabled?

--   PARQUET_REQUIRE_ENCRYPTION=OFF [default=OFF]
--       Build support for encryption. Fail if OpenSSL is not found

@galsalomon66
Copy link
Contributor

galsalomon66 commented Feb 4, 2022

@galsalomon66 do you know whether we need this flag enabled?

--   PARQUET_REQUIRE_ENCRYPTION=OFF [default=OFF]
--       Build support for encryption. Fail if OpenSSL is not found

this flag was OFF (with ORC=OFF)

@cbodley cbodley force-pushed the wip-arrow-submodule-ext branch 2 times, most recently from 02a74bb to a9e805b Compare February 7, 2022 19:17
@kalebskeithley
Copy link
Contributor

i bumped arrow to v6.0.1, because 4.0.1 doesn't build on recent compilers. once i add a WITH_SYSTEM_ARROW, that should still be able to link against Kaleb's centos package for arrow-4.0.1

the centos package is 4.0.1 because that's what was in the apache repo that Gal was using. I will update to 6.0.1 at some point.

@ivancich
Copy link
Member

ivancich commented Feb 9, 2022

I tried building it. Got the following:

CMake Error at /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:218 (message):
  Could NOT find utf8proc (missing: utf8proc_LIB) (found suitable version
  "2.6.1", minimum required is "2.2.0")
Call Stack (most recent call first):
  /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:582 (_FPHSA_FAILURE_MESSAGE)
  cmake/modules/Findutf8proc.cmake:87 (find_package_handle_standard_args)
  src/CMakeLists.txt:866 (find_package)

I did install utf8proc-devel. I also tried setting utf8proc_ROOT to /usr. It ultimately worked when I set utf8proc_LIB to "FILEPATH=/usr/lib/libutf8proc.so"/

@cbodley
Copy link
Contributor Author

cbodley commented Feb 10, 2022

thanks @ivancich, i updated the Findutf8proc.cmake module that i took from arrow, and built successfully on fedora with the default WITH_SYSTEM_UTF8PROC=ON. the jenkins builds still succeed, as do the shaman builds in https://shaman.ceph.com/builds/ceph/wip-arrow-submodule-utf8proc/

adds an arrow submodule. when WITH_RADOSGW_SELECT_PARQUET is enabled,
the submodule is built as an external project and rgw links against its
imported Arrow::Parquet target

Signed-off-by: Casey Bodley <cbodley@redhat.com>
adds utf8proc submodule, needed by the arrow submodule in centos. add a
WITH_SYSTEM_UTF8PROC option that controls whether or not utf8proc is
built from submodule

non-system utf8proc is built as a static library to avoid conflicts with
system-provided libraries

ceph.spec.in sets WITH_SYSTEM_UTF8PROC=OFF until it's available in
centos

Signed-off-by: Casey Bodley <cbodley@redhat.com>
the arrow submodule builds some C sources that trip up on _FORTIFY_SOURCE in debug builds

[ 79%] Building C object src/arrow/CMakeFiles/arrow_objlib.dir/vendored/musl/strptime.c.o
In file included from /usr/include/time.h:25,
                 from /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-10531-gc73e1fda/rpm/el8/BUILD/ceph-17.0.0-10531-gc73e1fda/src/arrow/cpp/src/arrow/vendored/strptime.h:20,
                 from /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-10531-gc73e1fda/rpm/el8/BUILD/ceph-17.0.0-10531-gc73e1fda/src/arrow/cpp/src/arrow/vendored/musl/strptime.c:4:
/usr/include/features.h:381:4: error: #warning _FORTIFY_SOURCE requires compiling with optimization (-O) [-Werror=cpp]
  381 | #  warning _FORTIFY_SOURCE requires compiling with optimization (-O)
      |    ^~~~~~~
cc1: all warnings being treated as errors
make[5]: *** [src/arrow/CMakeFiles/arrow_objlib.dir/build.make:2543: src/arrow/CMakeFiles/arrow_objlib.dir/vendored/musl/strptime.c.o] Error 1

Signed-off-by: Casey Bodley <cbodley@redhat.com>
relies on a hack to find the installed ParquetConfig.cmake

Signed-off-by: Casey <cbodley@redhat.com>
Signed-off-by: Casey <cbodley@redhat.com>
Signed-off-by: Casey <cbodley@redhat.com>
Signed-off-by: Casey <cbodley@redhat.com>
Signed-off-by: Casey <cbodley@redhat.com>
@cbodley
Copy link
Contributor Author

cbodley commented Mar 22, 2022

jenkins test make check

@cbodley
Copy link
Contributor Author

cbodley commented Mar 22, 2022

@galsalomon66 if you think this is ready for merge and quincy backport, could please approve?

approved

Copy link
Member

@ivancich ivancich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm requesting support for Arrow Flight be added.

CMakeLists.txt Show resolved Hide resolved
cmake/modules/BuildArrow.cmake Show resolved Hide resolved
@cbodley
Copy link
Contributor Author

cbodley commented Mar 23, 2022

hi @ivancich, this PR is a blocker for s3select in quincy so i'd really like to keep Arrow Flight out of scope for now. we're forcing the whole ceph project to build this submodule, so should only enable the features that we actually depend on. i'm happy to work with you to enable it for testing in the meantime

@ivancich
Copy link
Member

hi @ivancich, this PR is a blocker for s3select in quincy so i'd really like to keep Arrow Flight out of scope for now. we're forcing the whole ceph project to build this submodule, so should only enable the features that we actually depend on. i'm happy to work with you to enable it for testing in the meantime

I understand. I'm only asking that it be built conditionally.

@ivancich ivancich self-requested a review March 23, 2022 15:42
Copy link
Member

@ivancich ivancich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking forward to getting Arrow Flight built through this more general Arrow enablement.

@ivancich ivancich dismissed their stale review March 23, 2022 15:46

Further conversation with @cbodley.

@cbodley
Copy link
Contributor Author

cbodley commented Mar 23, 2022

sure, thanks Eric! this PR still needs an approval to merge, cc @galsalomon66

@galsalomon66 galsalomon66 self-assigned this Mar 23, 2022
@ljflores
Copy link
Contributor

@cbodley this PR seems to have introduced a regression for CentOS / RHEL operating systems: https://tracker.ceph.com/issues/55114

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build/ops needs-quincy-backport backport required for quincy rgw
Projects
None yet
6 participants