Skip to content

Conversation

@nealrichardson
Copy link
Member

@nealrichardson nealrichardson commented Mar 26, 2020

This patch switches our bundled boost ep to use a slimmer version of boost, which was built with the script added at cpp/build_support/trim-boost.sh. It uses the official boostorg big tarball as a fallback URL if for some reason ours is unavailable (as we've seen with other ep's, sometimes the download hosts are subject to rate limiting or other downtime, so having redundancy would more generally be good).

The resulting tarball is 10mb, much less than 113mb of the full boost but larger than the 800k I suggested in the ticket. This is because in order to build regex and filesystem, we need config build boost_install headers, which add some weight. Boost.Build also seems to require log, and predef is needed by log apparently.

In addition to slimming the boost tarball, this patch also refines the cmake logic that determines when boost is required. Due to recent-ish efforts to reduce usage of boost, we only need boost for tests and Gandiva, and for Parquet but only on gcc < 4.9. When building thrift_ep, we also need boost. By narrowing the condition involving Parquet to just gcc < 4.9, we are able to remove boost entirely from the R macOS and Windows packages.

Outstanding questions/to-dos that I'm aware of:

  • Put the boost bundle somewhere official. I plan to put it at https://dl.bintray.com/ursalabs/arrow-boost/ but am open to suggestion if someone has a better idea.
  • Add a crossbow job to build the boost bundle: to update the boost version or change what's included in the bundle, edit cpp/build_support/trim-boost.sh and run an on-demand build via PR comment. On further reflection, crossbow isn't appropriate because it would mean that anyone could edit the script and overwrite the bundle that gets pulled into all source builds. Instead, I added a script that one can run locally, just requiring a bintray user and token in the ursalabs organization (available to any committer on request).
  • Something with namespacing? I'm not sure how much of a concern this is since Arrow itself doesn't really require boost anymore.

@github-actions
Copy link

@nealrichardson
Copy link
Member Author

Hmm, thrift appears to need more of boost. Will see if I can figure out what it needs and add that.

@nealrichardson
Copy link
Member Author

Looks like Gandiva may require a little more, and grepping through the files, some tests import other boost headers (which we may by chance already have included, IDK). I'll explore some more tomorrow (unless someone comments here why this is a bad idea); fallback could be to change which URL to download boost from depending on which cmake args are set (i.e. if gandiva is being built, and/or if tests are being built, get the full boost).

@nealrichardson
Copy link
Member Author

@github-actions crossbow submit macos-r-autobrew

@github-actions
Copy link

Revision: ad657cc6b6c50e525d6f31186096b3a389a4d24a

Submitted crossbow builds: ursa-labs/crossbow @ actions-49

Task Status
macos-r-autobrew TravisCI

@nealrichardson
Copy link
Member Author

After poking at this some more, I haven't been able to pare back the heavier components at all. Whether for regex or locale, we need boost.build to build them, and that seems to require the bigger pieces, including log. Even so, we've reduced the size of the tarball and the expanded package by an order of magnitude, so it should be a noticeable improvement.

I'll work on the crossbow job for updating this bundle and then consider this done, pending review/approval. Will also want to run the all the nightlies against this.

@nealrichardson
Copy link
Member Author

I'm going to need some help again with cmake-format (when I run it locally it rewrites the entire file) but otherwise I think this is about ready to go, pending final review/approval.

@nealrichardson
Copy link
Member Author

@github-actions crossbow submit -g r

@github-actions
Copy link

Revision: 4869bf999412e91e1b77289c08f4f9ac219d3d2a

Submitted crossbow builds: ursa-labs/crossbow @ actions-51

Task Status
macos-r-autobrew TravisCI
test-conda-r-3.6 CircleCI
test-r-linux-as-cran Github Actions
test-r-rhub-ubuntu-gcc-release Azure
test-r-rocker-r-base-latest Azure
test-r-rstudio-r-base-3.6-bionic Azure
test-r-rstudio-r-base-3.6-centos6 Azure
test-r-rstudio-r-base-3.6-opensuse15 Azure
test-r-rstudio-r-base-3.6-opensuse42 Azure
test-ubuntu-18.04-r-sanitizer CircleCI

@kou
Copy link
Member

kou commented Mar 29, 2020

$ python3 -m pip install -r dev/archery/requirements-lint.txt
$ python3 run-cmake-format.py

will work.

@nealrichardson
Copy link
Member Author

I don't understand the Ruby failure, or at least don't see how it's related: https://github.com/apache/arrow/pull/6734/checks?check_run_id=543791775#step:6:1181

@kou
Copy link
Member

kou commented Mar 29, 2020

Oh, Meson 0.54.0 released today and it requires Ninja 1.7 or later https://mesonbuild.com/Release-notes-for-0-54-0.html#ninja-version-requirement-bumped-to-17 but Ubuntu 16.04 ships Ninja 1.5.1.
I'll take over this.

@kou
Copy link
Member

kou commented Mar 29, 2020

#6756

@nealrichardson
Copy link
Member Author

@github-actions crossbow submit -g r

@github-actions
Copy link

Revision: dec6be8

Submitted crossbow builds: ursa-labs/crossbow @ actions-52

Task Status
macos-r-autobrew TravisCI
test-conda-r-3.6 CircleCI
test-r-linux-as-cran Github Actions
test-r-rhub-ubuntu-gcc-release Azure
test-r-rocker-r-base-latest Azure
test-r-rstudio-r-base-3.6-bionic Azure
test-r-rstudio-r-base-3.6-centos6 Azure
test-r-rstudio-r-base-3.6-opensuse15 Azure
test-r-rstudio-r-base-3.6-opensuse42 Azure
test-ubuntu-18.04-r-sanitizer CircleCI

nealrichardson and others added 4 commits March 29, 2020 14:35
Because Meson 0.54.0 requires Ninja 1.7 but Ubuntu 16.04 provides
Ninja 1.5.1 by default.
@nealrichardson
Copy link
Member Author

@github-actions crossbow submit -g r

@github-actions
Copy link

Revision: 3277c77

Submitted crossbow builds: ursa-labs/crossbow @ actions-53

Task Status
macos-r-autobrew TravisCI
test-conda-r-3.6 CircleCI
test-r-linux-as-cran Github Actions
test-r-rhub-ubuntu-gcc-release Azure
test-r-rocker-r-base-latest Azure
test-r-rstudio-r-base-3.6-bionic Azure
test-r-rstudio-r-base-3.6-centos6 Azure
test-r-rstudio-r-base-3.6-opensuse15 Azure
test-r-rstudio-r-base-3.6-opensuse42 Azure
test-ubuntu-18.04-r-sanitizer CircleCI

@nealrichardson
Copy link
Member Author

Okay, now this is done IMO, for real this time. Review appreciated. Note that this will also fix the widespread failures on master due to whatever is up with bintray.com/boostorg.

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Looks good here. Removing a bunch of Boost a good idea

# to upload is at ${BOOST_FILE}/${BOOST_FILE}.tar.gz
#
# Also, you must have a bintray account on the "ursalabs" organization and
# set the BINTRAY_USER and BINTRAY_APIKEY env vars.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about hosting these tarballs as downloads from tagged GitHub releases (based on date or something)? Can do after this. Seems GitHub will be more reliable than Bintray

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with whatever works; it was just quickest for me to use bintray because it's familiar. If someone wants to modify the script to put a tarball somewhere else, that's cool.

They're not mutually exclusive, even: we can have multiple URLs as backups in the external project definition. So if you're worried about reliability, we can add redundancy.

@wesm
Copy link
Member

wesm commented Mar 30, 2020

Merging this. Thanks @nealrichardson for leading the charge on this

@wesm wesm closed this in 9621719 Mar 30, 2020
"https://dl.bintray.com/ursalabs/arrow-boost/boost_${ARROW_BOOST_BUILD_VERSION_UNDERSCORES}.tar.gz"
)
if(NOT CMAKE_VERSION VERSION_LESS 3.7)
# Append as a backup URL the full source from boostorg
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants