Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Reduce required dependencies on macOS builds #33087

Closed
asfimport opened this issue Sep 28, 2022 · 16 comments
Closed

[CI] Reduce required dependencies on macOS builds #33087

asfimport opened this issue Sep 28, 2022 · 16 comments

Comments

@asfimport
Copy link

asfimport commented Sep 28, 2022

Our macOS CI builds on Github Actions usually take at least 10 minutes installing dependencies from Homebrew (because of compiling from source?). It would be nice to cache those, especially as they probably don't change often.

Reporter: Antoine Pitrou / @pitrou
Assignee: Jin Shang / @js8544

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-17872. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
@assignUser Do you think that's reasonably doable?

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
Here is an example job which timed out due to an overlong dependency step:
https://github.com/pitrou/arrow/actions/runs/3141950727/jobs/5104979517

@asfimport
Copy link
Author

Jacob Wujciak / @assignUser:
I have set up a test job with debug output to see what exactly is taking so long: https://github.com/assignUser/test-repo-a/actions/runs/3142905685/jobs/5107502078#step:4:392

If you turn on timestamps you can see that what takes the time is extracting the archives (e.g. llvm ~1.5G) not downloading them, so caching the hombrew --cache directory would not save significant time. As the cache is also tar'd extracting the cache might be the new bottle neck....

@asfimport
Copy link
Author

Jacob Wujciak / @assignUser:
it looks like homebrew is using system tar to extract the gzipped bottles, maybe we can speed it up by symlinking in pigz to make use of the 3 cores the mac runners have...

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
Ouch, LLVM can be heavy but 1.5GB sounds really outlandish.
(for comparison, the combined unpacked size for the conda-forge packages libllvm, llvm-tools and llvmdev is 500MB)

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
(and even 10 minutes for extracting 1.5GB seems quite unexpected: that's only 2.5 MB/s... so it's not a gzip problem but probably an IO/memory issue)

@asfimport
Copy link
Author

Jacob Wujciak / @assignUser:
And we have 12 & 15 both similar size (do we need both?), aws sdk is 800M...

@asfimport
Copy link
Author

Jacob Wujciak / @assignUser:
relevant homebrew issue: Homebrew/brew#13621

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
We may perhaps want to disable some Arrow components on those macOS builds, unless there's another package manager that we can use?

@kou Do you know why we decided to use Homebrew for dependencies on macOS?

@asfimport
Copy link
Author

Jacob Wujciak / @assignUser:

10 minutes for extracting 1.5GB seems quite unexpected

I have checked in detail and each of the bigger dependecies (aws, llvm, boost) take 2-3 minutes to "pour", so ok speeds I would say. Just over all a lot but still nothing Isee the cache really speeding up.

The timeout is set to 60 minutes so we could just raise that limit if it is not applicable for the current build complexity (or as you said remove features). The build should already be using all 3 available cores.

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
A build that takes 60 minutes or more is horrible for developer experience. So I would suggest disabling Gandiva and S3 support on all our PR-based macOS builds (and update the brew files to remove/disable the corresponding third-party deps).

Do you want to take this @assignUser?

@asfimport
Copy link
Author

Kouhei Sutou / @kou:

Do you know why we decided to use Homebrew for dependencies on macOS?

Because Homebrew is one of major package managers that are used by macOS users. We should use an environment similar to the one that is used by users for CI to find bugs before we release.

Anyway, I'm OK with disabling some features for PR.

@asfimport
Copy link
Author

Jin Shang / @js8544:
FYI The Mac runner we use for github CI (mac-os-latest) has LLVM@14 and LLVM@13 preinstalled (https://github.com/actions/runner-images/blob/main/images/macos/macos-11-Readme.md). If we can use these two version for both Gandiva and Clang-format then the build costs for LLVM can be saved entirely. cc @pitrou @kou 

EDIT: :llvm@13 seems to only include the clang compiler. So to save build time we have to use llvm@14 for Gandiva and clang-format.

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
clang-format is only used for linting, so that shouldn't be an issue.

@asfimport
Copy link
Author

Kouhei Sutou / @kou:
Thanks for the info!

Let's try one of the preinstalled LLVMs.

If we upgrade our clang-format version to 14 from 12, we can use cpp/Brewfile without changes on CI:


diff --git a/.env b/.env
index 761642506f..e124264335 100644
--- a/.env
+++ b/.env
@@ -53,7 +53,7 @@ FEDORA=35
 UBUNTU=20.04
 
# Default versions for various dependencies
-CLANG_TOOLS=12
+CLANG_TOOLS=14
 CUDA=11.0.3
 DASK=latest
 DOTNET=6.0
diff --git a/cpp/Brewfile b/cpp/Brewfile
index 61fb619dc6..c6afd5dc76 100644
--- a/cpp/Brewfile
+++ b/cpp/Brewfile
@@ -28,8 +28,9 @@ brew "git"
 brew "glog"
 brew "googletest"
 brew "grpc"
-brew "llvm"
-brew "llvm@12"
+# GitHub Actions' macos-11 runner includes llvm@14.
+# We can use it to save CI time.
+brew "llvm@14"
 brew "lz4"
 brew "ninja"
 brew "numpy"

If we don't want to change our required clang-format version, we need to change cpp/Brewfile in each CI job.

@asfimport
Copy link
Author

Kouhei Sutou / @kou:
Issue resolved by pull request 14310
#14310

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant