Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cmake): Use FetchContent and ExternalProject for local deps #1744

Merged

Conversation

fdintino
Copy link
Contributor

This replaces the ad-hoc scripts in the ext folder with modules that use FetchContent and ExternalProject to fetch and build the dependencies as part of the overall project.

This is a fairly radical change, so I would not take offense if it is rejected out of hand. That said, I find the developer experience for this approach very appealing. I think #1683 and #1708 were maybe trying to solve the wrong problem: static linking is hard, and trying to solve for it generally (by, for instance, trying to account for all the variations of system static libraries) is maybe not worth the time and complexity required. Better to make the build process easier and more robust for library consumers.

There is a lot of duplicated code in these local build modules that could be refactored into functions and macros, but I held off on doing any of that work—in part because I've already invested a lot of time in an approach that might not get any traction, but also because I think the mechanics are clearer without additional abstraction.

I tested out this approach with my python library here. A lot of workarounds that would have to be implemented separately (supporting arm64 cross-builds, for instance, or incorporating @wantehchang's aom alpine patch) can instead be shared and maintained with libavif itself.

I feel pretty confident in the code here and have done a fair bit of testing, but I'm marking this as draft while I await general feedback. At the very least the build documentation would need to be updated before this was ready to merge.

@@ -53,9 +53,9 @@ TEST_P(AvmTest, EncodeDecode) {
decoder->codecChoice = av1_codec;
// AVIF_RESULT_NO_CODEC_AVAILABLE is expected because av1_codec is not
// enabled or because we are trying to decode an AV2 file with an AV1 codec.
ASSERT_EQ(avifDecoderReadMemory(decoder.get(), decoded.get(), encoded.data,
ASSERT_NE(avifDecoderReadMemory(decoder.get(), decoded.get(), encoded.data,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are broken in the main branch, but because of a typo in the github actions workflow it was never being executed. dav1d in particular was returning AVIF_RESULT_DECODE_COLOR_FAILED, not AVIF_RESULT_NO_CODEC_AVAILABLE.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should also be its own PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now in #1748

@@ -110,6 +100,8 @@ jobs:
-DAVIF_BUILD_EXAMPLES=ON -DAVIF_BUILD_APPS=ON
-DAVIF_BUILD_TESTS=ON -DAVIF_ENABLE_GTEST=ON -DAVIF_LOCAL_GTEST=ON
-DAVIF_BUILD_GDK_PIXBUF=ON -DCMAKE_INSTALL_PREFIX=./install
-DENABLE_AVX2=OFF
Copy link
Contributor Author

@fdintino fdintino Nov 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small percentage of macOS github runners don't have avx2 support but advertise that they do, causing random "Illegal instruction: 4" errors in AOM tests.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should probably be its own PR right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for diagnosing this. These random errors were a mystery to me and a real pain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't come up with a way to conditionally set -DENABLE_AVX2 for macOS in aom.cmd in a way that's also compatible with sh and cmd.exe...short of creating a separate aom.sh file. So this might need to wait until aom is built as a subdirectory.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to factor this out into a separate small PR? These "ILLEGAL" errors are really annoying. I tried creating a PR that just adds these two lines to the yml files but it seems that ENABLE_AVX2 is not recognized by cmake. I guess it's tied to other changes in this PR?

And a few more questions on this topic:

  • This sounds like an issue with github runners that we're working around. Do you know if there's an issue open somewhere?
  • Why is the CMAKE_OSX_DEPLOYMENT_TARGET flag needed?
  • In practice, I'm fairly sure I've only ever seen the ci-unix-static one fail. I doubt it's just a fluke. Maybe we don't need the changes in the other two workflows?

Thanks!

Copy link
Contributor Author

@fdintino fdintino Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ENABLE_AVX2 flag is actually in AOM (see here). In this pull request, I am adding libaom with add_subdirectory to incorporate it into the larger cmake project, which allows one to pass -DENABLE_AVX2=OFF to libavif's cmake build config. Whereas on the libavif main branch it gets built in a separate step by running aom.cmd. I can't think of a way to pass arguments to that script that would work for windows and unix, and we presumably don't want to disable avx2 for everyone just to get around this issue.

As for your observation: I have seen this error in all of the workflows that use macos, but it does seem to be much more frequent for ci-unix-static. And there is a pattern I've noticed: the builds seem to fail for macOS 12.7 but not for 12.6.9 (you can see the version by expanding the "Operating system" output in the "Set up job" step). If you use tmate to open a shell and run sysctl machdep.cpu, you'll see that a 12.6.9 runner always has AVX2 and a 12.7 runner does not. Anecdotally it seems like the ci-unix-static gets assigned the 12.7 worker more often. I can only speculate as to why that is.

I found an issue opened a year ago on the subject of AVX2 in macos runners: actions/runner-images#6845. The github employee who closed that issue references a comment on a separate issue actions/runner-images#5609 (comment), which reads:

Unfortunately, at this time out agents are not guaranteed to have AVX2 CPU instruction, and only the way to enable the feature for you on permanent basis is to use self-hosted agents(where it is possible to control the underlying hardware).

I don't know if it's worth raising a new issue, since it seems like the 12.6.9 runners have AVX2 pretty reliably, and the 12.7 runners represent a regression of sorts.

FWIW, AOM actually does cpu feature detection before it uses its own avx2 optimizations, but when building it will add the -mavx2 compile flag so long as the compiler supports it, and that enables avx2 optimizations in glibc. The failures in these tests all seem to be caused by a call to the round function, which has a vpbroadcastq instruction in the SIMD-optimized version.

Copy link
Contributor Author

@fdintino fdintino Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CMAKE_OSX_DEPLOYMENT_TARGET is a red herring. Before I figured out that the hardware was to blame, I had hoped that targeting a specific OS X version might resolve this apparent discrepancy between 12.6.9 and 12.7.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the extra context, I understand the issue much better now.
Can we maybe use a sed command to add  -DENABLE_AVX2=OFF to aom.cmd in the CI workflows, similar to what we used to do for shared libs here? 12e0666#diff-cf7515a55275bc6077268c78cd03a0f56422f2e21e35040f907cb7965aed4f14L45
It's not pretty but I'll take any fix at this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I think that ought to work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just make sure to do it before the actions/cache call

Copy link
Collaborator

@vrabaud vrabaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so here are a few comments:

  • first, thank you, this is very welcome (the ext folder is definitely not CMake friendly and has grown too much). Using FetchContent will definitely clean our CMake, make it more future proof, more compatible with static libraries and cross compilation
  • there are unrelated fixes in this PR, you should make other PRs to get those fixes in first
  • the Local*.cmake files could be made more user friendly: have comments that explain what you should change if you want to tweak those (e.g. have libaom with specific flags), set a version number variable on top so that it is easy to update. I believe some variables are useless define (e.g. *_INCLUDE_DIRS). Maybe we should focus on having proper targets that we can then use in the libavif CMake.
  • I believe we should first have PR with Local*.cmake for dependencies we do not want to tweak: definitely gtest, libxml2, zlibpng, jpeg, maybe libsharpyuv, libyuv, rav1e, dav1d. We can then focus on the ones we usually tweak as developers (libaom, gav1)

@@ -110,6 +100,8 @@ jobs:
-DAVIF_BUILD_EXAMPLES=ON -DAVIF_BUILD_APPS=ON
-DAVIF_BUILD_TESTS=ON -DAVIF_ENABLE_GTEST=ON -DAVIF_LOCAL_GTEST=ON
-DAVIF_BUILD_GDK_PIXBUF=ON -DCMAKE_INSTALL_PREFIX=./install
-DENABLE_AVX2=OFF
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should probably be its own PR right?

.github/workflows/ci-disable-gtest.yml Show resolved Hide resolved
@@ -5,13 +5,19 @@ plugins {
android {
namespace 'org.aomedia.avif.android'
compileSdk 30
ndkVersion "25.2.9519653"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should probably be its own PR

@@ -53,9 +53,9 @@ TEST_P(AvmTest, EncodeDecode) {
decoder->codecChoice = av1_codec;
// AVIF_RESULT_NO_CODEC_AVAILABLE is expected because av1_codec is not
// enabled or because we are trying to decode an AV2 file with an AV1 codec.
ASSERT_EQ(avifDecoderReadMemory(decoder.get(), decoded.get(), encoded.data,
ASSERT_NE(avifDecoderReadMemory(decoder.get(), decoded.get(), encoded.data,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should also be its own PR

add_library(GTest::gtest ALIAS gtest)
add_library(GTest::gtest_main ALIAS gtest_main)

set(GTEST_INCLUDE_DIRS ${googletest_SOURCE_DIR}/googletest/include)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All those variables are useless right? Because we use the targets GTest::gtest in the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. I was aiming to provide the same variables as find_package but I agree that it's unnecessary.

cmake/Modules/LocalDav1d.cmake Outdated Show resolved Hide resolved

set_property(TARGET aom PROPERTY AVIF_LOCAL ON)

set(AOM_INCLUDE_DIR "${libaom_SOURCE_DIR}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is also unused right? Because we use the target.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I'm not using any include_dir variables, everything is based on interface include directories. Likewise for transitive library dependencies.

endif()

function(avif_build_local_dav1d)
set(source_dir "${AVIF_SOURCE_DIR}/ext/dav1d")
Copy link
Contributor Author

@fdintino fdintino Nov 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this PR I opted to put the source dirs in the same place as they are now (the source ext directory), but the default behavior is for the source to also be in the build directory. The rules for determining the directories are in the cmake docs at the end of this section. Is there a preference for these directory options?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say the only reason to keep it in ext would be for legacy. ext makes it clearer to:

  • tweak build parameters (but now that will be in the Local*cmake files)
  • patch the repos
    If we modify the source of a FetchContent, do they get reset anyway by CMake? Or is the only way to patch the sources through the CMake file?

Copy link
Contributor Author

@fdintino fdintino Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few ways to prevent cmake from overwriting local changes to source repos. One option is to set FETCHCONTENT_SOURCE_DIR_<PackageName>; it allows you to override the source directory, and FetchContent will use whatever is there instead of fetching it. You could also make DOWNLOAD_COMMAND a no-op if(EXISTS), either by default or based on a user option. If we ended up going that route maybe we would only put SOURCE_DIR in ext if that flag is present.

From what I can tell, it's strongly recommended to keep fetchcontent sources outside of the source tree and to treat them as build artifacts, and to only make exceptions when the user has the intention of modifying sources.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so as long as it is documented, I think we do not care. My preference would be to check if there is anything in ext: if so, take it, otherwise FetchContent it in build.
A big use case we have is to find bugs in libaom/dav1d: we might need to go down those repos and modify them and/or use a specific tag.

@fdintino fdintino force-pushed the feat/cmake-fetchcontent-externalproject branch 2 times, most recently from 4656174 to a802fa3 Compare November 10, 2023 21:17
@fdintino fdintino force-pushed the feat/cmake-fetchcontent-externalproject branch from a802fa3 to dadea96 Compare November 15, 2023 02:45
maryla-uc added a commit to maryla-uc/libavif that referenced this pull request Nov 15, 2023
Author: Frankie Dintino in AOMediaCodec#1744

"A small percentage of macOS github runners don't have avx2 support but
advertise that they do, causing random "Illegal instruction: 4" errors in AOM tests."
maryla-uc added a commit to maryla-uc/libavif that referenced this pull request Nov 15, 2023
Author: Frankie Dintino in AOMediaCodec#1744

"A small percentage of macOS github runners don't have avx2 support but
advertise that they do, causing random "Illegal instruction: 4" errors in AOM tests."
maryla-uc added a commit to maryla-uc/libavif that referenced this pull request Nov 16, 2023
Author: Frankie Dintino in AOMediaCodec#1744

"A small percentage of macOS github runners don't have avx2 support but
advertise that they do, causing random "Illegal instruction: 4" errors in AOM tests."
@fdintino fdintino force-pushed the feat/cmake-fetchcontent-externalproject branch 10 times, most recently from 2487437 to 8de50ea Compare November 21, 2023 20:38
@fdintino fdintino force-pushed the feat/cmake-fetchcontent-externalproject branch 2 times, most recently from 2cc9df6 to a2cdce3 Compare December 4, 2023 14:23
@vrabaud
Copy link
Collaborator

vrabaud commented Dec 4, 2023

I guess some of the files in ext need to be removed now no? And the main README updated too.

@fdintino
Copy link
Contributor Author

fdintino commented Dec 4, 2023

Yes, I think so. What are your thoughts on my question above about source_dir? #1744 (comment)

@fdintino fdintino force-pushed the feat/cmake-fetchcontent-externalproject branch 3 times, most recently from 3b59738 to 1f93498 Compare December 5, 2023 18:51
@fdintino fdintino force-pushed the feat/cmake-fetchcontent-externalproject branch 3 times, most recently from ac139b2 to fdf2e93 Compare February 23, 2024 05:02
@fdintino
Copy link
Contributor Author

fdintino commented Feb 23, 2024

@vrabaud this PR is ready for another review. Sorry for the long gaps in my responses; I'm out on leave from work with a newborn at home, so my attention is a bit divided.

I've reworked the LocalPackage.cmake modules so that they're compatible with the existing ext scripts: if the library artifact is found in the expected location, the behavior is unchanged from what it is now. I've also set it up so that FetchContent will use the source directory in ext if it exists (overriding any downloads or git clones). I'm agnostic about how best to handle the transition from the ext scripts (if at all); I leave it up to you and the other maintainers to determine what the desired behavior should be. One motivation for keeping compatibility with the existing ext scripts was to avoid having to integrating the CI fuzz build flags into the fetchcontent builds in this PR, and also to put off the work of incorporating changes like #2034 into the rebase. In the case of #2034, I wasn't clear whether we should prefer clang-cl generally on windows for libyuv, or whether the override should only happen in GHA.

@fdintino fdintino force-pushed the feat/cmake-fetchcontent-externalproject branch 3 times, most recently from 554e4be to ec57d7b Compare February 23, 2024 18:56
@fdintino fdintino marked this pull request as ready for review February 24, 2024 22:57
endif()

set(CONFIG_PIC 1 CACHE INTERNAL "")
if(libyuv_FOUND)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that means libyuv needs to be built before that, it should be commented somewhere.

if(NOT EXISTS "${LIB_FILENAME}")
message(FATAL_ERROR "libavif(AVIF_LIBSHARPYUV=LOCAL): ${LIB_FILENAME} is missing, bailing out")
endif()
if(EXISTS "${LIB_FILENAME}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All those checks for a static lib are for backward compatibility right? In case somebody has built those libs locally. We could decide that this is not supported which would result in less code right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly.

message(CHECK_START "libavif(AVIF_CODEC_RAV1E=LOCAL): fetching and configuring rav1e")
endif()

find_program(CARGO_CINSTALL cargo-cinstall HINTS "$ENV{HOME}/.cargo/bin")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe those 5 lines can be moved to below with if(NOT TARGET cargo-cinstall)

CMakeLists.txt Outdated
@@ -716,8 +733,8 @@ if(AVIF_BUILD_APPS)
)
endif()
if(AVIF_ENABLE_EXPERIMENTAL_GAIN_MAP)
include(LocalLibargparse)
if(TARGET libargparse)
if(AVIF_LOCAL_LIBARGPARSE)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not do include(LocalLibargparse) and assume AVIF_LOCAL_LIBARGPARSE is ON by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I suppose it's implied by AVIF_ENABLE_EXPERIMENTAL_GAIN_MAP.

Copy link
Collaborator

@vrabaud vrabaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly ok, please rebase. And congrats on the new kid !!!

- name: Setup cmake
uses: jwlawson/actions-setup-cmake@d06b37b47cfd043ec794ffa3e40e0b6b5858a7ec # v1.14.2
with:
# CMake version 3.17 is required to build libwebp (which libsharpyuv is part of) on macOS.
cmake-version: '3.17.x'
# CMake version 3.18 is required to build libxml2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to bump here, libxml2 is not built in this CI (it is not set as a CMake argument)

@fdintino
Copy link
Contributor Author

fdintino commented Mar 7, 2024

The CI Windows build is failing, but I don't think that was introduced by my changes. Judging by this output from a recently run build on the main branch, it looks like the avifgainmaputil test was just being skipped previously:

CMake Warning at cmake/Modules/LocalLibargparse.cmake:9 (message):
  D:/a/libavif/libavif/ext/libargparse/build/argparse.lib is missing, not
  building avifgainmaputil, please run ext/libargparse.cmd

@fdintino
Copy link
Contributor Author

@vrabaud Any updates on this? I see there are some version bumps on the main branch that need to be incorporated here, but I've held off on including that in a rebase while awaiting further feedback. I'm happy to push that if it's helpful for review.

@vrabaud
Copy link
Collaborator

vrabaud commented Mar 28, 2024

@fdintino , sorry for the late review. I believe it is good as is. We decided to keep the legacy way of building things through ext scripts for now. We can remove that code later. Please sync the d'EPS to their latest version. Thx again for this tremendous work !

@fdintino fdintino force-pushed the feat/cmake-fetchcontent-externalproject branch from 39a3c5c to a8b946d Compare April 9, 2024 14:27
@fdintino fdintino force-pushed the feat/cmake-fetchcontent-externalproject branch from a8b946d to 500bfd8 Compare April 9, 2024 17:16
@vrabaud vrabaud merged commit f1b936d into AOMediaCodec:main Apr 11, 2024
25 checks passed
@vrabaud
Copy link
Collaborator

vrabaud commented Apr 11, 2024

Thx again ! Let's see how it goes with the core developers. We will also revisit all the static lib open bugs. I guess you can also close the other CMake related PR.

@y-guyon
Copy link
Collaborator

y-guyon commented Apr 12, 2024

It seems the CI workflows have been slow to complete since this PR have been merged.
I believe it is due to the "Build libavif (ninja)" step that runs rustc now.

Example before/after this PR:
https://github.com/AOMediaCodec/libavif/actions/runs/8619643709/job/23624784607 took 47 seconds.
https://github.com/AOMediaCodec/libavif/actions/runs/8661344586/job/23751067732 took 6 minutes.

@vrabaud
Copy link
Collaborator

vrabaud commented Apr 12, 2024

I am enabling cache in build/_deps in #2103, let's see how that goes.

@y-guyon
Copy link
Collaborator

y-guyon commented Apr 12, 2024

I also have weird issues on the CI in my fork:

https://github.com/y-guyon/libavif/actions/runs/8662720594/job/23755243272#step:15:47

Running `rustc --crate-name crossbeam_utils ...`
error: error parsing manifest: TOML parse error at line 18908, column 26
      |
18908 | [[pkg.rust.target.x86_64-
      |                          ^
invalid table header
expected `.`, `]]`

error: could not compile `crossbeam-utils` (lib)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants