Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vendoring dependencies presents maintenance issues #1991

Open
ChrisThrasher opened this issue Jan 31, 2022 · 20 comments
Open

Vendoring dependencies presents maintenance issues #1991

ChrisThrasher opened this issue Jan 31, 2022 · 20 comments

Comments

@ChrisThrasher
Copy link
Member

SFML currently vendors a number of binaries and headers for external dependencies. See the extlibs directory. Vendoring provides a more convenient developer experience by removing the need to install dependencies at the cost of a few problems.

  1. Increased maintenance burden. Rebuilding binaries as needed for new architectures or OSes needs to be reproducible and documented and manually performed by someone.
  2. Potential issues with dependency graphs. If someone depends on SFML and has another direct dependency on a different version of one of SFML's dependencies then it can be ambiguous which version of the library to use.
  3. Potential ABI issues with different compilers being used to build SFML versus the vendored binaries.
  4. Binaries cannot be diff'd efficiently by Git causing the repo size to bloat as binaries get updated.
  5. External headers add tens of thousands of lines of code to the repo.
  6. Separating 3rd party code from 1st party code is hard when it lives in the same repo. We must be sure tools like clang-format are never applied to vendored code.

SFML uses vendored dependencies for only a select few operating systems. For example, Linux users end up using find_package, CMake's canonical dependency retrieval mechanism. Users on Windows instead get the vendored dependencies. The key discriminator between whether a given OS uses find_package or vendored dependencies is the presence or absence of an official package manager. We want to keep this behavior.


What I propose as a first step is that we remove all binaries from extlibs and replace them with the source code for a given dependency as retrieved by CMake's FetchContent module but only on OSes that aren't already using find_package. FetchContent will automatically download and build the project and provide the SFML build tree with access to all of the CMake targets from a given dependency.

This also entails a fully target-based approach for expressing dependencies. No more modifying CMAKE_INCLUDE_PATH or CMAKE_LIBRARY_PATH. No matter how a dependency is retrieved, we must depend on a target from that project. This means that users of a given dependency don't have to know anything about how that dependency is retrieved. Separating the retrieval of dependencies from its usage opens the door for easier future changes to how dependencies are handled and brings with it the other benefits of modern target-based CMake code.

I still have more research to do into the existing dependency management scheme and the specifics of SFML's particular dependencies but I hope this issue helps start the conversation so we can start making steps towards improving this situation.

@Bromeon
Copy link
Member

Bromeon commented Jan 31, 2022

Those are some very good points.
See also: https://en.sfml-dev.org/forums/index.php?topic=25053

FetchContent is nice, but probably we should use it manually once and then check in the sources (that's what you suggest, if I understood correctly). On the other hand, FetchContent at build time would make us rely on several third-party resources to be available, or the build is broken. C++ doesn't have a centralized, stable, always-up dependency repository like crates.io, so we would have to trust each provider individually.

find_package on Linux seems to generally work well, but can we always detect incompatible versions? What if there's a slight discrepancy in behavior of a C library that SFML uses and is tested against, vs. one the user has installed?

@ChrisThrasher
Copy link
Member Author

FetchContent is nice, but probably we should use it manually once and then check in the sources (that's what you suggest, if I understood correctly).

I was referring to FetchContent at configuration time. I want to avoid having 3rd party code in our repo since that presents its own maintenance problems.

On the other hand, FetchContent at build time would make us rely on several third-party resources to be available, or the build is broken.

If every dependency is hosted on GitHub (or GitLab or Bitbucket) then it's safe to assume that someone can build SFML. If a dependency is only available on a server running in someone's basement then I understand the concern but I don't believe that's the case for any of our dependencies. Certainly I want to verify that all dependencies are hosted publicly on reputable websites without downtime issues.

If someone really cares deeply about avoiding this network dependency we can still let them modify a CMake variable via the command line which will switch their build over to using find_package so they can install the dependencies locally and avoid this network dependence. It's just that for users who can't be bothered to install the dependencies, we provide this convenient fallback.

find_package on Linux seems to generally work well, but can we always detect incompatible versions? What if there's a slight discrepancy in behavior of a C library that SFML uses and is tested against, vs. one the user has installed?

find_package will find a library with the same major version number that is as new or newer than what is specified. If someone looks for SFML 2.4 but SFML 2.5.1 is installed, CMake will happily let them use the newer version they have installed but will fail if only SFML 3 is installed. If a library changes behavior in a breaking way without a new major version number then they've violated the contract of semantic versioning. We have to hope that doesn't happen. The alternative, locking down exact version numbers, is overly restrictive and makes life harder for package maintainers who might want to bundle SFML alongside a slightly newer version of one of SFML's dependencies and can be tricky across many Linux distros that may install slightly different versions of a given dependency.

@Bromeon
Copy link
Member

Bromeon commented Jan 31, 2022

If every dependency is hosted on GitHub (or GitLab or Bitbucket) then it's safe to assume that someone can build SFML. [...] Certainly I want to verify that all dependencies are hosted publicly on reputable websites without downtime issues.

Downtime is not the only problem. GitHub repos change, are renamed, deleted, transferred, ...

I don't like the thought of people's build to break because one of the dependency maintainer decides to changes something or makes a mistake. You probably heard of the leftpad disaster, or just a few weeks ago, colors and faker. The first case made every single dependent code break, the second still broke everything with "use latest version" policy. And those are published in npm, likely the most widespread package manager, which at least has some basic stability. Unlike a GitHub repo which can change entirely through git push or two clicks.

And those are just some of the issues, security is another one. If any contributor with push access in any of these repos decides to go rogue (or is hacked), and even if this only remains undetected for a few hours, a SFML user can end up downloading malware.

If we don't want to version external code in SFML's own repo, we can always create a separate SFML/extlibs repo and let the user pull dependencies from there. But this remains in our organization, under our control, and dependency updates are a conscious decision.


If a library changes behavior in a breaking way without a new major version number then they've violated the contract of semantic versioning. We have to hope that doesn't happen.

I agree here, but does the C++ ecosystem actually take semver seriously? I haven't had the impression that semver is something that's particularly well-known or taught to library writers.

More a curiosity question -- as you say correctly, we can't do more than hope for the best.

@ChrisThrasher
Copy link
Member Author

The first case made every single dependent code break, the second still broke everything with "use latest version" policy.

I'm not proposing we use FetchContent to track the latest version. I want to use FetchContent to track a specific tag in that repo or maybe even a commit hash to be more specific since tags can move even if it's not likely they do."

And those are just some of the issues, security is another one. If any contributor with push access in any of these repos decides to go rogue (or is hacked), and even if this only remains undetected for a few hours, a SFML user can end up downloading malware.

This is addressed by pulling a specific commit hash instead of dynamically fetching whatever the latest version of the trunk branch is.

I agree here, but does the C++ ecosystem actually take semver seriously? I haven't had the impression that semver is something that's particularly well-known or taught to library writers.

I think it does. Any popular C++ project I've seen or worked with takes semver seriously (or doesn't use semver as is the case with Boost).

I agree here, but does the C++ ecosystem actually take semver seriously? I haven't had the impression that semver is something that's particularly well-known or taught to library writers.

Every popular C++ project I've worked with has taken semver seriously (or not used semver).

@ChrisThrasher
Copy link
Member Author

I'm going to close this issue for now because I need to figure out more about a viable implementation. I don't want this issue cluttering up the issue tracker if I myself don't really know what changes I want to make.

@Bromeon
Copy link
Member

Bromeon commented Feb 10, 2022

I think it's legitimate to keep issues open, even if there is no clearly agreed design yet (otherwise it might already be a PR).
This in particular is a problem we likely have to address sooner or later 🙂

@ChrisThrasher
Copy link
Member Author

Okay then I'll reopen this so we can keep this as our designated spot to discuss dependency management. I won't personally have much to share about this in the short term but hope to take another deep look at this problem in a few weeks.

@ethindp
Copy link

ethindp commented Mar 11, 2022

Why can't we just use vcpkg? From what I can tell, all dependencies that sfml has in extlibs are all in vcpkg, and that's practically a universal package manager for all OSes and all architectures that SFML supports (and could support in future).

@ChrisThrasher
Copy link
Member Author

That's a totally reasonable solution. I don't think anyone on the team or other frequent contributors have much experience with it but I'm sure we'd all love to see what a solution using vcpkg would look like.

@eXpl0it3r
Copy link
Member

There's no reason to lock-in on a specific package manager, when one can just use FetchContent in CMake, plus it enables also additional intermixing with other package managers.

Additionally, I've seen quite some horrible hacks on how vcpkg achieves its recipe packaging and quite dislike their library "discovery", which can't actually handle system library dependencies...

Overall, I don't want to force SFML users to install vcpkg, just to build SFML.

@ChrisThrasher
Copy link
Member Author

I support a solution where the CMake code exclusively uses find_package which has a number of benefits while we maintain a vcpkg manifest for users to optionally use if they don't want to use their system package manager or don't have one (like Windows users). I don't support a solution that involves any reference to a specific package manager in the CMake code because I don't want to force users into using any particular package manager or one at all.

@eXpl0it3r
Copy link
Member

eXpl0it3r commented Oct 18, 2022

I think we don't have to concern ourselves too much with a supply chain attack and by picking a specific version, we ensure that things shouldn't randomly break.

The topic of availability can't be fully dismissed, but at the same time we're not using very niche libraries, as such they won't be suddenly gone and unrecoverable. Moving things to our own repo doesn't really seem to provide a huge benefit, while actually costing us maintenance resources and limiting people from easily updating to some other version on their own.


Instead of listing pros + cons there's this method of aligning all the requirements and the solutions and see which one actually fulfills them the best. Here's my attempt that that, might be biased, so feel free to point out, how it could be adjusted.

Requirements:

  • Easy to use: A user shouldn't have to do a lot of manual steps to get everything to build
  • Low maintenance effort: We maintainers shouldn't be piling on unnecessary work tasks
  • Doesn't inflate repository size: We want to stop inflating our repository size, whenever we update a dependency
  • Offline usage: Clone the repo once or grab a zip file and you should be able to build SFML without an internet connection
  • Not affected by Supply Chain Issues: If a third-party repository has issues, the build shouldn't fail

Solutions:

  • Binaries in extlibs/: Status Quo
  • FetchContent: Fetch the source code during the first CMake run and build it alongside SFML
  • FetchContent + Our Own Repo: Same as above, but have the source code cached in one of our repositories
  • Include Source Code in Repo: Instead of having the source code cached somewhere, we include the whole third-party source code
  • Submodules of Source: Use git submodules to third-party repositories
  • Submodules of Source + Our Own Repo: Instead of using third-party repositories directly, we use our own cached source code versions
  • Submodules of Binaries: We essentially move the extlibs/ folder to an external repository
Easy to use Low maintenance effort Doesn't inflate repository size Offline Usage Not affected by Supply Chain Issues
Binaries in extlibs/ x x x
FetchContent x x x
FetchContent + Our Own Repo x x x
Include Source Code in Repo x ~ x x
Submodules of Source ~ x x
Submodules of Source + Our Own Repo ~ ~ x x
Submodules of Binaries ~ ~ x

As I'm not too worried about "Offline Usage" and "Not affected by Supply Chain Issues", I see a plain FetchContent to third-party repositories as a winner.

Side-note: I didn't include "Using system dependencies" as requirement, since this can be achieved with either option, just that some may make it easier than others.

@Bromeon
Copy link
Member

Bromeon commented Oct 18, 2022

Thanks a lot, really appreciate the effort to put this all together in a structured way! 👍

A question for my understanding:

Not affected by Supply Chain Issues: If a third-party repository has issues, the build shouldn't fail

Easy to use Low maintenance effort Doesn't inflate repository size Offline Usage Not affected by Supply Chain Issues
FetchContent x x x x
FetchContent + Our Own Repo x x x

Shouldn't the first row not have a x in the last column?

@eXpl0it3r
Copy link
Member

Shouldn't the first row not have a x in the last column?

Woops, yes. That moved down, then I fixed it in markdown, but then regenerated the table and forgot to update it in the "source".
Fixed now 😄

@ChrisThrasher
Copy link
Member Author

Side-note: I didn't include "Using system dependencies" as requirement, since this can be achieved with either option, just that some may make it easier than others.

Does this imply that you're only considering solutions that provide users a choice between automatic dependency resolution or manual dependency resolution? I'm surprised you didn't include a pure find_package solution in your table.

@eXpl0it3r
Copy link
Member

I will add it for completeness (don't want to edit the table on my phone), but it wouldn't fall under easy to use and would immediately disqualify for me. It might be okay for a Linux distro, where your OS package manager provides all our libraries, but if you look at how much people just struggle getting SFML to link, imagine how many would just give up if you told them they needed to build all the dependencies on their own as well, or figure out how to use a package manager

@ChrisThrasher
Copy link
Member Author

ChrisThrasher@3978980

I was curious how much code we could remove if we dropped x86 support so I did it. We can remove a fair few binaries and a few dozen lines of CMake. Getting rid of extlibs will be an iterative process so it's reasonable to consider how we might make incremental progress towards that goal. This is a step in that direction.

@eXpl0it3r
Copy link
Member

There's no point in removing binaries without actually having the new dependencies handling in place. An incremental approach is fine, but that increment should be on one dependency at a time and not on one architecture at a time.

We already know that some things can be "removed" if we drop 32-bits support, as @vittorioromeo has already created a PR for this in the past (#1931), but we rejected that for the same reasons we already discussed on Discord.

@Mizux
Copy link

Mizux commented Nov 10, 2022

FYI CMake 3.24 introduce find_package() integration in FetchContent()
ref: https://cmake.org/cmake/help/latest/module/FetchContent.html#integrating-with-find-package
note: this is to perform a find_package() then fallback to FetchContent() if dep is unavailable.

Alternative

If you want more fine control e.g. chose to FetchContent() first (and static build deps ?) but be able to disable this default behaviour e.g. for distro maintainer (they'd prefer/mandatory need system wide install deps) you can introduce an option() so user can have more fine control...

small example

CMakelist.txt:

# Force build of all deps (convenient super option for "hermetic" build)
option(BUILD_DEPS "Build all dependencies" OFF)

# Foo deps
# default: OFF aka rely on find_package(foo) BUT forced to ON if BUILD_DEPS=ON
include(CMakeDependentOption)
cmake_dependent_option(BUILD_foo "Build the foo dependency Library" OFF "NOT BUILD_DEPS" ON)
...

# To build requested dependencies we'll do it in "subfolder"
# note: CMake create a context per directory so we can force few option for our dependencies
add_subdirectory(cmake/dependencies dependencies)

# Check if we find ou build our deps
# set(CMAKE_FIND_PACKAGE_PREFER_CONFIG TRUE)
# dev note: the second test is for super build who may have already add_subdirectory(foo) etc..
# and find_package(foo) can't detect it...
# see: https://gitlab.kitware.com/cmake/cmake/-/issues/17735
if(NOT BUILD_Foo AND NOT TARGET foo::foo)
  find_package(Foo REQUIRED)
endif()

# Optional since find_package has REQUIRED...
if(NOT TARGET foo::foo)
  message(FATAL_ERROR "Target foo::foo not available.")
endif()

cmake/dependencies/CMakeLists.txt:

# Here we can set options only relevant for dependency build
# e.g. Forcing BUILD_SHARED_LIBS=OFF, disable tests of the deps etc...
# feel free to adapt to your needs
include(FetchContent)
set(FETCHCONTENT_QUIET OFF)
set(FETCHCONTENT_UPDATES_DISCONNECTED ON)

set(BUILD_SHARED_LIBS OFF)
set(CMAKE_POSITION_INDEPENDENT_CODE ON)
set(BUILD_TESTING OFF) # disable dependencies tests
set(CMAKE_Fortran_COMPILER OFF)

if(BUILD_foo)
  message(CHECK_START "Fetching Foo")
  list(APPEND CMAKE_MESSAGE_INDENT "  ")
  set(FOO_OPTION_1 ON) # require CMP0077
  set(FOO_OPTION_2 ON)
  FetchContent_Declare(
    foo
    GIT_REPOSITORY "https://github.com/orga/foo.git"
    GIT_TAG "v1.0"
    # PATCH_COMMAND git apply --ignore-whitespace "${CMAKE_CURRENT_LIST_DIR}/foo.patch")
  FetchContent_MakeAvailable(foo)
  list(POP_BACK CMAKE_MESSAGE_INDENT)
  message(CHECK_PASS "fetched")
endif()
...

@ChrisThrasher
Copy link
Member Author

FYI CMake 3.24 introduce find_package() integration in FetchContent()

Sadly that's out of the question since Ubuntu 22 only ships with CMake 3.22 and we value ensuring that users on that platform don't need to install a newer version of CMake.

If you want more fine control e.g. chose to FetchContent() first (and static build deps ?) but be able to disable this default behaviour e.g. for distro maintainer (they'd prefer/mandatory need system wide install deps) you can introduce an option() so user can have more fine control...

This option already exists in the form of the SFML_USE_SYSTEM_DEPS option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants