Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Introduce "base-sets" for vendored dependencies #35

Open
slyon opened this issue Aug 8, 2023 · 9 comments
Open

RFC: Introduce "base-sets" for vendored dependencies #35

slyon opened this issue Aug 8, 2023 · 9 comments

Comments

@slyon
Copy link
Contributor

slyon commented Aug 8, 2023

Modern languages like Rust or Golang make heavy use of vendoring for their dependencies, therefore we cannot easily link applications against libraries from the Ubuntu archive, which are supported and covered by the security team. (cpaelzer#3)

Different applications might pull in different versions of the same vendored dependency, which need to be tracked and updated individually (https://wiki.ubuntu.com/RustCodeInMain).

I wonder if we could define some kind of base-sets ("base-crates"/"base-packages") similar to the nature of "base-snaps". Those would describe a bundle of specific crates/packages/dependencies using specific versions and might be uploaded to crates.io / pkg.go.dev or implemented as a .deb package in the Ubuntu archive and maintained & supported by the corresponding toolchain team / security team.

For Rust one "base-set"/"base-crate" might for example contain dependencies to specific versions of very common crates, such as:

  • clap
    A simple to use, efficient, and full-featured Command Line Argument Parser
  • curl
    Rust bindings to libcurl for making HTTP requests
  • libc
    Raw FFI bindings to platform libraries like libc.
  • openssl
    OpenSSL bindings
  • serde
    A generic serialization/deserialization framework

Those dependencies (and probably more) are heavily used by many applications. When packaging Rust/Golang applications for "main" we could change their dependencies to make use of a base-set supported by Canonical, which should reduce the vendoring burden by a lot, as only additional dependencies (not part of the base-set in use) would need to be tracked individually.

I'd like to gauge your input on this thought. Also CC @liushuyu @zhsj @schopin-pro @samkamer

@slyon slyon changed the title RFC: Should we introduce "base-sets" for vendored dependencies? RFC: Introduce "base-sets" for vendored dependencies Aug 8, 2023
@zhsj
Copy link

zhsj commented Aug 8, 2023

What's the difference with using the packaged libraries (rust-*-dev and golang-*-dev packages)?

If they can use pinned versions in the base-sets, they can also use the packaged libraries. We just need to ensure these packaged libraries are aligned with the base-sets definition.

I think for Go and Rust packages, they are able to use mixed libraries from both vendor directory and system packages.

@slyon
Copy link
Contributor Author

slyon commented Aug 8, 2023

I think base-sets could be working pretty similarly to {rust,golang}-*-dev packaged libraries, just for cases where the dependency is not yet provided as a -*-dev package.

IMO, the final goal is to have properly packaged shared libraries, built from Rust or Golang sources, so we can consume them as we do with normal C libraries. But that's not yet fully supported by those ecosystems.

The next better approach would be using the -*-dev packaged libraries, as you suggested. Pulling in the dependency sources for a static build. This already defines a common version of a dependency, that can be consumed by multiple applications.

Next, having a base-set would allow for the toolchain maintainers to provide a set of common dependencies, using specific versions for their ecosystem. Those base-set .debs could even depend on already packaged -*-dev libraries, if available. But add additional dependencies through "vendoring", without the need to package them all by itself (some of those are pretty tiny). It's effectively a short-cut to stay ahead of the fast moving ecosystems.

The last resort would be for an application to vendor its own dependencies if they are not available otherwise.

All of those stages try to provide a common set of dependencies + specific versions to be used across the archive, to ease long-term maintenance. I.e. we'd need to track and update only that single version. That is not a given if many applications pull in different versions of the same dependency.

@cpaelzer
Copy link
Collaborator

cpaelzer commented Aug 9, 2023

AFAIR one of the biggest reasons to generally go recommending vendored was the non-maturity and churn of rust dependency and updates usually leading to breakage. I remember discussions about a cases like:

PKG A1 -> dep C1
PKG B1 -> deb C1
Update to A1 requring C2 and C1 is not compatible to C2, so we can either break A2 or B1.

Most likely (well, hopefully) things have evolved and I think what you mention as base-sets might very well be the more established libraries nowadays.
Maybe those have gotten more stable and less likely to break API and thereby their dependencies.
If that is true (I can't say) then I'd still follow @zhsj suggestion to then use the rust-*-dev and golang-*-dev packages instead of inventing yet another thing. Your "base-set" might just be a section in the supported-development-common seed marking a few as being in main after an MIR process. Those then should be used from rust-*-dev whenever possible.

In the go world we already go that way, one can use golang-*-dev packages from main or vendored dependencies. And on updates/backports if such mismatches occur former package dependencies are converted to be vendored.

Hint: no one likes to own packages for the effort and responsibility it brings, so I'd expect everyone saying "nah let me vendor all mine they are not stable". Which does not lead to good re-use of code and maintenance effort. I'm afraid that I expect that the initial base-set would need to be owned by the toolchain team that owns rust itself and not by the first person that comes by having a kernel-cmdline-parser that need curl for an optional feature.

This certainly needs more discussion and deeper analysis, thanks for bringing it up @slyon

@setharnold
Copy link
Contributor

See also https://bugs.debian.org/1049413 for a question about how strict we want to be with "rust dependencies must be vendored".

@slyon
Copy link
Contributor Author

slyon commented Nov 9, 2023

In order to bootstrap the new ecosystems and reduce heavy vendoring, I think it would be a good idea if the Foundations Toolchain squad could research/identify the most used/common/important (top 10?) dependencies of their rust/go/.net/java/... ecosystem. Those could then be packaged(?), MIRed (incl transitive dependencies) and owned by the corresponding toolchian maintainers.

This should cover a big & important part of new dependencies. So once a non-toolchain team wants to MIR a new tool, they do not need to be afraid of owning a huge tree of vendored core/base dependencies. Those would be covered by the Toolchain squad and remaining, less common dependencies could still be vendored.

This topic is less important for more mature ecosystems (GCC/Python/Perl/...), as the most important dependencies got into main already over the years and are owned by one team or another. It still might make sense to identify the relevant base-set and take ownership into the toolchian squad.

A "base-set" could either be implemented as a single package containing an ecosystem's Most important (+transitive) dependencies (vendored), or be maintained as a set of normal packages, tracked inside the "supported" seed.

@eslerm
Copy link
Member

eslerm commented Nov 24, 2023

At the in-person MIR meeting, there was discussion about teams owning (or co-owning) specific vendored packages.

Could someone explain what this would mean?

@eslerm
Copy link
Member

eslerm commented Nov 24, 2023

Each package added to a base-set should receive a main inclusion like review.

There are 564 vendored packages in the rustc package which include clap, curl, libc, openssl, and serde (3 major versions of clap even). Since these vendored packages must be maintained for Rust, they will likely inform base-set selection. This will require a lot of labor to review properly.

Possibly, packages which do not require unsafe stanzas and do not use crypto could get a lighter (quicker) review, but vulnerabilities could still exist. @liushuyu had more suggestions on what needs to be checked to allow a light review.

@eslerm
Copy link
Member

eslerm commented Dec 17, 2023

If you make a simple hello world package with a dependency to clap, running cargo vendor will pull in a bunch of dependencies for clap that the hello world base package won't actually use.

This issue seems to be described on https://wiki.ubuntu.com/RustCodeInMain

It’s a simple matter of running cargo vendor where your on the top-level directory. Sadly, it’s not possible to exclude irrelevant dependencies during vendoring yet, so you might want to automate that step and add some post-processing to remove voluminous, unused dependencies, and/or the C code for some system libraries that could be statically linked.

Removing unnecessary vendored packages should be a high priority for quality and maintainability in main.

Possibly an AST could determine which dependencies the base package actually uses, to debloat vendored packages.

See rust-lang/cargo#11929 and rust-lang/cargo#7058

@didrocks
Copy link

To replicate what I wrote on https://bugs.launchpad.net/ubuntu/+source/authd/+bug/2048781/comments/20:

I continued exploring this topic myself last week and was able to rely on a tool developed for this: https://github.com/coreos/cargo-vendor-filterer/.

This tool is not ideal in the sense that:

  • it vendorize the whole content
  • then, it filters by replacing entire crates based on some filtering rules, like arch, platform or file exclusion. The replacement is done by empty module, and rechecksumming.

So basically, cargo and rustc still thinks the crate is available, it just happens to be empty. Consequently, we wouldn’t know if we are impacted or not by security issue before manual checking.

However, I see this as a step in the right direction, so I implemented this in authd: https://github.com/ubuntu/authd/pull/270/files. Here, we are filtering to only keep Linux platform, on all our supported architectures (which is tier 1 and 2 in Rust world).
The benefit is that we are now able to remove our manual recheckshumming after purging the binary library archive files which are part of some crates.

This tool run during the package source build. I would feel better if this was packaged and maintained in ubuntu (as this injects potentially some code), and part of our standard tooling. I will reach out the Rust maintainer for the incoming engineering sprint. I think we can still trust this repository as it’s part of a well-known organization with multi-decades open source famous maintainers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants