Skip to content

Add strip_components to extract/download_and_extract `http_arch…#29281

Open
willstranton wants to merge 3 commits intobazelbuild:masterfrom
willstranton:strip
Open

Add strip_components to extract/download_and_extract `http_arch…#29281
willstranton wants to merge 3 commits intobazelbuild:masterfrom
willstranton:strip

Conversation

@willstranton
Copy link
Copy Markdown
Contributor

Add strip_components to extract/download_and_extract http_archive

Description

The strip_components attribute functions similar to tar --strip-components:

Strip NUMBER leading components from file names on extraction.

This is an alternative to the existing strip_prefix attribute, which required knowing the exact prefix to be stripped. Only one of the two attributes (strip_prefix, strip_components) can be set at one time.

Motivation

See #28879

Build API Changes

  1. Has this been discussed in a design doc or issue? (Please link it)

See #28879

  1. Is the change backward compatible?

Yes

  1. If it's a breaking change, what is the migration plan?

N/A - this is not a breaking change.

Checklist

  • I have added tests for the new use cases (if any).
  • I have updated the documentation (if applicable).

Release Notes

RELNOTES[NEW]: Adds the strip_components attribute to extract/download_and_extract/http_archive to allow stripping of path components when extracting files.

@willstranton willstranton force-pushed the strip branch 2 times, most recently from da36a7a to 613bd88 Compare April 13, 2026 22:11
@willstranton willstranton marked this pull request as ready for review April 13, 2026 22:12
@github-actions github-actions bot added team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. team-Core Skyframe, bazel query, BEP, options parsing, bazelrc awaiting-review PR is awaiting review from an assigned reviewer labels Apr 13, 2026
@meteorcloudy
Copy link
Copy Markdown
Member

which required knowing the exact prefix to be stripped.

If the source archive URL is deterministic, the exact prefix should be known?

@willstranton
Copy link
Copy Markdown
Contributor Author

If the source archive URL is deterministic, the exact prefix should be known?

Yes, that's true, but it's inconvenient to have to examine an archive to determine that exact prefix. This pull request is a "quality of life" improvement. As you point out, it's not a "must have".

Summarizing from the community:

  1. Copying the inconvenience expressed by the original issue filer in http_archive (also repository_ctx.extract) strip_components #28879 and why it's useful to have:

Archives often have a containing directories.

Sometimes, this is long or not easily memorable -- a version number, or a commit hash
Sometimes, this is not readily known. E.g. npm packages usually use a package/ prefix, but not always.
Usually, users don't actually care what the leading component is, they just want to remove it.
...
This feature is in both BSD and GNU tar; it's very useful.
While no mentioned in my original comment, it would also be very useful for archive_override (bzlmod).

I remember having to update dependencies manually before BCR. You had to update the tar archive AND the prefix that was stripped.

  1. Feature request: download_and_extract(strip_prefix="*") #13960 is an earlier request from 2021 that expresses similar friction.

When first adding a http_archive (or alternative) to your workspace, it's easy enough to find what the top level directory is called... but with many archives it requires a bit more effort...
...with dependencies that change... this can get very tiresome....
My particular use case is a custom build definition that provides a simpler interface to private repositories... I don't know of any justification for requiring strip_prefix to be specified manually.

  1. Issue 28879 has at least 2 members commenting on/in agreement with this proposal. With me being the author of this pull request, that makes 3. The second issue 13960 has two members commenting as well. So 5? people who want this solved somehow? I'll admit that counting users can be disingenuous since they could all be from the same company/friends rallying each other on. I have no relation to any of folks mentioned.

@meteorcloudy
Copy link
Copy Markdown
Member

OK, thanks for the context! If we do this, we should also backport this to Bazel 8 & 9, so that modules can keep the compatibility with multiple LTS releases when using this feature.

Comment thread tools/build_defs/repo/http.bzl Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new strip_components integer attribute/parameter (similar to tar --strip-components) to http_archive, repository_ctx.download_and_extract, and repository_ctx.extract, enabling prefix stripping without knowing an exact directory name.

Changes:

  • Introduces strip_components plumbing from Starlark (http_archive, download_and_extract, extract) down to the Java decompressor layer.
  • Implements component stripping during extraction for .zip, .7z, and tar-based archives.
  • Adds/updates integration + unit tests covering component stripping and rename-ordering behavior.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tools/build_defs/repo/http.bzl Adds strip_components attr, enforces mutual exclusivity with strip_prefix, passes through to download_and_extract.
src/main/java/com/google/devtools/build/lib/vfs/PathFragment.java Adds PathFragment.stripComponents(int) utility used by decompressors.
src/main/java/com/google/devtools/build/lib/bazel/repository/starlark/StarlarkBaseExternalContext.java Adds strip_components params to download_and_extract/extract and wires into decompression.
src/main/java/com/google/devtools/build/lib/bazel/repository/decompressor/DecompressorDescriptor.java Adds stripComponents field + builder validation for mutual exclusivity with prefix.
src/main/java/com/google/devtools/build/lib/bazel/repository/decompressor/ZipDecompressor.java Applies component stripping to zip entry paths before extraction.
src/main/java/com/google/devtools/build/lib/bazel/repository/decompressor/SevenZDecompressor.java Applies component stripping to 7z entry paths before extraction.
src/main/java/com/google/devtools/build/lib/bazel/repository/decompressor/CompressedTarFunction.java Applies component stripping to tar entry paths before extraction.
src/main/java/com/google/devtools/build/lib/bazel/repository/decompressor/CompressedFunction.java Updates docs to note stripComponents is ignored for single-file compressor formats.
src/test/shell/bazel/external_integration_test.sh Adds http_archive integration coverage for strip_components (tar/zip + add_prefix).
src/test/java/com/google/devtools/build/lib/vfs/PathFragmentTest.java Adds unit tests for PathFragment.stripComponents.
src/test/java/com/google/devtools/build/lib/bazel/repository/starlark/StarlarkBaseExternalContextTest.java Updates test calls for new downloadAndExtract signature.
src/test/java/com/google/devtools/build/lib/bazel/repository/decompressor/ZipDecompressorTest.java Adds zip decompression tests for strip_components (+ rename ordering).
src/test/java/com/google/devtools/build/lib/bazel/repository/decompressor/SevenZDecompressorTest.java Adds 7z decompression tests for strip_components (+ rename ordering + strip-all).
src/test/java/com/google/devtools/build/lib/bazel/repository/decompressor/CompressedTarFunctionTest.java Adds tar.gz decompression tests for strip_components (+ rename ordering).
src/test/tools/bzlmod/MODULE.bazel.lock Updates lockfile digests due to test/module changes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/main/java/com/google/devtools/build/lib/vfs/PathFragment.java Outdated
willstranton added a commit to willstranton/bazel that referenced this pull request Apr 16, 2026
@meteorcloudy
Copy link
Copy Markdown
Member

Thanks, please run Please run "bazel run //src/test/tools/bzlmod:update_default_lock_file" to address CI failure

@meteorcloudy
Copy link
Copy Markdown
Member

Let me know this is fixed, I will add the import label

…ive`

The `strip_components` attribute functions similar to tar --strip-components:

> Strip NUMBER leading components from file names on extraction.

This is an alternative to the existing `strip_prefix` attribute, which required
knowing the exact prefix to be stripped. Only one of the two attributes
(`strip_prefix`, `strip_components`) can be set at one time.

Fixes bazelbuild#28879

RELNOTES[NEW]: Adds the `strip_components` attribute to `extract`/`download_and_extract`/`http_archive` to allow stripping of path components when extracting files.
@willstranton
Copy link
Copy Markdown
Contributor Author

Let me know this is fixed, I will add the import label

CI now passing.

@meteorcloudy meteorcloudy added awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally and removed awaiting-review PR is awaiting review from an assigned reviewer labels Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally team-Core Skyframe, bazel query, BEP, options parsing, bazelrc team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants