Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change of --output_base causes subsequent builds to fail #10653

Closed
konste opened this issue Jan 24, 2020 · 18 comments
Closed

Change of --output_base causes subsequent builds to fail #10653

konste opened this issue Jan 24, 2020 · 18 comments
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Core Skyframe, bazel query, BEP, options parsing, bazelrc team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website type: bug

Comments

@konste
Copy link
Contributor

konste commented Jan 24, 2020

Description of the problem / feature request:

Change in --output_base parameter causes the build to break with very obscure error messages.

Feature requests: what underlying problem are you trying to solve with this feature?

When build is done using some IDEs (VSCode or IntelliJ) they tend to set their own --output_base, different from the one configured by the user for the command line builds. This should not cause any problems, but unfortunately it appears that after --output_base changes the build is broken.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Select any simplest Bazel project. Build it with bazel --output_base=C:/blah1 build ... so it builds successfully. Then try to build it with bazel --output_base=C:/blah2 build ... this time it breaks with the error messages which don't make any sense.

What operating system are you running Bazel on?

The problem does not seem to be OS dependent.

What's the output of bazel info release?

2.0.0

Any other information, logs, or outputs that you want to share?

I figured that the problem is must probably caused by the stale "courtesy" symlink in WORKSPACE folder. After the first build Bazel creates "courtesy" symlink such as bazel-<workspace_name> in the workspace folder and it points inside output_base. When we issue second build command with the different output_base Bazel is smart enough to realize that and spawn second build server process, but unfortunately that stale bazel-<workspace_name> symlink still stays and points inside the old output_base which seems to confuse Bazel. Running bazel clean between builds or simply deleting of that symlink fixes the problem. It looks like when Bazel discovers the change in startup parameters which warrants spawning new build server it should at the same time remove existing courtesy symlinks as they are not valid anymore and cause the build to fail.

@jin
Copy link
Member

jin commented Jan 27, 2020

Reproduced with Bazel 2.0:

03:30:25 /tmp/ws
$ cat BUILD
genrule(
    name = "g",
    outs = ["g.txt"],
    cmd = "touch $@",
)
03:30:28 /tmp/ws
$ cat WORKSPACE
03:30:31 /tmp/ws
$ bazel --output_base=/tmp/one build //... && bazel --output_base=/tmp/two build //...
INFO: Analyzed target //:g (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //:g up-to-date:
  bazel-bin/g.txt
INFO: Elapsed time: 0.119s, Critical Path: 0.00s
INFO: 0 processes.
INFO: Build completed successfully, 1 total action
ERROR: error loading package 'bazel-ws/external/bazel_tools/tools/build_defs/pkg': Label '//tools/python:private/defs.bzl' is invalid because 'tools/python' is not a package; perhaps you meant to put the colon here: '//:tools/python/private/defs.bzl'?
INFO: Elapsed time: 0.219s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (12 packages loaded)
    currently loading: bazel-ws/external/bazel_tools/tools/test/CoverageOutp\
utGenerator/java/com/google/devtools/coverageoutputgenerator ... (2 packages\
)
    Fetching @rules_java; fetching

@jin jin added team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website untriaged type: bug labels Jan 27, 2020
@elklein
Copy link
Contributor

elklein commented Feb 5, 2020

I'm seeing something very similar, but even more basic when using the latest pre-release vscode-bazel (which sets --output_base by default). vscode-bazel runs this command:

bazel --output_base=/tmp/ee79067f914abe58284ab7a8abdc7f7d query ...:* --output=package

It fails with the following output:

Extracting Bazel installation...
Starting local Bazel server and connecting to it...
INFO: Call stack for the definition of repository 'rules_cc' which is a http_archive (rule definition at /tmp/ee79067f914abe58284ab7a8abdc7f7d/external/bazel_tools/tools/build_defs/repo/http.bzl:292:16):
 - /tmp/ee79067f914abe58284ab7a8abdc7f7d/external/bazel_tools/tools/build_defs/repo/utils.bzl:205:9
 - /DEFAULT.WORKSPACE.SUFFIX:302:1
ERROR: error loading package 'bazel-sdk/external/bazel_tools/third_party/jarjar': Label '//tools/jdk:remote_java_tools_aliases.bzl' is invalid because 'tools/jdk' is not a package; perhaps you meant to put the colon here: '//:tools/jdk/remote_java_tools_aliases.bzl'?

If I remove the --output_base, the query succeeds just fine. If I change the --output_base to --output_user_root, the query succeeds just fine. It's almost like bazel cannot modify things in install_base when outputBase has been specified like that. Perhaps the query is sandboxed and overriding output_base is causing install_base to not be in the sandbox? Just wild-guessing.

For reference, I've seen a number of different but very similar errors and they all involve downloading http_archive rules (or similar download rules) for setting up a workspace prior to running a query. I think the intent of output_base was to not disturb the main workspace environment (and thus be able to run concurrently), but maybe the install_base is write-only in that case? For what it's worth the install_base HAD been populated with the downloads prior to running the query with --output_base, so I'm guessing the need to re-download is because changing output_base invalidated something. Maybe that's a clue?

@hanneskaeufler
Copy link
Contributor

Is anyone actively working on this? This is a pretty annoying bug that affects how e.g. Jenkins build nodes have to be spawned because the cannot handle multiple executors being isolated with output_base. I'd be willing to investigate a fix but am not familiar enough with bazel's codebase to even know where to start looking. Any pointers would be appreciated :)

@yev3
Copy link

yev3 commented Aug 12, 2020

I found a workaround, but I don't know why it works:

  • Let $WORKSPACE=app
  • Remove the existing symlink bazel-app
  • Replace with a dummy file echo "dummy" >| bazel-app
  • The builds no longer fail, but you get a warning: failed to create one or more convenience symlinks for prefix 'bazel-'

I tried setting --symlink_prefix, --experimental_use_sandboxfs, but it did not work. It's as if 'bazel-' is hardcoded somewhere.

I hope that helps.

@laurentlb laurentlb added the team-Core Skyframe, bazel query, BEP, options parsing, bazelrc label Aug 12, 2020
@janakdr
Copy link
Contributor

janakdr commented Sep 17, 2020

The basic problem here is that the bazel-$WORKSPACE convenience symlink is blindly traversed by Bazel: you can even do bazel build //bazel-$WORKSPACE/... and it will load the labels without any issues, and even build the targets there if you're lucky. When you switch output bases, the symlinks are broken, but the fundamental issue is visiting that convenience symlink in the first place.

My suggestion for a workaround for now is to do echo "bazel-$WORKSPACE" >> .bazelignore. The .bazelignore file in the root of the workspace tells Bazel not to consider those directories. Of course, Bazel should be smart enough to not consider them on its own.

cc @mhy1992

@janakdr janakdr added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Sep 17, 2020
@372046933
Copy link

I found that janakdr's solution works on a simple Bazel project which have very little external dependency. But BAZEL BUILD TARGET still fails on complex projects like serving

@janakdr
Copy link
Contributor

janakdr commented Feb 6, 2021

Basic question for people experiencing this issue: can the "alternate" output_root invocation just pass --symlink_prefix=/? If the invocation is being done by an automated process, it shouldn't need those convenience symlinks at all, and it means that your build outputs would remain easily accessible even without doing another build.

@phst
Copy link
Contributor

phst commented May 18, 2021

At least for me, neither --symlink_prefix=/ nor --experimental_convenience_symlinks=ignore improve anything.

@janakdr
Copy link
Contributor

janakdr commented May 19, 2021

Does passing --experimental_no_product_name_out_symlink help?

@phst
Copy link
Contributor

phst commented May 23, 2021

Not for me. In fact, with that option, I get even more error messages.

@RajivKurian
Copy link

RajivKurian commented Jun 15, 2021

The workarounds don't work for me either. Deleting the symlinks did not help, nor did passing in the flags mentioned. I get error messages of the form:

ERROR: error loading package 'my_bazel_output/external/bazel_skylib': cannot load '//:bzl_library.bzl': no such file

I am wondering what state is causing Bazel to get confused even when the symlinks have all been deleted.

I also see errors of the form:

ERROR: error loading package 'my_old_bazel_output/external/rules_python/python/runfiles': Label '//python:defs.bzl' is invalid because 'python' is not a package; perhaps you meant to put the colon here: '//:python/defs.bzl'?

This suggests that somehow the new output folder is looking for files in the old one through the symlinks. Perhaps I missed deleting some symlinks. Deleting my_old_bazel_output and restarting the bazel server seems to fix the problem but that prevents the ability to have two or more concurrent build folders that the user can switch between.

This issue was surprising to me, given the Bazel output directory layout page explicitly mentions:

The symlinks for “bazel-”, “bazel-out”, “bazel-testlogs”, and “bazel-bin” are put in the workspace directory; these symlinks point to some directories inside a target-specific directory inside the output directory. These symlinks are only for the user’s convenience, as Bazel itself does not use them. Also, this is done only if the workspace directory is writable.

EDIT: Ignore my comment. I triaged my particular issue to #13601.

@OscarVanL
Copy link

For those who are having this failure with the VSCode Bazel Extension with an error along the lines of:

Command failed: bazel --output_base=/var/folders/z4/tjbsqpbj5pz3_mh9jzb4vhxw0000gn/T/5dc9f0e710b2578b352c533d7f70060e query ...:* --output=package
Loading: 0 packages loaded
ERROR: error loading package '': at /Users/--------/go/src/gitlab.com/-----/-----/build/go/-----.bzl:3:6: Every .bzl file must have a corresponding package, but '@bazel_gazelle//:deps.bzl' does not have one. Please create a BUILD file in the same or any parent directory. Note that this BUILD file does not need to do anything except exist.
Loading: 0 packages loaded
Loading: 0 packages loaded

For me, this problem was rectified by opening the VSCode Bazel extension settings, finding the "Queries Share Server" checkbox, and ensuring it is ticked (unticked by default).

The box has the description: "Use the same Bazel server for queries and builds. By default, vscode-bazel uses a separate server for queries so that they can be executed in parallel with builds. You can enable this setting if running multiple Bazel servers has a negative performance impact on your system, but you may experience degraded performance in Visual Studio Code for operations that require queries."

I think the root cause is probably related to this issue, because (I think) to get the separate server the --output_base argument is provided, which then runs into the issue you describe here.

@ywrt
Copy link

ywrt commented Jul 3, 2022

For those coming from the future: You'll also get this sort of error if the workspace name in your WORKSPACE file differs from the enclosing directory name.

e.g. if you have foo/WORKSPACE and that file contains 'workspace(name="bar")' you will get weird and confusing errors.

@cameron-martin
Copy link
Contributor

FYI this is the most commonly reported issue on the vscode bazel extension, and there is no clear workaround for it while using a distinct output base.

I initially thought we could pass --deleted_packages, but this doesn't take package prefixes, so the only way of making this work would be to enumerate all packages in the bazel-* directories.

@cameron-martin
Copy link
Contributor

Possibly the issue lies here:

if (packageId.getRepository().isMain()
&& fileValue.isSymlink()
&& fileValue
.getUnresolvedLinkTarget()
.startsWith(directories.getExecRootBase().asFragment())) {
// Symlinks back to the execroot are not traversed so that we avoid convenience symlinks.
// Note that it's not enough to just check for the convenience symlinks themselves,
// because if the value of --symlink_prefix changes, the old symlinks are left in place. This
// algorithm also covers more creative use cases where people create convenience symlinks
// somewhere in the directory tree manually.
return ProcessPackageDirectoryResult.EMPTY_RESULT;
}

cameron-martin added a commit to cameron-martin/bazel that referenced this issue Jan 23, 2024
When the output base is changed after the convenience symlinks, queries such as `//...` try to traverse into them and load them as packages since they are only excluded based on whether the resolved location is in the output base. This adds an additional check to see whether they have the symlink prefix. This will not catch both the symlink prefix and the output base changing, but I think this case is rare.

Fixes bazelbuild#10653
cameron-martin added a commit to bazelbuild/vscode-bazel that referenced this issue Jan 26, 2024
The bazel query used to introspect the targets available incorrectly traverses convenience symlinks when the output base has changed. This is a bug in Bazel (see bazelbuild/bazel#10653). A lot of people are hitting this, since it happens with the default setup of bazel and this plugin. Until it is fixed upstream, this works around this issue by enabling the `queriesShareServer` option by default.

Workaround for #216

Co-authored-by: Cameron Martin <cameronmartin123@gmail.com>
cameron-martin added a commit to cameron-martin/bazel that referenced this issue Feb 15, 2024
When the output base is changed after the convenience symlinks have been created, queries such as `//...` try to traverse into them and load them as packages since they are only excluded based on whether the resolved location is in the output base. This adds some heuristics to determine if a symlink is a convenience symlink:

1. The symlink name has an appropriate suffix.
2. An ancestor of the symlink target at the appropriate level is called `execroot`.
3. The `execroot` directory contains a file called `DO_NOT_BUILD_HERE`.

These heuristics should work if both the output base and the symlink prefix change while being quite robust to false positives.

Fixes bazelbuild#10653
cameron-martin added a commit to cameron-martin/bazel that referenced this issue Feb 15, 2024
When the output base is changed after the convenience symlinks have been created, queries such as `//...` try to traverse into them and load them as packages since they are only excluded based on whether the resolved location is in the output base. This adds some heuristics to determine if a symlink is a convenience symlink:

1. The symlink name has an appropriate suffix.
2. An ancestor of the symlink target at the appropriate level is called `execroot`.
3. The `execroot` directory contains a file called `DO_NOT_BUILD_HERE`.

These heuristics should work if both the output base and the symlink prefix change while being quite robust to false positives.

Fixes bazelbuild#10653
cameron-martin added a commit to cameron-martin/bazel that referenced this issue Feb 16, 2024
When the output base is changed after the convenience symlinks have been created, queries such as `//...` try to traverse into them and load them as packages since they are only excluded based on whether the resolved location is in the output base. This adds some heuristics to determine if a symlink is a convenience symlink:

1. The symlink name has an appropriate suffix.
2. An ancestor of the symlink target at the appropriate level is called `execroot`.
3. The `execroot` directory contains a file called `DO_NOT_BUILD_HERE`.

These heuristics should work if both the output base and the symlink prefix change while being quite robust to false positives.

Fixes bazelbuild#10653
cameron-martin added a commit to cameron-martin/bazel that referenced this issue Feb 16, 2024
When the output base is changed after the convenience symlinks have been created, queries such as `//...` try to traverse into them and load them as packages since they are only excluded based on whether the resolved location is in the output base. This adds some heuristics to determine if a symlink is a convenience symlink:

1. The symlink name has an appropriate suffix.
2. An ancestor of the symlink target at the appropriate level is called `execroot`.
3. The `execroot` directory contains a file called `DO_NOT_BUILD_HERE`.

These heuristics should work if both the output base and the symlink prefix change while being quite robust to false positives.

Fixes bazelbuild#10653
bazel-io pushed a commit to bazel-io/bazel that referenced this issue Feb 27, 2024
When the output base is changed after the convenience symlinks have been created, queries such as `//...` try to traverse into them and load them as packages since they are only excluded based on whether the resolved location is in the output base. This adds some heuristics to determine if a symlink is a convenience symlink:

1. The symlink name has an appropriate suffix.
2. An ancestor of the symlink target at the appropriate level is called `execroot`.
3. The `execroot` directory contains a file called `DO_NOT_BUILD_HERE`.

These heuristics should work if both the output base and the symlink prefix change while being quite robust to false positives.

This is important for IDE integration, where the [output base is often changed](https://bazel.build/run/scripts#output-base-option) for queries, to prevent concurrent builds from blocking them. An example of this is the vscode-bazel extension.

Fixes bazelbuild#10653
Fixes bazelbuild#13951

Closes bazelbuild#21005.

PiperOrigin-RevId: 610667735
Change-Id: I1869c9a2063f7f526950e48c0b1ee6efa89fd202
phst added a commit to phst/rules_elisp that referenced this issue Feb 27, 2024
github-merge-queue bot pushed a commit that referenced this issue Mar 4, 2024
…21505)

When the output base is changed after the convenience symlinks have been
created, queries such as `//...` try to traverse into them and load them
as packages since they are only excluded based on whether the resolved
location is in the output base. This adds some heuristics to determine
if a symlink is a convenience symlink:

1. The symlink name has an appropriate suffix.
2. An ancestor of the symlink target at the appropriate level is called
`execroot`.
3. The `execroot` directory contains a file called `DO_NOT_BUILD_HERE`.

These heuristics should work if both the output base and the symlink
prefix change while being quite robust to false positives.

This is important for IDE integration, where the [output base is often
changed](https://bazel.build/run/scripts#output-base-option) for
queries, to prevent concurrent builds from blocking them. An example of
this is the vscode-bazel extension.

Fixes #10653
Fixes #13951

Closes #21005.

Commit
8f74ace

PiperOrigin-RevId: 610667735
Change-Id: I1869c9a2063f7f526950e48c0b1ee6efa89fd202

Co-authored-by: Cameron Martin <cameronmartin123@gmail.com>
Co-authored-by: Yun Peng <pcloudy@google.com>
@iancha1992
Copy link
Member

A fix for this issue has been included in Bazel 7.1.0 RC2. Please test out the release candidate and report any issues as soon as possible.
If you're using Bazelisk, you can point to the latest RC by setting USE_BAZEL_VERSION=7.1.0rc2. Thanks!

@LittleCuteBug
Copy link

Hi, I found this issue is still not properly handled, it still misses a case when the output_base is inside the workspace
To reproduce, just config output_base as bazel-cache, and build the //... target
Please help to also support this case if you don't might

@iancha1992
Copy link
Member

Hi, I found this issue is still not properly handled, it still misses a case when the output_base is inside the workspace To reproduce, just config output_base as bazel-cache, and build the //... target Please help to also support this case if you don't might

@LittleCuteBug do you have a reproducible code or repo?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Core Skyframe, bazel query, BEP, options parsing, bazelrc team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.