Skip to content

Conversation

@mering
Copy link
Contributor

@mering mering commented Apr 28, 2025

The runfiles.bash file was added in #13 but was never used. This is probably the reason why it was never adapted to the new location. This PR fixes the script and starts using it. Note that it keeps the old runfiles script as a dependency to avoid breaking bootstrap snippets (tests with old bootstrap still pass). See #22 for two potential follow-ups (1. adapt tests to use runfiles script from this repo and 2. remove dependency to script from bazel_tools).

@mering mering force-pushed the fix-runfiles branch 7 times, most recently from 9dee93c to 03a0db2 Compare April 28, 2025 15:06
Comment on lines +93 to 98
elif [[ -f "$0.runfiles/rules_shell/shell/runfiles/runfiles.bash" ]]; then
export RUNFILES_DIR="$0.runfiles"
elif ls "$0.runfiles/rules_shell~"*"/shell/runfiles/runfiles.bash" 1> /dev/null 2>&1; then
export RUNFILES_DIR="$0.runfiles"
elif [[ -f "$0.runfiles/rules_shell+/shell/runfiles/runfiles.bash" ]]; then
export RUNFILES_DIR="$0.runfiles"
Copy link
Contributor Author

@mering mering Apr 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we need to handle 3 cases

  1. WORKSPACE: rules_shell
  2. MODULE.bazel <=7.x: rules_shell~VERSION and rules_shell~ (solved by using a glob with ls and interpreting the return code)
  3. MODULE.bazel >=8.x: rules_shell+

@fmeum
Copy link
Collaborator

fmeum commented Apr 28, 2025

Could you explain why you want to make it? #13 should have already made it so that the new runfiles library is used on Bazel 8 without requiring any change to the copy-pasted snippet.

CI coverage for Bazel 8 was missing, but everything seems to work: #23

@mering
Copy link
Contributor Author

mering commented Apr 28, 2025

Using a path in bazel_tools instead of rules_shell is really confusing and makes it even more difficult to figure out what is going on to debug failures. New users should use the location in rules_shell directly.

Furthermore, I think it would be better if the script checks for its own existence instead of going through the indirection of the symlink.

@fmeum
Copy link
Collaborator

fmeum commented Apr 28, 2025

I can see that this is slightly confusing if you study the snippet in detail and are also aware of the old location, but is that really a problem in practice? The snippet is kind of code golfed anyway and really just something people copy & paste everywhere.

I fully agree that having a path under rules_shell would be nicer, but given that we need to keep the symlink around for backwards compatibility anyway and that the rules_shell path isn't static and actually a Bazel implementation detail, I just haven't been able to convince myself the migration would be worth it.

Could you elaborate on the problem with the file checking its own existence through a symlink? Is this more of a stylistic concern?

@mering
Copy link
Contributor Author

mering commented Apr 28, 2025

I can see that this is slightly confusing if you study the snippet in detail and are also aware of the old location, but is that really a problem in practice? The snippet is kind of code golfed anyway and really just something people copy & paste everywhere.

We regularly have to spent days debugging runfiles as things are quite fragile (changing with different Bazel versions, packaging in docker container layers, transferring to a different machine via tarballs, multiple layers of runfiles like an sh_binary calling a go_binary calling a py_binary each with their set of runfiles, ...). Having every colleague debugging such a failure spending several hours trying to understand what is happening is a big waste of time.

I fully agree that having a path under rules_shell would be nicer, but given that we need to keep the symlink around for backwards compatibility anyway and that the rules_shell path isn't static and actually a Bazel implementation detail, I just haven't been able to convince myself the migration would be worth it.

Why don't we start adding support with the current Bazel 8 path, hope it won't change again in the next versions. Then in Bazel 10, we can just remove the old path. Changing the snippet via search&replace should also be an easy task.

Could you elaborate on the problem with the file checking its own existence through a symlink? Is this more of a stylistic concern?

Things being much more fragile. For example, this might only work when used via bazel run and bazel test and might break when packaged into a .tar file or container layer as the symlink points outside of the runfiles boundary.

@fmeum
Copy link
Collaborator

fmeum commented Apr 28, 2025

Things being much more fragile. For example, this might only work when used via bazel run and bazel test and might break when packaged into a .tar file or container layer as the symlink points outside of the runfiles boundary.

I don't think that's true: Despite its name, root_symlinks results in runfiles that look just like ordinary runfiles - there is no additional level of symlink indirection. They just happen to have a runfiles path that isn't derives from their execpath.

Changing the snippet via search&replace should also be an easy task.

That's true for a monorepo, but the difficulty with this kind of breaking change lies in the long tail of external dependencies that rely on this runfiles library.

We regularly have to spent days debugging runfiles as things are quite fragile (changing with different Bazel versions, packaging in docker container layers, transferring to a different machine via tarballs, multiple layers of runfiles like an sh_binary calling a go_binary calling a py_binary each with their set of runfiles, ...).

Did this PR arise from such a debugging session? If so, could you explain the root cause? I would like to avoid an XY problem and first understand the full design space we have for improving usability.

Generally speaking, runfiles tend to work well if 1) rules properly forward them (which isn't obvious, even filegroup is broken in this sense and _repo_mapping is often ignored) and 2) the top-level process sets the runfiles env variables (which unfortunately can't be verified reliably).

@mering
Copy link
Contributor Author

mering commented Apr 29, 2025

Changing the snippet via search&replace should also be an easy task.

That's true for a monorepo, but the difficulty with this kind of breaking change lies in the long tail of external dependencies that rely on this runfiles library.

This is why one gives users some versions to migrate. Especially since this is a proper module now, people can use single_version_override to keep using the old functionality even longer.

We regularly have to spent days debugging runfiles as things are quite fragile (changing with different Bazel versions, packaging in docker container layers, transferring to a different machine via tarballs, multiple layers of runfiles like an sh_binary calling a go_binary calling a py_binary each with their set of runfiles, ...).

Did this PR arise from such a debugging session? If so, could you explain the root cause? I would like to avoid an XY problem and first understand the full design space we have for improving usability.

It came up in multiple of such debugging sessions. Sometimes it arises from our internal rules, sometimes from public rules (runfiles path changing on disk with Bazel versions, rules_go providing two different runfiles library implementations with different functionality, Python runfiles not falling back to $0.runfiles (see my open PR), the behavior change from bazel_tools to rules_shell between Bazel versions, ...

So 1. Making it more obvious what is happening to facilitate debugging and 2. Making the copyable snipper shorter and simpler to understand it faster and facilitate regex based migrations.

Generally speaking, runfiles tend to work well if 1) rules properly forward them (which isn't obvious, even filegroup is broken in this sense and _repo_mapping is often ignored) and 2) the top-level process sets the runfiles env variables (which unfortunately can't be verified reliably).

As someone has to work to make existing rules and processes runtime environments compliant to these rules, this usually requires deep understanding and a lot of debugging.

@fmeum
Copy link
Collaborator

fmeum commented Apr 30, 2025

This is why one gives users some versions to migrate. Especially since this is a proper module now, people can use single_version_override to keep using the old functionality even longer.

If this was my personal ruleset, I would just do this. But rules_shell is a core ruleset and I don't think there is even a policy for how to handle incompatible changes in starlarkified core rulesets yet.

CC @meteorcloudy who may have thought about this already

Python runfiles not falling back to $0.runfiles (see my open PR)

I would like to help fix your use case, but I'm pretty sure that that PR is not the right way.

the behavior change from bazel_tools to rules_shell between Bazel versions

I still don't see what this change in behavior is. Do you have an example that used to work when consuming the runfiles library from bazel_tools but no longer does when upgrading to a Bazel version that uses rules_shell internally?

As someone has to work to make existing rules and processes runtime environments compliant to these rules, this usually requires deep understanding and a lot of debugging.

It should only require a deep understanding if you actually work on a runfiles library. For users, assuming no bugs in runfiles libraries, forwarding the environment variables returned by the runfiles library should be the only thing they need to take care of. I don't see any way around this minimal requirement.

@mering
Copy link
Contributor Author

mering commented Apr 30, 2025

the behavior change from bazel_tools to rules_shell between Bazel versions

I still don't see what this change in behavior is. Do you have an example that used to work when consuming the runfiles library from bazel_tools but no longer does when upgrading to a Bazel version that uses rules_shell internally?

This paragraph was about debugging, you need to understand which repo to override for debugging and at which file you need to look in the first place.

As someone has to work to make existing rules and processes runtime environments compliant to these rules, this usually requires deep understanding and a lot of debugging.

It should only require a deep understanding if you actually work on a runfiles library. For users, assuming no bugs in runfiles libraries, forwarding the environment variables returned by the runfiles library should be the only thing they need to take care of. I don't see any way around this minimal requirement.

  • "assuming no bugs": this assumption is regularly violated
  • missing documentation: you often don't even know what is the supposed behavior, so you are only left with debugging the implementation
  • runfiles libraries for different languages behaving differently

@meteorcloudy
Copy link
Member

If this was my personal ruleset, I would just do this. But rules_shell is a core ruleset and I don't think there is even a policy for how to handle incompatible changes in starlarkified core rulesets yet.

I think we need to be careful about making incompatible change that'll impact everyone. For such changes, I would at least expect an issue to document it with a similar template like: https://github.com/bazelbuild/bazel/issues/new?template=incompatible_change.yml

Sounds like the main issue is some confusion caused by runfiles library while debugging. Can we mitigate this with better documentation instead of an incompatible change?

@mering mering closed this Aug 21, 2025
@mering mering deleted the fix-runfiles branch August 21, 2025 11:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants