Fix runfiles.bash and start using it #21

mering · 2025-04-28T13:33:51Z

The runfiles.bash file was added in #13 but was never used. This is probably the reason why it was never adapted to the new location. This PR fixes the script and starts using it. Note that it keeps the old runfiles script as a dependency to avoid breaking bootstrap snippets (tests with old bootstrap still pass). See #22 for two potential follow-ups (1. adapt tests to use runfiles script from this repo and 2. remove dependency to script from bazel_tools).

mering · 2025-04-28T15:08:08Z

shell/runfiles/runfiles.bash

+  elif [[ -f "$0.runfiles/rules_shell/shell/runfiles/runfiles.bash" ]]; then
+    export RUNFILES_DIR="$0.runfiles"
+  elif ls "$0.runfiles/rules_shell~"*"/shell/runfiles/runfiles.bash" 1> /dev/null 2>&1; then
+    export RUNFILES_DIR="$0.runfiles"
+  elif [[ -f "$0.runfiles/rules_shell+/shell/runfiles/runfiles.bash" ]]; then
    export RUNFILES_DIR="$0.runfiles"


Note that we need to handle 3 cases

WORKSPACE: rules_shell

MODULE.bazel <=7.x: rules_shell~VERSION and rules_shell~ (solved by using a glob with ls and interpreting the return code)

MODULE.bazel >=8.x: rules_shell+

fmeum · 2025-04-28T16:15:29Z

Could you explain why you want to make it? #13 should have already made it so that the new runfiles library is used on Bazel 8 without requiring any change to the copy-pasted snippet.

CI coverage for Bazel 8 was missing, but everything seems to work: #23

mering · 2025-04-28T17:05:44Z

Using a path in bazel_tools instead of rules_shell is really confusing and makes it even more difficult to figure out what is going on to debug failures. New users should use the location in rules_shell directly.

Furthermore, I think it would be better if the script checks for its own existence instead of going through the indirection of the symlink.

fmeum · 2025-04-28T18:39:14Z

I can see that this is slightly confusing if you study the snippet in detail and are also aware of the old location, but is that really a problem in practice? The snippet is kind of code golfed anyway and really just something people copy & paste everywhere.

I fully agree that having a path under rules_shell would be nicer, but given that we need to keep the symlink around for backwards compatibility anyway and that the rules_shell path isn't static and actually a Bazel implementation detail, I just haven't been able to convince myself the migration would be worth it.

Could you elaborate on the problem with the file checking its own existence through a symlink? Is this more of a stylistic concern?

mering · 2025-04-28T19:12:33Z

I can see that this is slightly confusing if you study the snippet in detail and are also aware of the old location, but is that really a problem in practice? The snippet is kind of code golfed anyway and really just something people copy & paste everywhere.

We regularly have to spent days debugging runfiles as things are quite fragile (changing with different Bazel versions, packaging in docker container layers, transferring to a different machine via tarballs, multiple layers of runfiles like an sh_binary calling a go_binary calling a py_binary each with their set of runfiles, ...). Having every colleague debugging such a failure spending several hours trying to understand what is happening is a big waste of time.

I fully agree that having a path under rules_shell would be nicer, but given that we need to keep the symlink around for backwards compatibility anyway and that the rules_shell path isn't static and actually a Bazel implementation detail, I just haven't been able to convince myself the migration would be worth it.

Why don't we start adding support with the current Bazel 8 path, hope it won't change again in the next versions. Then in Bazel 10, we can just remove the old path. Changing the snippet via search&replace should also be an easy task.

Could you elaborate on the problem with the file checking its own existence through a symlink? Is this more of a stylistic concern?

Things being much more fragile. For example, this might only work when used via bazel run and bazel test and might break when packaged into a .tar file or container layer as the symlink points outside of the runfiles boundary.

fmeum · 2025-04-28T21:59:48Z

Things being much more fragile. For example, this might only work when used via bazel run and bazel test and might break when packaged into a .tar file or container layer as the symlink points outside of the runfiles boundary.

I don't think that's true: Despite its name, root_symlinks results in runfiles that look just like ordinary runfiles - there is no additional level of symlink indirection. They just happen to have a runfiles path that isn't derives from their execpath.

Changing the snippet via search&replace should also be an easy task.

That's true for a monorepo, but the difficulty with this kind of breaking change lies in the long tail of external dependencies that rely on this runfiles library.

We regularly have to spent days debugging runfiles as things are quite fragile (changing with different Bazel versions, packaging in docker container layers, transferring to a different machine via tarballs, multiple layers of runfiles like an sh_binary calling a go_binary calling a py_binary each with their set of runfiles, ...).

Did this PR arise from such a debugging session? If so, could you explain the root cause? I would like to avoid an XY problem and first understand the full design space we have for improving usability.

Generally speaking, runfiles tend to work well if 1) rules properly forward them (which isn't obvious, even filegroup is broken in this sense and _repo_mapping is often ignored) and 2) the top-level process sets the runfiles env variables (which unfortunately can't be verified reliably).

mering · 2025-04-29T06:06:16Z

Changing the snippet via search&replace should also be an easy task.

That's true for a monorepo, but the difficulty with this kind of breaking change lies in the long tail of external dependencies that rely on this runfiles library.

This is why one gives users some versions to migrate. Especially since this is a proper module now, people can use single_version_override to keep using the old functionality even longer.

We regularly have to spent days debugging runfiles as things are quite fragile (changing with different Bazel versions, packaging in docker container layers, transferring to a different machine via tarballs, multiple layers of runfiles like an sh_binary calling a go_binary calling a py_binary each with their set of runfiles, ...).

Did this PR arise from such a debugging session? If so, could you explain the root cause? I would like to avoid an XY problem and first understand the full design space we have for improving usability.

It came up in multiple of such debugging sessions. Sometimes it arises from our internal rules, sometimes from public rules (runfiles path changing on disk with Bazel versions, rules_go providing two different runfiles library implementations with different functionality, Python runfiles not falling back to $0.runfiles (see my open PR), the behavior change from bazel_tools to rules_shell between Bazel versions, ...

So 1. Making it more obvious what is happening to facilitate debugging and 2. Making the copyable snipper shorter and simpler to understand it faster and facilitate regex based migrations.

Generally speaking, runfiles tend to work well if 1) rules properly forward them (which isn't obvious, even filegroup is broken in this sense and _repo_mapping is often ignored) and 2) the top-level process sets the runfiles env variables (which unfortunately can't be verified reliably).

As someone has to work to make existing rules and processes runtime environments compliant to these rules, this usually requires deep understanding and a lot of debugging.

fmeum · 2025-04-30T08:20:16Z

This is why one gives users some versions to migrate. Especially since this is a proper module now, people can use single_version_override to keep using the old functionality even longer.

If this was my personal ruleset, I would just do this. But rules_shell is a core ruleset and I don't think there is even a policy for how to handle incompatible changes in starlarkified core rulesets yet.

CC @meteorcloudy who may have thought about this already

Python runfiles not falling back to $0.runfiles (see my open PR)

I would like to help fix your use case, but I'm pretty sure that that PR is not the right way.

the behavior change from bazel_tools to rules_shell between Bazel versions

I still don't see what this change in behavior is. Do you have an example that used to work when consuming the runfiles library from bazel_tools but no longer does when upgrading to a Bazel version that uses rules_shell internally?

As someone has to work to make existing rules and processes runtime environments compliant to these rules, this usually requires deep understanding and a lot of debugging.

It should only require a deep understanding if you actually work on a runfiles library. For users, assuming no bugs in runfiles libraries, forwarding the environment variables returned by the runfiles library should be the only thing they need to take care of. I don't see any way around this minimal requirement.

mering · 2025-04-30T13:26:51Z

the behavior change from bazel_tools to rules_shell between Bazel versions

I still don't see what this change in behavior is. Do you have an example that used to work when consuming the runfiles library from bazel_tools but no longer does when upgrading to a Bazel version that uses rules_shell internally?

This paragraph was about debugging, you need to understand which repo to override for debugging and at which file you need to look in the first place.

As someone has to work to make existing rules and processes runtime environments compliant to these rules, this usually requires deep understanding and a lot of debugging.

It should only require a deep understanding if you actually work on a runfiles library. For users, assuming no bugs in runfiles libraries, forwarding the environment variables returned by the runfiles library should be the only thing they need to take care of. I don't see any way around this minimal requirement.

"assuming no bugs": this assumption is regularly violated
missing documentation: you often don't even know what is the supposed behavior, so you are only left with debugging the implementation
runfiles libraries for different languages behaving differently

meteorcloudy · 2025-04-30T14:18:18Z

If this was my personal ruleset, I would just do this. But rules_shell is a core ruleset and I don't think there is even a policy for how to handle incompatible changes in starlarkified core rulesets yet.

I think we need to be careful about making incompatible change that'll impact everyone. For such changes, I would at least expect an issue to document it with a similar template like: https://github.com/bazelbuild/bazel/issues/new?template=incompatible_change.yml

Sounds like the main issue is some confusion caused by runfiles library while debugging. Can we mitigate this with better documentation instead of an incompatible change?

mering force-pushed the fix-runfiles branch 7 times, most recently from 9dee93c to 03a0db2 Compare April 28, 2025 15:06

mering commented Apr 28, 2025

View reviewed changes

mering mentioned this pull request Apr 30, 2025

Update runfiles preamble to refer to rules_shell #25

Open

mering force-pushed the fix-runfiles branch from 03a0db2 to c8f0d70 Compare August 12, 2025 19:16

Fix runfiles.bash and start using it

02cfe3f

mering force-pushed the fix-runfiles branch from c8f0d70 to 02cfe3f Compare August 12, 2025 19:26

mering closed this Aug 21, 2025

mering deleted the fix-runfiles branch August 21, 2025 11:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix runfiles.bash and start using it #21

Fix runfiles.bash and start using it #21

mering commented Apr 28, 2025 •

edited

Loading

Uh oh!

mering Apr 28, 2025 •

edited

Loading

Uh oh!

fmeum commented Apr 28, 2025

Uh oh!

mering commented Apr 28, 2025

Uh oh!

fmeum commented Apr 28, 2025

Uh oh!

mering commented Apr 28, 2025

Uh oh!

fmeum commented Apr 28, 2025

Uh oh!

mering commented Apr 29, 2025 •

edited

Loading

Uh oh!

fmeum commented Apr 30, 2025

Uh oh!

mering commented Apr 30, 2025

Uh oh!

meteorcloudy commented Apr 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix runfiles.bash and start using it #21

Fix runfiles.bash and start using it #21

Conversation

mering commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mering Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fmeum commented Apr 28, 2025

Uh oh!

mering commented Apr 28, 2025

Uh oh!

fmeum commented Apr 28, 2025

Uh oh!

mering commented Apr 28, 2025

Uh oh!

fmeum commented Apr 28, 2025

Uh oh!

mering commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fmeum commented Apr 30, 2025

Uh oh!

mering commented Apr 30, 2025

Uh oh!

meteorcloudy commented Apr 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mering commented Apr 28, 2025 •

edited

Loading

mering Apr 28, 2025 •

edited

Loading

mering commented Apr 29, 2025 •

edited

Loading