Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: copy_to_directory execution-time performance issues #301

Closed
1 task
matthewjh opened this issue Dec 22, 2022 · 1 comment
Closed
1 task

[Bug]: copy_to_directory execution-time performance issues #301

matthewjh opened this issue Dec 22, 2022 · 1 comment
Labels
bug Something isn't working

Comments

@matthewjh
Copy link

matthewjh commented Dec 22, 2022

What happened?

Hi,

This is a follow-up to #258. The analysis-time performance issues were addressed by 350408b (thank you!), but the execution-time performance is still problematic. I wanted to raise this to track that.

The problem is that the copy_to_directory is generally slow and particularly slow on MacOS. The wall time on our build went up after moving from rules_nodejs to rules_js, mainly because of copy_to_directory. It seems that any non-trivial folder (i.e. a library) is going to block for anything from 2s to 10s on this action (and more on MacOS). This seems problematic for rules_js users in general as copy_to_directory is in the critical path of any change in a linked npm_package, which seems like a path that needs to be as fast as possible for the UX around a developer making local changes and ibazel rebuilding those to work.

copy_to_directory generates a bash script which is invoked through ctx.actions.run_shell. The script, in my case looks like:

if [[ ! -e "bazel-out/darwin_arm64-fastbuild/bin/libs/generic/src/lib/user-pref-service/user-pref-service.js" ]]; then echo "file 'bazel-out/darwin_arm64-fastbuild/bin/libs/generic/src/lib/user-pref-service/user-pref-service.js' does not exist"; exit 1; fi
if [[ -f "bazel-out/darwin_arm64-fastbuild/bin/libs/generic/src/lib/user-pref-service/user-pref-service.js" ]]; then
    mkdir -p "bazel-out/darwin_arm64-fastbuild/bin/libs/generic/generic/src/lib/user-pref-service"
    cp -v -n "bazel-out/darwin_arm64-fastbuild/bin/libs/generic/src/lib/user-pref-service/user-pref-service.js" "bazel-out/darwin_arm64-fastbuild/bin/libs/generic/generic/src/lib/user-pref-service/user-pref-service.js" >> "$OUT_CAPTURE" 2>>"$OUT_CAPTURE"
else
    if [[ -d "bazel-out/darwin_arm64-fastbuild/bin/libs/generic/generic/src/lib/user-pref-service/user-pref-service.js" ]]; then
        # When running outside the sandbox, then an earlier copy will create the dst folder
        # with nested read-only folders, so our copy operation will fail to write there.
        # Make sure the output folders are writeable.
        find "bazel-out/darwin_arm64-fastbuild/bin/libs/generic/generic/src/lib/user-pref-service/user-pref-service.js" -type d -print0 | xargs -0 chmod a+w
    fi
    mkdir -p "bazel-out/darwin_arm64-fastbuild/bin/libs/generic/generic/src/lib/user-pref-service/user-pref-service.js"
    cp -v -R -n "bazel-out/darwin_arm64-fastbuild/bin/libs/generic/src/lib/user-pref-service/user-pref-service.js/." "bazel-out/darwin_arm64-fastbuild/bin/libs/generic/generic/src/lib/user-pref-service/user-pref-service.js" >> "$OUT_CAPTURE" 2>>"$OUT_CAPTURE"
fi


if [[ ! -e "bazel-out/darwin_arm64-fastbuild/bin/libs/generic/src/lib/user-service/index.js" ]]; then echo "file 'bazel-out/darwin_arm64-fastbuild/bin/libs/generic/src/lib/user-service/index.js' does not exist"; exit 1; fi
if [[ -f "bazel-out/darwin_arm64-fastbuild/bin/libs/generic/src/lib/user-service/index.js" ]]; then
    mkdir -p "bazel-out/darwin_arm64-fastbuild/bin/libs/generic/generic/src/lib/user-service"
    cp -v -n "bazel-out/darwin_arm64-fastbuild/bin/libs/generic/src/lib/user-service/index.js" "bazel-out/darwin_arm64-fastbuild/bin/libs/generic/generic/src/lib/user-service/index.js" >> "$OUT_CAPTURE" 2>>"$OUT_CAPTURE"
else
    if [[ -d "bazel-out/darwin_arm64-fastbuild/bin/libs/generic/generic/src/lib/user-service/index.js" ]]; then
        # When running outside the sandbox, then an earlier copy will create the dst folder
        # with nested read-only folders, so our copy operation will fail to write there.
        # Make sure the output folders are writeable.
        find "bazel-out/darwin_arm64-fastbuild/bin/libs/generic/generic/src/lib/user-service/index.js" -type d -print0 | xargs -0 chmod a+w
    fi
    mkdir -p "bazel-out/darwin_arm64-fastbuild/bin/libs/generic/generic/src/lib/user-service/index.js"
    cp -v -R -n "bazel-out/darwin_arm64-fastbuild/bin/libs/generic/src/lib/user-service/index.js/." "bazel-out/darwin_arm64-fastbuild/bin/libs/generic/generic/src/lib/user-service/index.js" >> "$OUT_CAPTURE" 2>>"$OUT_CAPTURE"
fi


if [[ ! -e "bazel-out/darwin_arm64-fastbuild/bin/libs/generic/src/lib/user-service/mock-user-service-providers.js" ]]; then echo "file 'bazel-out/darwin_arm64-fastbuild/bin/libs/generic/src/lib/user-service/mock-user-service-providers.js' does not exist"; exit 1; fi
if [[ -f "bazel-out/darwin_arm64-fastbuild/bin/libs/generic/src/lib/user-service/mock-user-service-providers.js" ]]; then
    mkdir -p "bazel-out/darwin_arm64-fastbuild/bin/libs/generic/generic/src/lib/user-service"
    cp -v -n "bazel-out/darwin_arm64-fastbuild/bin/libs/generic/src/lib/user-service/mock-user-service-providers.js" "bazel-out/darwin_arm64-fastbuild/bin/libs/generic/generic/src/lib/user-service/mock-user-service-providers.js" >> "$OUT_CAPTURE" 2>>"$OUT_CAPTURE"
else
    if [[ -d "bazel-out/darwin_arm64-fastbuild/bin/libs/generic/generic/src/lib/user-service/mock-user-service-providers.js" ]]; then
        # When running outside the sandbox, then an earlier copy will create the dst folder
        # with nested read-only folders, so our copy operation will fail to write there.
        # Make sure the output folders are writeable.
        find "bazel-out/darwin_arm64-fastbuild/bin/libs/generic/generic/src/lib/user-service/mock-user-service-providers.js" -type d -print0 | xargs -0 chmod a+w
    fi
    mkdir -p "bazel-out/darwin_arm64-fastbuild/bin/libs/generic/generic/src/lib/user-service/mock-user-service-providers.js"
    cp -v -R -n "bazel-out/darwin_arm64-fastbuild/bin/libs/generic/src/lib/user-service/mock-user-service-providers.js/." "bazel-out/darwin_arm64-fastbuild/bin/libs/generic/generic/src/lib/user-service/mock-user-service-providers.js" >> "$OUT_CAPTURE" 2>>"$OUT_CAPTURE"
fi

You can see how dong all these checks and bash program invocations serially, and each blocking, and spawning a subprocesses, would get very slow for non-trivial directories. I'm fairly confident that if, rather than using bash, copy_to_directory called out to a go or js program that used non-blocking I/O and could do the work in parallel (not to mention not spawning subprocesses), the performance would be on a different level.

Of course, another question is whether any of this logic can be short-circuited and circumvented. It looks like some of the logic is long the lines of "if the src is a directory, do this, if the src is a file, do that", but doesn't bazel know at analysis time whether these inputs are files or directories (at least assuming no TreeArtifact inputs, which itself feels like an edge case to me)?

I'm keen to help with the implementation of this. It feels like the first step is just figuring out what direction(s) to take.

Version

Development (host) and target OS/architectures:

Output of bazel --version:
6.0.0
Version of the Aspect rules, or other relevant rules from your
WORKSPACE or MODULE.bazel file:
1.18.0

Language(s) and/or frameworks involved:

How to reproduce

No response

Any other information?

No response

Fund our work

  • Sponsor our open source work by donating a bug bounty
@gregmagolan
Copy link
Member

Performance improvements all landed and released in the v1.23.3. release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants