Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

symlink_to_dir linux performance improvement #949

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

DanielEliad
Copy link

Hi :)
I am building a linux kernel headers repo (https://github.com/DanielEliad/rules_linux_kernel_headers) using rules_foreign_cc

I noticed my build times were really long and mostly from the bash function symlink_to_dir.
I found this out by editing the build_script.sh and printing out how much each part of the script took.

This happened because symlink_to_dir is recursive and the kernel has a lot of sources to copy (even after only reducing them to the necessary ones).

I've rewrote the function while still preserving all of its features:

  1. creating directories and not symlinking them
  2. symlinking all files unless they have a certain suffix (e.g. *.mk) for replacement
  3. excluding *.ext_build_deps directories in the final copy

this made a huge difference for me
before the change: Elapsed time: 57.557s
after the change: Elapsed time: 29.276s

The performance increase comes from not using recursion in bash to copy all the subdirectories and files
This PR uses the cp flag -s to create symlinks. This actually mkdir's all the directories (even if they are a symlink themselves) and only symlinks the actual files.
This also symlinks the unwanted *.ext_build_deps directories but this is such a good performance increase that the extra copies that are then deleted are a good trade off in my opinion

Thank you for taking a look :)

@DanielEliad
Copy link
Author

There still seem to be some problems with the behaviour of symlink to dir so I will keep working on it

Copy link
Member

@jsharpe jsharpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good; this has definitely been one of the slower parts of these rules, and the recursive symlinking is particularly expensive on windows so fixing it there is a win if it can be made to work.
Thanks for the contributions so far

Comment on lines 133 to 138
IFS=$'\n'
local bad_directories=($(find -L "$target" -type d -name "*.ext_build_deps" -prune))
IFS=$SAVEIFS
for b in "${bad_directories[@]}"; do
rm -r "$b"
done
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be done in a single call to find by passing an exec argument? This will likely fix the failure in CI too?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to find exec I hope this will fix the CI
I'm having a hard time debugging this because I need a working setup to run the failed tests (ndk etc)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the ndk thing is annoying as with older bazel versions it wasn't a hard fail like it is now in 5.x. Most of us just push regularly against the CI to test changes that are hard to test locally due to lack of access to a specific system. All the changes get squashed on merge so intermediate commits don't matter.


# In order to be able to use `replace_in_files`, we ensure that we create copies of specfieid
# files so updating them is possible.
SAVEIFS=$IFS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need saving again?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes you're right I fixed it

@DanielEliad
Copy link
Author

DanielEliad commented Aug 11, 2022

I fixed most of the issues but still some tests in windows and spawn_strategy=standalone are failing.
Windows won't allow cp to create relative symlinks (which linux does as well) so I used readlink -f to get the absolute path but windows still says the same thing.

cp -prsL C:/b/qvkez5bj/execroot/rules_foreign_cc_examples/bazel-out/x64_windows-opt-exec-2B5CBBC6/bin/external/rules_foreign_cc/toolchains C:\b\qvkez5bj\execroot\rules_foreign_cc_examples/bazel-out/x64_windows-fastbuild/bin/configure_modify_input_source/simple.ext_build_deps/bin/
--
  | cp: 'C:\b\qvkez5bj\execroot\rules_foreign_cc_examples/bazel-out/x64_windows-fastbuild/bin/configure_modify_input_source/simple.ext_build_deps/bin/toolchains/make_tool_foreign_cc/build_script.sh': can make relative symbolic links only in current directory

Also spawn_strategy=standalone ruins my approach of using readlink it seems like we don't have permission to do a lot of what I am doing there.
I think we can get the same improvement in performance with the previous algorithm just in python because it's the recursion that's killing the performance in my opinion

Have you considered doing this in python?
I feel like this would be a lot easier.
Also I didn't quite understand the need for copying and symlinking everything I tried to recreate the logic exactly but it would be a lot simpler if we didn't have to.
What do you think?

@DanielEliad DanielEliad force-pushed the feature/symlink_to_dir_performance branch from dac59f6 to d717888 Compare August 18, 2022 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants