New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memoize git shell commands #223
Conversation
That is actually recommended best practice and highly used use case to have each cookbook as it's own repo. |
Understood. Any ideas for how we might reap this perf win without forking Chef? |
Only thing that comes to mind is by adding to knife config file a property to define cookbook mono repo or multiple repo approach and then use that to make decision to run your optimization or run the old loop. Another approach I just thought of is to use git initially and then check if all cookbooks live in subdirectory and then use your memorize approach and break out of loop otherwise move on through loop for other cookbooks. Just Ideas, haven't fully looked at code for viability or complexity of this kind of change. |
@adsr a question I have also is around how your storing all your cookbooks? Are you using central artifact store like artifactory, internal chef supermarket, or chef-server? I have found dep resolution is a lot faster when using one of these options as it's able to download a single index and depsolve against that vs fetching each cookbook from git and parsing files for dependency information. |
OK, I have a few ideas as well after looking at things more carefully. The simplest move to support repo-per-cookbook would be to include repo path as part of the memo key. The other idea I had was to parallelize the commands. That would be a bigger change.
In dev context, which is what I'm concerned with here, the cookbook store is the file system. I profiled the code a bit more carefully and found the dep solving code (starting from Here are rbspy flamegraphs of Alright, I just pushed a modified solution that keys the memo cache by repo path ( |
nice work |
@adsr looks like there are a couple of cookstyle fixes needed. See buildkite job. |
We have a shell script that invokes `chef install` and `chef export` against a setup with ~30 cookbooks. It was annoyingly slow so we dug into why that was the case. We found that these commands call `ChefCLI::CookbookProfiler::Git` in a loop which ends up shelling out the same handful of git commands repeatedly. For example, one run produced the following duplicate git commands (sorted by dupe count): ``` 74 execve("/usr/bin/git", ["git", "rev-parse", "HEAD"], 46 execve("/usr/bin/git", ["git", "rev-parse", "HEAD"], 41 execve("/usr/bin/git", ["git", "rev-parse", "--abbrev-ref", "HEAD"], 39 execve("/usr/bin/git", ["git", "diff-files", "--quiet"], 25 execve("/usr/bin/git", ["git", "branch", "-r", "--contains", "<snip>"], 23 execve("/usr/lib/git-core/git", ["/usr/lib/git-core/git", "status", "--porcelain=2", "-uno"], 23 execve("/usr/bin/git", ["git", "config", "--get", "branch.HEAD.remote"], 21 execve("/usr/bin/git", ["git", "diff-files", "--quiet"], 19 execve("/usr/bin/git", ["git", "rev-parse", "--abbrev-ref", "HEAD"], 17 execve("/usr/bin/git", ["git", "config", "--get", "branch.<snip>.remote"], 16 execve("/usr/bin/git", ["git", "config", "--get", "remote.origin.url"], 14 execve("/usr/bin/git", ["git", "branch", "-r", "--contains", "<snip>"], 14 execve("/usr/bin/git", ["git", "branch", "-r", "--contains", "<snip>"], 12 execve("/usr/bin/git", ["git", "config", "--get", "remote.origin.url"], 11 execve("/usr/bin/git", ["git", "config", "--get", "branch.<snip>.remote"], 9 execve("/usr/bin/git", ["git", "config", "--get", "branch.HEAD.remote"], 7 execve("/usr/bin/git", ["git", "branch", "-r", "--contains", "<snip>"], 5 execve("/usr/lib/git-core/git", ["/usr/lib/git-core/git", "status", "--porcelain=2", "-uno"], ``` Adding this memoization brings our run from ~45s down to ~8s. The memo cache is keyed by repo path (`rev-parse --show-toplevel`) which should support both cookbook-mono-repo and repo-per-cookbook. Signed-off-by: Adam Saponara <as@php.net>
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
@Stromweld Sorry, missed those. Fixed. |
@vkarve-chef came across this PR and looks like someone from Chef Workstation needs to weigh in to merge. |
sure @tpowell-progress, we'll take a look in a week or two. thanks @adsr for creating the PR! |
Hey @vkarve-chef, did your team have a chance to look at this? This would make a huge difference in the iteration speed of our local development process. |
@ericnorris @adsr we intend to get to this by around mid-September. Sorry, it's taking us so long! |
Description
We have a shell script that invokes
chef install
andchef export
against a setup with ~30 cookbooks. It was annoyingly slow so we dug into why that was the case. We found that these commands callChefCLI::CookbookProfiler::Git
in a loop which ends up shelling out the same handful of git commands repeatedly. For example, one run produced the following duplicate git commands (sorted by dupe count):These all seem to be cacheable at the repo-level. Adding a simple memoization brought down our run from ~45s to ~1s.
To be safer, we could wrap this is agit_memo
method to make the memoization behavior more explicit. I thought I'd collect some initial feedback on the general idea first. For example this would break things if each cookbook were its own repo. I'm not sure if it's that an expected / supported use-case.I modified the code to memoize at the repo-level, so cookbook-per-repo should continue to work.
Related Issue
n/a
Types of changes
Checklist: