Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize brew update by batching git cat-file operations #13244

Closed
wants to merge 2 commits into from

Conversation

boblail
Copy link
Contributor

@boblail boblail commented May 4, 2022

This approach is almost as optimal as omitting this filtering altogether.

It relies on git cat-file --batch, which can receive a list of blobs to look up on standard input.

Homebrew Branch Command Commits Rewound Time Elapsed Updated Formulae
master git -C "$(brew --repo homebrew/core)" reset --hard aa1b3d5df78; time sh -c "brew update &> output-2k.txt" 2,000 35.6s 666
master git -C "$(brew --repo homebrew/core)" reset --hard 11e6919661c; time sh -c "brew update &> output-20k.txt" 20,000 129.8s 2411
#13234 git -C "$(brew --repo homebrew/core)" reset --hard aa1b3d5df78; time sh -c "brew update &> output-2k-unfiltered.txt" 2,000 6.6s 819
#13234 git -C "$(brew --repo homebrew/core)" reset --hard 11e6919661c; time sh -c "brew update &> output-20k-unfiltered.txt" 20,000 10.4s 3532
this one git -C "$(brew --repo homebrew/core)" reset --hard aa1b3d5df78; time sh -c "brew update &> output-2k-optimized.txt" 2,000 7.4s 666
this one git -C "$(brew --repo homebrew/core)" reset --hard 11e6919661c; time sh -c "brew update &> output-20k-optimized.txt" 20,000 17.6s 2411

Fixes #13224

Alternatives


  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your changes? Here's an example.
  • Have you successfully run brew style with your changes locally?
  • Have you successfully run brew typecheck with your changes locally?
  • Have you successfully run brew tests with your changes locally?

@boblail boblail force-pushed the lail/batch-git-cat-file branch 2 times, most recently from 5499353 to cdc4d78 Compare May 4, 2022 21:33
@boblail boblail marked this pull request as ready for review May 4, 2022 21:34
| Homebrew Branch | Command | Commits Rewound | Time Elapsed | Updated Formulae |
| --- | --- | --- | --- | --- |
| `master` | `git -C "$(brew --repo homebrew/core)" reset --hard aa1b3d5; time sh -c "brew update &> output-2k.txt"` | 2,000 | 35.6s | 666 |
| `master` | `git -C "$(brew --repo homebrew/core)" reset --hard 11e6919; time sh -c "brew update &> output-20k.txt"` | 20,000 | 129.8s | 2411 |
| Homebrew#13234 | `git -C "$(brew --repo homebrew/core)" reset --hard aa1b3d5; time sh -c "brew update &> output-2k-unfiltered.txt"` | 2,000 | 6.6s | 819 |
| Homebrew#13234 | `git -C "$(brew --repo homebrew/core)" reset --hard 11e6919; time sh -c "brew update &> output-20k-unfiltered.txt"` | 20,000 | 10.4s | 3532 |
| _this one_ | `git -C "$(brew --repo homebrew/core)" reset --hard aa1b3d5; time sh -c "brew update &> output-2k-optimized.txt"` | 2,000 | 7.1s | 657† |
| _this one_ | `git -C "$(brew --repo homebrew/core)" reset --hard 11e6919; time sh -c "brew update &> output-20k-optimized.txt"` | 20,000 | 17.6s | 2411 |

† I haven't determined why there's a discrepancy between "Updated Formulae" on row 5 and row 1.

Fixes Homebrew#13224
Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate the work here but would love to avoid having a new class just for this lookback and instead look at improving FormulaVersions to be more performant instead.

@boblail
Copy link
Contributor Author

boblail commented May 5, 2022

@MikeMcQuaid, would you be receptive to a PR that kept git cat-file --batch as a strategy but implemented it in FormulaVersions instead of FormulaeAtRevision?

@MikeMcQuaid
Copy link
Member

@boblail Potentially: depends on how much the complexity increases. I'm not really convinced that the solution here is in git cat-file --batch usage but more:

@MikeMcQuaid
Copy link
Member

  • checking fewer revisions at all
  • checking fewer formulae
  • not checking formulae at all in some cases
  • caching versions

@boblail
Copy link
Contributor Author

boblail commented May 5, 2022

@MikeMcQuaid, what do you think of this refactor?

It

  • discards FormulaeAtRevision
  • extracts GitRepositoryExtension#git_cat_files
    (which could also be used to optimize formula-auditor — where, instead of catting many formulae at one revision, we're catting one formula at many revisions — but I don't want to explode the scope of this PR 😬)
  • implements file_contents_at_revision in terms of repository.git_cat_file and formula_at_revision in terms of FormulaVersions.formula_from_contents so
    1. there's not a duplication of logic between the one-off and bulk approaches for fetching previous versions of formulae
    2. the new strategy is better covered by other specs that were leveraging FormulaVersions#formula_at_revision

contents = next_entry ? s.scan_until(/^(?=#{Regexp.escape(snip)}|#{next_entry} missing)/) : s.rest
[entry, contents]
else
raise "that didn't work"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Better error message

@MikeMcQuaid
Copy link
Member

@MikeMcQuaid, what do you think of this refactor?

I don't love it, I'm afraid. What's the performance benefit here for the typical and pathological cases?

Would love to see what the same answers are for #13243 (comment)

This feels like it makes the code significantly more complex and hard to follow. That's perhaps worth it but I'd need numbers to decide.

@boblail
Copy link
Contributor Author

boblail commented May 5, 2022

What's the performance benefit here for the typical and pathological cases?

@MikeMcQuaid, would you consider running brew update once a week typical and once every 3 months pathological?

(At Square, we nudge engineers to do this monthly and we regularly see outliers who haven't done it in several months.)

If so, this branch makes brew update

  • 3.7× faster for the typical use-case (22.6s → 6.1s)
  • 9.3× faster for the pathological use-case (135s → 14.5s)

@MikeMcQuaid
Copy link
Member

@boblail Yeh, that's a decent speedup, thanks. I'm going to want to iterate a lot on the implementation here so I'd rather if we can land something like #13243 (comment) to simplify the need for this instead.

@MikeMcQuaid
Copy link
Member

I think #13299 may remove the need for this but interested in thoughts.

@github-actions
Copy link

github-actions bot commented Jun 9, 2022

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@github-actions github-actions bot added the stale No recent activity label Jun 9, 2022
@MikeMcQuaid
Copy link
Member

Passing on this given recent brew update changes. Feel free to reopen if still needed/desired.

@MikeMcQuaid MikeMcQuaid closed this Jun 9, 2022
@github-actions github-actions bot added the outdated PR was locked due to age label Jul 10, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
outdated PR was locked due to age stale No recent activity
Projects
None yet
Development

Successfully merging this pull request may close these issues.

brew update can be terribly slow for stale taps
2 participants