Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop filtering formulae with unchanged versions from the update report #13243

Closed
wants to merge 1 commit into from

Conversation

boblail
Copy link
Contributor

@boblail boblail commented May 4, 2022

brew update fast-forwards each tap from one commit to another. update-report lists all the formulae that were modified in that commit range. The clause this commit removes was filtering out modified formulae whose pkg_version had not changed.

The outcome of this filtering:

  • If you rewind homebrew/core 2,000 commits (or 2 weeks), 819 formulae were modified. This clause would have taken 29.0 seconds to filter out 153 (18.7%) of them.

  • If you rewind homebrew/core 20,000 commits (or 5 months), 3532 formulae were modified. This clause would have taken 119.4 seconds to filter out 1121 (31.7%) of them.

Data

Homebrew Branch Command Commits Rewound Time Elapsed Updated Formulae
master git -C "$(brew --repo homebrew/core)" reset --hard aa1b3d5df78; time sh -c "brew update &> output-2k.txt" 2,000 35.6s 666
master git -C "$(brew --repo homebrew/core)" reset --hard 11e6919661c; time sh -c "brew update &> output-20k.txt" 20,000 129.8s 2411
this one git -C "$(brew --repo homebrew/core)" reset --hard aa1b3d5df78; time sh -c "brew update &> output-2k.txt" 2,000 6.6s 819
this one git -C "$(brew --repo homebrew/core)" reset --hard 11e6919661c; time sh -c "brew update &> output-20k.txt" 20,000 10.4s 3532

Root Cause

This operation was so expensive because it shelled out to git cat-file for each formula that had been touched in the commit window.

Fixes #13224

Alternatives


  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your changes? Here's an example.
  • Have you successfully run brew style with your changes locally?
  • Have you successfully run brew typecheck with your changes locally?
  • Have you successfully run brew tests with your changes locally?

`brew update` fast-forwards each tap from one commit to another. `update-report` lists all the formulae that were modified in that commit range. The clause this commit removes was filtering out modified formulae _whose `pkg_version` had not changed._

The outcome of this filtering:

- If you rewind homebrew/core **2,000 commits** (or 2 weeks), **819** formulae were modified. This clause would have taken **29.0 seconds** to filter out **153** (18.7%) of them.

- If you rewind homebrew/core **20,000 commits** (or 5 months), 3532 formulae were modified. This clause would have taken **119.4 seconds** to filter out **1121** (31.7%) of them.

Data
=========
| Homebrew Branch | Command | Commits Rewound | Time Elapsed | Updated Formulae |
| --- | --- | --- | --- | --- |
| `master` | `git -C "$(brew --repo homebrew/core)" reset --hard aa1b3d5; time sh -c "brew update &> output-2k.txt"` | 2,000 | 35.6s | 666 |
| `master` | `git -C "$(brew --repo homebrew/core)" reset --hard 11e6919; time sh -c "brew update &> output-20k.txt"` | 20,000 | 129.8s | 2411 |
| _this one_ | `git -C "$(brew --repo homebrew/core)" reset --hard aa1b3d5; time sh -c "brew update &> output-2k.txt"` | 2,000 | 6.6s | 819 |
| _this one_ | `git -C "$(brew --repo homebrew/core)" reset --hard 11e6919; time sh -c "brew update &> output-20k.txt"` | 20,000 | 10.4s | 3532 |

Root Cause
=========
This operation was so expensive because it shelled out to `git cat-file` for each formula that had been touched in the commit window.

Fixes Homebrew#13224
Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! This is good with me but I'd like to see at least one other @Homebrew/brew maintainer agree that this is the right move.

@boblail
Copy link
Contributor Author

boblail commented May 4, 2022

Hey! I think this PR is a better approach (it's simpler, less brittle, and increases consistency between brew update and brew update --preinstall); but I'm also open to #13244 — It'll still need some tests and hardening, but it sketches an approach for keeping the current behavior but optimizing it by shelling out to git cat-file in batches.

@carlocab
Copy link
Member

carlocab commented May 4, 2022

Not a fan of showing a formula as updated when the pkg_version doesn't change. This could be quite confusing when a user sees foo as having been updated but brew upgrade foo being a no-op.

@samford
Copy link
Member

samford commented May 4, 2022

Not a fan of showing a formula as updated when the pkg_version doesn't change. This could be quite confusing when a user sees foo as having been updated but brew upgrade foo being a no-op.

As a maintainer who frequently makes changes to formulae/casks without modifying the version, I find the existing filtering behavior valuable. I agree with Carlo that displaying a formula as updated when the version hasn't changed isn't likely to be beneficial to most users.

For comparison, the same filtering doesn't apply to casks (i.e., any change leads to a cask showing as updated). I update some installed casks using brew upgrade --greedy someapp but there are others where I notice the cask was updated and manually update the related app by opening it, checking for updates, etc. When a cask shows as updated but the version hasn't changed, I end up wasting a bit of time checking for a new version for no reason, so I would argue that we should consider extending the formula behavior to casks (not the other way around).

Ignoring performance for a moment, I think filtering is the better default for most users (i.e., don't surface non-actionable changes). If there's an alternative to this PR that will improve performance while maintaining the filtering behavior (e.g., #13244), I'm all for it. However, if we go down a path where we remove this filtering behavior in the name of performance, it sounds like there are some of us who may prefer to continue using it (e.g., opting in using an environment variable?) despite any performance hit.

@MikeMcQuaid
Copy link
Member

Given feedback above: I think we shouldn't merge this as-is. Some thoughts:

  • This could be a good optimisation for when homebrew/core hasn't been updated in a long time. In that case, I'd suggest we probably output nothing and/or a specific message explaining why rather than dropping filtering and outputting even more.
  • How does Homebrew::EnvConfig.update_report_only_installed? affect the performance and/or output here? I use it personally and think it's a way better default but reverted it when people objected.
  • Perhaps this is something we could better accomplish with a JSON cache of versions that we write from update-report and compare this with the current version rather than parsing the Git history?

@boblail
Copy link
Contributor Author

boblail commented May 5, 2022

How does Homebrew::EnvConfig.update_report_only_installed? affect the performance and/or output here? I use it personally and think it's a way better default but reverted it when people objected.

Oh, +1! I think that's a better default too!

It does not really help the performance here because it's applied as a filter in dump_formula_report, after we've assembled (and filtered) the full update report.

Could we rearrange that? Or does anything else consume the report object ({ A: [...], AC: [...], M: [...], ... })?

@MikeMcQuaid
Copy link
Member

Oh, +1! I think that's a better default too!

It does not really help the performance here because it's applied as a filter in dump_formula_report, after we've assembled (and filtered) the full update report.

Could we rearrange that?

Yeh, that seems like a good idea. Makes sense to filter the expensive operation before rather than after, right?

For your personal use-case: would that address things sufficiently, assuming the speedup is large?

@MikeMcQuaid
Copy link
Member

Not a fan of showing a formula as updated when the pkg_version doesn't change. This could be quite confusing when a user sees foo as having been updated but brew upgrade foo being a no-op.

As a maintainer who frequently makes changes to formulae/casks without modifying the version, I find the existing filtering behavior valuable.

@carlocab @samford Sorry to disagree when I've asked for your feedback but something just occurred to me: do you still think this behaviour is useful and/or necessary given that it only applies to manual brew update runs and not automatically invoked brew update --preinstall runs?

Much like brew cleanup: I basically consider that, other that debugging, there's no real reason for users to run brew update explicitly (unless they've disabled auto-update). With that in mind: I really don't think the (vast) majority of our users are seeing this behaviour at all.

@MikeMcQuaid
Copy link
Member

@RandomDSdevel @neersighted can you explain your 👎🏻, please?

@MikeMcQuaid
Copy link
Member

@carlocab @samford gentle bump on the comment above!

@neersighted
Copy link
Contributor

@RandomDSdevel @neersighted can you explain your 👎🏻, please?

Despite the fact that the default settings/upstream's presets do not require the user to manually run brew update, I don't think it's worth rejecting a performance improvement because it's not the preferred way of doing things. It's part of the workflow of Homebrew developers and more advanced Homebrew users, and it's worth catering to them as well if the costs are not too high.

With this set of PRs, you can either trade moderate code complexity for significantly more idiomatic usage of the git plumbing, or trade more verbose/less focused output for a significant reduction in code complexity and expensive operations.

Either way, @boblail has two very solid approaches to a 'free' performance improvement depending on what complexity/output compromises the project wants to make -- I don't think it's worth rejecting them because the suggested way of using Homebrew doesn't regularly traverse this codepath.

@carlocab
Copy link
Member

I personally consider the existing behaviour of brew update useful, and would hate to see it changed. I'd at least like to see where optimising it goes instead of getting rid of it entirely.

That said, with my maintainer hat on: if it really is the case that most users don't do brew update, I'm not against just getting rid of the filtering behaviour. My concern is that I genuinely don't know whether most users run brew update or not. It feels to me that a sizeable number of users do run it, but I don't have any data to back that up so I'm willing to defer to experience here.

@samford
Copy link
Member

samford commented May 16, 2022

do you still think this behaviour is useful and/or necessary given that it only applies to manual brew update runs and not automatically invoked brew update --preinstall runs?

I probably shouldn't have said "most users", as I wouldn't be surprised if brew update usage is low. I don't have any insight into how the average user interacts with brew and my usage may be unrepresentative, so I'll defer to your judgment in terms of what's common.

With my previous comments, I was thinking about any users like myself who manually run brew update (however small that number may be). If these users are anything like me, they may run the command specifically to see which formulae have a new version (not just installed formulae like brew upgrade --dry-run), so dropping the existing filtering behavior would remove support for this use case and I don't think we offer an alternative.

That said, my opinion isn't very strong and I can probably adapt my workflow to any changes here. If, in the end, the performance gains would benefit most users (and/or increasing consistency of brew update behavior in different scenarios is appealing), I can understand removing the filtering behavior. If we do remove the filtering behavior as the default, it would be nice if we could keep it around in an opt-in fashion for users who do benefit from it but it's fine regardless.

However, if we can find performance gains that don't require removing the filtering behavior, I would prefer that, of course. As mentioned above, I would prefer for this filtering behavior to be extended to casks rather than removed but I'm biased.

@MikeMcQuaid
Copy link
Member

Despite the fact that the default settings/upstream's presets do not require the user to manually run brew update, I don't think it's worth rejecting a performance improvement because it's not the preferred way of doing things.

@neersighted You misunderstand me here: I'm suggesting that the slow code, instead of being optimised, can be avoided being called at all.

I personally consider the existing behaviour of brew update useful, and would hate to see it changed. I'd at least like to see where optimising it goes instead of getting rid of it entirely.
If these users are anything like me, they may run the command specifically to see which formulae have a new version

@carlocab @samford I guess my question is: why do you not care about this output being missing when brew update is run automatically, only manually? Do you have auto-update disabled or are you just careful to only ever run brew update before the auto-update happens? Are/were you aware that, unless I've misunderstood, an auto-update followed by a manual update will silently discard this information in the auto-update case and not display it later in the manual update case?

@samford
Copy link
Member

samford commented May 17, 2022

I guess my question is: why do you not care about this output being missing when brew update is run automatically, only manually? Do you have auto-update disabled or are you just careful to only ever run brew update before the auto-update happens?

It's situational for me. There are times when I care about brew update output and times when I don't. When I care about the output, I carefully manage situations that would trigger an auto-update (brew install, brew upgrade) and/or I run update before related commands in those cases (so I can see the output before it would be lost). If I don't care about update output, I simply run install/upgrade and let auto-update run. If I change my mind after the fact, I look through the Git log but that naturally takes longer (part of what's nice about brew update for formulae is that it only shows version changes and in a compact fashion, so I can skim it very quickly).

In practical terms, removing the filtering behavior would mean A) I can't assume that an updated formula is a version change and B) there would simply be more output to look through (PRs that bulk modify formulae/casks are a problem for unfiltered output). From my perspective, the change would save my computer some time but would increase the amount of time I have to spend thinking, which isn't a great tradeoff in my case.

To be clear, I'm not against the idea of modifying the default behavior if it benefits most users. I just think it would be helpful to have some way to continue to opt-in to the filtering behavior for those who want it (e.g., a HOMEBREW_UPDATE_ONLY_VERSION_CHANGES environment variable). I imagine changing the mental model may trip up some users and it would be nice to have a solution to offer if/when they come knocking.

@carlocab
Copy link
Member

I do have HOMEBREW_NO_AUTO_UPDATE set, but I also have auto-update running once a day.

Despite that, I do still find myself running brew update manually, typically through an OhMyZsh-supplied alias (e.g. bubo, which is brew update && brew outdated). Which, come to think of it, might be another possible set of users who are running brew update manually.

@MikeMcQuaid
Copy link
Member

To be clear, I'm not against the idea of modifying the default behavior if it benefits most users. I just think it would be helpful to have some way to continue to opt-in to the filtering behavior for those who want it (e.g., a HOMEBREW_UPDATE_ONLY_VERSION_CHANGES environment variable). I imagine changing the mental model may trip up some users and it would be nice to have a solution to offer if/when they come knocking.

Yeh, this is what I'm leaning towards. I also think we should combine this change with defaulting to only listing updates to installed formulae by default. This was changed before but reverted after backlash but I think we/I should do so again but with an opt-out this time.

@samford
Copy link
Member

samford commented May 17, 2022

Yeh, this is what I'm leaning towards. I also think we should combine this change with defaulting to only listing updates to installed formulae by default. This was changed before but reverted after backlash but I think we/I should do so again but with an opt-out this time.

That sounds reasonable to me, so long as affected users have ways to control related behavior (i.e., one environment variable for only displaying version updates and another for displaying updates for all formulae (not just installed)). It may trip up some users (as it goes against their existing mental model) but it should only be a temporary inconvenience if there's a way for them to get back to their desired configuration.

@carlocab
Copy link
Member

To be clear, I'm not against the idea of modifying the default behavior if it benefits most users. I just think it would be helpful to have some way to continue to opt-in to the filtering behavior for those who want it (e.g., a HOMEBREW_UPDATE_ONLY_VERSION_CHANGES environment variable). I imagine changing the mental model may trip up some users and it would be nice to have a solution to offer if/when they come knocking.

Yeh, this is what I'm leaning towards. I also think we should combine this change with defaulting to only listing updates to installed formulae by default. This was changed before but reverted after backlash but I think we/I should do so again but with an opt-out this time.

Sounds good to me.

@MikeMcQuaid
Copy link
Member

I think #13299 may remove the need for this but interested in thoughts.

@github-actions
Copy link

github-actions bot commented Jun 9, 2022

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@github-actions github-actions bot added the stale No recent activity label Jun 9, 2022
@MikeMcQuaid
Copy link
Member

Passing on this given recent brew update changes. Feel free to reopen if still needed/desired.

@MikeMcQuaid MikeMcQuaid closed this Jun 9, 2022
@github-actions github-actions bot added the outdated PR was locked due to age label Jul 10, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
outdated PR was locked due to age stale No recent activity
Projects
None yet
Development

Successfully merging this pull request may close these issues.

brew update can be terribly slow for stale taps
5 participants