Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memoize installed tap loading v2 #16863

Merged

Conversation

apainintheneck
Copy link
Contributor

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your changes? Here's an example.
  • Have you successfully run brew style with your changes locally?
  • Have you successfully run brew typecheck with your changes locally?
  • Have you successfully run brew tests with your changes locally?

This is an alternative approach to #16806. The difference here is that do not use a registry but just cache the results directly on the Tap class and then purge the cache whenever we install or uninstall a tap which happens rarely. I also took the time to memoize the Tap.reverse_tap_migrations_renames method and rename it to Tap.tap_migration_oldnames which hopefully is more intuitive.

I added two new methods to cache both installed and all taps.
All taps includes core taps no matter if they're installed locally
since they're always provided by the API anyway.

This makes it easier to cache `Tap.each` while making the code
easier to reason about. It also will be useful because we'll
be able to avoid the `Tap.select(&:installed?` pattern that has
recently invaded the codebase.

Note: I also stopped clearing all tap instance caches before
tests. Running `Tap.each` would cache existing taps which would
lead to unexpected behavior since the only existing tap before
each test is the core tap. This is the only tap whose directory
is not cleaned up between tests so we just clear it's cache directly.
We also now clear all tap instances after tests as well regardless
of whether the API was used that time.
.map(&method(:from_path))
# All locally installed taps.
sig { returns(T::Array[Tap]) }
def self.installed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep these a private implementation detail. I don't really want to have three different ways to do the same thing, e.g. Tap.select(&:installed?) vs. Tap.all.select(&:installed?) vs. Tap.installed, and Tap.all vs. Tap.to_a.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's preferable to use Tap.installed vs. Tap.select(&:installed?) which is now used widely throughout the codebase if we're already caching that value directly. Tap.all is currently only used in tests so I think that we can make more of an argument that it can be made private. Keep in mind that Tap.all and Tap.to_a are not always equivalent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, this is an implementation detail to make each faster. In order to have the same performance benefits without exposing a new method, we can do:

  def self.select(&block)
    return installed if block == proc(&:installed?)

    super
  end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that now that Tap.each returns different results depending on whether the API is being used the code becomes a bit more confusing. Why do we need to check that taps are installed or not? That is also an implementation detail. Tap.installed is simpler to understand when reading the codebase. It was not me intent to add a new method here; that just flowed logically from the two states that Tap.each can represent within the lifetime of the program. But now that it's here I think we should use it.

Is this disagreement a blocker for this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, this is an implementation detail to make each faster.

To me, this is a nice helper function that you removed due to your personal preference and is nicer to add back to the way it has been the last ~10 years.

Is this disagreement a blocker for this PR?

No.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, this is a nice helper function that you removed due to your personal preference

@MikeMcQuaid, exactly when did I remove Tap::installed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reitermarkus You made #16710 which required changes like Homebrew/homebrew-bundle#1317.

Copy link
Member

@reitermarkus reitermarkus Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean, Tap::each was basically Tap::installed before. Still, that wasn't a personal preference but necessary to improve Tap.each(&:clear_cache) in tests and fix Tap::reverse_tap_migrations_renames.

This gets used by `Tap.reverse_tap_migrations_renames` and reduces
the amount of information that needs to be calculated on the fly
every time.
- Add tests for:
  - `Tap.each`
  - `Tap.installed`
  - `Tap.all`
  - `Tap#reverse_tap_migrations_renames`
  - `Tap.reverse_tap_migrations_renames`
@apainintheneck apainintheneck force-pushed the memoize-installed-tap-loading-v2 branch from 2821a73 to 0844273 Compare March 9, 2024 18:32
Comment on lines +511 to +519
it "includes the core tap with the api" do
ENV.delete("HOMEBREW_NO_INSTALL_FROM_API")
expect(described_class.to_a).to include(CoreTap.instance)
end

it "omits the core tap without the api" do
ENV["HOMEBREW_NO_INSTALL_FROM_API"] = "1"
expect(described_class.to_a).not_to include(CoreTap.instance)
end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
it "includes the core tap with the api" do
ENV.delete("HOMEBREW_NO_INSTALL_FROM_API")
expect(described_class.to_a).to include(CoreTap.instance)
end
it "omits the core tap without the api" do
ENV["HOMEBREW_NO_INSTALL_FROM_API"] = "1"
expect(described_class.to_a).not_to include(CoreTap.instance)
end
it "includes the core tap with the api", :with_api do
expect(described_class.to_a).to include(CoreTap.instance)
end
it "omits the core tap without the api", :without_api do
expect(described_class.to_a).not_to include(CoreTap.instance)
end

It might make sense to add before spec tags for :with_api and :without_api so that we can avoid this boilerplate and make tests simpler. I also know that specifying ENV["HOMEBREW_NO_INSTALL_FROM_API"] = "1" is technically unnecessary but it does improve readability.

I this makes sense to everyone it can be handled as a follow-up in another PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apainintheneck makes sense to me! Might be nicer still to have e.g. with_api be implicit and only without_api is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the test suite would break in that case I believe. We already automatically set HOMEBREW_NO_INSTALL_FROM_API before running tests. I'd rather be explicit even if it adds a small amount redundancy.

I guess another option would be to do something like api: true and api: false.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already automatically set HOMEBREW_NO_INSTALL_FROM_API before running tests. I'd rather be explicit even if it adds a small amount redundancy.

We should flip this and unset HOMEBREW_NO_INSTALL_FROM_API from the environment (like we do for a bunch of other variables) and add without_api as the non-default case.

This is similar to what we do for other environment variables and seems nicer to have tests assume the default environment unless specified otherwise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should flip this and unset HOMEBREW_NO_INSTALL_FROM_API

HOMEBREW_NO_INSTALL_FROM_API is currently only set because a bunch of tests would break otherwise, so yes, the goal should be to unset it for tests eventually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be ideal to not have to set HOMEBREW_NO_INSTALL_FROM_API before running tests. Now that I'm looking at the test suite a bit more I realize that we don't have any way of ensuring that we're not making network requests when the :needs_network flag is not provided. I ran the test suite locally with the API and some of the failures seemed to be related to making unexpected network requests.

If we had some way of being reasonably sure that we weren't making unexpected network requests, I'd feel more comfortable unsetting HOMEBREW_NO_INSTALL_FROM_API by default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be ideal to not have to set HOMEBREW_NO_INSTALL_FROM_API before running tests.

Agreed.

If we had some way of being reasonably sure that we weren't making unexpected network requests, I'd feel more comfortable unsetting HOMEBREW_NO_INSTALL_FROM_API by default.

I think at some point we need to just bite the bullet and switch over. It's mildly concerning to me that the vast majority of our users do not have HOMEBREW_NO_INSTALL_FROM_API set but it's what our test suite does by default.

Addressing this seems higher priority than ensuring we don't accidentally add network requests as a result (which can be done after).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressing this seems higher priority than ensuring we don't accidentally add network requests as a result (which can be done after).

I'm worried that this will end up making a bunch of tests flaky because of hard to predict network requests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flaky tests can be fixed later. I'm more concerned about having tests right now that provide false confidence because they are running in a minority/legacy configuration by default.

@apainintheneck apainintheneck marked this pull request as ready for review March 9, 2024 18:41
@apainintheneck apainintheneck changed the title [WIP] Memoize installed tap loading v2 Memoize installed tap loading v2 Mar 9, 2024
@apainintheneck
Copy link
Contributor Author

Here are some benchmarks. This can get called for each cask and formula depending on the command so 5,000 isn't that unusual of a number.

Before:

brew(main):001:0> require "benchmark"
=> true
brew(main):002:0> key = "#{CoreCaskTap.instance}/schism-tracker"
=> "homebrew/cask/schism-tracker"
brew(main):003:1* Benchmark.realtime do
brew(main):004:2*   5_000.times do
brew(main):005:2*     Tap.reverse_tap_migrations_renames.fetch(key)
brew(main):006:1*   end
brew(main):007:0> end
=> 3.96402266000041
brew(main):008:0> Tap.reverse_tap_migrations_renames.fetch(key)
=> ["schismtracker"]

After:

brew(main):001:0> require "benchmark"
=> true
brew(main):002:1* Benchmark.realtime do
brew(main):003:2*   5_000.times do
brew(main):004:2*     Tap.tap_migration_oldnames(CoreCaskTap.instance, "schism-tracker")
brew(main):005:1*   end
brew(main):006:0> end
=> 0.58806838000055
brew(main):007:0> Tap.tap_migration_oldnames(CoreCaskTap.instance, "schism-tracker")
=> ["schismtracker"]

Comment on lines +511 to +519
it "includes the core tap with the api" do
ENV.delete("HOMEBREW_NO_INSTALL_FROM_API")
expect(described_class.to_a).to include(CoreTap.instance)
end

it "omits the core tap without the api" do
ENV["HOMEBREW_NO_INSTALL_FROM_API"] = "1"
expect(described_class.to_a).not_to include(CoreTap.instance)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apainintheneck makes sense to me! Might be nicer still to have e.g. with_api be implicit and only without_api is needed.

@MikeMcQuaid
Copy link
Member

Thanks again @apainintheneck!

@MikeMcQuaid MikeMcQuaid merged commit b1990ed into Homebrew:master Mar 12, 2024
33 checks passed
@github-actions github-actions bot added the outdated PR was locked due to age label Apr 14, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 14, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
outdated PR was locked due to age
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants