Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Cached module to simplify caching code. #16671

Closed
wants to merge 1 commit into from

Conversation

reitermarkus
Copy link
Member

@reitermarkus reitermarkus commented Feb 15, 2024

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your changes? Here's an example.
  • Have you successfully run brew style with your changes locally?
  • Have you successfully run brew typecheck with your changes locally?
  • Have you successfully run brew tests with your changes locally?

Also added a separate version of Cachable#cache which takes a block and automatically freezes the cached value.

@reitermarkus reitermarkus force-pushed the tap-cache branch 4 times, most recently from 5425d28 to 49f33a7 Compare February 15, 2024 21:58
@apainintheneck
Copy link
Contributor

I'm not sure that it's intuitive to include and extend the same module in different places in the codebase especially if we're planning on doing both in the same class. It might be worth thinking up some way of differentiating the instance cache from the class/module cache.

Interestingly enough we do have this pattern in a few places in the codebase but it's never truly an instance cache.

$ cd $(brew --repo)
$ git grep --no-index -C 4 'include Cachable'
Library/Homebrew/api/cask.rb-    #
Library/Homebrew/api/cask.rb-    # @api private
Library/Homebrew/api/cask.rb-    module Cask
Library/Homebrew/api/cask.rb-      class << self
Library/Homebrew/api/cask.rb:        include Cachable
Library/Homebrew/api/cask.rb-
Library/Homebrew/api/cask.rb-        private :cache
Library/Homebrew/api/cask.rb-
Library/Homebrew/api/cask.rb-        sig { params(token: String).returns(Hash) }
--
Library/Homebrew/api/formula.rb-    #
Library/Homebrew/api/formula.rb-    # @api private
Library/Homebrew/api/formula.rb-    module Formula
Library/Homebrew/api/formula.rb-      class << self
Library/Homebrew/api/formula.rb:        include Cachable
Library/Homebrew/api/formula.rb-
Library/Homebrew/api/formula.rb-        private :cache
Library/Homebrew/api/formula.rb-
Library/Homebrew/api/formula.rb-        sig { params(name: String).returns(Hash) }
--
Library/Homebrew/readall.rb-#
Library/Homebrew/readall.rb-# @api private
Library/Homebrew/readall.rb-module Readall
Library/Homebrew/readall.rb-  class << self
Library/Homebrew/readall.rb:    include Cachable
Library/Homebrew/readall.rb-    include SystemCommand::Mixin
Library/Homebrew/readall.rb-
Library/Homebrew/readall.rb-    # TODO: remove this once the `MacOS` module is undefined on Linux
Library/Homebrew/readall.rb-    MACOS_MODULE_REGEX = /\b(MacOS|OS::Mac)(\.|::)\b/

@reitermarkus
Copy link
Member Author

not sure that it's intuitive to include and extend the same module

I disagree, that's just how Ruby works, but it is actually quite surprising that all existing usages so far are class-level caches.

Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good other than BracesRequiredMethods change.

Relatedly: I appreciate the renewed enthusiasm @reitermarkus but would appreciate if you can prioritise e.g. fixing the newly flaky tests from previous refactors (see BuildPulse or Slack). Thanks!

Library/.rubocop.yml Outdated Show resolved Hide resolved
Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many breaking refactors recently. Let's hold off on this until after the next stable tag, please.

@apainintheneck
Copy link
Contributor

not sure that it's intuitive to include and extend the same module

I disagree, that's just how Ruby works, but it is actually quite surprising that all existing usages so far are class-level caches.

Fair enough. Maybe I'm overthinking this.

@apainintheneck
Copy link
Contributor

There are a few places in the code that use cache.fetch(:key) { logic } and those can probably be cleaned up after this change too.

MikeMcQuaid
MikeMcQuaid previously approved these changes Feb 19, 2024
Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied the one change I cared about (which will likely break syntax job). Once syntax fixed up: 👍🏻 to merge (although, as mentioned elsewhere, please do so when around for 2-4 hours afterwards to fix up any reported regressions).

@reitermarkus reitermarkus force-pushed the tap-cache branch 11 times, most recently from fe332f8 to c28005f Compare February 23, 2024 01:20
Copy link
Contributor

@apainintheneck apainintheneck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reitermarkus You're crazy for trying to solve this with meta programming but I really like the result so I guess you had the right approach. I think that this can be simplified a bit be restricting the arity of methods that can be cached. It'd be really simple to add a runtime check to restrict this to methods without arguments with method.arity.zero?; doing it purely in tests is also possible but a little trickier.

Comment on lines +18 to +20
extend Cached
include Cached::Clear
extend Cached::Clear
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to use the included or extended callback to simplify this a little bit?

Ideally all that'd be necessary here would be to add include Cached or extend Cached and everything would be set up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be possible, but does Sorbet support following this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea. @dduugg do you happen to know if this works as expected? We do use the included callback in one part of the code but the methods are generated so they are not typed by Sorbet.

sig { params(_base: Class).void }
def self.included(_base)
raise "Do not include `OnSystem` directly. Instead, include `OnSystem::MacOSAndLinux` or `OnSystem::MacOSOnly`"
end
module MacOSAndLinux
sig { params(base: Class).void }
def self.included(base)
OnSystem.setup_arch_methods(base)
OnSystem.setup_base_os_methods(base)
OnSystem.setup_macos_methods(base)
end
end
module MacOSOnly
sig { params(base: Class).void }
def self.included(base)
OnSystem.setup_arch_methods(base)
OnSystem.setup_macos_methods(base)
end
end

Copy link
Sponsor Member

@dduugg dduugg Feb 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't have the bandwidth to look at this specific code in detail anytime soon, but often the answer to making included hook functionality sorbet-compatible is to use mixes_in_class_methods: https://sorbet.org/docs/abstract#interfaces-and-the-included-hook

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dduugg! That's very helpful.

Comment on lines +16 to +48
sig { params(method: Symbol).returns(Symbol) }
def cached(method)
uncached_instance_method = instance_method(method)

define_method(method) do |*args, **options, &block|
@cached_method_calls ||= T.let({}, T.nilable(T::Hash[Symbol, T::Hash[T.untyped, T.untyped]]))
cache = @cached_method_calls[method] ||= {}

key = [args, options, block]
if cache.key?(key)
cache.fetch(key)
else
cache[key] = uncached_instance_method.bind(self).call(*args, **options, &block)
end
end
end

sig { params(method: Symbol).returns(Symbol) }
def cached_class_method(method)
uncached_singleton_method = singleton_method(method)

define_singleton_method(method) do |*args, **options, &block|
@cached_method_calls ||= T.let({}, T.nilable(T::Hash[Symbol, T::Hash[T.untyped, T.untyped]]))
cache = @cached_method_calls[method] ||= {}

key = [args, options, block]
if cache.key?(key)
cache[key]
else
cache[key] = uncached_singleton_method.call(*args, **options, &block)
end
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it really makes sense for cached methods to take arguments so this can probably be simplified a bit more. None of the cache methods in Tap take arguments right now. At the very least, I think it's a bad idea to allow cached methods to take blocks. I don't think it's be terribly difficult to add tests to make sure that methods are defined that way too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm a bit surprised that you can use a block as a hash key. I guess I'd just never considered even attempting that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it really makes sense for cached methods to take arguments so this can probably be simplified a bit more. None of the cache methods in Tap take arguments right now. At the very least, I think it's a bad idea to allow cached methods to take blocks.

Agreed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it does make sense, for example Tap::_fetch will cache it based on user and repo.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The block may not be needed though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this functionality unless it's needed in 3+ places.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also a pattern in the code to create complex hash keys by concatenating strings with dashes. I'm not sure if this is preferable though to just using arrays of objects. Does anyone have strong feelings either way?

broken_tap(core_cask_tap) || examine_git_origin(core_cask_tap.git_repo, core_cask_tap.remote)
broken_tap(core_cask_tap) || examine_git_origin(core_cask_tap.git_repo, T.must(core_cask_tap.remote))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What changed to make this necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually returning super in CoreTap#remote when HOMEBREW_NO_INSTALL_FROM_API is set.

Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given (more) recent user regressions around refactoring and how this PR is adding more code than it's removing I want to step back and have a discussion about the motivation here.

I am not convinced this makes the code significantly easier to follow or that this is not going to cause more bugs.

# frozen_string_literal: true

module Cached
module Clear
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is a new module needed now? Is this only used once?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is a new module needed now?

To make the code more readable and easier to maintain, i.e. not having to manually keep track of which cached variables need to be cleared.

Is this only used once?

So far only in Tap, but it should be possible to replace Cachable with this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be good to use this in 2+ places if it's added here to verify the API is sufficiently generic and better test the edges.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far only in Tap, but it should be possible to replace Cachable with this.

I think having both Cachable and Cached is a bit confusing especially if they are used in the same module. The long-term goal should be to only have one. I prefer the simplicity of Cachable in most cases. Cached is not as flexible and that means once we try to cache methods that take arguments the interface is not as clean and it's possible that we might need to add helper methods which actually add complexity.

I like the approach and I think the code is cool but I'm not sure it's worth the cost here from a maintainability point of view.

As a quick comparison of Cachable and Cached, I'll show some examples of things that would be easier and things that would be harder.

# Now (on master)
def git_head
  raise TapUnavailableError, name unless installed?

  @git_head ||= git_repo.head_ref
end

# Cached
cached def git_head
  raise TapUnavailableError, name unless installed?

  git_repo.head_ref
end

# Cachable
def git_head
  raise TapUnavailableError, name unless installed?

  cache[:git_head] ||= git_repo.head_ref
end

These functions are all similar complexity. The advantage of caching is being able to clear the cache with one easy method which would be provided by both Cached and Cachable.

Note: Cached and Cachable are not exactly equivalent here. Cached will only be evaluated once while Cachable will check if the tap is installed each time though the head ref will be cached.

Here's another example.

# Now (on master)
def audit_exceptions
  @audit_exceptions ||= begin
    ensure_installed!
    super
  end
end

# Cached
cached def audit_exceptions
  ensure_installed!
  super
end

# Cachable
def audit_exceptions
  cache[:audit_exceptions] ||= begin
    ensure_installed!
    super
  end
end

The complexity is not much different in this case either. I would argue that it's also easier to understand at a glance since it uses basic Ruby patterns as well but that's very subjective.

One fine example, that's a bit different.

# Cachable (on master)
def self.fetch(user, repo)
  ...

  cache_key = "#{user}/#{repo}".downcase
  cache.fetch(cache_key) { |key| cache[key] = Tap.new(user, repo) }
end

# Cached
def self.fetch(user, repo)
  ...

  _fetch(user, repo)
end

private_class_method cached_class_method def self._fetch(user, repo)
  new(user, repo)
end

The Tap.fetch method shows that Cached is a not as flexible and in this case that means we need to create an additional method. One method is not the end of the world, of course, but it does suggest that this doesn't always simplify the code.

I feel like the simplest approach would just be to use Cachable without any modifications.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The long-term goal should be to only have one.

I agree with most of the above except: I don't think this should be "long-term", I would really like to avoid having both existing separately for any time/at all.

@MikeMcQuaid MikeMcQuaid dismissed their stale review February 23, 2024 08:56

No longer applies

@reitermarkus
Copy link
Member Author

restrict this to methods without arguments with method.arity.zero?

I tried this in the beginning, but that actually only works when not using Sorbet at runtime because Sorbet will always wrap all methods with *args.

@reitermarkus reitermarkus changed the title Use Cachable in Tap. Add Cached module to simplify caching code. Feb 23, 2024
@reitermarkus reitermarkus mentioned this pull request Feb 23, 2024
7 tasks
@apainintheneck
Copy link
Contributor

restrict this to methods without arguments with method.arity.zero?

I tried this in the beginning, but that actually only works when not using Sorbet at runtime because Sorbet will always wrap all methods with *args.

Lovely... it sometimes feels like I'm fighting Sorbet when I propose generic solutions.

Copy link
Contributor

@apainintheneck apainintheneck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you can get the arity and parameters for a method wrapped by Sorbet by using the T::Utils.signature_for_method helper. Not sure if that's really needed here but at least it's interesting.

brew(main):011:0> Formulary.method(:factory).arity
=> -1
brew(main):012:0> Formulary.method(:factory).parameters
=> [[:rest, :args], [:keyrest, :**], [:block, :blk]]
brew(main):013:0> T::Utils.signature_for_method(Formulary.method(:factory)).req_arg_count
=> 1
brew(main):014:0> T::Utils.signature_for_method(Formulary.method(:factory)).parameters
=>
[[:req, :ref],
 [:opt, :spec],
 [:key, :alias_path],
 [:key, :from],
 [:key, :warn],
 [:key, :force_bottle],
 [:key, :flags],
 [:key, :ignore_errors]]

Another way to get around it is by disabling runtime checks but I doubt we want to do that.

I found that by looking here: https://sorbet.org/docs/runtime#tsigwithoutruntimesig

Edit: There is also a T::Utils.arity(method) helper as well.

Comment on lines +60 to 63
sig { params(user: String, repo: String).returns(T.attached_class) }
private_class_method cached_class_method def self._fetch(user, repo)
new(user, repo)
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, I think we should avoid making this a pattern. Ideally the original method would be able to be cached directly.

@apainintheneck apainintheneck mentioned this pull request Feb 25, 2024
7 tasks
@apainintheneck
Copy link
Contributor

I'll throw another tangentially related idea out there. Maybe it'd be good to somehow programmatically collect all classes and modules that mix in Cachable and then clear those caches in-between tests. That would make it harder to forget to clear the cache of some module. This probably doesn't need to happen at the instance level though. To be clear, I'm not suggesting that this should be included in this PR. I'm just thinking out loud.

Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@github-actions github-actions bot added the stale No recent activity label Mar 18, 2024
@MikeMcQuaid
Copy link
Member

Passing on this, sorry.

@github-actions github-actions bot added the outdated PR was locked due to age label Apr 18, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 18, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
outdated PR was locked due to age stale No recent activity
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants