Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve consistency between Git and API formula handling #12936

Merged
merged 19 commits into from Jun 16, 2022

Conversation

Bo98
Copy link
Member

@Bo98 Bo98 commented Feb 28, 2022

I intended to open a PR for this a while back but never did.

It's WIP and I'll hopefully revisit it within the next month. There's been minimal changes since I last presented this (main change is keg_only_reason being now supported).

@Bo98 Bo98 added the in progress Maintainers are working on this label Feb 28, 2022
@BrewTestBot
Copy link
Member

Review period will end on 2022-03-01 at 14:36:56 UTC.

@BrewTestBot BrewTestBot added the waiting for feedback Merging is blocked until sufficient time has passed for review label Feb 28, 2022
Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing. A bunch of comments but would really like to land this sooner rather than later.

@@ -758,7 +758,7 @@ then
export HOMEBREW_DEVELOPER_MODE="1"
fi

if [[ -n "${HOMEBREW_INSTALL_FROM_API}" && -n "${HOMEBREW_DEVELOPER_COMMAND}" ]]
if [[ -n "${HOMEBREW_INSTALL_FROM_API}" && -n "${HOMEBREW_DEVELOPER_COMMAND}" && "${HOMEBREW_COMMAND}" != "irb" ]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we should turn this into a more explicit denylist or just remove this entirely?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could probably remove this, since I want to allow brew ruby too. @Rylan12 will have a better idea about this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh, that makes sense to me. bump, command, dispatch-build-bottle, generate-man-completions, install-bundler-gems, irb, linkage, pr-publish, prof, release, rubocop, ruby, sh, sponsors, style, tap-new, tests, typecheck, unpack, update-license-data, update-maintainers, update-test, vendor-gems all look like they should work without a tapped homebrew/core.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there's no reason to exclude HOMEBREW_INSTALL_FROM_API from devs for those commands (except maybe brew style formula and unpack), it's just easier to maintain that developers shouldn't use it. Otherwise, we need to maintain this list to make sure that if commands get an online component they are removed from the list (and vice versa).

Another, better option, is probably just to complain on a per-command basis. We can remove the restriction on developers and just fail immediately in brew bump-formula-pr and friends if the user has HOMEBREW_INSTALL_FROM_API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

brew style formula could still work for non-core formulae and brew unpack could still be useful to unpack even without a local formula IMO.

We can remove the restriction on developers and just fail immediately in brew bump-formula-pr and friends if the user has HOMEBREW_INSTALL_FROM_API.

This would work for me 👍🏻

Library/Homebrew/cmd/update.sh Outdated Show resolved Hide resolved
Library/Homebrew/formulary.rb Outdated Show resolved Hide resolved
Comment on lines +658 to +674
if !CoreTap.instance.installed? &&
Homebrew::EnvConfig.install_from_api? &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we want a testing mode where you can always install from the API, even if the CoreTap is installed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, though I'm not sure if it should be a part of this PR or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh, can punt on that being part of this PR if necessary. Just think it'd be a nice decoupling at some point.

Library/Homebrew/api/formula.rb Outdated Show resolved Hide resolved
@BrewTestBot
Copy link
Member

Review period ended.

@BrewTestBot BrewTestBot removed the waiting for feedback Merging is blocked until sufficient time has passed for review label Mar 1, 2022
Copy link
Member

@Rylan12 Rylan12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good to me!

Have you tested this with an install yet? I feel like something is missing still. Before, the bottles were downloaded using Homebrew::API::Bottle.fetch_bottles. This added some caching thing to Formulary so that BottleLoader would recognize the ref (I'm pretty sure) and load there. The API load happens last in Formulary::laoder_for, so I worry that there will be some unexpected consequences of that.

Also, we can probably remove Formulary.map_formula_name_to_local_bottle_path and the methods in Homebrew::API::Bottle (and maybe the entire API), right?

@Bo98 Bo98 changed the title Support offline usage under HOMEBREW_INSTALL_FROM_API Improve consistency between Git and API formula handling Jun 13, 2022
Copy link
Member

@Rylan12 Rylan12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've spent today taking a much more thorough look at this PR for the API project. Here are a few questions.

One thing that I'm still looking into is caching in Formulary. Currently, when you load a formula from a path, the formula class is cached in Formulary#cache under the path name. However, when loading from the API (as is currently set up), the class is still cached but in a different way. Instead of caching in Formulary#cache, we simply check the Formulary::FormulaNamespaceAPI module to see if the class is already defined and, if so, return it (although only after re-updating the build flags). I think this has the same effect, but I wonder if we should be consistent and cache in Formulary#cache like we do "normally."

Library/Homebrew/formulary.rb Outdated Show resolved Hide resolved
Library/Homebrew/formulary.rb Show resolved Hide resolved
Library/Homebrew/formulary.rb Outdated Show resolved Hide resolved
Library/Homebrew/api/formula.rb Outdated Show resolved Hide resolved
@Bo98
Copy link
Member Author

Bo98 commented Jun 13, 2022

Currently, when you load a formula from a path, the formula class is cached in Formulary#cache under the path name. However, when loading from the API (as is currently set up), the class is still cached but in a different way. Instead of caching in Formulary#cache, we simply check the Formulary::FormulaNamespaceAPI module to see if the class is already defined and, if so, return it (although only after re-updating the build flags). I think this has the same effect, but I wonder if we should be consistent and cache in Formulary#cache like we do "normally."

Caching is something I've largely ignored. I feel like we should probably investigate what we have as we currently have three caching mechanisms: factory caching, Formulary#cache and the namespace management. The latter is needed for marhsalling reasons etc and wasn't introduced for caching (but is able to work as one), so it's worth investigating whether the former two is actually adding much on top of that.

One thing to remember: we should make sure a formula loaded from file and a formula created via API is not seen as the same cache entry. The scenario where this might happen is a build-from-source flow (theoretically, as this doesn't actually exist yet).

@MikeMcQuaid
Copy link
Member

Caching is something I've largely ignored. I feel like we should probably investigate what we have as we currently have three caching mechanisms: factory caching, Formulary#cache and the namespace management. The latter is needed for marhsalling reasons etc and wasn't introduced for caching (but is able to work as one), so it's worth investigating whether the former two is actually adding much on top of that.

Agreed that this is worth investigating 👍🏻

@Rylan12
Copy link
Member

Rylan12 commented Jun 14, 2022

I've looked a bit more into the caching and it looks like we do two things:

  • We always cache when loading a formula from a file path. When loading a formula from a file, we first check to see if the path is in the cache and if so simply return the class that is cached. Here, the formula class (e.g. Formulary::FormulaNamespaceb684604c8244a5905bc797f4e22cc31f::Wget) is cached
  • We sometimes cache all formulae, regardless of how they're loaded. This is the "factory cache" and needs to be explicitly enabled (which is only done in uses, deps, and unbottled at the moment). When enabled, we create a cache key from the parameters passed to ::factory and compare that with the factory cache. If there's a match, we return that formula and skip the rest of the loading process. Here, the formula instance is cached
  • If we load from the API or a formula's contents (i.e. from a bottle), we don't have any caching

I'd suggest that we scope the cache to be type-dependent. Meaning, having a separate cache for loading from path and from the API. That way, there's no risk of accidental overlap if we somehow try to load the same formula from the API and a file.

We also could add caching when loading from a bottle, potentially using the bottle path as a cache key. We could also use e.g. a hash of the contents as a cache key. This might help speed up loading from a bottle since we won't need to read the file contents each time, but is also probably outside the scope of this PR.

@Rylan12
Copy link
Member

Rylan12 commented Jun 14, 2022

Okay, I think I'm done with this for today. At the moment, loading from the API does work. I was successfully able to uninstall and reinstall formulae.

For consistency, one important thing to note is that loading formulae and casks from the API will take precedence over loading from an installed keg/cask. This is intentional since it mimics the way things work without the API: the most recent version is loaded, even if an installed version is older. Doing this will allow lots of if Homebrew::EnvConfig.install_from_api? calls to be removed since the formula that's loaded will always be assumed to be the latest version.

I'm still working through all of those changes, and I'll mark this PR as "ready" once I've made those and have done more testing. But, I'll gladly accept feedback on what's been done so far since I've made some more substantial changes to the original commits.

With these changes, I've also been able to remove the Homebrew::API::Bottle code since we don't really need the bottle API anymore. Eventually, I think it may make sense to remove that API altogether since it has several flaws (e.g. it doesn't include build/test dependencies, doesn't know that different OSes can have different information, etc.) and I don't think it really provides any information that can't be found using our other APIs. That can be a conversation for the future, though.

Overall, I'm very pleased with how this approach is looking since I've been able to remove a ton of those conditionals. It makes everything feel much more integrated and less like an add-on.

@Bo98
Copy link
Member Author

Bo98 commented Jun 15, 2022

For consistency, one important thing to note is that loading formulae and casks from the API will take precedence over loading from an installed keg/cask. This is intentional since it mimics the way things work without the API: the most recent version is loaded, even if an installed version is older.

Yeah I agree. Formula files in kegs can get stale. Casks use their equivalent a lot more than the formula side and there have been countless bugs caused by that due to our deprecation turnover. One of the goals here was to avoid needing to use them.

+ we need the latest information anyway for brew outdated etc

Overall, I'm very pleased with how this approach is looking since I've been able to remove a ton of those conditionals. It makes everything feel much more integrated and less like an add-on.

Excellent. That's exactly what I wanted to see the API code become. It's a lot easier to maintain not having to have two different code paths everywhere. The idea was to fix brew info etc without actually touching cmd/info.rb.

@Rylan12 Rylan12 marked this pull request as ready for review June 15, 2022 21:12
Copy link
Member

@Rylan12 Rylan12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR is now ready to move out of the draft stage. I've done some testing locally, and haven't encountered any issues. It feels super seamless and much more stable. Plus, a ton of code was able to be removed so that the only places where we need to check whether HOMEBREW_INSTALL_FROM_API is set are in Formulary, Cask::CaskLoader, Caskroom, and a few places only to make sure that having homebrew/core and homebrew/cask untapped isn't an issue.

There are still a few things that I want to work on that should happen in separate PRs:

  • Removing the restrictions on HOMEBREW_INSTALL_FROM_API for developers (except for certain commands that need full clones)
  • The new brew update process will need to be looked at to make sure that things like tap/formula migrations still are noticed, and potentially failing brew update if the cached formula.json file can't be downloaded
  • There are certain commands (e.g. brew update) that feel like they run slower now since they need to parse the huge formula.json file. I'm not sure yet what the best solution is, but I wonder if we can further improve performance for some of these commands.
  • Adding a way to test without needing to move homebrew/cask and homebrew/cask so that they aren't installed

@Bo98
Copy link
Member Author

Bo98 commented Jun 15, 2022

  • The new brew update process will need to be looked at to make sure that things like tap/formula migrations still are noticed

Is there even an API endpoint for that yet?

  • There are certain commands (e.g. brew update) that feel like they run slower now since they need to parse the huge formula.json file.

How long does parsing the file take?


One thing to address at some point (not now) is download integrity.

This could be using standards like JWS, or potentially something more custom if we really want to shoehorn it into existing endpoints.

@Rylan12
Copy link
Member

Rylan12 commented Jun 15, 2022

Is there even an API endpoint for that yet?

Nope

How long does parsing the file take?

Here are the results of some tests I just ran using hyperfine:

Command Average Time Without API Average Time With API
brew outdated 702.9 ms 5.076 s
brew ruby -e 'Formulary.factory("abcde")' 935.7 ms 1.416 s

I wonder if there's something else going on in brew update that explains why it takes 70 times longer. More testing can definitely be done.

One thing to address at some point (not now) is download integrity.

This could be using standards like JWS, or potentially something more custom if we really want to shoehorn it into existing endpoints.

Good point, thanks for bringing it up. I don't really know anything about this kind of thing so I'll have to look into it more in the future

@Bo98
Copy link
Member Author

Bo98 commented Jun 16, 2022

Good point, thanks for bringing it up. I don't really know anything about this kind of thing so I'll have to look into it more in the future

I know about JWS at least so if we go that route feel free to ask me about it at the time.

@MikeMcQuaid
Copy link
Member

For consistency, one important thing to note is that loading formulae and casks from the API will take precedence over loading from an installed keg/cask. This is intentional since it mimics the way things work without the API: the most recent version is loaded, even if an installed version is older.

Yeah I agree. Formula files in kegs can get stale. Casks use their equivalent a lot more than the formula side and there have been countless bugs caused by that due to our deprecation turnover. One of the goals here was to avoid needing to use them.

  • we need the latest information anyway for brew outdated etc

Also agreed 👍🏻

I wonder if there's something else going on in brew update that explains why it takes 70 times longer. More testing can definitely be done.

Tried playing with brew prof here? If the answer is "I'm not sure how to do that": shout and I'll give you a hand.

@@ -764,7 +764,7 @@ then
export HOMEBREW_DEVELOPER_MODE="1"
fi

if [[ -n "${HOMEBREW_INSTALL_FROM_API}" && -n "${HOMEBREW_DEVELOPER_COMMAND}" ]]
if [[ -n "${HOMEBREW_INSTALL_FROM_API}" && -n "${HOMEBREW_DEVELOPER_COMMAND}" && "${HOMEBREW_COMMAND}" != "irb" ]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻 for now. I'm thinking we may want to have a longer list of commands we allow here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already have a PR in the works for this

Library/Homebrew/cmd/update.sh Outdated Show resolved Hide resolved
Library/Homebrew/formulary.rb Show resolved Hide resolved
Library/Homebrew/formulary.rb Outdated Show resolved Hide resolved
Library/Homebrew/formulary.rb Outdated Show resolved Hide resolved
@MikeMcQuaid
Copy link
Member

Great work @Rylan12 and @Bo98. Happy to see this merged as-is and we can iterate further!

@Rylan12 Rylan12 merged commit d23dba6 into Homebrew:master Jun 16, 2022
@Rylan12
Copy link
Member

Rylan12 commented Jun 16, 2022

Great! Thanks for getting this started and helping out, @Bo98!

@Bo98 Bo98 deleted the api-offline branch June 16, 2022 20:12
@github-actions github-actions bot added the outdated PR was locked due to age label Jul 17, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
in progress Maintainers are working on this outdated PR was locked due to age
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants