Skip to content

resource_auditor: audit PyPI resources that exist as formulae#21731

Open
botantony wants to merge 1 commit intomainfrom
audit-pypi-packages
Open

resource_auditor: audit PyPI resources that exist as formulae#21731
botantony wants to merge 1 commit intomainfrom
audit-pypi-packages

Conversation

@botantony
Copy link
Member

@botantony botantony commented Mar 13, 2026

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your changes? Here's an example.
  • Have you successfully run brew lgtm (style, typechecking and tests) with your changes locally?

  • AI was used to generate or assist with generating this PR. Please specify below how you used AI to help you, and what steps you have taken to manually verify the changes.

Some widely-used PyPI packages are available in Homebrew as dependencies for other Python-based formulae. We encourage their use either because they take a lot of time to build (f.e. pydantic or scipy) or we don't want to do hundreds of revision bumps when new security updates come out (f.e. cryptography or certifi). The problem I see with new contributors is that they don't know it. A lot of the time, they read the cookbook, create a Python-based formula, and it passes audit and tests. They did nothing wrong, but a maintainer still have to point out, "Hey, numpy takes a lot of time to build, and it exists as a formula, let's use it instead". I'd rather add an audit for such cases and make exceptions for formulae where it cannot be used

I'd also take a look at Python for Formula Authors but it should be revised in another PR

Copilot AI review requested due to automatic review settings March 13, 2026 20:00
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds/renames a PyPI resource audit in ResourceAuditor and integrates it into FormulaAuditor to (a) validate resource names against PyPI filenames and (b) flag certain PyPI resources that should be replaced by Homebrew dependencies, with tap-level exceptions.

Changes:

  • Rename audit_resource_name_matches_pypi_package_name_in_url to audit_pypi_resources and extend it to flag dependency-replacement candidates.
  • Add pypi_resources_allowlist tap audit exception handling to skip the PyPI resource audit for specific resources.
  • Extend formula_auditor_spec with new examples covering dependency-replacement reporting, allowlist behavior, and skipping top-level formula PyPI URLs.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
Library/Homebrew/test/formula_auditor_spec.rb Adds spec coverage for dependency-replacement auditing and allowlist behavior.
Library/Homebrew/resource_auditor.rb Implements audit_pypi_resources, adds a dependency candidate set, and emits a new audit message.
Library/Homebrew/formula_auditor.rb Wires tap exceptions into resource auditing by conditionally extending the except list.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@botantony botantony force-pushed the audit-pypi-packages branch 2 times, most recently from 27650b5 to c445820 Compare March 13, 2026 20:10
@botantony
Copy link
Member Author

formula audit failure is expected, if we agree to merge this I'll review failing formulae and either add them to the exception list or replace resources with dependencies

Some widely-used PyPI packages are available in Homebrew as dependencies
for other Python-based formulae. We encourage their use either because
they take a lot of time to build (f.e. `pydantic` or `scipy`) or we
don't want to do hundreds of revision bumps when new security updates
come out (f.e. `cryptography` or `certifi`). The problem I see with new
contributors is that they don't know it. A lot of the time, they read
the cookbook, create a Python-based formula, and it passes audit and
tests. They did nothing wrong, but a maintainer still have to point out,
"Hey, numpy takes a lot of time to build, and it exists as a formula,
let's use it instead". I'd rather add an audit for such cases and make
exceptions for formulae where it cannot be used

I'd also take a look at [Python for Formula
Authors](https://docs.brew.sh/Python-for-Formula-Authors) but it should
be revised in another PR

Signed-off-by: botantony <antonsm21@gmail.com>
@botantony botantony force-pushed the audit-pypi-packages branch from c445820 to 67ccd0d Compare March 13, 2026 20:15
@botantony botantony changed the title resource_auditor: audit PyPI resources that exist as dependencies resource_auditor: audit PyPI resources that exist as formulae Mar 14, 2026
Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good so far, a few suggestions.

Comment on lines +734 to +741
allowed_pypi_packages = formula.tap&.audit_exception(:pypi_resources_allowlist, formula.name)
allowed_pypi_packages = case allowed_pypi_packages
when String
allowed_pypi_packages.split(/\s+/i).to_set
else
Set.new
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the case here is a bit weird; an if/else here would make more sense and can use .blank? or .presence to avoid calling on an empty string. Also, I'd just take the nil/empty array/full array and use Set.new for all to avoid branching.

Comment on lines +764 to +768
except = if allowed_pypi_packages.include?(resource.name)
@except.to_a + ["pypi_resources"]
else
@except
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
except = if allowed_pypi_packages.include?(resource.name)
@except.to_a + ["pypi_resources"]
else
@except
end
except = @except
except = [*Array(except), "pypi_resources"] if allowed_pypi_packages.include?(resource.name)

or similar. can simplify more if @except is already an array.

class ResourceAuditor
include Utils::Curl

DEPENDENCY_PACKAGES = Set.new(%w[certifi cffi cryptography numpy pillow pydantic rpds-py scipy torch]).freeze
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid a hardcoded list here. A new JSON file in homebrew/core would be better.

return if DEPENDENCY_PACKAGES.exclude?(pypi_package_name.to_s.downcase)

problem "`resource` name should be '#{pypi_package_name}' to match the PyPI package name"
problem "PyPI package should be replaced with Homebrew dependency and excluded using `pypi_package` method"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to spell out the specific depends_on line

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants