-
-
Notifications
You must be signed in to change notification settings - Fork 631
fix: venv site packages with pkgutil packages #3268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: venv site packages with pkgutil packages #3268
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @rickeylev, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request addresses a critical issue where Bazel's virtual environment creation fails when dealing with pkgutil
-style namespace packages that attempt to install files into overlapping directories. The changes introduce a sophisticated file merging strategy that identifies conflicting paths and manually combines the relevant files, thereby preventing build errors and ensuring that such packages are correctly integrated into the virtual environment. This enhancement significantly improves the robustness of Python virtual environment support within Bazel.
Highlights
- Namespace Package Support: Implements a new mechanism to correctly handle
pkgutil
-style namespace packages within virtual environments, resolving Bazel build errors caused by overlapping file paths. - File Merging Logic: Introduces logic to detect and manually merge files from different distributions that target overlapping paths within the virtual environment.
- Refactored Venv Creation: Extracts and refactors virtual environment file creation logic into a new module,
venv_runfiles.bzl
, improving modularity and maintainability. - Enhanced
PyInfo
Provider: Adds afiles
field to theVenvSymlinkEntry
provider to track individual files for more granular merging of overlapping paths. - Comprehensive Testing: Adds new test cases specifically for
pkgutil
packages and the new file merging logic to ensure correctness.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
6b37ff5
to
df77fa5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a fix for handling pkgutil
-style namespace packages in venv site-packages, which can cause overlapping file paths. The approach of detecting and manually merging files from overlapping distributions is sound. The implementation is mostly solid, with good refactoring of venv creation logic into a new venv_runfiles.bzl
file and accompanying analysis tests.
However, I've found a critical bug in python/private/py_library.bzl
where the files
depset for a VenvSymlinkEntry
is populated with far too many files, which will likely cause performance issues and incorrect behavior. I've also included a couple of medium-severity suggestions to improve code clarity and correct a docstring. Once the critical issue is addressed, this should be a great improvement.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…lev/rules_python into fix.site.packages.overlap
Good call to factor out the code so it can be unit tested. Waaaay easier to generate test cases and verify behavior. |
So in the code right now, I didn't implement the "link_to_path=None means remove" logic, mostly because I forgot to. However, when I went back to add it, the semantics are more complicated than when we originally added it: there is now package, version, and kind to account for. Maybe we should just drop that functionality for the time being? Otherwise: I think the original removal logic has a weird edge case when multiple kinds are involved: only the corresponding kind is removed, not all of them. Which strikes me as a bit weird? e.g given
The result is a Maybe a more sensible behavior is to match by package first? IDK. There's 4 fields, (kind, venv_path, package, and version), which seems like quite a few to come up with intuitive, sensible, and reasonable behavior. One could write a custom rule to filter the PyInfo objects and propagate them. I'm also thinking, with better options for selecting what dependencies are selected, the reason for omitting entries goes away (multiple versions of something in the dependency closure). |
I think with the new transition API we can leave the removal of symlinks unimplemented for now on. People will be able to transition to the right hub easily enough. Otherwise, if we want to do it, we should do the matching by package first. |
This reverts commit fa95803.
…into fix.site.packages.overlap
Ok, removed the tests and logic for link removal. PTAL @gemini-code-assist /review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the venv creation logic to correctly handle pkgutil
-style namespace packages, where multiple distributions might install files into overlapping directory structures. This is a solid improvement that moves venv file creation into a new venv_runfiles.bzl
module and implements a file-merging strategy for these overlapping paths. The relative_path
function has also been helpfully centralized into common.bzl
. The changes are well-structured and supported by new analysis and integration tests. My feedback includes a few suggestions to improve the efficiency of some of the new path manipulation logic.
@gemini-code-assist Give a changelog description for this PR in markdown format, wrapped to 80 columns. Review CHANGELOG.md for existing style and format. |
Bug Fixes
|
Neat. It identified and suggested some improvements to the looping we have to do to identify paths and prefixes. I'm gonna hold off on making those changes, though; our test coverage is somewhat modest with these code paths and I'd like to get this fix in. |
Currently, an error occurs if one packages files are intended to go into a
sub-directory of another package's directory. This can happen when pkgutil-style
namespace packages are used, which results in multiple distributions wanting
to install the same files (pkgutil
__init__.py
files) into the same top-leveldirectories. This eventually results in a Bazel error because Bazel detects that
the one output is the prefix of another.
To fix, detect when distributions overlap in their paths and merge their files
manually. Internally, entries are sorted from shorted venv path to longest,
however, that's just an implementation detail.
Along the way, give agents better advice for bzl_library targets.
Fixes #3204