Skip to content

Conversation

rickeylev
Copy link
Collaborator

@rickeylev rickeylev commented Sep 14, 2025

Currently, an error occurs if one packages files are intended to go into a
sub-directory of another package's directory. This can happen when pkgutil-style
namespace packages are used, which results in multiple distributions wanting
to install the same files (pkgutil __init__.py files) into the same top-level
directories. This eventually results in a Bazel error because Bazel detects that
the one output is the prefix of another.

To fix, detect when distributions overlap in their paths and merge their files
manually. Internally, entries are sorted from shorted venv path to longest,
however, that's just an implementation detail.

Along the way, give agents better advice for bzl_library targets.

Fixes #3204

@rickeylev rickeylev requested a review from aignas as a code owner September 14, 2025 20:47
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @rickeylev, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical issue where Bazel's virtual environment creation fails when dealing with pkgutil-style namespace packages that attempt to install files into overlapping directories. The changes introduce a sophisticated file merging strategy that identifies conflicting paths and manually combines the relevant files, thereby preventing build errors and ensuring that such packages are correctly integrated into the virtual environment. This enhancement significantly improves the robustness of Python virtual environment support within Bazel.

Highlights

  • Namespace Package Support: Implements a new mechanism to correctly handle pkgutil-style namespace packages within virtual environments, resolving Bazel build errors caused by overlapping file paths.
  • File Merging Logic: Introduces logic to detect and manually merge files from different distributions that target overlapping paths within the virtual environment.
  • Refactored Venv Creation: Extracts and refactors virtual environment file creation logic into a new module, venv_runfiles.bzl, improving modularity and maintainability.
  • Enhanced PyInfo Provider: Adds a files field to the VenvSymlinkEntry provider to track individual files for more granular merging of overlapping paths.
  • Comprehensive Testing: Adds new test cases specifically for pkgutil packages and the new file merging logic to ensure correctness.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@rickeylev rickeylev force-pushed the fix.site.packages.overlap branch from 6b37ff5 to df77fa5 Compare September 14, 2025 20:48
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a fix for handling pkgutil-style namespace packages in venv site-packages, which can cause overlapping file paths. The approach of detecting and manually merging files from overlapping distributions is sound. The implementation is mostly solid, with good refactoring of venv creation logic into a new venv_runfiles.bzl file and accompanying analysis tests.

However, I've found a critical bug in python/private/py_library.bzl where the files depset for a VenvSymlinkEntry is populated with far too many files, which will likely cause performance issues and incorrect behavior. I've also included a couple of medium-severity suggestions to improve code clarity and correct a docstring. Once the critical issue is addressed, this should be a great improvement.

@rickeylev
Copy link
Collaborator Author

Good call to factor out the code so it can be unit tested. Waaaay easier to generate test cases and verify behavior.

@rickeylev
Copy link
Collaborator Author

So in the code right now, I didn't implement the "link_to_path=None means remove" logic, mostly because I forgot to.

However, when I went back to add it, the semantics are more complicated than when we originally added it: there is now package, version, and kind to account for. Maybe we should just drop that functionality for the time being?

Otherwise: I think the original removal logic has a weird edge case when multiple kinds are involved: only the corresponding kind is removed, not all of them. Which strikes me as a bit weird? e.g given

kind=LIB, venv_path=foo, link_to_path=None
kind=BIN venv_path=foo-cli, package=foo, version=1.0, ...
kind=LIB, venv_path=foo, package=foo, version=1.0, ...

The result is a bin/foo-cli still gets populated, but site-packages/foo doesn't. Which strikes me as surprising.

Maybe a more sensible behavior is to match by package first? IDK. There's 4 fields, (kind, venv_path, package, and version), which seems like quite a few to come up with intuitive, sensible, and reasonable behavior. One could write a custom rule to filter the PyInfo objects and propagate them. I'm also thinking, with better options for selecting what dependencies are selected, the reason for omitting entries goes away (multiple versions of something in the dependency closure).

@aignas
Copy link
Collaborator

aignas commented Sep 15, 2025

I think with the new transition API we can leave the removal of symlinks unimplemented for now on. People will be able to transition to the right hub easily enough.

Otherwise, if we want to do it, we should do the matching by package first.

@rickeylev
Copy link
Collaborator Author

Ok, removed the tests and logic for link removal. PTAL

@gemini-code-assist /review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the venv creation logic to correctly handle pkgutil-style namespace packages, where multiple distributions might install files into overlapping directory structures. This is a solid improvement that moves venv file creation into a new venv_runfiles.bzl module and implements a file-merging strategy for these overlapping paths. The relative_path function has also been helpfully centralized into common.bzl. The changes are well-structured and supported by new analysis and integration tests. My feedback includes a few suggestions to improve the efficiency of some of the new path manipulation logic.

@rickeylev
Copy link
Collaborator Author

@gemini-code-assist Give a changelog description for this PR in markdown format, wrapped to 80 columns. Review CHANGELOG.md for existing style and format.

Copy link
Contributor

Bug Fixes

  • venv: Correctly handle pkgutil-style namespace packages in venv
    site packages. Previously, overlapping paths from different distributions
    could lead to Bazel errors. The venv creation logic now detects and
    merges these overlapping files, ensuring proper installation. This
    also improves the handling of bzl_library targets.

@rickeylev
Copy link
Collaborator Author

Neat. It identified and suggested some improvements to the looping we have to do to identify paths and prefixes. I'm gonna hold off on making those changes, though; our test coverage is somewhat modest with these code paths and I'd like to get this fix in.

@rickeylev rickeylev added this pull request to the merge queue Sep 16, 2025
Merged via the queue into bazel-contrib:main with commit cbc8774 Sep 16, 2025
4 checks passed
@rickeylev rickeylev deleted the fix.site.packages.overlap branch September 16, 2025 06:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Using venvs_site_packages=yes results in overlapping action output paths error
2 participants