Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bzlmod pip extension generates a different repo if the python.toolchain version is specified in X.Y.Z format #1339

Closed
aignas opened this issue Jul 24, 2023 · 3 comments

Comments

@aignas
Copy link
Collaborator

aignas commented Jul 24, 2023

I would expect the following change to be benign:

diff --git a/examples/bzlmod/MODULE.bazel b/examples/bzlmod/MODULE.bazel
index df88ae8..2810592 100644
--- a/examples/bzlmod/MODULE.bazel
+++ b/examples/bzlmod/MODULE.bazel
@@ -32,7 +32,7 @@ python.toolchain(
 # work in progress.
 python.toolchain(
     configure_coverage_tool = True,
-    python_version = "3.10",
+    python_version = "3.10.9",
 )

 # You only need to load this repositories if you are using multiple Python versions.

However, it seems that it is not the case. The issue is present on both, main and 0.24.0 releases. We may be able to use #1328 in order to fix this.

aignas added a commit to aignas/rules_python that referenced this issue Jul 24, 2023
Before this PR the `bzlmod` example would only work if the Python
interpreter version is specified in the `X.Y` format. However, users may
desire to use the full version (`X.Y.Z`) for reproducibility reasons and
expect everything to continue to work.

This fixes the versions used everywhere by leveraging functionality
introduced in bazelbuild#1328 and fixes the `python_versions` to always contain
`@python_versions//X.Y:defs.bzl` imports. NOTE, bazelbuild#1328 also fixed the
case where the Python-version specific `pip` hub repos would be named
`<hub_name>_XYZ` as opposed to the `<hub_name>_XY`, which is expected
across the codebase.

Breaking for `bzlmod` users who previously used
`@python_versions//X.Y.Z:defs.bzl` in their BUILD.bazel files and/or
depended on the `@pip_XYZ//:requirements.bzl`.

This essentially limits the users of the `@python_versions` to a single
`X.Y` toolchain version, whilst the full functionality should still be
available by using the `python_version` parameter in the `py_binary` and
`py_library` rules.

Fixes bazelbuild#1339.
@aignas
Copy link
Collaborator Author

aignas commented Jul 29, 2023

I am thinking about this a little more and I am wondering if it is actually a little more difficult than I expected, so here is my brain dump.

Right now in rules_python each version of the py_toolchain have separate config_settings and it means that we can differentiate between different X.Y.Z version in the select statements and we can have 3.9.5 and 3.9.6 registered along side each other. In turn, if we do pip.parse with 3.9.5 and generate a select statement for the pkg target using whl_library_alias or similar, then we will only be able to use that in our resultant targets. However, in reality it may be that a .whl built using 3.9.5 is compatible either with any Python or with cp39 depending on the underlying code itself. I am using here cp39 because that is the ABI of the hermetic toolchain that we are using and I think that it is important to distinguish the flavour of the python interpreter, in this case it is much more important to know that it is a CPython rather than RustPython or something similar.

So technically one could argue that it could always be possible to make a 3.9.6 toolchain work with a pip.parse target built with 3.9.5, however in our implementation this won't be the case, because the select statement will ensure that there is no match when we try to access a target from pip.parse using a 3.9.6 toolchain when the pip.parse specifically depends on 3.9.5.

I am only familiar with basic select usage and maybe there could be a way to make the toolchain expose 2 config settings and use those to do the matching, but I would like to hear more from more experienced people. I am thinking that we should have the render_pkg_alias or whl_library_alias implementation use:

select({
    "@rules_python//python/config_settings:cp39": "@pypi_cp39_foo//:pkg,
    "@rules_python//python/config_settings:cp38": "@pypi_cp38_foo//:pkg,
})

That means that we should do the following with our toolchains:

  • Instead of printing warning for any toolchain that gets registered twice, we print warning for any toolchain that has the same ABI and is registered twice. This means that only one of 3.9.5 and 3.9.6 would be actually registered and the first one would win in this case. That way whatever the root module has setup, will be the most important.
  • If any toolchain with cp39 ABI is required by any dependency or a submodule, then it should still reasonably work.
  • We then have no issue with documenting and ensuring that only a single @python_3_9 repo exists.

All of this view could be consistent with sysconfig, which returns .get_python_version as X.Y only.

@rickeylev, @chrislovecnm, what do you think about all of it?

aignas added a commit to aignas/rules_python that referenced this issue Jul 29, 2023
Before this PR the `bzlmod` example would only work if the Python
interpreter version is specified in the `X.Y` format. However, users may
desire to use the full version (`X.Y.Z`) for reproducibility reasons and
expect everything to continue to work.

This fixes the versions used everywhere by leveraging functionality
introduced in bazelbuild#1328 and fixes the `python_versions` to always contain
`@python_versions//X.Y:defs.bzl` imports. NOTE, bazelbuild#1328 also fixed the
case where the Python-version specific `pip` hub repos would be named
`<hub_name>_XYZ` as opposed to the `<hub_name>_XY`, which is expected
across the codebase.

Breaking for `bzlmod` users who previously used
`@python_versions//X.Y.Z:defs.bzl` in their BUILD.bazel files and/or
depended on the `@pip_XYZ//:requirements.bzl`.

This essentially limits the users of the `@python_versions` to a single
`X.Y` toolchain version, whilst the full functionality should still be
available by using the `python_version` parameter in the `py_binary` and
`py_library` rules.

Fixes bazelbuild#1339.
aignas added a commit to aignas/rules_python that referenced this issue Aug 5, 2023
Before this PR the `bzlmod` and legacy setups would only expose the
multi-version python toolchain aliases as the string that is passed to
the `python_register_multi_toolchains` function. This meant that if the
user decided to pass the full version (e.g. `3.11.1`) then they had to
import the version aware `py_*` defs via `@foo//3.11.1:defs.bzl`. This
PR changes it such that the user can still do the old way of importing
but we print a deprecation warning and ask the user to use
`3.11:defs.bzl` instead which better reflects how the multi-version
toolchain should be used. This also means that the PRs bumping the minor
versions like in bazelbuild#1352 won't need updating Python code elsewhere.

Whilst at it, we introduce a validation that disallows multiple Python
toolchains of the same `X.Y` version. This is handled in the `bzlmod` by
printing warnings that the toolchains are ignored, whilst the non-bzlmod
users will get validation errors, which is a potentially breaking
change.

Fixes bazelbuild#1339
aignas added a commit to aignas/rules_python that referenced this issue Aug 5, 2023
Before this PR the `bzlmod` and legacy setups would only expose the
multi-version python toolchain aliases as the string that is passed to
the `python_register_multi_toolchains` function. This meant that if the
user decided to pass the full version (e.g. `3.11.1`) then they had to
import the version aware `py_*` defs via `@foo//3.11.1:defs.bzl`. This
PR changes it such that the user can still do the old way of importing
but we print a deprecation warning and ask the user to use
`3.11:defs.bzl` instead which better reflects how the multi-version
toolchain should be used. This also means that the PRs bumping the minor
versions like in bazelbuild#1352 won't need updating Python code elsewhere.

Whilst at it, we introduce a validation that disallows multiple Python
toolchains of the same `X.Y` version. This is handled in the `bzlmod` by
printing warnings that the toolchains are ignored, whilst the non-bzlmod
users will get validation errors, which is a potentially breaking
change.

Fixes bazelbuild#1339
aignas added a commit to aignas/rules_python that referenced this issue Aug 11, 2023
Before this PR the `bzlmod` and legacy setups would only expose the
multi-version python toolchain aliases as the string that is passed to
the `python_register_multi_toolchains` function. This meant that if the
user decided to pass the full version (e.g. `3.11.1`) then they had to
import the version aware `py_*` defs via `@foo//3.11.1:defs.bzl`. This
PR changes it such that the user can still do the old way of importing
but we print a deprecation warning and ask the user to use
`3.11:defs.bzl` instead which better reflects how the multi-version
toolchain should be used. This also means that the PRs bumping the minor
versions like in bazelbuild#1352 won't need updating Python code elsewhere.

Whilst at it, we introduce a validation that disallows multiple Python
toolchains of the same `X.Y` version. This is handled in the `bzlmod` by
printing warnings that the toolchains are ignored, whilst the non-bzlmod
users will get validation errors, which is a potentially breaking
change.

Fixes bazelbuild#1339
@aignas
Copy link
Collaborator Author

aignas commented Aug 12, 2023

So whilst working on #1340 I realized that the desired behaviour might be the following:

Allow the user not using a bzlmod to use one X.Y toolchain (e.g. no 3.9.1 and 3.9.3) per single multi-version toolchain registration. Reason for this is that in pip installations we would should treat them as identical versions because they have the same Python ABI. Requirements are also usually locked per X.Y python version and this assumption that there is only one X.Y version of python in the multi-version setup allows us to ensure that bumping a patch version of the toolchain in the user's repo does not require any other source code changes no matter how they request the toolchain version.

However being able to define the toolchain with the patch version can be desirable because the toolchain sometimes differs in the way it is packaged and that may have regressions and not having this convenience would mean that users will not be able to use toolchains not defined in MINOR_MAPPING without patching rules_python.

Without this assumption we cannot guarantee that a @hub//3.11:defs.bzl will be allways available irrespective whether the user specifies 3.11 or 3.11.4 in their config.

In the bzlmod land there is an additional constraint where the root module decides what version of the toolchain we use and any submodules declaring a particular version of the toolchain will be ignored. Even without explicitly declaring a toolchain, the root module will default particular versions of the toolchain based on the MINOR_MAPPING, so if the submodule asks for 3.11.1, but MINOR_MAPPING defines 3.11.4, then 3.11.4 will bue used for the root and other modules.

This has to be the case because we can only have a single python toolchain hub.

If we don't merge the change proposed in the #1340, we will have the current behaviour where the hub repo structure depends on exactly what is passed to the repository rule. Whilst it is consistent behaviour to some degree, it makes things a little bit hard to maintain and there is a chance to be able to register two toolchains of the same version (3.11 which expands to 3.11.4 and 3.11.4) if I remember the code logic correctly.

TLDR:

  • Users should not have a need to have to patch versions of the same X.Y toolchain at the same time.
  • That means that users should allways import things from the hub repo by specifying X.Y no matter how the toolchain was registered.
  • This means that there can be no two toolchains registered which differ only in the patch version.
  • In bzlmod this means that root module decides which patch versions of the toolchain will be used.
  • Users should be able to define a toolchain with the patch level version to be able to use an older toolchain version if there is a packaging buck in the latest one that is mentioned in the MINOR_MAPPING without waiting for a bugfix release of rules_python.

cc: @rickeylev, let me know if there is a whole in my logic somewhere.

@aignas
Copy link
Collaborator Author

aignas commented Feb 5, 2024

#1696 fixed this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant