Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch Python rules to use toolchains #7375

Open
brandjon opened this Issue Feb 7, 2019 · 17 comments

Comments

Projects
None yet
8 participants
@brandjon
Copy link
Member

brandjon commented Feb 7, 2019

This bug tracks designing, implementing, and migrating built-in Python rules to use the toolchain mechanism for resolving the Python runtime. This replaces the legacy mechanism of specifying --python_top to globally make all Python targets look at a specific py_runtime, or omitting --python_top to use the system default "python" command.

This is prioritized. Design doc to follow.

@brandjon

This comment has been minimized.

Copy link
Member Author

brandjon commented Feb 7, 2019

(Plan is to use this work to also address #4815, rather than implement an ad hoc solution to that bug.)

@brandjon

This comment has been minimized.

Copy link
Member Author

brandjon commented Feb 12, 2019

Design doc here.

Discussion can proceed in this thread. Initial reviewers are @katre (Bazel, Toolchains), @mrovner (Python), and @nlopezgi (Remote Execution).

@katre

This comment has been minimized.

Copy link
Member

katre commented Feb 12, 2019

Overall, this looks good. A few points:

  1. Get rid of the host_python_toolchain, just use the autodetecting version. I don't think the time spent running the autodetection script will really hurt python performance, will it?
  2. In the Backwards Compatbility section, you mention deleting several flags: I do not think flags can be conditionally deleted based on another flag. Ignoring the values should be sufficient, and they can be deleted in a post-migration cleanup.
@brandjon

This comment has been minimized.

Copy link
Member Author

brandjon commented Feb 12, 2019

  1. Ok, I'll remove host_python_toolchain. I don't have a strong opinion about its impact on performance, but it's worth remembering that it will still be superseded by any other applicable toolchain that the user registers.

  2. By "delete" I really mean "fail-fast if the user tries to specify them while the incompatible flag is enabled". This can be implemented by checking the flag value for null, and logging an error in the fragment's reportInvalidOptions() (see example). The true deletion would happen around the time the old code paths and incompatible flag are removed.

@katre

This comment has been minimized.

Copy link
Member

katre commented Feb 12, 2019

I think the distinction between "delete" and "ignore with a warning" is worth calling out.

@brandjon

This comment has been minimized.

Copy link
Member Author

brandjon commented Feb 12, 2019

(See also Nicolas's comments here.)

@brandjon

This comment has been minimized.

Copy link
Member Author

brandjon commented Feb 12, 2019

Replying to @nlopezgi's concern about toolchains and extensibility here:

I think you need something similar as there are rules that execute the native py_binary (e.g., rules_docker does here: https://github.com/bazelbuild/rules_docker/blob/master/python/image.bzl#L84) and there might be something special that these type of users need to do to work with the new py toolchains properly.

Users who operate at the level of BUILD targets and macros only need to worry about having definitions for their platforms and python toolchains, i.e. the py_runtime_pair target and an associated toolchain target referencing it, with appropriate constraints. Whatever the build graph looks like, it should Just Work.

Users who are writing new rules that interoperate with the Python rules may need to manipulate Python toolchains. To receive a Python toolchain, they should declare @bazel_tools//tools/python:toolchain_type as a required toolchain type in their rule and receive it with ctx.toolchains[<same label>]. This would give them back the ToolchainInfo with the schema described in the doc. This would be necessary for instance to write a replica of py_binary in Starlark. Conversely, users can write a replacement for py_runtime or py_runtime_pair by simply returning an appropriate PyRuntimeInfo / ToolchainInfo provider.

I didn't go into too much detail about how to write rules that produce or consume Python toolchains (aside from a description of the provider schema) because this is already covered by the general toolchain documentation linked in the doc's abstract.

@dslomov

This comment has been minimized.

Copy link
Contributor

dslomov commented Feb 15, 2019

(please do not assign issues to more than one team-*)

bazel-io pushed a commit that referenced this issue Feb 15, 2019

Refactor some py_runtime code
Moved its configured target factory, provider, and test to the ...lib/rules/python directory instead of ...lib/bazel/rules/python. Simplified the test. Also moved BazelPythonConfigurationTest into the bazel dir.

The `files` attribute of py_runtime is no longer mandatory. Since a non-empty value is not permitted when `interpreter_path` is used, it makes more sense to omit it rather than explicitly set it to empty.

The error messages and rule logic are simplified. The provider now has an invariant that either interpreter is null or else interpreterPath and files are null; this will be clarified in a follow-up CL that exposes PyRuntimeProvider to Starlark.

Work toward #7375.

PiperOrigin-RevId: 234174755
@JIoJIaJIu

This comment has been minimized.

Copy link

JIoJIaJIu commented Feb 21, 2019

Let me add my 5 coins as an end user
I would like to see similar naming convention and signatures for toolchain types as is done for all other toolchains

py_runtime_pair -> py_toolchain

Also related the section

Why not split Python 2 and Python 3 into two separate toolchain types?
The naming convention for toolchain types is to literally name the target "toolchain_type",

Based on jdk toolchain implementation it doesn't look like a problem - tools/jdk/BUILD
Also

@bazel_tools//tools/python2:toolchain_type
@bazel_tools//tools/python3:toolchain_type

doesn't break the convention but helps avoids problems with a missed py_runtime attribute in the pair and provide an opportunity to force version as

--python3_toolchain .../python:remote3.7.15 --python2_toolchain ../python:remote2.7.13

Also it allows to rid of python_version attribute from py_binary/py_test rules and select runtime based on provided toolchain type

@brandjon

This comment has been minimized.

Copy link
Member Author

brandjon commented Feb 21, 2019

Hi Guro, thanks for the comments. There's a few reasons I didn't organize it that way.

py_runtime_pair -> py_toolchain

It's not clear what the ideal way to deal with multiple Python runtimes will be long-term. Currently the rules are organized around this PY2/PY3 distinction, but in the future that will become less important and there may be more emphasis on choosing minor versions. Building the current schema into the rule name gives us a little more room to redesign it in the future without extra breaking changes.

Why not split Python 2 and Python 3 into two separate toolchain types?

Based on jdk toolchain implementation it doesn't look like a problem

Again, future-proofing is part of the reason. It's easier to migrate the schema of a ToolchainInfo to use some other layout rather than py2_runtime = ..., py3_runtime = ..., and even to migrate a toolchain rule to have different attributes, than it is to migrate Python rules to accept different toolchain types.

For platforms that do not provide an implementation for one of these toolchains, we'd still have to define a dummy toolchain that fails at analysis time, since even if a target does not use the toolchain its rule type will still require both toolchain types.

Also, creating .../python[2|3]:toolchain_type implies the creation of two additional directories and BUILD files whose only purpose is to provide naming disambiguation for the toolchain types. I'd sooner just break convention and name them .../python:py[2|3]_toolchain_type.

provide an opportunity to force version

I don't believe there's anything in the toolchain framework that would allow you to directly force the toolchain implementation to use for a given toolchain type, as opposed to just letting Bazel select one based on constraints, though I could be wrong.

Also it allows to rid of python_version attribute from py_binary/py_test rules and select runtime based on provided toolchain type

I checked with @katre before starting the design doc, and it turns out that currently toolchains can really only differentiate themselves based on platform constraints, not other rule logic constraints. You can't easily express that a given py_binary has a constraint of "I want Python 2".

@mrovner

This comment has been minimized.

Copy link

mrovner commented Feb 21, 2019

Looks good to me.

@nlopezgi

This comment has been minimized.

Copy link
Member

nlopezgi commented Feb 21, 2019

Thanks for addressing my question about extending python rules. It would be ideal to get this info as part of user docs whenever the right time for that comes.

Looks good to me.

@Globegitter

This comment has been minimized.

Copy link

Globegitter commented Feb 21, 2019

@brandjon is it really not possible to express a version constraint for e.g. a py_binary using toolchains? There is an example here: https://docs.bazel.build/versions/master/platforms.html#defining-constraints-and-platforms talking about defining Version as a constraint. That could not be used?

This is sidetracking a bit but I am also curios about this as I was planning to implement to add a constraint for the nodejs toolchains so it would be possible to use different versions of nodejs in the same workspace and one could choose on nodejs_binary level.

@brandjon

This comment has been minimized.

Copy link
Member Author

brandjon commented Feb 21, 2019

In that example the platform either supports one version or the other, and targets build/run on whichever platform has the version they require. Here the same platform is generally expected to provide both Python versions. Currently all values for a constraint setting are considered mutually exclusive.

I think the bigger issue is that Python version is a property of the target, not the platform. @katre could comment more.

bazel-io pushed a commit that referenced this issue Feb 22, 2019

Expose PyRuntimeProvider to Starlark as PyRuntimeInfo
This creates a top-level symbol `PyRuntimeInfo` representing the provider created by the `py_runtime` rule and consumed by `py_binary` and `py_test`.

PyRuntimeProvider is refactored to not implement TransitiveInfoProvider, and to now be split into separate classes for the provider instance vs provider type. Fakes are created for Stardoc and friends.

Tests are added for the API of `PyRuntimeInfo` and for the ability to sandwich a Starlark rule between a py_runtime and its consuming py_binary.

As drive-by cleanups, the provider is now instantiated in Java code via factory methods instead of its constructor, and the isHermetic method is renamed to isInBuild to be consistent with the terminology in the design doc for #7375. Rule documentation is also updated.

Work toward #7375.

RELNOTES: (Python rules) PyRuntimeInfo is exposed to Starlark, making it possible for Starlark rules to depend on or imitate `py_runtime`. The `files` attribute of `py_runtime` is no longer mandatory.
PiperOrigin-RevId: 235224774

bazel-io pushed a commit that referenced this issue Feb 22, 2019

Refactor py_runtime rule class definition
Rename BazelPyRuntimeRule to BazelPyRuntimeRule, and moved it to lib/rules/python instead of lib/bazel/rules/python. Improved documentation, and eliminated an unneeded declaration on configuration fragments it does not actually use.

Work toward #7375.

RELNOTES: None
PiperOrigin-RevId: 235237085
@Globegitter

This comment has been minimized.

Copy link

Globegitter commented Feb 22, 2019

Ah yeah I see, so I can say a platform is compatible with a certain version, but I can not say that a platform is compatible with multiple versions and then just choose the version on a per-target level. It would be nice to hear more details about how to make such a use-case possible in a generic way from @katre. As I imagine this could come up again and again. I am also happy to pull this in a separate issue or bazel-discuss as this is just tangentially related to this issue.

I also do wonder how rules_go manages to allow the setting toolchain attributes on a per-target basis, e.g. I can set go_binary(os="windows",...) and then that will compile for windows.

@katre

This comment has been minimized.

Copy link
Member

katre commented Feb 25, 2019

We don't yet have a good story for handling constraints on specific toolchain versions. I know that rules_scala also has problems with this, but I am not sure what the best UX would be (ie, what is the easiest way for users to express the idea that "this target needs nodejs X.Y to build").

I don't think there's an existing issue in Github to track this, so feel free to open one and make any suggestions as to how you think it should be handled.

irengrig added a commit to irengrig/bazel that referenced this issue Feb 26, 2019

Refactor some py_runtime code
Moved its configured target factory, provider, and test to the ...lib/rules/python directory instead of ...lib/bazel/rules/python. Simplified the test. Also moved BazelPythonConfigurationTest into the bazel dir.

The `files` attribute of py_runtime is no longer mandatory. Since a non-empty value is not permitted when `interpreter_path` is used, it makes more sense to omit it rather than explicitly set it to empty.

The error messages and rule logic are simplified. The provider now has an invariant that either interpreter is null or else interpreterPath and files are null; this will be clarified in a follow-up CL that exposes PyRuntimeProvider to Starlark.

Work toward bazelbuild#7375.

PiperOrigin-RevId: 234174755

irengrig added a commit to irengrig/bazel that referenced this issue Feb 26, 2019

Expose PyRuntimeProvider to Starlark as PyRuntimeInfo
This creates a top-level symbol `PyRuntimeInfo` representing the provider created by the `py_runtime` rule and consumed by `py_binary` and `py_test`.

PyRuntimeProvider is refactored to not implement TransitiveInfoProvider, and to now be split into separate classes for the provider instance vs provider type. Fakes are created for Stardoc and friends.

Tests are added for the API of `PyRuntimeInfo` and for the ability to sandwich a Starlark rule between a py_runtime and its consuming py_binary.

As drive-by cleanups, the provider is now instantiated in Java code via factory methods instead of its constructor, and the isHermetic method is renamed to isInBuild to be consistent with the terminology in the design doc for bazelbuild#7375. Rule documentation is also updated.

Work toward bazelbuild#7375.

RELNOTES: (Python rules) PyRuntimeInfo is exposed to Starlark, making it possible for Starlark rules to depend on or imitate `py_runtime`. The `files` attribute of `py_runtime` is no longer mandatory.
PiperOrigin-RevId: 235224774

irengrig added a commit to irengrig/bazel that referenced this issue Feb 26, 2019

Refactor py_runtime rule class definition
Rename BazelPyRuntimeRule to BazelPyRuntimeRule, and moved it to lib/rules/python instead of lib/bazel/rules/python. Improved documentation, and eliminated an unneeded declaration on configuration fragments it does not actually use.

Work toward bazelbuild#7375.

RELNOTES: None
PiperOrigin-RevId: 235237085

bazel-io pushed a commit that referenced this issue Feb 26, 2019

Add a python_version attribute to py_runtime and its associated provider
This is used to declare whether a runtime is for Python 2 or Python 3. When we move to toolchains, the runtime's declared version will be validated against the py_binary's version. For compatibility, validation will not occur for the old --python_top way of specifying runtimes.

Initially the attribute is optional on the py_runtime rule and mandatory on the PyRuntimeInfo provider. At some point we'll add an incompatible change to make the attribute mandatory on the rule too. In the meantime it defaults to a version in the same way as py_binary, i.e. with --incompatible_py3_is_default.

Work toward #7375.

RELNOTES: `py_runtime` gains a `python_version` attribute for specifying whether it represents a Python 2 or 3 interpreter.
PiperOrigin-RevId: 235774497

bazel-io pushed a commit that referenced this issue Feb 26, 2019

Add Python toolchain definitions to @bazel_tools
This adds the toolchain type `@bazel_tools//tools/python:toolchain_type`, which is the toolchain type of Python runtimes. It also adds a toolchain rule for this type, `py_runtime_pair`, loadable from `@bazel_tools//tools/python:toolchain.bzl`.

Two constraint settings are also added, `@bazel_tools//tools/python:py[2|3]_interpreter_path`. These will be used to indicate that a platform has a Python interpreter located at a specific path.

Work toward #7375.

RELNOTES: None
PiperOrigin-RevId: 235785173

bazel-io pushed a commit that referenced this issue Feb 27, 2019

Add unit tests that check what interpreter py_binary invokes
This adds analysis-time test cases that assert on the Python interpreter path that gets emitted to the stub script of a py_binary or py_test target. This will be useful as we start modifying this behavior using Python toolchains (#7375).

Also renamed PyRuntimeTest for consistency with other "ConfiguredTargetTest"s.

Work toward #7375.

RELNOTES: None
PiperOrigin-RevId: 235970443
@brandjon

This comment has been minimized.

Copy link
Member Author

brandjon commented Feb 28, 2019

I'm running into some bootstrapping issues with the new provider definition that I won't have time to address before we cut the baseline for 0.24. So this feature and the associated --incompatible change to preview it will be available in 0.25.

bazel-io pushed a commit that referenced this issue Mar 5, 2019

Make Python rules require the new Python toolchain type
This makes py_binary / py_test require the new Python toolchain type (@bazel_tools//tools/python:toolchain_type). It does *not* make the rules actually use Python runtimes obtained via the toolchain; that happens in a follow-up CL.

A default toolchain is defined and registered automatically in order to prevent analysis of Python rules from failing frivolously. This default is simply a stub that fails at execution time, but a follow-up CL will change it to autodetect the Python interpreter at runtime. Registration of the toolchain occurs via a WORKSPACE suffix file, so it happens automatically for every workspace.

Note that workspace suffixes are disabled in BazelAnalysisMock#createRuleClassProvider, so we need to add it to the analysis mock's boilerplate workspace content. Tests that want to register a specific toolchain (and have it take precedence over the boilerplate) should use --extra_toolchains=... in the configuration, rather than manipulate the WORKSPACE file.

tools/python/BUILD is refactored into tools/python/BUILD.tools so that the newly added toolchain definitions can be bootstrapped with Bazel 0.23, which does not have the required PyRuntimeInfo provider type.

As a drive-by cleanup, the ":py_interpreter" attribute is pushed down from PyBaseRule to PyBinaryBaseRule.

Work toward #7375.

RELNOTES: None
PiperOrigin-RevId: 236897042

bazel-io pushed a commit that referenced this issue Mar 6, 2019

Add flag for getting the Python runtime from the toolchain
This adds --experimental_use_python_toolchains (to be renamed --incompatible...). When enabled, py_binary and py_test will obtain the Python runtime from the resolved Python toolchain, rather than from the legacy mechanism (--python_top / --python_path).

Work toward #7375.

RELNOTES: None
PiperOrigin-RevId: 237131032
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.