Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support bzlmod #870

Merged
merged 3 commits into from
Nov 7, 2022
Merged

Support bzlmod #870

merged 3 commits into from
Nov 7, 2022

Conversation

kormide
Copy link
Contributor

@kormide kormide commented Oct 28, 2022

PR Checklist

Please check if your PR fulfills the following requirements:

  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)

PR Type

What kind of change does this PR introduce?

  • Bugfix
  • Feature (please, look at the "Scope of the project" section in the README.md file)
  • Code style update (formatting, local variables)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • CI related changes
  • Documentation content changes
  • Other... Please describe:

What is the current behavior?

The requirements file is parsed in a python script that generates requirements.bzl and passes information from the file into whl_library called by install_deps within the generated file. In order to declare the wheel repositories in a bzlmod extension, we need to be able to parse that same information in Starlark.

What is the new behavior?

Can parse a requirements file in Starlark which will be uses in a pip_parse bzlmod extension. This in in a similar vein to how rules_js parses a pnpm lockfile in starklark for the npm_translate_lock extension.

Does this PR introduce a breaking change?

  • Yes
  • No

Other information

This isn't currently used, but it's a nice reviewable chunk for the bzlmod work.

@kormide
Copy link
Contributor Author

kormide commented Oct 28, 2022

fyi: @alexeagle, @Wyverald

@kormide kormide force-pushed the requirements-parser branch 2 times, most recently from 28f9f64 to 68f4841 Compare October 28, 2022 01:50
@alexeagle
Copy link
Collaborator

My understanding is that something was added after you wrote the yaml parser, which permits one module extension to see a workspace that was created by another, and so perhaps that means it's possible to fetch some tool (like yq) and then execute it from a module extension.

Perhaps that means we could use an existing requirements.txt parser like from pip or poetry that's likely to handle all the edge cases correctly. I doubt it, but might be worth a bit of poking.

Copy link
Collaborator

@alexeagle alexeagle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe @hrfuller can help with reviews as well, he has a good memory and wrote the pip_parse stuff.

env = unittest.begin(ctx)

asserts.equals(env, [
("certifi", "certifi==2021.10.8 --hash=sha256:78884e7c1d4b00ce3cea67b44566851c4343c120abd683433ce934a68ea58872 --hash=sha256:d62a0163eb4c2344ac042ab2bdf75399a71a2d8c7d47eac2e2ee91b9d6339569"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you only need the name for the module extension implementation, and not the hashes? I thought you'd need all the data required to produce a requirements.bzl file, like the hashes for downloading the packages.

OH I remember - the pip_parse rule doesn't actually know how to download anything, it will still end up calling pip install on individual packages which are required by the build...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is the same requirement info that gets passed to whl_library.

@kormide
Copy link
Contributor Author

kormide commented Oct 28, 2022

My understanding is that something was added after you wrote the yaml parser, which permits one module extension to see a workspace that was created by another, and so perhaps that means it's possible to fetch some tool (like yq) and then execute it from a module extension.

Perhaps that means we could use an existing requirements.txt parser like from pip or poetry that's likely to handle all the edge cases correctly. I doubt it, but might be worth a bit of poking.

Yeah originally I wanted to call pip_parse in the module extension but have it output an extra JSON file containing the result of parsing requirements that I could easily load in starlark. But where I got stuck was finding some way to load that file from the pip_parse repo within the extension. I'm not sure that's possible. But maybe I could call module_ctx.execute to execute a python script that uses the same parsing library? Can a module extension create and own a file like that?

@Wyverald
Copy link
Member

Yeah originally I wanted to call pip_parse in the module extension but have it output an extra JSON file containing the result of parsing requirements that I could easily load in starlark.

That's not directly possible (you can't declare a repo in an extension and immediately read from that repo), but you can use the trick mentioned by Alex: have an extension declare a pip_parse repo, use_repo that repo in your module, and then load from it in another extension.

But maybe I could call module_ctx.execute to execute a python script that uses the same parsing library? Can a module extension create and own a file like that?

Alternatively, this works too. A module extension can create files, but these files are not addressable after the extension impl function finishes.

@kormide
Copy link
Contributor Author

kormide commented Oct 28, 2022

Yeah originally I wanted to call pip_parse in the module extension but have it output an extra JSON file containing the result of parsing requirements that I could easily load in starlark.

That's not directly possible (you can't declare a repo in an extension and immediately read from that repo), but you can use the trick mentioned by Alex: have an extension declare a pip_parse repo, use_repo that repo in your module, and then load from it in another extension.

But maybe I could call module_ctx.execute to execute a python script that uses the same parsing library? Can a module extension create and own a file like that?

Alternatively, this works too. A module extension can create files, but these files are not addressable after the extension impl function finishes.

Thanks for the clarification. That's interesting that a module context can create temporary files. I think one extension is a better developer experience than two, so I'll look into that approach and see where it leads me.

Copy link
Contributor

@hrfuller hrfuller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool use of a state machine, thanks for working on this.

python/pip_install/private/requirements_parser.bzl Outdated Show resolved Hide resolved
@groodt
Copy link
Collaborator

groodt commented Oct 28, 2022

Looks good. I've got a different proposal to consider and wondered what you might think.

  1. Update rules_python to consume a pip "installation report" instead of a transitively locked requirements.txt https://pip.pypa.io/en/stable/reference/installation-report/
  2. Parse the locked json report with https://bazel.build/rules/lib/json#decode

The benefits of the json installation report has a few benefits:

  • Easier to parse
  • It has the direct urls
  • It captures the environment (sys info) of the locking platform
  • While it isn't presently an agreed "pip lockfile", it is easier to parse and has a clearer spec than a requirements.txt file.

The direct url info in particular will be helpful in future, because it would enable us to parse the format into .bzl http_file or http_archive for retrieval in repository rules using the bazel machinery without having to invoke pip.

@Wyverald
Copy link
Member

The direct url info in particular will be helpful in future, because it would enable us to parse the format into .bzl http_file or http_archive for retrieval in repository rules using the bazel machinery without having to invoke pip.

This might be built into Bzlmod in the future, without you guys having to do anything. With the planned lockfiles feature, Bzlmod will store the results of module extension resolution in the lockfile; said result is actually just the names and definitions of the repos that the extension generated.

@groodt
Copy link
Collaborator

groodt commented Oct 29, 2022

The direct url info in particular will be helpful in future, because it would enable us to parse the format into .bzl http_file or http_archive for retrieval in repository rules using the bazel machinery without having to invoke pip.

This might be built into Bzlmod in the future, without you guys having to do anything. With the planned lockfiles feature, Bzlmod will store the results of module extension resolution in the lockfile; said result is actually just the names and definitions of the repos that the extension generated.

That would be something different to what I mention above. That would be for locking the bazel rules dependency graph, not the dependency graph(s) of resolved python packages from external sources. There is no mechanism for bzlmod to do that presently because we aren’t capturing it presently. Hence my line of questioning.

@alexeagle
Copy link
Collaborator

I don't think it's ever a good idea for Bazel to manage the lockfile when a language-idiomatic package manager manages the dependencies and constraints. Then you get a different result outside of Bazel and the language tooling gets confused.

Anyhow this PR is just to make a bzlmod module extension for the existing repository rules, not to redesign them.

@kormide
Copy link
Contributor Author

kormide commented Oct 29, 2022

I'm going to hold off on this for now and see if I can just execute a python script in the module extension to do the parsing (using the same library that pip_parse uses).

@fmeum
Copy link
Contributor

fmeum commented Oct 31, 2022

Thanks for the clarification. That's interesting that a module context can create temporary files. I think one extension is a better developer experience than two, so I'll look into that approach and see where it leads me.

@kormide Note that even though rules_python would have to define two module extensions, end users would not need to interact with the one that provides the tools (such as yq) for the other.

@kormide
Copy link
Contributor Author

kormide commented Oct 31, 2022

Thanks for the clarification. That's interesting that a module context can create temporary files. I think one extension is a better developer experience than two, so I'll look into that approach and see where it leads me.

@kormide Note that even though rules_python would have to define two module extensions, end users would not need to interact with the one that provides the tools (such as yq) for the other.

That's good to know. I had ruled that out because I thought that users would have to declare both extensions.

I'm going to put up a solution with a single module extension first because I'm close to getting that working, and if it feels like there's too much duplication I can go back to the two extension approach.

@kormide
Copy link
Contributor Author

kormide commented Nov 1, 2022

I repurposed this PR to include the full bzlmod solution.

Commit 1: requirements starlark parser (added support for parsing pip options)
Commit 2: bzlmod extension
Commit 3: bzlmod release support (workspace snippet, BCR app config)

I ended up going with the starlark parsing of requirements as an initial solution here. The alternative is to add an additional module extension that parses the requirements file and outputs something like JSON that starlark can read. In the end, (I think) we need to have the whl_library declarations inside of the pip_parse extension, hence the need for either a parser or another extension to load the wheel arguments.

Comment on lines 3 to 14
"maintainers": [
{
"name": "Richard Levasseur",
"email": "rlevasseur@google.com",
"github": "rickeylev"
},
{
"name": "Greg Roodt",
"email": "groodt@gmail.com",
"github": "groodt"
}
],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a template for an entry that will appear the Bazel Central Registry (example).

I wasn't sure who the main maintainers of rules_python are. @rickeylev @groodt are you okay with being listed here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kormide you can also add me.

@kormide kormide changed the title Create a requirements file parser in stalark for bzlmod Support bzlmod Nov 1, 2022
@kormide kormide force-pushed the requirements-parser branch 2 times, most recently from 12367fd to c91bb3c Compare November 1, 2022 19:16
@kormide kormide removed the request for review from brandjon November 1, 2022 19:16
@kormide kormide requested review from alexeagle and removed request for lberki and thundergolfer November 1, 2022 19:16
Comment on lines 161 to +163
def requirement(name):
if _bzlmod:
return "@@{repo}//:" + _clean_name(name) + "_{py_library_label}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This points calls to xyz_requirement(...) to the aliases inside the pip_parse repo, so that we don't need to declare a bunch of use_repos in MODULE.bazel.

@kormide
Copy link
Contributor Author

kormide commented Nov 1, 2022

Just realized that I still need to support registering the python toolchain via bzlmod. The tests are failing on CI because the bzlmod example doesn't use a consistent python version on all platforms (I think it's using whatever the executor has installed), and it's outputting the python version to the vendored requirements file, causing a diff_test to fail.

@kormide
Copy link
Contributor Author

kormide commented Nov 1, 2022

Added a module extension for registering a python toolchain.


python = use_extension("@rules_python//python:extensions.bzl", "python")

python.toolchain(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does every root module have to manually register a toolchain or is the host Python used if nothing else is specified?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's optional and will use the host if not specified. I added a comment indicating that in the release workspace_snippet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of this example, I had to use it to fix the python version because the generated requirements_lock.txt file stamps the python version in a comment, and on CI the different platforms had different versions installed, so the diff_test was failing.

@kormide
Copy link
Contributor Author

kormide commented Nov 2, 2022

This PR needs another review since adding the full bzlmod solution, but it's not clear to me who I should ping for that.

@f0rmiga
Copy link
Collaborator

f0rmiga commented Nov 3, 2022

@alexeagle is probably the best person to do the final review.

@kormide kormide requested review from alexeagle and removed request for hrfuller November 3, 2022 05:30
Copy link
Collaborator

@alexeagle alexeagle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! sorry it took me a while to get to this

.bcr/config.yml Show resolved Hide resolved
.github/workflows/workspace_snippet.sh Outdated Show resolved Hide resolved
.github/workflows/workspace_snippet.sh Show resolved Hide resolved
.github/workflows/workspace_snippet.sh Outdated Show resolved Hide resolved
"pypi__click",
"pypi__colorama",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for later: it's gross how this repository rule depends on packages from pypi, it should be self-contained and require only the python interpreter like what we did in rules_js

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants