Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crate_universe rule #598

Merged
merged 7 commits into from
Mar 25, 2021
Merged

Conversation

illicitonion
Copy link
Collaborator

This rule dynamically looks up crate dependencies as a repository rule.

See #2 for context.

Copy link
Collaborator

@UebelAndre UebelAndre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super excited about this rule! Some comments for you though 😅


Some areas have unit testing, there are a few (very brittle) integration tests, and some examples.

## How to test local changes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this all be put in a script?

I made https://github.com/google/cargo-raze/blob/master/tools/bootstrap.sh and it works quite nicely IMO. I'd imagine you could do something similar where you write a small shell script to build the resolver using cargo and use an environment variable to to toggle whether or not a bootstrapped binary is used. Having to specify the path I think is unnecessarily cumbersome and would like to tighten the iteration loop for developers working on this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will depend on how we decide to approach bootstrapping, so I'm hesitant to automate things before we know what they'll look like

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few options I can immediately think of for solving this

  1. Commit pre-built binaries to the repo
    This is my favorite option because it means if a user has already downloaded rules_rust then they've already got all the requirements to fetch dependencies. This would also likely setup a nice developer workflow for recompiling the binary for their host and replacing it for testing. I think there may be something we can do with Github Actions to build the binaries and commit them back to the repo, I can look into that if there's interest.

  2. Build artifacts that are pinned in the repository and downloaded
    I think this is the expected solution, it requires something to be hosting (and likely building) the files which is probably quite doable with the current CI but would also probably mean a release cadence is adopted.

  3. Setup some boostrapping mechanism that builds the target on the host before it's needed... somehow
    This would likely require some magic with tools/bazel which sounds not ideal (awful, in-fact), but might be a path forward.

Copy link
Collaborator

@UebelAndre UebelAndre Mar 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having dug into this a bit more, I feel option 1 is likely the most effective thing to do. rules_python has done something similar with .pem files (@rules_python//tools:piptool.par, @rules_python//tools:whltool.par) and have added notes to their pull requests and contributing documentation with some guidelines for testing this. Perhaps we could write a rule to test, byte-for-byte that a compiled version of the binaries that are checked in matches what would otherwise get built? There may also be a way to have a github action check for changes to the resolver and create a unique commit and merge it back. Though this then raises the question of whether or not the rules support cross compilation. If not then a github action might be the appropriate course of action.

Would love to also get thoughts from @hlopko

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry this took me this long.

I don't have any great ideas how to solve bootstrapping :( If I had to get inspiration from the ecosystem, I'd look at and talk with folks at https://github.com/bazelbuild/bazel-gazelle.

attrs = {
"packages": attr.string_list(allow_empty = True),
"cargo_toml_files": attr.label_list(),
"overrides": attr.string_dict(doc = """Mapping of crate name to crate.override(...) entries)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't going to render well when generated in docs, could instead a lot of the complex descriptions go in the top level doc attribute?

Additional nits:

  • Begin docstrings like this with """\ and have content start on the next line
  • Add doc field for all attributes
  • Add doc field for rule

cargo/workspace.bzl Outdated Show resolved Hide resolved
"_resolver_script_linux": attr.label(
default = "@crate_universe_resolver_linux//file:downloaded",
executable = True,
cfg = "host",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these should be exec

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repository rules run on the host machine where bazel is running, not on an exec platform

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, there is this remotable attribute (https://docs.bazel.build/versions/master/skylark/lib/globals.html#repository_rule) that tells the repository rule to execute on the remote execution worker. It's useful for detecting toolchains and configuring repos for the exec platform.

"""),
"repository_template": attr.string(),
"supported_targets": attr.string_list(allow_empty = False),
"lockfile": attr.label(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it's written now, this is misleading and should either be renamed or have a doc explaining that it's not actually a Cargo.lock file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a better name? Lockfile is the general term of art for what this is, and its name contrasts with "cargo_toml_files" - I'd name it "cargo_lock_file" were it a Cargo.lock...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's two much mixed functionality setting different expectations in this initial approach. My primary use case is to be able to take a Cargo.toml file and a Cargo.lock file and generate all the same dependencies Cargo would. But that doesn't seem to be the case here, unless I missed something.

I also struggle to think of why someone would mix and match cargo_toml_files and packages. It seems that if you're already using Cargo.toml files, it's not an unfamiliar process to go update or make a new one to get more dependencies. Only in a case where I have a pure Bazel repo would I directly add packages. I suspect having both means you can no longer trust the Cargo.lock file because you're constantly dealing with a modified Cargo.toml file internally.

(Controversial statement inbound 😅) I think lockfile is fine, but I'd say instead of allow_single_file = True, it'd be allow_single_file = [".lock", ".bzl"] and the use of Cargo.toml files be restricted from mixing with directly named packages so it's clear where what type of file should be used (unless you want to have multiple attributes but I think it's fine to have a single lockfile attribute).

features = features if features else [],
)

def _override(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a docstring for this?

local = True,
)

def crate_universe(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now that this is a wrapper for the rule. The bulk of the documentation should likely go on this macro and doesn't necessarily need to be on the rule (though doc files should still have brief descriptions imo)

if lockfile_format_version_line != "# rules_rust crate_universe file format 1\n" {
return Err(anyhow!("Unrecognized lockfile format"));
}
if let Some(lockfile_hash) = lockfile_hash_line.strip_prefix("# config hash ") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My biggest gripe with the current implementation is the "lockfiles" and the need for this hash. If the rule was not marked as local then it wouldn't trigger every build and could rely on just the Cargo.toml tree and Cargo.lock files. Typically (at least in my experience with cargo-raze) regenerating BUILD files is not terribly expensive and is pretty quick. I think it'd be a lot simpler and effective to allow Bazel to run the rule when it believes it should be (not using local) and allow users to bazel sync when they need new dependencies. Typically though, if the Cargo.lock file is a field on the rule then all they would need to do is update that file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My biggest gripe with the current implementation is the "lockfiles" and the need for this hash.

If the rule was not marked as local then it wouldn't trigger every build and could rely on just the Cargo.toml tree and Cargo.lock files. Typically (at least in my experience with cargo-raze) regenerating BUILD files is not terribly expensive and is pretty quick.

I have regularly seen resolves take double-digit seconds, which is far too much time to do every time you git pull a new change in.

I think it'd be a lot simpler and effective to allow Bazel to run the rule when it believes it should be (not using local) and allow users to bazel sync when they need new dependencies. Typically though, if the Cargo.lock file is a field on the rule then all they would need to do is update that file.

I agree with removing the local, but a few really important UX points when using a lockfile are:

  1. If I git pull and do a build, I get the same versions of dependencies as the person who committed without needing to do a resolve.
  2. If I change any of the input to the rule, I don't silently get the old versions.
  3. There's an easy way for me to say "I want new versions, subject to the same version constraints".

I think making the lockfile actually be a Cargo.lock file would be fine if it contains all of the information needed to guarantee the above properties, and would be happy to review (or work on) a patch to make that happen. But I'd rather not block on that re-work, if we can avoid it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I git pull and do a build, I get the same versions of dependencies as the person who committed without needing to do a resolve.

This sounds like magic to me. You'd have to be able to trust the "lockfile" that's there and simply not do the resolve. But you can't know that the lockfile is up to date without first resolving it yourself. Right?

If I change any of the input to the rule, I don't silently get the old versions.

I don't see how this would be the case if all inputs were tracked. But this also assumes the rule would resolve each time to ensure all dependencies were tracked

There's an easy way for me to say "I want new versions, subject to the same version constraints".

Is this analogous to cargo update?

I have regularly seen resolves take double-digit seconds, which is far too much time to do every time you git pull a new change in.

I think because the rule was marked as local and doesn't return any repository info (see http.bzl) it'd always retrigger.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I git pull and do a build, I get the same versions of dependencies as the person who committed without needing to do a resolve.

This sounds like magic to me. You'd have to be able to trust the "lockfile" that's there and simply not do the resolve. But you can't know that the lockfile is up to date without first resolving it yourself. Right?

This is what the rules offer today, if you use the current lockfile mechanism. We hash every input into the resolution process (including the resolver binary itself, the cargo binary, and all arguments to everything), and check whether they're up-to-date.

There's an easy way for me to say "I want new versions, subject to the same version constraints".

Is this analogous to cargo update?

Yes

I have regularly seen resolves take double-digit seconds, which is far too much time to do every time you git pull a new change in.

I think because the rule was marked as local and doesn't return any repository info (see http.bzl) it'd always retrigger.

I don't mean "I see it doing double-digit second resolves every build", I mean "A single resolve can take double-digit seconds" - I'm not saying "it triggers too often", I'm saying "we should minimise triggering, because any single trigger can be very expensive", but I am totally cool with achieving this by fixing up how the rule runs (e.g. by properly setting up params). I do, however, think the "If I git pull, and am using a lockfile, I don't need to pay for a resolve" is really important.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mean "I see it doing double-digit second resolves every build", I mean "A single resolve can take double-digit seconds" - I'm not saying "it triggers too often", I'm saying "we should minimise triggering, because any single trigger can be very expensive", but I am totally cool with achieving this by fixing up how the rule runs (e.g. by properly setting up params). I do, however, think the "If I git pull, and am using a lockfile, I don't need to pay for a resolve" is really important.

I 100% agree. The thing I care most about is being able to take an existing Cargo files and slot them into this Rule as the source of truth without requiring anything additional to be generated. Though, I like the idea that users can specify a hash value to indicate whether or not resolution needs to be done (if that's what you mean by)

This is what the rules offer today, if you use the current lockfile mechanism. We hash every input into the resolution process (including the resolver binary itself, the cargo binary, and all arguments to everything), and check whether they're up-to-date.

@@ -0,0 +1,251 @@
DEFAULT_REPOSITORY_TEMPLATE = "https://crates.io/api/v1/crates/{crate}/{version}/download"
Copy link
Collaborator

@UebelAndre UebelAndre Feb 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this, repositories_bin.bzl and repositories.bzl go in cargo/cargo_universe? I'd also imagined the resolver to be in a sub-directory there

.
└── crate_universe
    ├── repositories.bzl
    ├── resolver
    │   ├── Cargo.lock
    │   ├── Cargo.toml
    │   ├── src
    │   └── test
    └── workspace.bzl

something to this effect.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I feel this shouldn't live in cargo anymore as it's not tied directly to cargo (at least on the user's end). Should be it's own top level package.

@@ -8,6 +8,11 @@ load("@bazel_tools//tools/build_defs/repo:utils.bzl", "maybe")
load("@build_bazel_rules_nodejs//:index.bzl", "node_repositories")
load("@rules_proto//proto:repositories.bzl", "rules_proto_dependencies", "rules_proto_toolchains")

load("@crate_universe_basic_rust_deps//:defs.bzl", basic_pinned_rust_install = "pinned_rust_install")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the cargo_universe be it's own workspace that's .bazelignore'd by the top level examples workspace? I think it'd make the examples clearer for users, though don't feel too strongly about this one, up to you.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I struggled with this - I think it's not super clear now, but also, a lot of effort has gone into making both @rules_rust//examples/... and @examples/... work, and it felt like adding a third repo makes this harder still... That said, I'm happy to re-work however folks think is clearest

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing I care most about is having the repository definitions colocated with the examples. It's not immediately clear that the definitions are in examples_deps.bzl.

Bazel's ability (or inability) to handle transitive workspaces definitely makes me sad, but I think it's important to isolate examples as much as possible. I find they become less valuable the more they become a dropbox of random code. Not saying that's the case now, just my fear that things escalate to that point.

Copy link
Collaborator

@UebelAndre UebelAndre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think because there's nothing indicating these rules are "experimental" or subject to breaking changes, It's important to make sure the initial rollout is done after a larger review. I don't think this is something that can be a "merge it now, change it later" sort of thing because users will likely adopt all ways the rule can be used and from that point on, any alterations to the design would break someone.

@hlopko
Copy link
Member

hlopko commented Feb 24, 2021

(I just want to apologize for not reviewing yet, it would help me if this could wait till the next week?)

@illicitonion
Copy link
Collaborator Author

(I just want to apologize for not reviewing yet, it would help me if this could wait till the next week?)

Absolutely, no rush at all - I know there's a lot here! :)

@illicitonion
Copy link
Collaborator Author

I think because there's nothing indicating these rules are "experimental" or subject to breaking changes, It's important to make sure the initial rollout is done after a larger review. I don't think this is something that can be a "merge it now, change it later" sort of thing because users will likely adopt all ways the rule can be used and from that point on, any alterations to the design would break someone.

I would rather solve the problem of socialising that this is experimental, so that we can get some real-world feedback on how things are working, rather than try to solve (particularly lockfile-related) issues up-front. There are a lot of unknowns around usage (e..g you raised issues of when would folks want to mix Cargo.toml and package definitions), and I think we could make better decisions based on multiple people trying things out in the wild.

That said, we could just remove the lockfile support for now if we really dislike it (I quite dislike it, it definitely needs work), but it's currently useful, and it's built with changeability in mind (e.g. it has a version number so we could conditionally handle different formats differently).

@UebelAndre
Copy link
Collaborator

I would rather solve the problem of socialising that this is experimental, so that we can get some real-world feedback on how things are working, rather than try to solve (particularly lockfile-related) issues up-front. There are a lot of unknowns around usage (e..g you raised issues of when would folks want to mix Cargo.toml and package definitions), and I think we could make better decisions based on multiple people trying things out in the wild.

That sounds good to me, though I would still say the rules first roleout should accept Cargo.lock files if it's going to accept Cargo.toml files. This is probably the thing I care about most. I'd like to be able to setup a Bazel WORKSPACE in a previously pure Cargo environment and allow for the Cargo files to be the source of truth until everything can be migrated over. Though you say this is supported? Can you add an example that does exactly, this? It doesn't seem like any example is doing this.

@illicitonion
Copy link
Collaborator Author

I would rather solve the problem of socialising that this is experimental, so that we can get some real-world feedback on how things are working, rather than try to solve (particularly lockfile-related) issues up-front. There are a lot of unknowns around usage (e..g you raised issues of when would folks want to mix Cargo.toml and package definitions), and I think we could make better decisions based on multiple people trying things out in the wild.

That sounds good to me, though I would still say the rules first roleout should accept Cargo.lock files if it's going to accept Cargo.toml files. This is probably the thing I care about most. I'd like to be able to setup a Bazel WORKSPACE in a previously pure Cargo environment and allow for the Cargo files to be the source of truth until everything can be migrated over. Though you say this is supported? Can you add an example that does exactly, this? It doesn't seem like any example is doing this.

You can't just point at a Cargo.lock file right now, the reason being that if you're specifying any of the attributes to the rule other than cargo_toml_files (or more than one value to cargo_toml_files), the Cargo.lock file that corresponds with an existing Cargo.toml file won't be correct, as it will be missing information. I guess we could provide a special-cased way to say "If your rule specifies exactly one cargo_toml_files and nothing else, you can specify a corresponding Cargo.lock and we'll use it", but it would be nice to solve the more general problem in our approach too.

@UebelAndre
Copy link
Collaborator

You can't just point at a Cargo.lock file right now, the reason being that if you're specifying any of the attributes to the rule other than cargo_toml_files (or more than one value to cargo_toml_files), the Cargo.lock file that corresponds with an existing Cargo.toml file won't be correct, as it will be missing information. I guess we could provide a special-cased way to say "If your rule specifies exactly one cargo_toml_files and nothing else, you can specify a corresponding Cargo.lock and we'll use it", but it would be nice to solve the more general problem in our approach too.

Can you explain the desire to mix Cargo.toml files? I initially thought this field was for the manifests in a "cargo workspace", not various packages that happen to be committed to the repo. Is this the case? Are the toml files merged and resolved?

@illicitonion
Copy link
Collaborator Author

You can't just point at a Cargo.lock file right now, the reason being that if you're specifying any of the attributes to the rule other than cargo_toml_files (or more than one value to cargo_toml_files), the Cargo.lock file that corresponds with an existing Cargo.toml file won't be correct, as it will be missing information. I guess we could provide a special-cased way to say "If your rule specifies exactly one cargo_toml_files and nothing else, you can specify a corresponding Cargo.lock and we'll use it", but it would be nice to solve the more general problem in our approach too.

Can you explain the desire to mix Cargo.toml files? I initially thought this field was for the manifests in a "cargo workspace", not various packages that happen to be committed to the repo. Is this the case? Are the toml files merged and resolved?

Yes, the dependencies from the files are merged (and errors if there are hard incompatibilities). The idea is if you have multiple projects developed in a Cargo world, but don't want to re-build their deps in a Bazel world, you can choose to do so (at the cost of manually ensuring they don't conflict). Basically workspace-ifying crates which may not otherwise actually live in a workspace.

@UebelAndre
Copy link
Collaborator

Yes, the dependencies from the files are merged (and errors if there are hard incompatibilities). The idea is if you have multiple projects developed in a Cargo world, but don't want to re-build their deps in a Bazel world, you can choose to do so (at the cost of manually ensuring they don't conflict). Basically workspace-ifying crates which may not otherwise actually live in a workspace.

This feels like it comes at the cost of being able to use a simple ("traditional") Cargo project in Bazel. I think if a Cargo.toml file is to be treated as a source of truth, then it's corresponding Cargo.lock file should be usable as the primary source of truth. I also find it odd because if I wanted to use the same setup without using Cargo.toml files, I effectively couldn't, right? How would a user achieve the same behavior with just using the packages attribute here?

@UebelAndre
Copy link
Collaborator

Thinking a bit more, just wanna be clear on what I'm hoping to be able to do

  1. I have a cargo project that I want to build in Bazel. The Cargo.toml and Cargo.lock files are the source of truth as there is still development expected to be done with Cargo for some time.
  2. I have a project entirely in Bazel and want to grab dependencies without needing to author any Cargo files

In the first case, if the Cargo.lock file is not used, it breaks the consistency in the dependencies between Bazel and Cargo which makes transitioning or interfacing with Bazel hard.

Whatever other things the rule can do is just icing on the cake to me so long as I can do both of those things.

@illicitonion
Copy link
Collaborator Author

Yes, the dependencies from the files are merged (and errors if there are hard incompatibilities). The idea is if you have multiple projects developed in a Cargo world, but don't want to re-build their deps in a Bazel world, you can choose to do so (at the cost of manually ensuring they don't conflict). Basically workspace-ifying crates which may not otherwise actually live in a workspace.

This feels like it comes at the cost of being able to use a simple ("traditional") Cargo project in Bazel. I think if a Cargo.toml file is to be treated as a source of truth, then it's corresponding Cargo.lock file should be usable as the primary source of truth. I also find it odd because if I wanted to use the same setup without using Cargo.toml files, I effectively couldn't, right? How would a user achieve the same behavior with just using the packages attribute here?

Really, both the Cargo.toml files and the packages attribute are just sources for specifying semver constraints which are and'd together, and additional config (e.g. extra deps).

One approach that could be interesting here is additionally allowing Cargo.lock files in the cargo_toml_files attribute, where for each entry in there we'd treat the version as an exact equality constraint (just as if you'd put a Cargo.toml in there with every transitive crate with an exact equality constraint). So if you wanted to exactly reflect one Cargo.toml file you could either put just the Cargo.lock file in that attribute, or both the Cargo.toml and the Cargo.lock, and as long as they stay in sync, everything will Just Work.

I think that would provide the behaviours and guarantees you're looking for?

@UebelAndre
Copy link
Collaborator

One approach that could be interesting here is additionally allowing Cargo.lock files in the cargo_toml_files attribute, where for each entry in there we'd treat the version as an exact equality constraint (just as if you'd put a Cargo.toml in there with every transitive crate with an exact equality constraint). So if you wanted to exactly reflect one Cargo.toml file you could either put just the Cargo.lock file in that attribute, or both the Cargo.toml and the Cargo.lock, and as long as they stay in sync, everything will Just Work.

I think that would provide the behaviours and guarantees you're looking for?

Yup, that'd work for me! Though I still feel like there's a lot of functionality here with hypothetical use cases. I'd definitely appreciate thorough documentation in the initial release 😅

But also, just wanna thank you for putting this together and very much appreciate the work! ❤️ 🙏

@gibfahn
Copy link
Contributor

gibfahn commented Feb 25, 2021

Yup, that'd work for me! Though I still feel like there's a lot of functionality here with hypothetical use cases. I'd definitely appreciate thorough documentation in the initial release 😅

Just for the record, everything that's in these rules is what we needed for our initial (small) set of users internally when we wrote this. This is a 20% project for us, and we very much don't have time to cover hypothetical use-cases 😁 .

@UebelAndre
Copy link
Collaborator

UebelAndre commented Feb 25, 2021

Hypothetical was the wrong word. Some of the use cases here sound odd or less than ideal me being someone who doesn't know exactly what your projects look like and why.

Maybe the only thing I don't understand (which could be based on a bad assumption) is why someone couldn't use a Cargo Workspace in a monorepo scenario, or perhaps why someone would be downloading a crate via some repository rule and injecting it's dependencies into a pool for the current workspace.

@UebelAndre
Copy link
Collaborator

UebelAndre commented Feb 25, 2021

Sorry, just one more comment about #598 (comment) 😅

edit: I've done a bunch of digging and have revised my view of the world.

I think turning cargo_toml_files into cargo_files and allowing the resolver to detect a pair of Cargo.lock and Cargo.toml files is the better thing to do, 100% on board.

The only outstanding thing for me is the generated lockfile. I think this should definitely not be a dump of the BUILD file but instead a raw json dump of the metadata used to generate it. And the resolver should generate unique BUILD.*.bazel files in the repository's directory. I don't think regenerating the BUILD files is an expensive step and It sets up the possibility of consuming transitive rules_rust.lock (Bazel.lock?) files as a future feature request where users can author their dependencies entirely in Bazel and be able to generate a shared universe with the dependencies of transitive workspaces. If this is the case than I'm 100% on board!

Again, super excited! 😄

if lockfile_path != None:
args.append("--lockfile")
str(args.append(lockfile_path))
if repository_ctx.os.environ.get("RULES_RUST_UPDATE_CRATE_UNIVERSE_LOCKFILE", "false") == "true":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rules_jvm_external just added similar functionality, and they're using REPIN=1: bazelbuild/rules_jvm_external#509 (comment)

https://github.com/bazelbuild/rules_jvm_external/blob/ec2c5617b339844312d4adef4400dcc2ccb73c4f/coursier.bzl#L336-L340

How about we just check that the env var is set (not necessarily to true), and use REPIN as well, that way the user can set a single env var to repin everything.

Optionally we could also check for a specific env var like RULES_RUST_REPIN so users can opt-in on a per-language basis, but that's also a non-breaking change to add later, so less important.

Copy link
Collaborator

@UebelAndre UebelAndre Mar 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it'd make more sense to make updating an explicit run action. How else are users expected to generate a lockfile? Perhaps there's also something I'm missing about this workflow though, can you elaborate a bit?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand the use of RULES_RUST_UPDATE_CRATE_UNIVERSE_LOCKFILE now. If the user makes a change to their dependencies, then it won't match the digest in the lockfile and the repository rule will fail. So users will be blocked until they're able to update it by either using this environment variable or commenting out the lockfile. What I still don't understand is how users are expected to extract the lockfile after it's generated. I think there needs to be some run action that will extract and install it, such that users who perform an update would run something like

RULES_RUST_REPIN=1 bazel run @my_crate_universe//:repin

And this would appropriately update the lockfile related to that repository.

@UebelAndre
Copy link
Collaborator

Digging deeper into this. One other concern I have is with cross project code reuse. If a user defines all their dependencies in Bazel, how can those same dependencies be used in another repo that fetches it via some repository rule. What's the mechanism for supporting this?

Copy link
Member

@hlopko hlopko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is some solid work, and hats off, the usability (well modulo lockfiles but I get that) seems really nice. I have no objections to merging this in.

Internally we don't use external repositories, just plain old monorepo, so we won't be able to provide a good feedback.

Personally ever since my Ruby days I always liked to vendor my deps, that's the workflow I use with raze as well. External repos make it annoying to find where the sources are both for me and for editors. Just for the discussion (I'm not asking to reimplement everything :) what do you think about the vendoring workflow? IIUC gazelle supports a vendoring workflow with a trivial bootstrapping (just edit your //:BUILD, add gazelle(name = "gazelle") and bazel run //:gazelle will get you new BUILD files and sources.

"_resolver_script_linux": attr.label(
default = "@crate_universe_resolver_linux//file:downloaded",
executable = True,
cfg = "host",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, there is this remotable attribute (https://docs.bazel.build/versions/master/skylark/lib/globals.html#repository_rule) that tells the repository rule to execute on the remote execution worker. It's useful for detecting toolchains and configuring repos for the exec platform.


Some areas have unit testing, there are a few (very brittle) integration tests, and some examples.

## How to test local changes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry this took me this long.

I don't have any great ideas how to solve bootstrapping :( If I had to get inspiration from the ecosystem, I'd look at and talk with folks at https://github.com/bazelbuild/bazel-gazelle.

environment = {
"RUST_LOG": "info",

# The resolver invokes `cargo metadata` which relies on `rustc` being on the $PATH
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does cargo metadata actually invoke rustc? What for?

I worry that having to use host platform to run cargo metadata will somehow spoil the generated repositories and there will be incompatibilities when host platform is not exactly exec platform. Do you use remote execution with rules rust and crate_universe? Were there ever problems with this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does cargo metadata actually invoke rustc? What for?

It specifically invokes rustc -vV which on my current system outputs:

% rustc -vV
rustc 1.49.0 (e1884a8e3 2020-12-29)
binary: rustc
commit-hash: e1884a8e3c3e813aada8254edfa120e85bf5ffca
commit-date: 2020-12-29
host: x86_64-apple-darwin
release: 1.49.0

I did some tracing through cargo, and it doesn't actually use this output for anything, it just passes it to some methods which conditionally use it in other codepaths. I filed rust-lang/cargo#9295 to try to remove this requirement in cargo.

I worry that having to use host platform to run cargo metadata will somehow spoil the generated repositories and there will be incompatibilities when host platform is not exactly exec platform. Do you use remote execution with rules rust and crate_universe? Were there ever problems with this?

For what we're using it for, cargo resolves the superset of all platforms (which is why you end up with dependencies on things like winapi on all platforms in your Cargo.lock), and only narrows down when it gets around to the actual fetching and building. We are indeed using remote execution with crate_universe and have not run into any issues :)

@@ -0,0 +1,40 @@
# buildifier: disable=load
load(
"@rules_rust//rust:rust.bzl",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about using the "more focused" rules from //rust:defs.bzl?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely - I'll make that change now, these didn't exist when I put together this branch :)

@UebelAndre
Copy link
Collaborator

I'm very concerned about merging something that then can't be change without a large amount of effort and coordination because some users might depend on particular behavior.

Things I feel should be addressed before merging (all equal priority):

  • The lock file should be a pre-rendered json file
  • Using only a cargo.toml + lock file where the lock file is the authority should be supported
  • Bootstrapping should be setup (even if @illicitonion just commits binaries into the repo for the initial release)
  • Documentation is written and generated into the existing site.

@UebelAndre
Copy link
Collaborator

Is there a way this could maybe go into an isolated repo first and allow pull requests to it before merging into the rules?

@tschuett
Copy link

You can add a note: This is still experimental and the API and the implementation may change without notice.

@illicitonion
Copy link
Collaborator Author

Personally ever since my Ruby days I always liked to vendor my deps, that's the workflow I use with raze as well. External repos make it annoying to find where the sources are both for me and for editors. Just for the discussion (I'm not asking to reimplement everything :) what do you think about the vendoring workflow? IIUC gazelle supports a vendoring workflow with a trivial bootstrapping (just edit your //:BUILD, add gazelle(name = "gazelle") and bazel run //:gazelle will get you new BUILD files and sources.

I haven't thought about it much, but I suspect adding a vendor_root path argument to the crate_universe rule could work well here. Whether it shoves information into the lockfile to decide whether it needs to re-vendor, or we introduce a specific runnable target to perform vendoring, the internals could be reasonably easy to make work without ~any API differences. It would end up emitting the generated BUILD files in slightly different locations, and the implementation of the crates_from and similar macros would end up using different prefixes for the dependency labels, but from a user perspective in both the WORKSPACE and their own BUILD files it would be ~identical.

FYI, there is this remotable attribute (https://docs.bazel.build/versions/master/skylark/lib/globals.html#repository_rule) that tells the repository rule to execute on the remote execution worker. It's useful for detecting toolchains and configuring repos for the exec platform.

This could be interesting, at least to expose as an option... But at least with how we have things set up where I am working, the place we invoke bazel has access to (a mirror of) crates.io, and our remote build workers shouldn't, so running the resolver remotely would be problematic :) Maybe something to look at in the future!

Bootstrapping, and stability

How about for now, we merge this with a warning that it's unstable, and also, without solving bootstrapping? That way, someone who wants to try it out will need to actively build the resolver and somehow pull it in as a remote repository file, which goes some way to emphasising that the feature is experimental?

This rule dynamically looks up crate dependencies as a repository rule.

See bazelbuild#2 for context.
This is more clear, as they don't need to round-trip via two .bzl files,
so the examples are self-contained. However, we'll need to work out how
to make them build on CI.
@UebelAndre
Copy link
Collaborator

I would still prefer to put it in a separate repo or have my concerns addressed but I understand it's not my call.

I do not think in it's current state it's practically usable for most users. This is something that I'd like to iterate on but will be pretty demotivated if I continue to have to deal with supporting legacy behavior from the initial commit. My push back is an attempt to avoid that situation entirely and find a solution where the bare minimum feature set is met or something is setup to iterate with exclusively future facing changes.

@gibfahn
Copy link
Contributor

gibfahn commented Mar 24, 2021

This is something that I'd like to iterate on but will be pretty demotivated if I continue to have to deal with supporting legacy behavior from the initial commit.

This sounds like the project (rules_rust) needs to be clear on what guarantees it's giving this, if it's experimental then there are no guarantees and things can change. Is there something you think we could do to make it more obviously experimental? I think the fact that you have to build your own resolver makes it fairly hard not to notice the experimentalness of it.

Personally I want this to land so it's easier to iterate on, definitely not because it's stable.

I do not think in it's current state it's practically usable for most users.

If so that should make breaking changes easier.

@UebelAndre
Copy link
Collaborator

This could be merged int an "experimental" directory. Rules Python did this when upstreaming rules_python_external but I don't know how if that ended up being the best thing. If there were an experimental directory, we could probably merge more things into it to make it easier to iterate on? I don't know why there's a desire to merge this so it can be iterated on but not things like the Prostgen rules or the persistent worker or something else. Could something be added to ARCHITECTURE.md to provide a clear place for merging experiments?

@tschuett
Copy link

How about updating the README.md instead and hinting at the unstable new feature.

* Update to use defs.bzl
* Fix up some tests
* Remove pseudo-attempt at bootstrapping
@illicitonion
Copy link
Collaborator Author

I've added the following note to the docstring of the crate_universe rule:

    This is currently highly experimental, and subject to breaking API changes without notice.

    In order to actually use this rule, you will need to make available a built version of the rust binary in `cargo/crate_universe_resolver`
    as an http_file with the name crate_universe_resolver_linux or crate_universe_resolver_darwin (whichever is appropriate for your platform).

I'm happy for us to make every commit to crate_universe a breaking change until things stabilise a bit (currently I expect to be the only person who suffers from instability there!) - let's work together to make it something we want to settle down on together :)

@illicitonion illicitonion merged commit dd7e458 into bazelbuild:main Mar 25, 2021
@illicitonion illicitonion deleted the crate_universe branch March 25, 2021 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants