Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistent worker for Rust rules #412

Open
nikhilm opened this issue Sep 20, 2020 · 13 comments
Open

Persistent worker for Rust rules #412

nikhilm opened this issue Sep 20, 2020 · 13 comments

Comments

@nikhilm
Copy link

nikhilm commented Sep 20, 2020

Hello,

I have a simple persistent worker for Bazel that I've been playing with on some small projects.

Would the Rust rules be interested in integrating this? Since rustc does not support Java/TypeScript/Scala like "Compiler as an API", this really only takes advantage of in-crate incremental compilation introduced in rust 1.24. This is achieved by passing --codegen=incremental=/path/to/dir. The directory is named using a combination of hash(path to rustc wrapper) and the workspace name, with the goal of:

  1. Being shared across multiple worker instances when they are spawned by Bazel. This is why one cannot use something like the tempdir crate (would be deleted each time a worker shut down) or just a randomly named temp file (would not be shared across worker processes).
  2. Not being shared across workspaces.
  3. Not being shared across compiler updates. This should probably actually depend on the path to rustc itself, instead of the wrapper. I can fix that.

The questions I have are:

  1. Is this kind of cache sharing sound design? I don't know enough of Bazel internals nor rustc internals to have high confidence in answering yes to this.
  2. Since this project is itself in Rust, how would something like this be integrated into the rules? Options:
    1. Use pre-built binaries built with Cargo. (What I'm doing now for simplicity.)
    2. Re-write in C++.
    3. Have certain internal switches to the Rust rules to build the worker itself using the rustc wrapper directly.

If this all sounds reasonable, I can send relevant PRs.
In particular, it would also be nice to have the worker and the current process wrapper combine their functionality so that an additional process spawn is saved (Go from worker -> process wrapper -> rustc to worker -> rustc.

@nikhilm
Copy link
Author

nikhilm commented Sep 20, 2020

If we are to follow the behavior of Cargo exactly, we would also want to include the compilation mode in the incremental cache path (to reflect the Cargo default of target/{debug,release}/incremental.

@damienmg
Copy link
Collaborator

damienmg commented Sep 21, 2020

That sounds like a great option to offer better build speed.

@mfarrugi
Copy link
Collaborator

mfarrugi commented Sep 21, 2020

Is this kind of cache sharing sound design? I don't know enough of Bazel internals nor rustc internals to have high confidence in answering yes to this.

I think 'sound' is a pretty low bar, depending on what you mean. Reproducibility might be a good measure.

https://doc.rust-lang.org/rustc/codegen-options/index.html#incremental is not very specific, I assume that directory is per-crate? I don't see why avoiding sharing across workspaces is relevant, since the cache should be f(crate, rustc, crate-deps) which could overlap in different workspaces.

Since this project is itself in Rust, how would something like this be integrated into the rules? Options:
Use pre-built binaries built with Cargo. (What I'm doing now for simplicity.)
Re-write in C++.
Have certain internal switches to the Rust rules to build the worker itself using the rustc wrapper directly.

I think that bootstrapping is fairly straight forward by making using the worker (internally?) optional, and disabling it when building the worker and it's dependents.


Last time I looked, there were still reproducibility issues w/ rustc builds, and I doubt incremental compilation helps. In the worst case, if incremental building reduces reproducibility, it is possible that output changes trigger more downstream re-building and incremental builds ultimately don't save any time.

I suspect that right now most rustc builds are going to rebuild all dependents anyways (though again, I have not looked into this recently), so it's very likely this is an improvement.

I also vaguely recall there being a performance trade-off to using --codegen-units, and have seen some places set this to 0.

Also, #228 is tangentially related as another use-case for a worker.

@nikhilm
Copy link
Author

nikhilm commented Sep 22, 2020

https://doc.rust-lang.org/rustc/codegen-options/index.html#incremental is not very specific, I assume that directory is per-crate? I don't see why avoiding sharing across workspaces is relevant, since the cache should be f(crate, rustc, crate-deps) which could overlap in different workspaces.

It also probably depends on the compilation mode, since Cargo tends to keep different dirs - target/{debug,release}/incremental.
Is there a way to retrieve something similar to the crate-deps hash in Bazel, or should I simply hash all the deps?

I'll try the bootstrapping option and send a PR. Would you prefer 2 different PRs, one that simply added the worker code and another that actually enabled the build + integration?

Certainly even sandboxed rustc does not produce reproducible builds right now (tested on Linux and I have previous experiences on Windows), so I don't think this is a net negative.

Thanks for the inputs!

@nikhilm
Copy link
Author

nikhilm commented Sep 22, 2020

Is there a way to retrieve something similar to the crate-deps hash in Bazel, or should I simply hash all the deps?

I don't need to handle this as rustc already takes care of the unique hash derivation after it is passed an incremental directory. It is quite likely that the implementation also takes care of capturing things like build flags, as it seems to propagate dependency information down the tree, so I've chosen to only combine the rustc path and the compilation mode.

@nikhilm
Copy link
Author

nikhilm commented Sep 23, 2020

@mfarrugi Is there a better way to implement the following than what I'm thinking?

I think that bootstrapping is fairly straight forward by making using the worker (internally?) optional, and disabling it when building the worker and it's dependents.

Specifically, I'm having trouble with "and it's dependents". Since those are raze generated and use rust_library, I can't create my own rust_library_no_worker internal rule. The problem is, even to specify the worker executable to use as an _worker_executable property, I need an attrs that is completely different for my worker build rules and dependencies. Is there some way to control the behavior of dependencies?

@mfarrugi
Copy link
Collaborator

mfarrugi commented Sep 23, 2020

@nikhilm
Copy link
Author

nikhilm commented Sep 24, 2020

Yes, this is certainly hampered by the need to build dependencies. :(
I may have to just rewrite the relevant bits in C++ or something.
Let me look into how easy it is to tweak raze.
Editing the generated BUILD files sounds ugly.

@nikhilm
Copy link
Author

nikhilm commented Sep 24, 2020

What is the minimum Bazel version these rules need to support?

@mfarrugi
Copy link
Collaborator

mfarrugi commented Sep 24, 2020

@nikhilm
Copy link
Author

nikhilm commented Oct 3, 2020

Sorry, I've been tending to real life for a while.
I poked around at using build settings and transitions last night, including label build settings, hoping to bootstrap the worker, but hit a dead end. I'm going to try editing the BUILD files now and then file a raze issue.

@nikhilm
Copy link
Author

nikhilm commented Jan 14, 2021

I've been out of town since mid December and will continue to be so for another 3 weeks. I don't have access to my development machine until then. I'll check back on some of this once I'm back.

@dae
Copy link
Contributor

dae commented Mar 27, 2021

Minor changes were required to get this going with the latest rules_rust - they're available here if anyone needs them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants