Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Rust code integration #17090

Open
fanquake opened this issue Oct 9, 2019 · 18 comments

Comments

@fanquake
Copy link
Member

@fanquake fanquake commented Oct 9, 2019

Moving the discussion from #15798 here, as it makes more sense to have it in an issue instead of a PR. #15798 no longer reflects the current proposed Rust changes, and Rust related work is now happening across multiple others.

Corys thoughts / introduction from #15798:

This is not intended to be merged as-is, but instead to serve as a reference for anyone who might be interested in trying out some rust code inside of Bitcoin Core. I have no idea what works. I have lots of questions about debugging, threading, etc. But instead of trying to hack and document how things work, we thought it'd be fun for everyone to be able to poke at it and scratch our heads together :). If something interesting comes out of it, a discussion about merging can happen then.

It is surprisingly functional. The rust tools are impeccable. I would've thought this would be a project that would take months/years, but the rust devs have done such a good job that mostly everything already just works. The gitian descriptors have been modified to actually incorporate working rust code. All of our currently supported platforms seem to already work (even macOS cross!) with the exception of 32bit windows and ancient linux distros. The specific issues are documented in the gitian descriptors. For now, Gitian needs net access and a larger disk volume, but those are both very temporary issues

Why rust? I don't know. Maybe not. But I think it's a fair assumption to say that Rust code will eventually end up in Bitcoin Core as the result of adding a new dependency. Adoption is happing quickly. So, I think it prudent to look ahead and not only be prepared, but actively help usher it in. Already I have a few things that I would like to work on and upstream to rust-lang to make our integration nicer, and I assume that more will be uncovered as it actually gets used.

Anybody have any fun ideas?

There's still discussion to be had / documentation to be added around various Rust related decisions, such as:

  • Whether to use/require the 2015 or 2018 Editions.
  • A minimum required version of the language.
  • How failures in the Rust code are (expected to be) handled by the c++ code.

As well as higher level discussions around how far we might take the Rust integration/re-writing of certain parts of the code, and the complexities that could introduce in regards which contributors can/can't write/review Rust, or have more/less experience with the language compared to C++ etc.

Personally, I'd like to think that we'll have some off-by-default Rust as part of the Bitcoin Core 0.20.0 release; and I think something like #16834 might be a good first approach. Regardless of approach, the idea of using Rust inside Bitcoin Core seems to resonate with some of the (still limited set) of other contributors I've discussed it with.

Using rustc directly instead of Cargo

The initial Rust integration PR used Cargo for managing compiling, dependencies etc. Since then, the Rust changes have been refactored to remove the cbindgen dependency and drop any usage of Cargo altogether, in favour of using rustc directly.

See also the recent Security advisory for Cargo.

Related PRs

Fetch Headers over DNS
Rust-based Backup over-REST block downloader
Add Parallel P2P Client in Rust

Related Reading

Bootstrapping Rust using Guix

@TheBlueMatt

This comment has been minimized.

Copy link
Contributor

@TheBlueMatt TheBlueMatt commented Oct 9, 2019

A number of comments:

  • Obviously my favorite path to rust in core right now is the "make it easy to add new modules to download blocks/headers for anti-censorship", which #16834 and #16762 obviously are first steps/demos of. I think its nice cause it is completely optional, both at build-time, and in the sense that we can have threads that are doing that crash (without bringing down the rest of Core due to panic catches) without it making the rest of Core broken.
  • I think we should air towards "minimum version of the language we can get by with" until we have a reason not to, and we can discuss it more then. Largely, I think #16834 is ready-to-go enough (mod review) that we can pretty much move forward with it without committing to a lot of these decisions today.
  • WRT cargo-vs-rustc: I don't see a lot of reason to use cargo. The major benefit it provides is downloading arbitrary code from github, building it, and linking it into your program. Obviously this isn't helpful for us, and we'll almost certainly subtree any dependencies we take, and I went and tested linking dependencies that are subtree'd without cargo and its just a few more flags to rustc.
@MarcoFalke

This comment has been minimized.

Copy link
Member

@MarcoFalke MarcoFalke commented Oct 9, 2019

Be reminded that there has been an IRC discussion about this topic: http://www.erisian.com.au/bitcoin-core-dev/log-2019-04-11.html#l-392
There was no strong support for rust, but I'd like to reiterate on a few worries that have been brought up:

On concern was that it might be a blocker for compiling Bitcoin Core on exotic systems, but that seems to be addressed by only using it for additional (optional) features.

Another concern was that most Bitcoin Core developers and reviewers don't have any background in rust, so rust code might be of lower quality. However, this is a chicken-and-egg problem and can be solved by adding rust in small steps.

Finally, a concern was that we aren't running into many bugs that rust could prevent. Rust can prevent memory and race issues, but not logic errors. While I initially agreed with this point, I concluded that switching to rust will help us with memory issues (at least for the code written in rust). Just to dig up some examples from master, that would have been impossible in rust: #6636, #16796, #13351. Happy to dig out more, but I think they are good illustrations. On top of that, I (and other people, such as a resident in the recent Chaincode Summer residency) am running into segfaults and other memory issues on a regular basis. Those runtime errors require a lot of developer attention (running in gdb, valgrind, sanitizers). Rust makes them impossible and might thus speed up the workflow.

@TheBlueMatt

This comment has been minimized.

Copy link
Contributor

@TheBlueMatt TheBlueMatt commented Oct 9, 2019

Right, I'll note that previous discussions had a little less clear projects that might use rust in Core. The specific work in #16834 and #16762 is pretty different in nature.

I agree with most of your concerns, but, indeed, as you point out, the specific proposals sidestep them (at least in the immediate future) somewhat. The advantage I like for Rust in this vein is that review is a little bit easier by offloading some classes of issues onto the compiler, hopefully allowing us to move a bit faster, which is uniquely important for censorship-resistance and -detection block/header fetching (as more is better, and critical).

@laanwj

This comment has been minimized.

Copy link
Member

@laanwj laanwj commented Oct 10, 2019

FWIW I'm all for using Rust for non-required, non-critical functionality in Bitcoin Core. I think it's way too early to require it for build.

Another concern was that most Bitcoin Core developers and reviewers don't have any background in rust

Given what I've heard a lot of bitcoin developers have side-projects in rust. But as for a personal anecdote, I prefer reviewing rust code over C++. I've been very much disillusioned with C++ over the last years, especially the eternal issues around what is Undefined Behavior and what is not (just grep for commits and PRs by @practicalswift to find tons of them).

Finally, a concern was that we aren't running into many bugs that rust could prevent.

Also because we very much tiptoe around things that might cause those kind of bugs. We've been extremely careful around introducing concurrency, for example, or refactoring it to be finer-grained (e.g. @TheBlueMatt's PRs are just so scary to review), because it's so hard to get correct in C++. And even then, there have been a few nasty race conditions. Rust could allow us to be bolder around those things, resulting in more performant code in the end.

@practicalswift

This comment has been minimized.

Copy link
Member

@practicalswift practicalswift commented Oct 10, 2019

@MarcoFalke

Finally, a concern was that we aren't running into many bugs that rust could prevent. Rust can prevent memory and race issues, but not logic errors.

FWIW, a small scale bug shootout:

Both implementations written by extremely skilled and security conscious developers :)

@laanwj

I've been very much disillusioned with C++ over the last years, especially the eternal issues around what is Undefined Behavior and what is not (just grep for commits and PRs by @practicalswift to find tons of them).

Three points:

  1. The UB situation in C++ is a problematic, but with some help from the sanitizers (which we are finally using!) it is not too hard to avoid relying on UB in modern C++ code. FWIW, poorly written unsafe Rust code will have UB too :)
  2. Usually it is pretty clear if something is UB or not. Any discussion is typically around the priority of fixing, and/or if relying on the specific UB is "safe in practice" (in other words: what is the likelihood of a future compiler optimisation pass exploiting this specific UB?).
  3. As always when discussing defects: don't shoot the messenger :)

FWIW I don't think UB is a big problem in our code base as of today. I think our lack of continuous large-scale fuzz testing is a larger problem - something which I plan to address :)

@elichai

This comment has been minimized.

Copy link
Contributor

@elichai elichai commented Oct 10, 2019

I agree with most of what was said here with a few exceptions:

  1. Even though I don't know of any security critical bug bitcoin had that was memory related I think in the long term rust can help a lot with both review time and implementation time.
    Because right now when someone opens a PR with a function that for example accepts a raw pointer part of the review is to look everywhere that function is used and make sure there's no way to get to a dangling pointer or to accidentally dereference a null pointer.
    All of these problems are no-ops in safe rust. so even if currently the bitcoin review process is good at eliminating most of these problems Rust can help save review time.

  2. Personally and I'll guess for other non C++ experts, writing C++ sometimes feel scary because you're not 100% there's no UB in your code and there's not some edge case with memory issue, this problem is mitigated with sanitizers, valgrind and good review process. but it's better to know your code does what you think it does. (again this is only about memory un-clearity not logic)

Now about rust itself:

  1. I think that if we plan to make this experimental and optional for a while i'll go for the latest rust compiler available right now. because when and if it will ever be an inherit part of bitcoin mrustc will probably support newer versions(already at the verge of 1.29 support[1] ) and maybe even a gcc frontend will arise by then[2]

  2. I do understand the idea of not requiring any version newer than what is shipped with latest distros, but I think that any safety feature we'll miss because of "supporting the oldest compiler possible" will make us miss the entire point of using rust.

  3. I see no point in using Edition 2015, as Edition 2018 was stabilized in Rust 1.31. That said this is only an esthetic change, so I don't have strong feelings about it.

  4. Cargo vs. rustc. I think that as long as we can get away with using rustc directly it's fine, but I don't think this will work for the long term.

  5. Combining point 4 and 1, I think that when we will want cargo and when we will have dependencies then we will want a feature like cargo vendor which was added to cargo in 1.37 so that's another reason for using an up to date rust version. on top of that the mentioned security advisory(I found btw) is an example on why it's problematic to use old versions of fast evolving tools. I hope the rust and cargo team will learn from that experience for future features but I would argue that we should try and use an up to date rust version.

Quick simple example of dependencies we will want to use if we continue to integrate rust is cbindgen and rust-bindgen as FFI calling is a very easy way to introduce UB that no compiler will warn about (if the rust and C declarations won't match correctly).

P.S. If people want I can try and give a list of examples of features we might need that aren't available in old rustc versions. and that's without talking about changes to the borrow checker that makes writing rust code way easier.

[1] thepowersgang/mrustc#95 (comment)
[2] https://users.rust-lang.org/t/call-for-help-implementing-an-independent-rust-frontend-for-gcc/32163

@MarcoFalke

This comment has been minimized.

Copy link
Member

@MarcoFalke MarcoFalke commented Oct 10, 2019

I do understand the idea of not requiring any version newer than what is shipped with latest distros, but I think that any safety feature we'll miss because of "supporting the oldest compiler possible" will make us miss the entire point of using rust.

Another point to consider here is whether we want to ship it with the official binaries (either gitian running on Bionic or guix). If we don't ship it and it is disabled by default when compiling from source, you could argue that no one is using it, nor testing it. So we might save us from the hassle by not adding it in the first place?

@elichai

This comment has been minimized.

Copy link
Contributor

@elichai elichai commented Oct 10, 2019

@MarcoFalke what about other optional compilations? (i.e. upnp) are they shipped in binaries?

FYI currently this is what bionic has in the repos: https://packages.ubuntu.com/bionic/rustc. 1.36 is pretty up to date. and it seems like most distros are maximum 1-3 versions behind (the speculation for why are they so up to date is because they need to compile firefox heh)

@TheBlueMatt

This comment has been minimized.

Copy link
Contributor

@TheBlueMatt TheBlueMatt commented Oct 10, 2019

Right, my preference would be off-by-default-and-in-release-binaries in 0.20, but hopefully (pending guix work and mrustc-in-guix) we can change both of those in 0.21. Note that just because we support older compilers doesn't mean we build by default with older compilers in guix/release binaries.

@laanwj

This comment has been minimized.

Copy link
Member

@laanwj laanwj commented Oct 10, 2019

FWIW, poorly written unsafe Rust code will have UB too

Yes, I don't think anyone doubts that. The point is that if you write rust code without unsafe, you can rely that it won't have certain issues (excepting bugs in rust's compiler and runtime itself). The parts that are unsafe (if needed at all) automatically draw attention.

I think our lack of continuous large-scale fuzz testing is a larger problem - something which I plan to address

Fuzz testing is good in any language, be it C++, C, Rust, Python … using a different language doesn't change anything in the need for testing.

Another point to consider here is whether we want to ship it with the official binaries

That should be the goal.

@TheBlueMatt

This comment has been minimized.

Copy link
Contributor

@TheBlueMatt TheBlueMatt commented Oct 10, 2019

Another point is I dont think we need to fully decide this now. While I'm really skeptical of requiring rustc newer than 1.34 (Debian buster/stable for the next many years), we can start with that and revisit the question as we have newer, larger features that may want more features.

As for cbindgen/rust-bindgen, I'm really dubious of the value of using them at build-time, but a really cool project may be running them on travis as a linter to check that the checked-in bindings are "correct".

@jgarzik

This comment has been minimized.

Copy link
Contributor

@jgarzik jgarzik commented Oct 10, 2019

An Ode To Simplicity

Rather infamously, in a recent talk, a developer related the story of a manager who chose to reward programmers based on LOC committed. One engineer responded by making an extra effort to go through the company's codebase and delete code, ensuring they would rank at the very bottom of the chart, with negative LOC each month.

From the standpoint of must-be-secure high quality code, less is more. Less code probablistically produces fewer bugs, fewer LOC to review and analyze, and is easier to prove secure.

This was the origin of the --disable-wallet compile time flag. This permits eliding GUI and wallet code, focusing on the network-security-sensitive kernel as the minimized runtime attack surface for validating nodes.

Integrating two languages tends to go in the opposite direction from complexity reduction. FFI and unsafe rust and hidden sand traps galore. It takes another 10 years to move the C++/Rust boundary to 75+% rust.

Trying to take a step back and think holistically,

  1. There is a numerically large userbase for "the kernel" - the network validating engine - and numerically small userbase for the GUI and wallet.
  2. The kernel should be in a repo by itself, without a kitchen sink of other code such as wallet or GUI. Those accessories can live in git repos that use the kernel repo as a submodule. This creates incentives for the kernel repo (and issues and workflow) become more simple, less complex over time.
  3. Build a separate, safe Rust validating engine. Run in parallel with C++ to observe A/B differences in field operation.
  4. Migrate wallet/GUI to use REST/zmq/etc. interfaces.
  5. At this point, switch to Rust engine can safely be done.

End result in medium term is a less complex core.
End result in longer term is less complex core... in Rust.

@TheBlueMatt

This comment has been minimized.

Copy link
Contributor

@TheBlueMatt TheBlueMatt commented Oct 10, 2019

@jgarzik It looks like you missed all the current context and didn't really add much with your comment. Please go read #16834 first. Specifically, no one is proposing, nor even really considering doing any kind of validation in Rust, nor migrating existing code towards it.

@jgarzik

This comment has been minimized.

Copy link
Contributor

@jgarzik jgarzik commented Oct 10, 2019

Context was not missed. Re-read assuming this context is known, looking at another 10 years of development, and the author is observing that this is a choice point in development, where the ship could be steered towards reduced or greater complexity.

@TheBlueMatt

This comment has been minimized.

Copy link
Contributor

@TheBlueMatt TheBlueMatt commented Oct 10, 2019

This was discussed extensively at the meeting, conclusions as I understand them are as follows:

  • Target Rust 1.34.2 for now. If we end up needing something more, we can revisit then, but as an initial target it should do (probably implies we need to test it in Travis in #16834).
  • No direct conclusion on rustc-vs-cargo, but for now I don't see any strong disagreement to move forward with rustc and be willing to revisit in the future.
  • Be willing to ship on-in-release-binaries immediately in 0.20, using gitian/Canonical's rustc as it doesn't (massively) change our trust model within Gitian of trusting the Canonical toolchain, instead of gating on the ongoing Guix work. Note that this (obviously) will result in release builds using a newer rustc than our "minimum supported".
  • No significant objection to moving forward here!
@JeremyRubin

This comment has been minimized.

Copy link
Contributor

@JeremyRubin JeremyRubin commented Oct 10, 2019

As for cbindgen/rust-bindgen, I'm really dubious of the value of using them at build-time, but a really cool project may be running them on travis as a linter to check that the checked-in bindings are "correct".

This is a good point in particular -- I agree that cbindgen outputs should be committed. Prefer linting to happen during the regular local build/testing though rather than just in Travis, and think that we should have tooling which makes it simple to emit bindings to be committed to keep development flow simple. We should also be careful to not allow bindings which aren't generatable from CBindgen, so even if we aren't relying on it for build we can be sure it's in line with what could be machine generated.

@elichai

This comment has been minimized.

Copy link
Contributor

@elichai elichai commented Oct 10, 2019

oh I 100% agree that if and when we'll use cbindgen/rust-bindgen the output should be committed to track changes. we could also run them as commands manually(or as part of some special make flag) every release.
don't need to have it in a build.rs(or it can be an optional feature in the build.rs)

@elichai

This comment has been minimized.

Copy link
Contributor

@elichai elichai commented Nov 4, 2019

Saw this: #17208 (comment)
So something to note, is that rust requires floating points to be IEEE-754.
(though it does support platforms without floats at all using llvm's soft-float)
https://doc.rust-lang.org/reference/types/numeric.html#floating-point-types

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.