Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling from Rust to C++ is not safe #1

Closed
bjorn3 opened this issue Jan 8, 2020 · 17 comments
Closed

Calling from Rust to C++ is not safe #1

bjorn3 opened this issue Jan 8, 2020 · 17 comments

Comments

@bjorn3
Copy link

bjorn3 commented Jan 8, 2020

The C++ side may do arbitrary unsafe things like dereferencing dangling pointers.

@dtolnay
Copy link
Owner

dtolnay commented Jan 8, 2020

100% of C++ code is unsafe. When auditing a project, you would be on the hook for auditing all the unsafe Rust code and all the C++ code. The core safety claim under this new model is that auditing just the C++ side would be sufficient to catch all problems, i.e. the Rust side can be 100% safe.

@bjorn3
Copy link
Author

bjorn3 commented Jan 8, 2020

I understand. Making the rust side safe could make for example cargo-geiger think there is nothing to audit though, as it doesn't peek into the macro expansion.

@dtolnay
Copy link
Owner

dtolnay commented Jan 8, 2020

The user of cargo-geiger would need to know that if there is non-Rust code in the project then Rust's safety guarantees obviously do not apply to the non-Rust code.

I don't know much about cargo-geiger but hopefully there would be some way for it to flag the presence of non-Rust code.

@YaLTeR
Copy link

YaLTeR commented Jan 8, 2020

There's an interpretation of safe Rust code which allows to draw a clear boundary of what should be marked unsafe, which is that it should not be possible to cause undefined behavior with safe Rust code. Do I understand correctly that the intended usage of cxx is to write the C++ side in such a way as to satisfy this Rust side requirement, rather than marking the functions unsafe from the Rust side?

@dtolnay
Copy link
Owner

dtolnay commented Jan 8, 2020

There's an interpretation of safe Rust code which allows to draw a clear boundary of what should be marked unsafe, which is that it should not be possible to cause undefined behavior with safe Rust code.

That's certainly the case here. You can't cause undefined behavior with just safe Rust code. You need to add some unsafe C++ code in order to get undefined behavior.

Consider the following: suppose you were using traditional bindgen to expose an unsafe FFI which you then wrapped in a safe Rust API, as is common practice. That doesn't give you the right to later mess up the C++ code arbitrarily badly without getting UB...

With CXX you can mess up the Rust code arbitrarily badly without getting UB, so the Rust code is safe.

@bjorn3
Copy link
Author

bjorn3 commented Jan 8, 2020

That doesn't give you the right to later mess up the C++ code arbitrarily badly without getting UB...

Yes, but in that case you uses unsafe {} to claim that the C++ doesn't cause UB. With cxx you can call C++ code which causes UB without ever using the unsafe keyword.

@dtolnay
Copy link
Owner

dtolnay commented Jan 8, 2020

I understand what you're saying but I don't find it to be a distinction that would be valuable for maintaining correctness of a codebase in practice.

My approach is as follows:

  • Fact 1: unsafe is the minimal set of code that you need to audit to discover all sources of undefined behavior.
  • Fact 2: all C++ is unsafe.
  • Fact 3: if you don't need to audit the Rust code, it doesn't need to be unsafe.

@anderejd
Copy link

anderejd commented Jan 8, 2020

Hello, I'm the author of the tool cargo-geiger. Thank you @dtolnay for yet another significant open source Rust contribution, your work is amazing, period. This approach to C++ FFI looks wonderful from my perspective.

@bjorn3 I think I can see the point you are making, that it should be easy to find out if a crate depends on any unsafe code including FFI? Let's continue that discussion including detection of unsafe code using macro expansion over at the cargo-geiger repos.

@ogoffart
Copy link

ogoffart commented Jan 9, 2020

But by that definition, extern functions should not be unsafe. But they are:

extern {  fn init_stuff(); }

fn main() {
     init_stuff(); // Error:  call to unsafe function is unsafe and requires unsafe function or block
}

Why is calling init_stuff() unsafe? I could claim that calling it is not unsafe because it is the C code that need to be audited.

Repository owner deleted a comment from zamazan4ik Jan 9, 2020
@RalfJung
Copy link

RalfJung commented Jan 10, 2020

Fact 1: unsafe is the minimal set of code that you need to audit to discover all sources of undefined behavior.

That's already not true, though. However I have to admit that this particular refutation has little bearing on the discussion at hand.


I think @bjorn3 is following the interpretation that when running a Rust program, all possible UB must arise from some unsafe block. This property is violated by cxx, because C++ code is not inside a literal unsafe block. Usually, writing unsafe { extern_fn() } is the place where the proof obligation for the C++ code occurs.

Consider the following: suppose you were using traditional bindgen to expose an unsafe FFI which you then wrapped in a safe Rust API, as is common practice. That doesn't give you the right to later mess up the C++ code arbitrarily badly without getting UB...

No, it doesn't -- so? The unsafe block on the Rust side crucially marks the place where the proof obligation is checked, and of course if anything that is relied upon by the corresponding proof changes, it needs to be re-checked. This is true even for pure Rust code: calling some unsafe fn in a correct way does not give you the right to later change that unsafe Rust code and change its safety contract; you have to re-check all uses at that point.


@dtolnay on the other hand views C++ code as implicitly carrying an unsafe block all around it.

I can understand this view, but I think the issue with this is the lack of a clear proof obligation. unsafe blocks aren't about the exact auditing scope (see above); they are all about marking where exactly which proof obligations arise. (This is consistent with my blog post: the proof obligations for set_len start at some unsafe block in Vec that would cause UB with a incorrect length; that obligation is discharged via Vec's datastructure invariant, and that invariant in turns puts an obligation on the entire module.)

With cxx, what has to be proven about the C++ code is determined very non-locally by the cxx bridge file. The C++ type system is too weak to tell you what you have to prove. Crucially, that cxx bridge file doesn't even have any unsafe in it so it is not clear that there is some proof obligation being determined here. (This is basically the point @ogoffart is making above as well; I wonder what your thoughts on that are @dtolnay.)

@bjorn3
Copy link
Author

bjorn3 commented Jan 10, 2020

I think @bjorn3 is following the interpretation that when running a Rust program, all possible UB must arise from some unsafe block.

Indeed

Crucially, that cxx bridge file doesn't even have any unsafe in it so it is not clear that there is some proof obligation being determined here.

Maybe add a #[unsafe_assume_no_ub] (anyone a better name?) which makes the function safe to call, with the default being unsafe when #[unsafe_assume_no_ub] is not used?

@dtolnay
Copy link
Owner

dtolnay commented Jan 10, 2020

@RalfJung I think your post you linked reinforces my perspective. The emphasis is that "unsafe is the minimal set of code that you need to audit to discover all sources of undefined behavior. As with the Vec implementation you talk about, looking at one piece of unsafe code can expose other code not necessarily bearing the unsafe keyword that also needs to be considered, but through the process you discover all problems and can convince yourself that the whole system is good.

Your wording of "C++ code as implicitly carrying an unsafe block all around it" is accurate. This is just how that language works, and how audits of FFI to it always work. https://www.reddit.com/r/rust/comments/elvfyn/ffi_like_its_2020_announcing_safe_ffi_for_rust_c/fdm8olr/ covers this in more detail from the perspective of real practical users of FFI. C++ isn't about clear formal proof obligations, it relies on the programmer to understand what they are looking at in broader context; this is pervasive to the language and applies here.

I would add that in addition to C++ code carrying an implicit unsafe block all around it, Cargo build scripts also always carry an implicit unsafe block all around and must be audited to discover the necessary understanding of the system and its safety requirements.

@dtolnay
Copy link
Owner

dtolnay commented Jan 10, 2020

Moderator note: I am deleting comments because GitHub's linear comment structure makes it unmanageable to hold a complex forking discussion here. Discussions that are not the dominant thread would be better to hold here.

Repository owner deleted a comment from jsgf Jan 10, 2020
Repository owner deleted a comment from bjorn3 Jan 10, 2020
Repository owner deleted a comment from bjorn3 Jan 10, 2020
Repository owner deleted a comment from fpoli Jan 10, 2020
@dtolnay
Copy link
Owner

dtolnay commented Jan 10, 2020

But by that definition, extern functions should not be unsafe.

extern { fn init_stuff(); }

Why is calling init_stuff() unsafe? I could claim that calling it is not unsafe because it is the C code that need to be audited.

In the case of a handwritten extern, the two reasons it's unsafe are:

  • You might have the wrong signature. You don't get to mess up the Rust side arbitrarily badly (after reviewing what is being exposed from C++) without getting UB. CXX solves this.

  • Not all C++ functions with the type void (*)() are free of proof obligations.

CXX is designed for C++ functions that are fine to call with any possible value of their arguments and this would be something that an audit of the C++ side of the bridge would be responsible for verifying (but this definitely should be called out better in the documentation).

@dtolnay
Copy link
Owner

dtolnay commented Jan 10, 2020

I think @bjorn3 is following the interpretation that when running a Rust program, all possible UB must arise from some unsafe block. This property is violated by cxx, because C++ code is not inside a literal unsafe block. Usually, writing unsafe { extern_fn() } is the place where the proof obligation for the C++ code occurs.

I don't find this to be a practical litmus test nor helpful for upholding the correctness of a codebase in practice. I've called out that all C++ and all build scripts implicitly carry the unsafe you are looking for. To drive this home, consider:

// build.rs

fn main() {
    let f = curl("https://gist.github.com/dtolnay/.../mystery.rs");
    let out_dir = PathBuf::from(env::var("OUT_DIR").unwrap());
    fs::write(out_dir.join("mystery.rs"), f).unwrap();
}
// src/lib.rs

include!(concat!(env!("OUT_DIR"), "/mystery.rs"));

What do we grep for to find the undefined behavior in this project? The solution isn't to make curling unsafe, or fs::write unsafe, or include! unsafe. I would see that as grasping to justify an incorrect litmus test.

Repository owner deleted a comment from Type1J Jan 11, 2020
@0xd34d10cc
Copy link

CXX is designed for C++ functions that are fine to call with any possible value of their arguments

Does CXX codegen automatically prove this? If not, why generated bindings are safe?

@RalfJung
Copy link

The emphasis is that "unsafe is the minimal set of code that you need to audit to discover all sources of undefined behavior.

I would extend this to say "to discover all sources of undefined behavior and the proof obligations to avoid it". And that's where cxx's approach falls short: even if we accept that C++ has an implicitly unsafe block all around it, that tells us nothing about what the actual proof obligation is.

And you are right that build scripts (and most forms of code generation) break this. There's not much we can do about that. Item-level proc-macros sometimes have a similar problem, and the usual approach seems to be to call them something with unsafe in their name -- that would match @bjorn3's proposal of an #[unsafe_assume_no_ub] attribute.

CXX is designed for C++ functions that are fine to call with any possible value of their arguments and this would be something that an audit of the C++ side of the bridge would be responsible for verifying (but this definitely should be called out better in the documentation).

The part about any possible value might be worth calling out more explicitly in the docs, maybe?

Repository owner locked and limited conversation to collaborators Apr 4, 2020
@dtolnay dtolnay closed this as completed Dec 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants