Skip to content

Conversation

@apoelstra
Copy link
Collaborator

Currently our transaction environment construction is quite complicated, and (as a consequence) unsound in a couple of places. This was implemented in PRs #49 through #52 in the early days of this crate with very little review, using a strategy copied from Haskell but not really appropriate for Rust.

In particular:

  • We do not need custom marshalling functions in Rust because we can directly define repr(C) structs
  • ...in doing so, we may use Rust references in place of raw pointers, as long as the C code upholds the Rust aliasing rules (which it does, at least for these read-only POD types); in fact we may use optional references in place of nullable pointers
  • ...and we can use arrays and pointers to arrays in place of pointers to primitive types which are supposed to point to a specific number of elements
  • ...but we must make sure that any data we produce a pointer or reference to outlives the pointer/reference!

There was at least one instance where we were creating a temporary variable and extracting a pointer from it, which would then outlive the object it pointed into. (This occurred for the annex, though in dead code, and for the pegin genesis hash, in live code that has never been used.)

As an aside, we used empty enums in a couple places for opaque types. We should not do this, according to the nomicon.

Copy link
Collaborator Author

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On ffad4a0 successfully ran local tests

Copy link
Collaborator Author

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On 6678851 successfully ran local tests

@apoelstra
Copy link
Collaborator Author

cc @canndrew with this one and #308 I think we can cut a new release (or maybe we can go all the way to bitcoin support, we'll see how things are progressing on libsimplicity)

...and remove Elements and _elements from the names (where they occured).
These don't refer to anything
I can't find any citation that this is safe to do. We don't actually use
this structure in our Rust code; we populate it by calling a c_set function
which is defined on the C side, and then pass it to C functions. So no code
needs to be changed.

However, I want to remove the c_set_* functions because they hide life
information (and in fact, are unsound, as we currently use them) and serve
to obfuscate code. They seem to have come from BlockstreamResearch#48 BlockstreamResearch#49 BlockstreamResearch#50 BlockstreamResearch#51 in which
Sanket seems to have copied the corresponding Haskell code into Rust, where
it is unnecessary and dangerous.
…erences

This adds some lifetimes to our raw C structures, which in the next commits
will be used to enforce that we are not giving the C code pointers to local
variables or other unsound things. (In fact, our existing code does do this.)
As a starting point, replace a few "easy" FFI fields with stronger types.

We do not change the pointers to value types, which are listed in Rust as
"NULL, or a pointer to 9 bytes, or to 33" because it was not clear that
there was any better representation for this than *const u8.

Through the C FFI we have structs which have listed invariants of the form
"either this is NULL or it points to an array of 33 bytes". We further have
a guarantee (because we are maintainers of both the C and Rust code, and we
are responsible citizens) that these const pointers will never be mutated
through by the C code.

The rest of this comment is to justify this change:

According to the nomicon, we may use references in FFI types as long as we
promise they won't be NULL and that we won't violate the aliasing rules
(which for const pointers, simply means we won't cast them to non-const
pointers):

https://doc.rust-lang.org/nomicon/ffi.html#interoperability-with-foreign-code

The repr(C) docs further say that if our pointers might be NULL, we can
use Option<&T>:

https://doc.rust-lang.org/nomicon/other-reprs.html#reprc

Finally, where the C code takes a pointer to a u8 with an implicit claim
that this is actually a pointer to 33 contiguous u8s, we may use pointers
or references to Rust arrays, which is permissible because:

* all thin pointers have the same size and representation, justified by
  the Rust reference https://doc.rust-lang.org/reference/expressions/operator-expr.html#r-expr.as.pointer.sized
* arrays are guaranteed to be contiguous in memory in both C and Rust
(The following commits will delete the rest of these.) The role of these
functions is to marshall Rust data into C structures. This is implemented
in env.c, on the C side of the FFI boundary, and use raw pointers and
out-pointers. None of this is necessary because Rust has a well-defined
FFI interface with C.

Furthermore, it serves to obfuscate the lifetimes and provenance of the
various pointers at play.

We can remove this function, replace its call sites with direct construction
of the Rust version of the C structure, and replace raw pointers (where
possible) with references that have named lifetimes. Then our Rust code
is formally safe and the compiler will give us much more help.
The existing code creates a temporary `PeginData` structure, then
stores a pointer to its `genesis_hash` field in a struct that we
pass to the C code. This is unsound.

In general, we lack transaction environment tests where we construct
different environments then run Simplicity code that exercises them.
The nomicon doesn't like this. From the bottom of

https://doc.rust-lang.org/nomicon/ffi.html

"Notice that it is a really bad idea to use an empty enum as FFI type. The
compiler relies on empty enums being uninhabited, so handling values of
type &Empty is a huge footgun and can lead to buggy program behavior (by
triggering undefined behavior)."

They give an example that uses a phantom to also remove Send, Sync, and
Unpin. I did not do this. I think all these traits are fine, because the
type has no self-references and is never mutated by the C code except
when it is freed, and then only through a `*mut` pointer.
These structures are private data types used ephemerally by
src/jet/elements/c_env.rs. They should not be public and certainly
should not be exposed in simplicity-sys.
@apoelstra
Copy link
Collaborator Author

Rebased.

Copy link
Collaborator Author

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On 405487e successfully ran local tests

#[derive(Debug)]
pub struct RawOutputData {
pub asset: Vec<c_uchar>,
pub asset: Option<[c_uchar; 33]>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: In some places you've replaced c_uchar with u8 but not others. Using u8 consistently seems better (but I can change that in another commit).

@canndrew
Copy link
Collaborator

ACK 405487e

@apoelstra apoelstra merged commit f2f53a1 into BlockstreamResearch:master Sep 16, 2025
24 checks passed
@apoelstra apoelstra deleted the 2025-09/ffi-cleanup branch September 16, 2025 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants