types: use GhostCell in place of mutexes in union-bound and type context #305

apoelstra · 2025-08-31T22:20:50Z

I recently learned about the GhostCell crate, and its associated paper from ICFP 2021. There are a couple of similar projects from that era, notably qcell which predates the published version of ghostcell by a bit, but none of these are used much, as evidenced by the "Dependents" tab on crates.io.

I learned about them from a blog post about "generativity" which has a pattern that makes GhostCells a little more ergonomic-looking at the cost of using macros. (I don't think this is worthwhile and I didn't use them).

But the macro syntax isn't really the point. The key quote from the blog post is this

The fundamental purpose of the generativity pattern is not necessarily to improve performance, but to statically require that individual values come from or refer to the same source. The performance benefits are a symptom of this requirement.

Anyway, GhostCells are extremely powerful and I don't know why they are not more popular. In effect, they allow synchronizing a single zero-sized token, and gating access to an arbitrarily-sized set of otherwise-disconnected objects on them. So if you have a &mut Token you can get mutable access to all of your objects.

This is exactly the situation we have with our type inference context and our union bound algorithm: we have a Context object which has a Mutex controlling access to a slab of allocated bounds, and then we have a whole pile of UbElements which are connected in a forest structure that the Rust compiler doesn't really understand, but where we morally have exclusive access exactly when we have exclusive access to the slab. GhostCell lets us tell the compiler this.

This PR

Eliminates all the mutexes in the type checker except the one in the Context object that holds the slab
Fixes multiple bugs related to inconsistent contexts, including one in the policy module which made it impossible to actually use the asm fragment functionality
Adds an invariant lifetime 'brand which acts as a tag on all Types, ConstructNodes and WitnessNodes which prevent them being combined if they come from different type inference contexts
Requires the user to do node construction inside a closure which is passed to Context::with_context, rather than constructing a context object which then sorta leaks into the ether by being contained in an Arc carried by each node

The latter item sounds like a big ergonomic hit, but in practice it's not a big deal because it doesn't affect CommitNode or RedeemNode.

The use of a lifetime rather than a marker type means that while every type context has a "distinct type" for Type, Arrow, ConstructNode, etc., the distinction is erased after typechecking. So compilation speed and program size will not be adversely affected. But conversely, the compiler error messages are pretty unclear when you use this incorrectly.

uncomputable · 2025-09-01T11:58:42Z

This looks very promising. Are we ready to say that the type inference code is deadlock free, at this point?

apoelstra · 2025-09-01T12:01:50Z

@uncomputable yep!

We are going to change a bunch of stuff about type inference, so add a benchmark for decoding a fixed program that's available. This is not the best benchmark, but it's one that's available. On my system the current results is: 70,972.00 ns/iter (+/- 662.39)

This one is coauthored by Ralf Jung and has a paper (which I read) and a formal proof (which I did not). It would be hard to find a more trustworthy dependency. Also, we have a clear need for this -- currently we have Mutexes all over our type checker code which are both slow and hard to reason about.

This impl was never used and will be impossible to have once we put a GhostToken into the context.

The Context object contains an Arc, which is probably the right design decision since these are supposed to be "handles" to a type inference context which can be cheaply cloned everywhere. Without this there would be a ton of Arc::new and Arc::clone noise throughout the codebase. But we want to make this structure more elaborate, which requires putting more stuff in the Arc, which requires a new struct. So do that in its own commit, to try to reduce the amount of noise in the next one.

Right now our types::Error enum has a couple variants that include bare `Type`s in them which are intended to represent invalid types. It isn't good hygeine to do this, because these Types carry with them information about the type context (which is supposed to be an ephemeral thing created during parsing and destroyed before parsing completes). It also limits our ability to provide useful representation of incomplete types in error messages, because the bare Bound type does not contain any data about why a bound was invalid. The user is also forced to be careful with these invalid objects, especially in the case of an occurs-check failure, when any attempt to iterate on the type will infinite-loop. This commit does the bare minimum to define a new type for use in error messages. In a followup we probably want to clean things up a bit -- at the very least we should try to construct some sort of rich "occurs check failure" information. We may also want to define an IncompleteArrow, which would make the code in human_encoding a bit tidier. The attentive reviewer may notice calls to `lock.drop()` inserted before calls to Incomplete::from_bound_ref. Without these calls, the code deadlocks. There is no compiler help or anything to keep track of this. This kind of bullshit is why I'm so excited to use GhostCell, where the compiler (and borrowck) will do all the work for us. Finally, in the next commit when we add a 'brand lifetime marker to Type, we will be forced to avoid saving this type in our Error enum, because then Type will be unable to outlive its type context (and errors must outlive the context, since the context is ephemeral and the error is something we return to the user).

All this commit does is patch a 'brand lifetime (the name comes from the GhostCell paper) into `types::Context`. In the next commit this lifetime will be associated with a GhostToken but for now it's unused. However, putting an invariant lifetime onto `types::Context` means that we need an invariant lifetime on `Type`, which means one on `ConstructNode` and the `*Constructible` traits, and on any functions which create these things. In fact, this is exactly what we want! All these types and functions are ones that should be tied to a single type-inference context, and by use of GhostCells we will get compile-time assurance that users are not trying to mix types from one program with types from another. As a second benefit, this should get rid of the mutex mess in the typechecker and union-bound code. IOW this is an extremely large commit, but it doesn't do anything at all except add lifetime annotations, and they're all good.

In the new GhostCell paradigm it will be impossible for a Type (and therefore a ConstructNode) to outlive its context. This means that all the methods which currently return a disembodied ConstructNode need to accept a context as a parameter. In effect, this puts "direct constructors" and parsers/decoders of ConstructNodes on even footing: people doing direct construction of programs have always needed to create a context and keep track of it, since the constructors iden(), unit(), witness() and jets all need one. The other constructors simply copy the context Arc from the objects. This design had a couple goals: * Minimizing pain for users by minimizing the number of places they had to pass a context object * Minimizing errors by using "the correct context object" automatically by storing an Arc and cloning it behind the scenes * Allowing users of decode_expression and ConstructNode::from_str to not think about contexts at all. However, I observe that the "minimizing pain" goal is not that valuable since this only applies to ConstructNode (since CommitNode and RedeemNode have complete types in them, and these are much more commonly used than ConstructNode). Furthermore: * The current logic actually makes it impossible to decode two expressions and then combine them, since they'll have different associated type contexts. (I guess nobody has tried this?? They would get a panic.) * The "minimizing errors" goal we can achieve by GhostCell, at compile time and much more thoroughly, and in a way that "feels Rustic" in that it levers the existing borrowchecker.

Very short change, because this function is never called or tested anywhere.

Same as before, quick change, no tests..

Same justification as before. If you are constructing a ConstructNode (or a WitnessNode, which is basically the same thing), you should keep explicit track of your context objects. Also, prior to this commit, if you were converting named nodes to witness nodes for the purpose of combining them, you could not do this because to_witness_node was making a fresh type context for each conversion.

Turns out the asm fragment is basically unusable. Oops. The next commit will fix this, at the cost of some ugly API changes.

This is a bit annoying for users of the Satisfier trait: they need to construct a type inference and keep it around for the satisfaction logic to use. If the satisfier supports assembly this is necessary. If it doesn't support assembly, which is probably the more common case, then there's no need for this ... but because of lack of specialization and a general lack of expressivity in Rust's type system, they gotta do it anyway.

Prior to this commit we were using pointer values and assertions to check that users were using type contexts consistently (i.e. not using different contexts for the same program). This worked reasonably well, but was (a) slow, (b) revealed errors only at runtime, and (c) provided almost no help in guiding the user to using the correct context in the correct places. Instead, we can use a ghost_cell::GhostToken as an identity, which is a special type which can be used to assign a unique invariant "brand" lifetime to every type context. Then everything that holds a context reference, i.e. every type bound and ConstructNode, inherits this brand, and the compiler will ensure that you don't mix brands *and* that you don't fail to mix brands. And it will do it all using the borrowchecker, meaning that your code won't compile if you're wrong, but the markers get erased prior to codegen so that the compilation time and code size aren't meaningfully increased. Furthermore, the use of invariant lifetimes ensures that we catch mistakes in both directions: if the user uses two contexts where they should've used one, that's an error (which our existing assertions will probably catch); *and* if the user uses one context where they should've used two, that's probably also an error. The tradeoff here is that you can no longer construct context objects directly and then pass them around. You need to construct them using Context::with_context, which provides a fresh context object to a closure, and the context won't outlive the closure. (This is how we catch "if you use one context where you should've used two" errors; to make this mistake and still have your code compile, you need to put two program constructions into the same closure, which hopefully will be visibly wrong.) So far, based on my updating of our unit tests and all our code, this limitation doesn't seem to be a big deal. It does require a bunch of reindentation but it hasn't made the code any harder to write or structure -- and where it did force restructuring (e.g. in the Satisfy trait in the previous commit) it was because the original code was actually wrong. Another tradeoff is that when you get stuff wrong the compiler errors are pretty rough. But I think we'll have to live with this, and hopefully future compilers will learn to understand GhostCell and GhostToken constructions better. Definitely want to review this using git show --color-moved-ws=ignore-all-space because the bulk of the diff is just me re-indenting unit tests to wrap them in Context::with_context.

In the next couple commits we will replace the mutexes in the union-bound algorithm with GhostCells. Happily, we can reuse the existing GhostToken in types::Context as an access-control token. However, to do this we need to refactor a couple algorithms so that we can fish the token recursively through unify() and bind(), which naively would require the union-bound algorithm to know about our ContextInner type. To avoid this abstraction violation, we introduce the generic WithContextToken wrapper, so the union-bound algorithm can work with "a token and some arbitrary data" without knowing or understanding anything about the arbitrary data.

Now that we have the dedicated ContextInner type, we don't need a dedicated LockedContext type. It's also conceptually simpler to have a "Context, which mutex-wraps a ContextInner, which implements all the real methods" rather than having this auxiliary "locked context" type which isn't directly related to Context. (And ofc there is WithGhostToken in the middle, but the reader of the code can hopefully mostly ignore this.) This gets rid of a lot of `.inner`s which is nice.

This eliminates all the locking code from union_bound.rs, turning any lifetime mistake into compiler errors rather than deadlocks :). It also lets us simplify a fair bit of code, removing all the drops, all the lock().unwrap()s, and one shallow_clone. There are some borrow()s and borrow_mut()s, but if these compile then they are no-ops, so the reader can basically ignore them. The previous benchmark was: 70,972.00 ns/iter (+/- 662.39) The new one is: 69,947.54 ns/iter (+/- 286.68) which is a ~1.5% improvement, if these numbers are to be believed.

apoelstra · 2025-09-01T21:25:50Z

Went over this in detail in person with @canndrew. He will ACK when he gets to his computer, I think.

apoelstra · 2025-09-01T21:27:19Z

One followup item is that Incomplete::from_bound_ref takes a Context but it should take a LockedContext (or rather, a ContextInner). This is a private method and requires a locked context, so better to require the caller to have already obtained the lock than to have lots of .lock sites. I think.

canndrew · 2025-09-01T21:29:17Z

ACK 060455a

apoelstra

On 060455a successfully ran local tests

apoelstra force-pushed the 2025-08/ghostcell branch 2 times, most recently from 2ddbf3f to 0a2341b Compare August 31, 2025 22:47

apoelstra force-pushed the 2025-08/ghostcell branch from 0a2341b to defcc2c Compare September 1, 2025 19:33

apoelstra added 6 commits September 1, 2025 19:39

types: remove Default from Context

e2d1c4c

This impl was never used and will be impossible to have once we put a GhostToken into the context.

apoelstra force-pushed the 2025-08/ghostcell branch 2 times, most recently from 7fe6588 to 3999983 Compare September 1, 2025 19:45

apoelstra added 4 commits September 1, 2025 19:47

node: take context object as parameter to CommitNode::unfinalize_types

a534de4

Very short change, because this function is never called or tested anywhere.

node: add inference prameter to RedeemNode::to_construct_node

a5aa6a2

Same as before, quick change, no tests..

apoelstra force-pushed the 2025-08/ghostcell branch from 3999983 to 5a54d86 Compare September 1, 2025 19:47

apoelstra added 6 commits September 1, 2025 20:41

policy: add (broken, ignored) unit test showing a problem with asm

e18ffc0

Turns out the asm fragment is basically unusable. Oops. The next commit will fix this, at the cost of some ugly API changes.

apoelstra force-pushed the 2025-08/ghostcell branch from 5a54d86 to 060455a Compare September 1, 2025 20:41

apoelstra commented Sep 1, 2025

View reviewed changes

apoelstra merged commit 6f245ec into BlockstreamResearch:master Sep 1, 2025
24 checks passed

apoelstra deleted the 2025-08/ghostcell branch September 1, 2025 22:53

apoelstra mentioned this pull request Sep 12, 2025

refactors to prepare for the new rust-simplicity version BlockstreamResearch/SimplicityHL#143

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

types: use GhostCell in place of mutexes in union-bound and type context #305

types: use GhostCell in place of mutexes in union-bound and type context #305

Uh oh!

apoelstra commented Aug 31, 2025 •

edited

Loading

Uh oh!

uncomputable commented Sep 1, 2025

Uh oh!

apoelstra commented Sep 1, 2025

Uh oh!

apoelstra commented Sep 1, 2025

Uh oh!

apoelstra commented Sep 1, 2025

Uh oh!

canndrew commented Sep 1, 2025

Uh oh!

apoelstra left a comment

Uh oh!

Uh oh!

Uh oh!

types: use GhostCell in place of mutexes in union-bound and type context #305

types: use GhostCell in place of mutexes in union-bound and type context #305

Uh oh!

Conversation

apoelstra commented Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

uncomputable commented Sep 1, 2025

Uh oh!

apoelstra commented Sep 1, 2025

Uh oh!

apoelstra commented Sep 1, 2025

Uh oh!

apoelstra commented Sep 1, 2025

Uh oh!

canndrew commented Sep 1, 2025

Uh oh!

apoelstra left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

apoelstra commented Aug 31, 2025 •

edited

Loading