Skip to content

Elements of Low* v3

Aseem Rastogi edited this page Apr 15, 2019 · 3 revisions

The first version of Low* was based heavily around FStar.Buffer. We got pretty far with it---a big part of JK's thesis was based on it. So that was already quite successful. But, it had many shortcomings that we wanted to improve.

The second version of Low* was focused on improving the libraries that provided the memory model, monotonicity, etc., including HyperStack, the modifies clauses, locs, and a new buffer library. This has been a substantial improvement over the previous version, both EverParse and EverCrypt are based on it and, of course, Vale v2.

Low* v2, with some incremental improvements, ought to be sufficient for us to get miTLS done. It'll take some doing, but many of us have become accustomed to its peculiarities and we know how to steer clear of pitfalls.

However, there are many things that are still much more difficult than they should be in Low*/F*. And if we are to continue to scale up and get more people to write more verified low-level code, we need to invest in substantial improvements to Low* and F*. This time, I don't think that incremental improvements to Low* libraries alone will get us there.

At a high-level, I see the design spirit of Low* as "Dafny/Vcc as a library within F*". Now, I don't think we'll ever have a fully seamless embedding of some other independently designed, special-purpose verification language as a library within F*, nor should that be our success metric. Besides, we also aim to provide other features not all supported by those languages (extraction to C, low-level memory model, higher order, proof by normalization, tactics, type inference, ... ). But, I still see it as a useful design principle from which will intentionally deviate. We've come quite a ways towards providing an embedded, low-level imperative language as a library within F*, but there's still a long way to go.

Here are some elements of the design space of Low* v3 that have come up in many discussions over the past years. I'm probably forgetting many things, but here are the main ingredients I have in mind for a Low* v3.

  1. Catalog a small but through collection of representative micro-benchmarks that we aim to write and verify with minimal fuss. For instance:

    • writing to several references, and framing reads across those writes.

    • allocating on the heap and proving that the fresh locations do not interfere with existing ones

    • deallocating memory and preserving invariants on disjoint memory locations

    • reading and writing with sub-buffers

    • ...

    Ideally, while containing micro-benchmarks, this catalog should have be naturally "scaleable", e.g., by just increasing the number of writes, or by increasing the number of memory locations in play, ...

  2. Some "real" code that we want to write or port

    • AEAD?
    • PQ crypto?
    • Bits of QuicF*?
    • Revised libraries like doubly linked lists?
  3. Better modeling of the stack

    Low* models mutable stack-allocated data extremely conservatively. Every mutable local variable is modeled as a pointer into a region of the hyperstack---effectively, we have turned well-understood and easily scalable reasoning about mutable local variables into the much thornier problem about reasoning about heap-allocated, aliasable pointers.

    For variable-length stack allocations, or stack slots that we want to pass by reference to callees, this level of generality/complexity may be unavoidable. But, we shouldn't pay this cost for inoccuous mutable local variables.

    One idea for how to better model the stack is to switch to a memory model that consists of two distinct compartments:

    1. An HS.mem, which contains heap-allocated data and alloca'd stack pointers.

    2. A local_stack_frame: A record {x1:t1; ...; xn:tn} of n-fields, disjoint by construction, each holding a mutable t-typed value.

    For example, one idea I would like to explore is the following:

    An indexed effect for Low* computations:

       ST (result_type:Type)
          (locals: Type)
          (pre: HS.mem * locals -> prop)
          (post: HS.mem * locals -> result_type -> HS.mem * locals -> prop)
    

    A way to create a scoped stack frame:

    with_locals ({x=0ul; y=false})
                (body : unit -> ST result_type {x:u32; y:bool}
                                (requires (fun (_, locals) ->
                                          locals.x = 0ul /\
                                          locals.y = false))
                                (ensures ... ))
    

    Which logically:

    • optionally pushes a new stack frame (for any alloca's in body)
    • sets the locals compartment to the values provided
    • runs the body computation in the extended state
    • pops the stack frame it pushed, if any
  4. Concrete syntax

    The kind of modeling that I propose above may lead to better verification conditions, but without some reasonable concrete syntax, that will be unusable.

    So far, we have been hesitant to add Low*-specific syntax to F*. But, given that we are writing and plan to write lots more Low* code, it may be time to add a more palatable syntax for Low*.

    E.g., Rather than writing that with_locals ... mouthful by hand, we could write something like what follows and desugar to the core syntax above. And, if we open the concrete syntax box, there's a lot more we might consider, e.g., syntax for loops.

  {
    var x = 0ul;
    var y = false;
    ...
    body
  }
  1. A revised effect hierarchy for Low*

    Also related to modeling the stack is our current use of effects like Stack and push_frame/pop_frame. Related predicates like equal_stack_domains are used to enforce that pushes and pops are balanced. And equal_domains is a predicate to ensure that code does not heap allocate.

    Rather than trying to encode these syntactic properties within the verification condition, it would be better if we could enforce it by syntactic means.

    For example, the equal_stack_domains condition could simply be removed if we had a concrete syntax that enforced {}-matching.

    Enforcing that code does not leak via heap allocations could be done by factoring allocation into a separate effect.

     sub_effect STACK ~> HEAP : ...
     val malloc : ... -> HEAP ..
     val salloc : ... -> STACK ...

If we wanted to, we could model both within an effect hierarchy that looking something like this:

   Well-formed Low* computations:
      HEAP: the effect of computations that may heap allocate and stack allocate in their own stack frames
      STACK: the effect of computations that only stack allocate in their own stack frames

   Low* computations that expect an enclosing stack frame in which to allocate:
      STACK_INLINE: Only stack allocate, but require an enclosing frame
      HEAP_INLINE: Require an enclosing frame for stack allocation, and may also heap allocate

      These are made well-formed by wrapping with a `with_frame`

   Here's how these four effects relate:

       HEAP  ---> HEAP_INLINE
       ^ ^             |  ^
      /  |__with_frame_|  |
     /                    |
    /                     |
  STACK  --->  STACK_INLINE
    ^              |
    |___with_frame_|
  1. Revised Low*/F* libraries, including HyperStack, Buffer, Sequence, Modifies, machine integer, ...

    • All with carefully audited SMT patterns, designed with our catalog of target programs in mind.

    • Ironing out various kinks, e.g., buffer preorders

    • Const pointers

    • [Aseem]: Some specific points of re-design: should we make HyperStack more abstract, rather than a transparent layer over Heaps? Or can we design it so that for the common cases that we want to automate, we don't have to use Heap lemmas, while for other cases, we can conditionally expose them? For buffers, can we design it so that it is only parametric on the underlying reference preorder and then expose the base length and base offset to frame the appropriate precondition for writing?

  2. Heap: More abstract modeling of the heap

    The work on LensST tries to provide a more abstract view of a structurally separate heap memory.

    It would be nice to continue this line of work and integrate it more comprehensively with Low*.

  3. Totality checks for Low*

    Low* code currently only supports partial correctness proofs (the Stack effect include Div).

    It would be nice to carve out a total sub-effect of Low* computations.

    Likewise, we have discussed having a sub-effect of Stack that lacks monotonicity and can be reified and reused for spec, used for relational proofs, etc.

  4. A Low* suite of tactics?

    So far, we produce a VC for Low* programs and send them to the SMT solver using F*'s default workflow.

    Perhaps we should have several Low*-specific tactics to interpose in this process.

    For example, a VC splitting tactic that, say, splits out all proof obligations about modifies into a separate VC may be a good investment.

    [Aseem]: Going a step further, the tactics can also try to prove the modifies obligations, without SMT, by applying appropriate lemmas. In the latest iteration of modifies lemmas, we have been trying to control how SMT arrives at the proof, and this should be scriptable (at least for the common cases where we aim to provide good automation).

    Likewise for liveness and disjointness reasoning.

  5. More data structures for Low*

    • Structs with mutable fields: These have been supported by Low*/Kremlin for a while now, where mutations of the contents of a pointer {x:t1; y:t2} is extracted to a field update by KreMLin. It would be nice to evaluate this style, make it more well known, perhaps provide better concrete syntax for it.

    • Flatly allocated arrays within structs: Another common request, this could similarly be supported by KreMLin using something like the above style.

      pointer {x:seq 10 t1; y:t2}

      Could be extracted by KreMLin to {t1 x[10]; t2 y}* with suitable abstraction in place at the source level to prevent reading out the entire sequence as a value.

  6. Performance profiling and optimizations of the F* compiler

  7. Interop or migration of code from Low* v2 to v3

    • Could v3 simply be implemented as a layer on top of v2?

    • Should we just axiomatize their interaction as we migrate code from one to the other?

  8. Finer-grained interactions with F* programs

    • "Democratizing scopes" to allow decomposing large proofs in scope
Clone this wiki locally