New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring VcGenerator #501

Closed
DavePearce opened this Issue Jul 8, 2015 · 9 comments

Comments

Projects
None yet
1 participant
@DavePearce
Member

DavePearce commented Jul 8, 2015

The class wyil.builder.VcGenerator needs significant refactoring. It is very hard to understand and, hence, maintain. Furthermore, some parts are strange:

  1. Generating Verification Conditions. These are ultimately generated via the buildVerificationCondition function. But, this method is either called directly when needed, or is called indirectly by returning a failed branch. Why have two ways?
  2. Why type environment not in VcBranch?. The type environment is not part of VcBranch, which seems odd.

@DavePearce DavePearce added this to the ChangeList Pre v0.4.0 milestone Jul 8, 2015

DavePearce added a commit that referenced this issue Jul 8, 2015

WyIL: Possible bug fix for #500.
This is approach 1 as outlined in the ticket.  It might be working, but
eitherway it's more of a hack.  Really need to refactor the VcGenerator
as per #501.
@DavePearce

This comment has been minimized.

Show comment
Hide comment
@DavePearce

DavePearce Mar 12, 2016

Member

The real question is: how should verification condition generation work?

Here are the main components:

  1. buildMacroBlock(). This is used to reduce a precondition, postcondition, invariant or assertion to a macro which can be called. To do this, it explores non-failing paths of the block and its sub-blocks. Furthermore, it does not emit verification conditions.
  2. buildVerificationCondition(). This is used during general path traversal to generate an assertion to check some property is true. For example, upon reaching a loop we will check the loop invariant on entry, etc. The generated condition is of the form assumptions ==> condition, where the assumptions are those constraints built up along the given path(s).
  3. General path traversal. This is a traversal of the execution paths through the body of a function or method. This will generate verifications as it encounters them, and build macros for the relevant components (e.g. preconditions, postconditions, etc). One of the key challenges here is to compactly generation verification conditions. In particular, by joining paths together etc.
  4. Hoare logic. This is the application of simple rules to individual bytecodes to update the set of assumptions associated with a given path.

What else is there?

Member

DavePearce commented Mar 12, 2016

The real question is: how should verification condition generation work?

Here are the main components:

  1. buildMacroBlock(). This is used to reduce a precondition, postcondition, invariant or assertion to a macro which can be called. To do this, it explores non-failing paths of the block and its sub-blocks. Furthermore, it does not emit verification conditions.
  2. buildVerificationCondition(). This is used during general path traversal to generate an assertion to check some property is true. For example, upon reaching a loop we will check the loop invariant on entry, etc. The generated condition is of the form assumptions ==> condition, where the assumptions are those constraints built up along the given path(s).
  3. General path traversal. This is a traversal of the execution paths through the body of a function or method. This will generate verifications as it encounters them, and build macros for the relevant components (e.g. preconditions, postconditions, etc). One of the key challenges here is to compactly generation verification conditions. In particular, by joining paths together etc.
  4. Hoare logic. This is the application of simple rules to individual bytecodes to update the set of assumptions associated with a given path.

What else is there?

@DavePearce

This comment has been minimized.

Show comment
Hide comment
@DavePearce

DavePearce Mar 12, 2016

Member

What is making the VcGenerator more complicated that it needs to be?

  1. Loop Invariants. Currently these are awkward because they can appear at arbitrary bytecode locations. In practice. loop invariants are currently only ever used at the beginning (before the condition) and at the end (before the condition).
  2. Path Traversal. The mechanism for dealing with different paths or "branches" as they are referred to is rather complicated. In particular, supporting unstructured control flow is awkward and joining paths non-optimal.
  3. Flow Typing. The need to try and support flow typing is seriously awkward, and didn't even work.
  4. Block Navigation. The way blocks were previously navigated was not helping either.
Member

DavePearce commented Mar 12, 2016

What is making the VcGenerator more complicated that it needs to be?

  1. Loop Invariants. Currently these are awkward because they can appear at arbitrary bytecode locations. In practice. loop invariants are currently only ever used at the beginning (before the condition) and at the end (before the condition).
  2. Path Traversal. The mechanism for dealing with different paths or "branches" as they are referred to is rather complicated. In particular, supporting unstructured control flow is awkward and joining paths non-optimal.
  3. Flow Typing. The need to try and support flow typing is seriously awkward, and didn't even work.
  4. Block Navigation. The way blocks were previously navigated was not helping either.
@DavePearce

This comment has been minimized.

Show comment
Hide comment
@DavePearce

DavePearce Mar 12, 2016

Member

Question: Can we eliminate unstructured control flow from WyIL altogether?

This would significantly simplify the VCG algorithm. Overall, I think it is possible. But, it requires including some kind of break statement to break from a loop. It also is awkward for handling if statements. For example:

if x && y:
   ... // true branch
else:
   ... // false branch

This translates into:

if x:
   if y:
      ... // true branch
   else:
      ... // false branch
 else:
   ... // false branch

And there is some obvious code duplication here. That said, we could actually eliminate it by giving the false branches the same block identifier. Then, it rests on the back-ends to notice this and efficiently implement it without duplication. Yes, that would work but it would lead to duplication in generated verification conditions!

Member

DavePearce commented Mar 12, 2016

Question: Can we eliminate unstructured control flow from WyIL altogether?

This would significantly simplify the VCG algorithm. Overall, I think it is possible. But, it requires including some kind of break statement to break from a loop. It also is awkward for handling if statements. For example:

if x && y:
   ... // true branch
else:
   ... // false branch

This translates into:

if x:
   if y:
      ... // true branch
   else:
      ... // false branch
 else:
   ... // false branch

And there is some obvious code duplication here. That said, we could actually eliminate it by giving the false branches the same block identifier. Then, it rests on the back-ends to notice this and efficiently implement it without duplication. Yes, that would work but it would lead to duplication in generated verification conditions!

@DavePearce

This comment has been minimized.

Show comment
Hide comment
@DavePearce

DavePearce Mar 13, 2016

Member

Idea. One general approach which appears to be emerging is to decouple plan generation from the mechanical process of turning bytecodes into WyCS expressions.

So, we generate a plan for each verification condition and then flatten it into a WyCS assertion separately. The plan embodies just the control-flow rather than anything else.

Member

DavePearce commented Mar 13, 2016

Idea. One general approach which appears to be emerging is to decouple plan generation from the mechanical process of turning bytecodes into WyCS expressions.

So, we generate a plan for each verification condition and then flatten it into a WyCS assertion separately. The plan embodies just the control-flow rather than anything else.

@DavePearce

This comment has been minimized.

Show comment
Hide comment
@DavePearce

DavePearce Mar 13, 2016

Member

Plan generation

  1. Generate path-graph representation for each verification condition / macro
  2. Flatten path-graphs into conjuncts / disjuncts.

Example

function max(int x, int y) -> (int r)
ensures ...:
    //
    int r = x    // 1
    if y >= x:   // 2
        r = y    // 3
    return r   // 4

This should generate one verification condition for the return statement. The path-graph should be:

1
|
2
| \
|   3
| /
4

This can then be flattened into this plan:

1 && ((2 && 3) || (!2)) && 4

Isn't this essentially just the same as turning the graph into structured form?

Member

DavePearce commented Mar 13, 2016

Plan generation

  1. Generate path-graph representation for each verification condition / macro
  2. Flatten path-graphs into conjuncts / disjuncts.

Example

function max(int x, int y) -> (int r)
ensures ...:
    //
    int r = x    // 1
    if y >= x:   // 2
        r = y    // 3
    return r   // 4

This should generate one verification condition for the return statement. The path-graph should be:

1
|
2
| \
|   3
| /
4

This can then be flattened into this plan:

1 && ((2 && 3) || (!2)) && 4

Isn't this essentially just the same as turning the graph into structured form?

@DavePearce

This comment has been minimized.

Show comment
Hide comment
@DavePearce

DavePearce Mar 13, 2016

Member

Right, this is starting to sound like a Guarded Command Language. Since the other major tools also employ such a language as an intermediate step, I'm starting to feel that this makes sense.

Guarded Command Language (WyGCL)

This is an intermediate step between WyIL and WyAL. The purpose is to eliminate unstructured control-flow and to dramatically simplify the process of converting WyGCL into WyAL.

Example

function indexOf(int[] xs, int x) -> (int r)
requires |xs| > 0
ensures r >= 0 ==> xs[r] == x
ensures r < 0 ==> no { k in 0..|xs| | xs[k] == x }:
   //
   int i = 0
   while i < |xs| where i >= 0 && all { k in 0..i | xs[k] != x }:
      if xs[i] == x:
          return i
      i = i + 1
    //
    return -1

This is then turned into the following WyGCL:

macro precondition_1(int[] xs, int x):
   |xs| > 0
macro postcondition_1(int[] xs, int x, int r):
   i >= 0 ==> xs[r] == x
macro postcondition_2(int[] xs, int x, int r):
   i < 0 ==> no { k in 0..|xs| | xs[k] == x }: 
macro loop_invariant(int[] xs, int x, int i):
   i >= 0 && all { k in 0..i | xs[k] != x }
function indexOf(int[] xs, int x) -> (int r):
    //
    assume precondition_1(xs,x)
    //
    int i = 0
    //
    // loop entry must hold on entry
    assert loop_invariant(xs,x,i)
    //
    havoc xs, i // modified variables
    //
    loop:
        assume loop_invariant(xs,x,i)
        if:
            assume i >= |xs|
            break
        else:
            assume i < |xs|
            assert i >= 0 && i < |xs| // precondition for array access
            if:
               assume xs[i] == x
               assert postcondition_1(xs,x,i)
               assert postcondition_2(xs,x,i)
               return
            else:
               assume xs[i] != x
               i = i + 1
               assert loop_invariant(xs,x,i)
    //
    // loop invariant holds on exit
    assume i >= |xs| && loop_invariant(xs,x,i)
    //
    assert postcondition_1(xs,x,-1)
    assert postcondition_2(xs,x,-1)
    return

The advantage of this is that it cuts out a lot of issues, in particular by treating various things (e.g. preconditions, postconditions, loop invariants) uniformly as assertions / assumptions.

Member

DavePearce commented Mar 13, 2016

Right, this is starting to sound like a Guarded Command Language. Since the other major tools also employ such a language as an intermediate step, I'm starting to feel that this makes sense.

Guarded Command Language (WyGCL)

This is an intermediate step between WyIL and WyAL. The purpose is to eliminate unstructured control-flow and to dramatically simplify the process of converting WyGCL into WyAL.

Example

function indexOf(int[] xs, int x) -> (int r)
requires |xs| > 0
ensures r >= 0 ==> xs[r] == x
ensures r < 0 ==> no { k in 0..|xs| | xs[k] == x }:
   //
   int i = 0
   while i < |xs| where i >= 0 && all { k in 0..i | xs[k] != x }:
      if xs[i] == x:
          return i
      i = i + 1
    //
    return -1

This is then turned into the following WyGCL:

macro precondition_1(int[] xs, int x):
   |xs| > 0
macro postcondition_1(int[] xs, int x, int r):
   i >= 0 ==> xs[r] == x
macro postcondition_2(int[] xs, int x, int r):
   i < 0 ==> no { k in 0..|xs| | xs[k] == x }: 
macro loop_invariant(int[] xs, int x, int i):
   i >= 0 && all { k in 0..i | xs[k] != x }
function indexOf(int[] xs, int x) -> (int r):
    //
    assume precondition_1(xs,x)
    //
    int i = 0
    //
    // loop entry must hold on entry
    assert loop_invariant(xs,x,i)
    //
    havoc xs, i // modified variables
    //
    loop:
        assume loop_invariant(xs,x,i)
        if:
            assume i >= |xs|
            break
        else:
            assume i < |xs|
            assert i >= 0 && i < |xs| // precondition for array access
            if:
               assume xs[i] == x
               assert postcondition_1(xs,x,i)
               assert postcondition_2(xs,x,i)
               return
            else:
               assume xs[i] != x
               i = i + 1
               assert loop_invariant(xs,x,i)
    //
    // loop invariant holds on exit
    assume i >= |xs| && loop_invariant(xs,x,i)
    //
    assert postcondition_1(xs,x,-1)
    assert postcondition_2(xs,x,-1)
    return

The advantage of this is that it cuts out a lot of issues, in particular by treating various things (e.g. preconditions, postconditions, loop invariants) uniformly as assertions / assumptions.

@DavePearce

This comment has been minimized.

Show comment
Hide comment
@DavePearce

DavePearce Mar 13, 2016

Member

The real question is: what are the main advantages?

  1. Structured Control-Flow. Obviously, having structured control-flow makes the process quite a lot easier. However, if WyIL also has structured control-flow then the benefits are less.
  2. Explicit Assumptions/Assertions. All assumptions and assertions are now explicit. This means they have been dug out of expressions, preconditions and/or postconditions, etc.
  3. Loop Invariants. Loop invariants in the above have been essentially eliminated. Whether or not this is the best way to do it remains to be seen.
  4. Explicit Macros. The macros have already been flattened. This helps to separate the process of traversing the function body to generation verification conditions, from the process of flattening an invariant into a macro. At the moment, these two things are intertwined.
  5. Havocing is explicit. The havoc statement is helpful for clearly identifying this part of the process
  6. Temporary Registers. These have been eliminated in generating the WyGCL, meaning that this part of the process is separated out into the first stage.
  7. Debugging. Obviously, having this intermediate form is useful for debugging.

What else?

Is that enough?

Can we generate WyGCL from unstructured control flow?

Member

DavePearce commented Mar 13, 2016

The real question is: what are the main advantages?

  1. Structured Control-Flow. Obviously, having structured control-flow makes the process quite a lot easier. However, if WyIL also has structured control-flow then the benefits are less.
  2. Explicit Assumptions/Assertions. All assumptions and assertions are now explicit. This means they have been dug out of expressions, preconditions and/or postconditions, etc.
  3. Loop Invariants. Loop invariants in the above have been essentially eliminated. Whether or not this is the best way to do it remains to be seen.
  4. Explicit Macros. The macros have already been flattened. This helps to separate the process of traversing the function body to generation verification conditions, from the process of flattening an invariant into a macro. At the moment, these two things are intertwined.
  5. Havocing is explicit. The havoc statement is helpful for clearly identifying this part of the process
  6. Temporary Registers. These have been eliminated in generating the WyGCL, meaning that this part of the process is separated out into the first stage.
  7. Debugging. Obviously, having this intermediate form is useful for debugging.

What else?

Is that enough?

Can we generate WyGCL from unstructured control flow?

@DavePearce

This comment has been minimized.

Show comment
Hide comment
@DavePearce

DavePearce Mar 13, 2016

Member

Some other possibilities:

  1. Heap modelling. WyGCL could be helpful in modelling the heap, where methods are modeled as accepted an array of any to model the heap and where references are indices into that array.

And some questions:

  1. Loop. An interesting question is whether we want to represent loops in a different way. This could be done, for example, by including a specific notion of invariant. A loop then corresponds explicitly to an induction, rather than in the ad-hoc fashion done above. This would need special handling of modified operands, which would rather negate the value of the havoc variable.
Member

DavePearce commented Mar 13, 2016

Some other possibilities:

  1. Heap modelling. WyGCL could be helpful in modelling the heap, where methods are modeled as accepted an array of any to model the heap and where references are indices into that array.

And some questions:

  1. Loop. An interesting question is whether we want to represent loops in a different way. This could be done, for example, by including a specific notion of invariant. A loop then corresponds explicitly to an induction, rather than in the ad-hoc fashion done above. This would need special handling of modified operands, which would rather negate the value of the havoc variable.
@DavePearce

This comment has been minimized.

Show comment
Hide comment
@DavePearce

DavePearce Mar 15, 2016

Member

Another angle on this is that we don't necessarily need to generate WyGCL from unstructured WyIL. Instead, we could implement a transformation from unstructured WyIL to structured WyIL. This is something I already want to do for e.g. the JavaScript backend, etc.

Member

DavePearce commented Mar 15, 2016

Another angle on this is that we don't necessarily need to generate WyGCL from unstructured WyIL. Instead, we could implement a transformation from unstructured WyIL to structured WyIL. This is something I already want to do for e.g. the JavaScript backend, etc.

DavePearce added a commit that referenced this issue Mar 23, 2016

WyIL: Update VcGenerator
This updates the VcGenerator to compile again and pass tests.
Unfortunately, it's looking something of a mess now and is in dire need
of some serious refactoring.  I have thought about how to do this, and
report my findings in #501

DavePearce added a commit that referenced this issue Mar 27, 2016

WyIL: Update VcGenerator
This updates the VcGenerator to compile again and pass tests.
Unfortunately, it's looking something of a mess now and is in dire need
of some serious refactoring.  I have thought about how to do this, and
report my findings in #501

@DavePearce DavePearce closed this Aug 7, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment