Convert Bytecode into Structured Form #620

Closed
DavePearce opened this Issue Apr 18, 2016 · 3 comments

Projects

None yet

1 participant

@DavePearce
Member
DavePearce commented Apr 18, 2016 edited

The WyIL bytecode has traditionally used unstructured control flow which gives a lot of flexibility. However, the reality is that using a structured notion of control-flow would make life easier:

  • Verification Condition Generation. The problem of generating verification conditions is actually surprisingly tricky. The VCGenerator is currently a complete mess because of the need to support unstructured control flow. See #501
  • Source Code Transliteration. For backends which target source-level programming languages (e.g. JavaScript, C, etc), having unstructured control-flow is a real pain.

The plan is to exploit nested bytecode blocks to implement a more traditional structured form of bytecode. The rough sequence is something like this:

  1. Compound. Update Bytecode.Compound to support multiple blocks. This will also be useful for loop invariants! (DONE)
  2. If. Make if bytecode take two optional blocks.
  3. Goto. Eliminate goto bytecode and replace with break instead.
  4. Return. Replace Return bytecode with a break bytecode instead. Return values are then specified as assignments.
  5. If / Ifis. Replace if and ifis with switch and match

This refactoring follows on logically from #502 and is part of #563.

@DavePearce DavePearce self-assigned this Apr 18, 2016
@DavePearce DavePearce added this to the ChangeList Pre v0.4.0 milestone Apr 18, 2016
@DavePearce
Member
DavePearce commented Apr 26, 2016 edited

Question: why not simply encode a full Abstract Syntax Tree for a cut-down version of Whiley?

This is not crazy, but there are some possible answers:

  1. Registers. We need each intermediate value to have a register name and type. This is essential for efficient compilation onto embedded systems.
  2. Simplicity. We want fewer constructs in the bytecode, compared with the source-level AST.
  3. Uniformity. Bytecode does not distinguish statements from expressions, making it a more uniform representation.
  4. Atomicity. Bytecodes are atomic entities which can be manipulated, unlike AST nodes which cannot easily be compared or manipulated (e.g. if they contain annotations, etc).
  5. Typing. The concept of flow typing should not be present in the underlying bytecode (though it currently is). In place, we have type tests and explicit coercions.

Overall, (2) seems a bit questionable because, in the end, the number of constructs is not that much smaller. The main differences are:

  1. Loops are represented in a single uniform fashion, rather than with different syntax for each loop kind. Even then, we could still include an indicator as the loop form.
  2. Quantifiers are broken down into a loop-like form.
  3. Logical Connectors. Logical AND and OR are implemented using sequences of nested if statements (as necessary for short-circuiting).
  4. Variable Declarations. These are just regular assignment statements.

But, that's it, right?

@DavePearce
Member
DavePearce commented Apr 26, 2016 edited

Some problems encountered in the translation so far:

  1. Type Tests. These are problematic as, without a goto, there is no place to do the physical retyping. Currently, they are translated roughly like this:

    if x is int && x < 0:
    ...
    

    becomes:

    r1 = x is int
    r3 = false
    if r1:
    r2 = x < 0
    if r2:
     r3 = true
    if r3:
    ...
    

    Here, the effects of x is int are diminished. Of course, can just insert casts as necessa ry here, but it raises the question as to where they should go exactly.

  2. If Statements. Translation of if statements. These seem to often degenerate into sequences which load either true or false into some target register. This is caused by logical connectors, but also quantifiers and other operations (e.g. type tests).
    It's possible that a better translation of if statements would lead to more succinct bytecode. In particular, the handling of logical connectors seems questionable.

@DavePearce DavePearce closed this Jun 10, 2016
@DavePearce
Member
DavePearce commented Jul 28, 2016 edited

This finally landed today in develop, just for the record. It's been a tough ride by I think this represents an important step in the evolution of Whiley.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment