Introduce control flow verification. #213

dvander · 2018-05-24T06:21:59Z

This is something I've wanted to do for years, and @peace-maker finally convinced me to do it. SourcePawn needs some internal concept of a control-flow graph. It is the gateway to many important optimizations and transformations, and it will also make our method verification much more rigorous. In fact, it is not possible to attempt many optimizations without veracity.

Overview

This patch introduces the ControlFlowGraph data structure, which is a collection of Blocks. A block (or "basic block") is a unit of control flow. It has exactly one entry-point and exactly one exit-point. Blocks form a directed, cyclic graph - each one may have any number of successors or predecessors. The entry-point to a method is a single block, and any number of blocks may exit the method.

Verification Changes

ControlFlowGraphs are created by GraphBuilder, which is explained below. The graph builder performs some very, very basic opcode verification - only what is needed to ascertain the block structure. However, this verification does introduce two new SMX requirements:

Jump targets must target an instruction that can be decoded as if ignoring control-flow. Before, it was possible to have a "split" instruction, like push.c <payload> with a jump pointing to the payload. This would decode in the interpreter, but would greatly confuse the JIT. It is now illegal, for the simple reason that analysis is super difficult if this is allowed (the code stream has to be constantly reparsed).
Local control flow in a method may not extend beyond the method. This sounds obvious, but a malformed code stream could be missing a retn instruction, and simply flow into the next method. There was no verification for this before, and it is now formally disallowed.

Testing

I used a corpus of ~6,000 .smx files from the forums and web compiler and confirmed that no new verification failures were introduced, which is a good sign for our scrappy compiler (so far). The time to verify these methods did not significantly change despite the new passes involved, but that may change in the future as we expand verification.

Implementation

GraphBuilder has three steps: prescan, scan, and cleanup.

Prescanning

The prescan step runs through the code stream and ensures it can actually be decoded. That is, the code stream does not terminate in the middle of an instruction. It also computes the actual bounds of the method, and verifies jump and switch instructions. Finally, along the way, it computes two important pieces of information:

An instruction bitmap, that maps which cells contain an opcode (versus a payload/parameter).
An jump target bitmap. Any instruction which is the target of a jump, gets marked in this map.

Scanning

The scan step is a fairly simple breadth-first algorithm to build a ControlFlowGraph. It creates an entry block, and walks the instruction stream looking for where to create new blocks. A few scenarios create a new basic block:

If an instruction was marked as a jump target, this means the instruction has two entry-points: the preceding instruction, and a jump from somewhere else. A loop would cause this, for example. This situation will terminate the current basic block, adding an implicit jump to a new block at the current code position.
If an instruction is an unconditional jump, the current block is terminated, and a new block is created at the jump target.
If an instruction is a conditional jump, then two new blocks must be created: one for the next instruction (the jump-not-taken case), and one for the jump target. The current block is terminated.
If an instruction is a switch, then a block must be created for each case (including the default case), and the current block is terminated.

Finally, a return statement terminates a block. The algorithm is a graph algorithm - all paths are searched, until either (1) all blocks are terminated, or (2) the code stream ends abruptly. It uses a work-queue to avoid recursion, and a hash table to ensure that blocks are not duplicated.

Cleanup

The cleanup step iterates all blocks looking for any that were not terminated. In theory, this could only happen on at most one block - and only if the method is malformed (for example, does not contain a retn instruction). If this is the case, verification fails.

In debug builds it also verifies that no disconnected blocks exist. Since it is a graph algorithm, all blocks should be connected.

This introduces three new data structures. The first is ControlFlowGraph, which represents a collection of basic blocks for a method. A basic block is a unit of code that terminates in a control flow instruction, and for which the only entry point is the start of the block. Basic blocks are represented by the new Block class. Finally, there is a new GraphBuilder class which can extract a ControlFlowGraph for a method. It also performs basic verification on control-flow, namely: - Jump targets must be a valid decoded instruction in the method. - The method's local control flow does not escape the method bounds.

This removes checks from MethodVerifier that are now unnecessary. Readability of the code stream is guaranteed by GraphBuilder, as well as the validity of jump targets.

dvander force-pushed the cfg branch 2 times, most recently from 76ab485 to beec699 Compare May 24, 2018 23:41

dvander added 2 commits May 24, 2018 16:46

Update the method verifier for the new ControlFlowGraph pass.

c0e30f6

This removes checks from MethodVerifier that are now unnecessary. Readability of the code stream is guaranteed by GraphBuilder, as well as the validity of jump targets.

dvander force-pushed the cfg branch from beec699 to c0e30f6 Compare May 24, 2018 23:46

dvander merged commit 88ea50e into master May 25, 2018

dvander deleted the cfg branch May 25, 2018 00:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce control flow verification. #213

Introduce control flow verification. #213

dvander commented May 24, 2018 •

edited

Loading

Introduce control flow verification. #213

Introduce control flow verification. #213

Conversation

dvander commented May 24, 2018 • edited Loading

Overview

Verification Changes

Testing

Implementation

Prescanning

Scanning

Cleanup

dvander commented May 24, 2018 •

edited

Loading