Introduce control flow verification. #213
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is something I've wanted to do for years, and @peace-maker finally convinced me to do it. SourcePawn needs some internal concept of a control-flow graph. It is the gateway to many important optimizations and transformations, and it will also make our method verification much more rigorous. In fact, it is not possible to attempt many optimizations without veracity.
Overview
This patch introduces the
ControlFlowGraph
data structure, which is a collection ofBlock
s. A block (or "basic block") is a unit of control flow. It has exactly one entry-point and exactly one exit-point. Blocks form a directed, cyclic graph - each one may have any number of successors or predecessors. The entry-point to a method is a single block, and any number of blocks may exit the method.Verification Changes
ControlFlowGraph
s are created byGraphBuilder
, which is explained below. The graph builder performs some very, very basic opcode verification - only what is needed to ascertain the block structure. However, this verification does introduce two new SMX requirements:push.c <payload>
with ajump
pointing to the payload. This would decode in the interpreter, but would greatly confuse the JIT. It is now illegal, for the simple reason that analysis is super difficult if this is allowed (the code stream has to be constantly reparsed).retn
instruction, and simply flow into the next method. There was no verification for this before, and it is now formally disallowed.Testing
I used a corpus of ~6,000 .smx files from the forums and web compiler and confirmed that no new verification failures were introduced, which is a good sign for our scrappy compiler (so far). The time to verify these methods did not significantly change despite the new passes involved, but that may change in the future as we expand verification.
Implementation
GraphBuilder
has three steps: prescan, scan, and cleanup.Prescanning
The prescan step runs through the code stream and ensures it can actually be decoded. That is, the code stream does not terminate in the middle of an instruction. It also computes the actual bounds of the method, and verifies jump and switch instructions. Finally, along the way, it computes two important pieces of information:
Scanning
The scan step is a fairly simple breadth-first algorithm to build a
ControlFlowGraph
. It creates an entry block, and walks the instruction stream looking for where to create new blocks. A few scenarios create a new basic block:Finally, a
return
statement terminates a block. The algorithm is a graph algorithm - all paths are searched, until either (1) all blocks are terminated, or (2) the code stream ends abruptly. It uses a work-queue to avoid recursion, and a hash table to ensure that blocks are not duplicated.Cleanup
The cleanup step iterates all blocks looking for any that were not terminated. In theory, this could only happen on at most one block - and only if the method is malformed (for example, does not contain a
retn
instruction). If this is the case, verification fails.In debug builds it also verifies that no disconnected blocks exist. Since it is a graph algorithm, all blocks should be connected.