Conversation
76ab485 to
beec699
Compare
This introduces three new data structures. The first is ControlFlowGraph, which represents a collection of basic blocks for a method. A basic block is a unit of code that terminates in a control flow instruction, and for which the only entry point is the start of the block. Basic blocks are represented by the new Block class. Finally, there is a new GraphBuilder class which can extract a ControlFlowGraph for a method. It also performs basic verification on control-flow, namely: - Jump targets must be a valid decoded instruction in the method. - The method's local control flow does not escape the method bounds.
This removes checks from MethodVerifier that are now unnecessary. Readability of the code stream is guaranteed by GraphBuilder, as well as the validity of jump targets.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is something I've wanted to do for years, and @peace-maker finally convinced me to do it. SourcePawn needs some internal concept of a control-flow graph. It is the gateway to many important optimizations and transformations, and it will also make our method verification much more rigorous. In fact, it is not possible to attempt many optimizations without veracity.
Overview
This patch introduces the
ControlFlowGraphdata structure, which is a collection ofBlocks. A block (or "basic block") is a unit of control flow. It has exactly one entry-point and exactly one exit-point. Blocks form a directed, cyclic graph - each one may have any number of successors or predecessors. The entry-point to a method is a single block, and any number of blocks may exit the method.Verification Changes
ControlFlowGraphs are created byGraphBuilder, which is explained below. The graph builder performs some very, very basic opcode verification - only what is needed to ascertain the block structure. However, this verification does introduce two new SMX requirements:push.c <payload>with ajumppointing to the payload. This would decode in the interpreter, but would greatly confuse the JIT. It is now illegal, for the simple reason that analysis is super difficult if this is allowed (the code stream has to be constantly reparsed).retninstruction, and simply flow into the next method. There was no verification for this before, and it is now formally disallowed.Testing
I used a corpus of ~6,000 .smx files from the forums and web compiler and confirmed that no new verification failures were introduced, which is a good sign for our scrappy compiler (so far). The time to verify these methods did not significantly change despite the new passes involved, but that may change in the future as we expand verification.
Implementation
GraphBuilderhas three steps: prescan, scan, and cleanup.Prescanning
The prescan step runs through the code stream and ensures it can actually be decoded. That is, the code stream does not terminate in the middle of an instruction. It also computes the actual bounds of the method, and verifies jump and switch instructions. Finally, along the way, it computes two important pieces of information:
Scanning
The scan step is a fairly simple breadth-first algorithm to build a
ControlFlowGraph. It creates an entry block, and walks the instruction stream looking for where to create new blocks. A few scenarios create a new basic block:Finally, a
returnstatement terminates a block. The algorithm is a graph algorithm - all paths are searched, until either (1) all blocks are terminated, or (2) the code stream ends abruptly. It uses a work-queue to avoid recursion, and a hash table to ensure that blocks are not duplicated.Cleanup
The cleanup step iterates all blocks looking for any that were not terminated. In theory, this could only happen on at most one block - and only if the method is malformed (for example, does not contain a
retninstruction). If this is the case, verification fails.In debug builds it also verifies that no disconnected blocks exist. Since it is a graph algorithm, all blocks should be connected.