This repository contains the following two subdirectories:
- WCCPhiAnal - It is a Clang-based program analysis tool that computes control dependencies, weak control closure, and phi nodes (required for SSA computation)
- SSA - It is implemented as an LLVM pass to transform any LLVM IR program to the SSA form
Both are implemented and tested in LLVM/Clang version 11.
WCCPhiAnal can run as a standalone tool built using the Libtooling library support. To build WCCPhiAnal, do the following:
- Copy WCCPhiAnal in
(your llvm directory)/clang/tools
directory, - Add the instruction
add_clang_subdirectory(WCCPhiAnal)
in the CMakeLists.txt file in thetools
directory, - Build the WCCPhiAnal project by building LLVM (or selectively build projects in LLVM) which will generate the wccphigen executable. See the details of how to build Clang/LLVM.
SSA is an LLVM pass which can be built as follows:
- Copy SSA to the directory
(your llvm directory)/llvm/lib/Transforms
directory and rename SSA by SmartSSA (just to resolve naming conflict), - Add the instruction
add_subdirectory(SmartSSA)
in the CMakeLists.txt file in theTransforms
directory, - Build SSA by building LLVM (or selectively build projects in LLVM) which will generate the SmartSSA.dylib library in the
Debug/lib/
directory of your LLVM's build directory.
- Compute the weak control closure of each procedure with respect to a given subset of CFG nodes. These subset of CFG nodes can be taken randomly or given as input. WCC can be computed using various state-of-the-art methods and a novel duality-based algorithm.
- Compute WCC using the method of Danicic et al.[1]. Usage option:
-alg=CDDanicic
- Computte WCC using reaching definition-based analysis [2,3]. Usage option:
-alg=CDSAS
- Computte WCC using novel duality-based algorithm [4]. Usage option:
-alg=CDDual
- Compare the computation of WCC using all the above three methods. Usage option:
-alg=CompCD
- Compute WCC using the method of Danicic et al.[1]. Usage option:
- Compute the set of PHI nodes from the CFG of each procedure and for each program variable.
- Compute Phi nodes using the dominance frontier based method (Cytron's algorithm [5]). Usage option:
-alg=PhiDF
- Compute Phi nodes using RD-based method discussed in SCAM 2019 [6] and JSS 2020 [7]. Usage option:
-alg=PhiRD
- Compute Phi nodes using the duality-based fixpoint method [4]. Usage option:
-alg=PhiDual
- Compare the computation of Phi nodes using all the above methods. Usage option:
-alg=CompPhi
- Compute Phi nodes using the dominance frontier based method (Cytron's algorithm [5]). Usage option:
- Use the option
-proc=(Procedure name)
if you want to analyze a specific procedure. Then, you can use-cfg
or-cdg
to see the CFG or the control dependency information of the analyzed procedure. - Usually, WCC is computed with respect to a set of CFG nodes. By default, this set is obtained randomly from the set of CFG nodes. However, this option can be overridden by using the option
-np
which will then allow to get a subset of CFG nodes from console. - Dominance frontier based computation (such as [5]) of Phi nodes assumes that all program variables are defined at the beginning and introduces spuriousness in computing phi nodes. However, theis assumption is relaxed in [4,6,7] by assuming that only global variables and formal parameters are defined at the beginning. Thus, the implementation of the methods in [4,6,7] produces precise phi nodes. However, the option
-defs-at-start
forces these methods to assume that all program variables are definied at the beginning, and then the set of phi nodes in all computations are same.
Limitations:
- The computation of phi nodes does not consider the liveness of program variables.
- It generates only phi nodes, but does not transform high-level code to its SSA form. However, the above limitations are resolved in SmartSSA which transforms any LLVM IR code to its SSA form.
SmartSSA is implemented as an LLVM pass that can be invoked using the LLVM's opt
tool. It extends the original mem2reg pass of LLVM and include the implementation of duality-based phi node computation. This tool can be used using the following command line option:
opt -load (path to your ...Debug/lib/)SmartSSA.dylib -S -smartssa source.ll -o sourceOutput.ll Options
-
-m2rlive
Convert source.ll to sourceOutput.ll using the mem2reg pass of LLVM and use the liveness of variables into account to compute pruned SSA form -
-duallive
Convert source.ll to sourceOutput.ll using the SmartSSA pass and use the liveness of variables into account to compute pruned SSA form -
-m2r
Convert source.ll to sourceOutput.ll using the mem2reg pass of LLVM and without using the liveness of variables into account ( thus, it computes minimal SSA form) -
-dual
Convert source.ll to sourceOutput.ll using the SmartSSA pass and without using the liveness of variables into account to compute pruned SSA form -
-defstart
option can be used with-duallive
or-dual
to state that all program variables are defined at the beginning. The effect will be that the generated LLVM IR will be equal to the LLVM IR of mem2reg pass.
[1] Danicic, Sebastian et al. “A unifying theory of control dependence and its application to arbitrary program structures.” Theor. Comput. Sci. 412 (2011): 6809-6842.
[2] Abu Naser Masud. Efficient computation of minimal weak and strong control closure. J. Syst. Softw. 184, C (Feb 2022). DOI:https://doi.org/10.1016/j.jss.2021.111140
[3] Abu Naser Masud. Simple and Efficient Computation of Minimal Weak Control Closure. in proceeding of the 27th Intl. conf. in static analysis (SAS) (2020).
[4] Abu Naser Masud. The Duality in Computing SSA Programs and Control Dependency, under review in transactions on software engineering (TSE).
[5] Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck. Efficiently computing static single as- signment form and the control dependence graph. ACM Trans. Program. Lang. Syst., 13(4):451–490, October 1991.
[6] A. N. Masud and F. Ciccozzi. Towards constructing the ssa form using reaching definitions over dominance frontiers. In 2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM), pages 23–33, 2019.
[7] Abu Naser Masud and Federico Ciccozzi. More precise con- struction of static single assignment programs using reaching definitions. Journal of Systems and Software, 166:110590, 2020.