## PySVF tutorial: ICFG (Interprocedural Control-Flow Graph)

### Introduction

In this Jupyter Notebook, we will explore the use of the SVF (Static Value-Flow) tool for static analysis of LLVM bitcode, focusing on the Interprocedural Control-Flow Graph (ICFG). The `pysvf` library provides functionality to analyze bitcode files and extract valuable insights from the ICFG. This notebook will guide you through the process of setting up the environment, using the ICFG-related functions, and analyzing the resulting data structures.

### Significance

- **Control Flow Analysis**: ICFG provides a comprehensive framework for analyzing the control flow within a program. It helps in understanding the execution paths and the flow of control between different functions, which is essential for optimizing compilers and detecting potential vulnerabilities.
- **Interprocedural Analysis**: One of the primary applications of ICFG is in interprocedural analysis, where it helps in determining the control flow across function boundaries, providing a more holistic view of the program's behavior.
- **Optimization and Refactoring**: By providing insights into the control flow, ICFG aids in optimizing code and refactoring it for better performance and maintainability.

### Functionality

- **Representation of Functions and Control Flow**: In ICFG, functions are represented as nodes, and control flow between them is represented as edges. This graph-based representation allows for efficient traversal and analysis of the program's structure.
- **Interprocedural Control Flow**: ICFG supports interprocedural control flow analysis, meaning it can analyze the flow of control across function boundaries, providing a more comprehensive view of the program's execution.
- **Integration with LLVM**: ICFG is designed to work seamlessly with LLVM bitcode, making it a powerful tool for analyzing programs written in languages supported by LLVM.



In [2]:
# Install the pysvf library
# You might need to run this command in your terminal or use a Jupyter magic command
# !pip install pysvf

# Import necessary libraries
import pysvf

# Load the LLVM bitcode file
bitcode_file = "demo.ll"

# Get the pag(SVFIR) from the bitcode file
svfir = pysvf.getPAG(bitcode_file)

# Get the control flow graph (CFG) from the pag
cfg = svfir.getICFG()

# Dump the ICFG to a file, the output file will be named "demo.icfg.dot"
cfg.dump("demo.icfg")



*********General Stats***************
################ (program : demo.ll)###############
AddrsNum            21
BBWith2Succ         1
BBWith3Succ         0
CallsNum            3
ConstArrayObj       0
ConstStructObj      0
ConstantObj         14
CopysNum            2
FIObjNum            13
FSObjNum            9
FunctionObjs        5
GepsNum             10
GlobalObjs          1
HeapObjs            1
IndCallSites        0
LoadsNum            3
MaxStructSize       0
NonPtrObj           22
ReturnsNum          1
StackObjs           2
StoresNum           7
TotalCallSite       4
TotalFieldObjects   2
TotalObjects        25
TotalPTASVFStmts    22
TotalPointers       70
TotalSVFStmts       56
VarArrayObj         1
VarStructObj        0
----------------Time and memory stats--------------------
LLVMIRTime          0.01
SVFIRTime           0.014
SymbolTableTime     0.002
#######################################################

*********PTACallGraph Stats (Andersen analysis)***************
#######

Then we can traverse the ICFG to analyze the control flow between functions and perform various analyses on the program's structure.

### Basic Operations on ICFG

#### ICFG Class
- **APIs**
  ```python
  def getNodes(self) -> List[ICFGNode]: ... # Get the list of ICFG nodes
  def getGNode(self, id: int) -> ICFGNode: ... # Get the ICFG node by id
  def getGlobalICFGNode(self) -> ICFGNode: ... # Get the global ICFG node
  def dump(self, file: str) -> None: ... # Dump the ICFG to a file
  ```

#### ICFGNode Class
- **APIs**
  ```python
  def toString(self) -> str: ... # Get the string representation of the ICFG node
  def getId(self) -> int: ... # Get the id of the ICFG node
  def getFun(self) -> SVFFunction: ... # Get the function that the ICFG node belongs to
  def getBB(self) -> SVFBasicBlock: ... # Get the basic block that the ICFG node belongs to
  def getSVFStmts(self) -> List[SVFStmt]: ... # Get the SVF statements associated with the ICFG node
  def asFunEntry(self) -> FunEntryICFGNode: ... # Downcast to FunEntryICFGNode
  def asFunExit(self) -> FunExitICFGNode: ... # Downcast to FunExitICFGNode
  def asCall(self) -> CallICFGNode: ... # Downcast to CallICFGNode
  def asRet(self) -> RetICFGNode: ... # Downcast to RetICFGNode
  def isFunEntry(self) -> bool: ... # Check if the ICFG node is a function entry node
  def isFunExit(self) -> bool: ... # Check if the ICFG node is a function exit node
  def isCall(self) -> bool: ... # Check if the ICFG node is a function call node
  def isRet(self) -> bool: ... # Check if the ICFG node is a function return node
  def getOutEdges(self) -> List[ICFGEdge]: ... # Get the out edges of the ICFG node
  def getInEdges(self) -> List[ICFGEdge]: ... # Get the in edges of the ICFG node
  ```

###  ICFGEdge Class
- **APIs**
  ```python
  def toString(self) -> str: ... # Get the string representation of the ICFG edge
  def isCFGEdge(self) -> bool: ... # Check if the edge is a CFG edge
  def isCallCFGEdge(self) -> bool: ... # Check if the edge is a call CFG edge
  def isRetCFGEdge(self) -> bool: ... # Check if the edge is a return CFG edge
  def isIntraCFGEdge(self) -> bool: ... # Check if the edge is an intra CFG edge
  def getSrcNode(self) -> ICFGNode: ... # Get the source node of the edge
  def getDstNode(self) -> ICFGNode: ... # Get the destination node of the edge
  def asIntraCFGEdge(self) -> IntraCFGEdge: ... # Downcast to IntraCFGEdge
  def asCallCFGEdge(self) -> CallCFGEdge: ... # Downcast to CallCFGEdge
  def asRetCFGEdge(self) -> RetCFGEdge: ... # Downcast to RetCFGEdge
  ```

Then, we can use these APIs to traverse the ICFG and perform various analyses on the control flow between functions.

- Traverse the ICFG nodes and print their information
- Get a certain node by its ID (e.g. Global ICFG node)
- Get the SVFStmt associated with a certain node
- Downcast the ICFGNode/Edge to a specific type (e.g. CallICFGNode)

In [3]:
for node in cfg.getNodes():
    print(node)

GlobalICFGNode0
CopyStmt: [Var1 <-- Var0]	
ConstNullPtrValVar ID: 0
 ptr null { constant data }
AddrStmt: [Var5 <-- Var6]	
ConstIntValVar ID: 5
 i8 37 { constant data }
AddrStmt: [Var7 <-- Var8]	
ConstIntValVar ID: 7
 i8 100 { constant data }
AddrStmt: [Var9 <-- Var10]	
ConstIntValVar ID: 9
 i8 10 { constant data }
AddrStmt: [Var11 <-- Var12]	
ConstIntValVar ID: 11
 i8 0 { constant data }
AddrStmt: [Var48 <-- Var49]	
ConstIntValVar ID: 48
 i32 3 { constant data }
AddrStmt: [Var45 <-- Var46]	
ConstIntValVar ID: 45
 i64 1 { constant data }
AddrStmt: [Var42 <-- Var43]	
ConstIntValVar ID: 42
 i32 5 { constant data }
AddrStmt: [Var39 <-- Var40]	
ConstIntValVar ID: 39
 i64 0 { constant data }
AddrStmt: [Var55 <-- Var56]	
ConstIntValVar ID: 55
 i32 1 { constant data }
AddrStmt: [Var21 <-- Var22]	
ConstIntValVar ID: 21
 i32 0 { constant data }
AddrStmt: [Var60 <-- Var61]	
ConstIntValVar ID: 60
 i1 false { constant data }
AddrStmt: [Var36 <-- Var37]	
ConstIntValVar ID: 36
 i64 8 { constant data

From the output above, we can see the list of ICFG nodes in the program, along with their unique identifiers and function names. This information provides insights into the structure of the program and the control flow between different functions. And we can also dump the ICFG to a file and visualize it using graph visualization tools like Graphviz.

#### Visualizing the ICFG

To visualize the ICFG, we can use the `graphviz` library to generate a graphical representation of the control flow between functions. The `dump` function of the ICFG class generates a DOT file that can be rendered using Graphviz.

![Alt text](icfgdot.png)


### Exploring the ICFG

Now that we have loaded the ICFG and visualized it, we can explore the control flow between functions and perform various analyses on the program's structure. We can analyze the call graph, identify function dependencies, and detect potential issues in the control flow.

#### ICFGNode

ICFGNode represents a node in the Interprocedural Control-Flow Graph (ICFG) and provides information about the function associated with the node, its successors, and predecessors. ICFGNode has several subtypes, such as CallICFGNode, RetICFGNode, and IntraICFGNode, FunEntryICFGNode, FunExitICFGNode which represent different kinds of control flow edges between functions.

If you want to use apis in subclass, you need to downcast the ICFGNode to the specific type.



In [4]:
for node in cfg.getNodes():
    if isinstance(node, pysvf.CallICFGNode):
        call_node = node.asCall()
        print(call_node.getCalledFunction()) # This function can only be called on CallICFGNode, which can get the called function obj var
    elif isinstance(node, pysvf.RetICFGNode):
        ret_node = node.asRet()
        print(ret_node.getCallICFGNode()) # This function can only be called on RetICFGNode,which can get the corresponding call node 
    elif isinstance(node, pysvf.FunEntryICFGNode):
        funentry_node = node.asFunEntry()
    elif isinstance(node, pysvf.FunExitICFGNode):
        funexit_node = node.asFunExit()

FunObjVar ID: 15 (base object)
add_or_sub
CallICFGNode22 {fun: main}
CallPE: [Var17 <-- Var51]	
ValVar ID: 54
   %call = call i32 @add_or_sub(i32 noundef %1, i32 noundef %2, i32 noundef 1) 
CallPE: [Var18 <-- Var53]	
ValVar ID: 54
   %call = call i32 @add_or_sub(i32 noundef %1, i32 noundef %2, i32 noundef 1) 
CallPE: [Var19 <-- Var55]	
ValVar ID: 54
   %call = call i32 @add_or_sub(i32 noundef %1, i32 noundef %2, i32 noundef 1) 

FunObjVar ID: 65 (base object)
llvm.objectsize.i64.p0
CallICFGNode25 {fun: main}
   %4 = call i64 @llvm.objectsize.i64.p0(ptr %3, i1 false, i1 true, i1 false) CallICFGNode: 

FunObjVar ID: 68 (base object)
__memcpy_chk
CallICFGNode27 {fun: main}
GepStmt: [Var90 <-- Var57]	
ValVar ID: 66
   %call4 = call ptr @__memcpy_chk(ptr noundef %3, ptr noundef %0, i64 noundef 8, i64 noundef %4) #4 
GepStmt: [Var91 <-- Var34]	
ValVar ID: 66
   %call4 = call ptr @__memcpy_chk(ptr noundef %3, ptr noundef %0, i64 noundef 8, i64 noundef %4) #4 
LoadStmt: [Var92 <-- Var91]	
ValV

The above code snippet demonstrates how to downcast an ICFGEdge to a specific subtype and access the associated information. By analyzing the ICFG nodes and edges, we can gain insights into the control flow between functions and identify potential issues in the program's structure.

### SVF Statements (SVFStmt) under ICFGNode

Each ICFGNode is associated with a set of SVF statements that represent the program's behavior at that point. By analyzing these SVF statements, we can understand the data flow and value propagation within the program, which is essential for optimizing compilers and detecting potential vulnerabilities.

We have following SVF statements:

| Class Name       | Method Name                          | Description                                      |
|------------------|--------------------------------------|--------------------------------------------------|
| `SVFStmt`        | `toString`                           | Get the string representation of the SVF statement |
|                  | `getEdgeId`                          | Get the ID of the SVF statement                  |
|                  | `getICFGNode`                        | Get the ICFG node that the SVF statement belongs to |
|                  | `getValue`                           | Get the value of the SVF statement               |
|                  | `getBB`                              | Get the basic block that the SVF statement belongs to |
|                  | `isAddrStmt`                         | Check if the SVF statement is an address statement |
|                  | `isCopyStmt`                         | Check if the SVF statement is a copy statement   |
|                  | `isStoreStmt`                        | Check if the SVF statement is a store statement  |
|                  | `isLoadStmt`                         | Check if the SVF statement is a load statement   |
|                  | `isCallPE`                           | Check if the SVF statement is a call PE          |
|                  | `isRetPE`                            | Check if the SVF statement is a return PE        |
|                  | `isGepStmt`                          | Check if the SVF statement is a GEP statement    |
|                  | `isPhiStmt`                          | Check if the SVF statement is a phi statement    |
|                  | `isSelectStmt`                       | Check if the SVF statement is a select statement |
|                  | `isCmpStmt`                          | Check if the SVF statement is a compare statement |
|                  | `isBinaryOpStmt`                     | Check if the SVF statement is a binary operation statement |
|                  | `isUnaryOpStmt`                      | Check if the SVF statement is a unary operation statement |
|                  | `isBranchStmt`                       | Check if the SVF statement is a branch statement |
|                  | `asAddrStmt`                         | Downcast the SVF statement to an address statement |
|                  | `asCopyStmt`                         | Downcast the SVF statement to a copy statement   |
|                  | `asStoreStmt`                        | Downcast the SVF statement to a store statement  |
|                  | `asLoadStmt`                         | Downcast the SVF statement to a load statement   |
|                  | `asCallPE`                           | Downcast the SVF statement to a call PE          |
|                  | `asRetPE`                            | Downcast the SVF statement to a return PE        |
|                  | `asGepStmt`                          | Downcast the SVF statement to a GEP statement    |
|                  | `asPhiStmt`                          | Downcast the SVF statement to a phi statement    |
|                  | `asSelectStmt`                       | Downcast the SVF statement to a select statement |
|                  | `asCmpStmt`                          | Downcast the SVF statement to a compare statement |
|                  | `asBinaryOpStmt`                     | Downcast the SVF statement to a binary operation statement |
|                  | `asUnaryOpStmt`                      | Downcast the SVF statement to a unary operation statement |
|                  | `asBranchStmt`                       | Downcast the SVF statement to a branch statement |
| `AddrStmt`       | `getLHSVar`                          | Get the LHS variable of the address statement    |
|                  | `getLHSVarID`                        | Get the ID of the LHS variable of the address statement |
|                  | `getRHSVar`                          | Get the RHS variable of the address statement    |
|                  | `getRHSVarID`                        | Get the ID of the RHS variable of the address statement |
|                  | `getArrSize`                         | Get the array size of the address statement      |
| `CopyStmt`       | `getLHSVar`                          | Get the LHS variable of the copy statement       |
|                  | `getLHSVarID`                        | Get the ID of the LHS variable of the copy statement |
|                  | `getRHSVar`                          | Get the RHS variable of the copy statement       |
|                  | `getRHSVarID`                        | Get the ID of the RHS variable of the copy statement |
| `StoreStmt`      | `getLHSVar`                          | Get the LHS variable of the store statement      |
|                  | `getLHSVarID`                        | Get the ID of the LHS variable of the store statement |
|                  | `getRHSVar`                          | Get the RHS variable of the store statement      |
|                  | `getRHSVarID`                        | Get the ID of the RHS variable of the store statement |
| `LoadStmt`       | `getLHSVar`                          | Get the LHS variable of the load statement       |
|                  | `getLHSVarID`                        | Get the ID of the LHS variable of the load statement |
|                  | `getRHSVar`                          | Get the RHS variable of the load statement       |
|                  | `getRHSVarID`                        | Get the ID of the RHS variable of the load statement |
| `CallPE`         | `getCallSite`                        | Get the call site                                |
|                  | `getLHSVar`                          | Get the LHS variable of the call PE              |
|                  | `getLHSVarID`                        | Get the ID of the LHS variable of the call PE    |
|                  | `getRHSVar`                          | Get the RHS variable of the call PE              |
|                  | `getRHSVarID`                        | Get the ID of the RHS variable of the call PE    |
|                  | `getFunEntryICFGNode`                | Get the function entry ICFG node                 |
| `RetPE`          | `getCallSite`                        | Get the call site                                |
|                  | `getLHSVar`                          | Get the LHS variable of the return PE            |
|                  | `getLHSVarID`                        | Get the ID of the LHS variable of the return PE  |
|                  | `getRHSVar`                          | Get the RHS variable of the return PE            |
|                  | `getRHSVarID`                        | Get the ID of the RHS variable of the return PE  |
|                  | `getFunExitICFGNode`                 | Get the function exit ICFG node                  |
| `GepStmt`        | `getLHSVar`                          | Get the LHS variable of the GEP statement        |
|                  | `getLHSVarID`                        | Get the ID of the LHS variable of the GEP statement |
|                  | `getRHSVar`                          | Get the RHS variable of the GEP statement        |
|                  | `getRHSVarID`                        | Get the ID of the RHS variable of the GEP statement |
|                  | `isConstantOffset`                   | Check if the GEP statement has a constant offset |
|                  | `getConstantOffset`                  | Get the constant offset                          |
|                  | `getConstantByteOffset`              | Get the constant byte offset                     |
|                  | `getConstantStructFldIdx`            | Get the constant struct field index              |
|                  | `getOffsetVarAndGepTypePairVec`      | Get the offset variable and GEP type pair vector |
|                  | `getSrcPointeeType`                  | Get the source pointee type                      |
| `PhiStmt`        | `getRes`                             | Get the result variable                          |
|                  | `getResId`                           | Get the ID of the result variable                |
|                  | `getOpVar`                           | Get the operand variable                         |
|                  | `getOpICFGNode`                      | Get the operand ICFG node                        |
|                  | `getOpVarNum`                        | Get the number of operand variables              |
| `CmpStmt`        | `getPredicate`                       | Get the predicate                                |
|                  | `getRes`                             | Get the result variable                          |
|                  | `getResId`                           | Get the ID of the result variable                |
|                  | `getOpVar`                           | Get the operand variable                         |
|                  | `getOpVarNum`                        | Get the number of operands of the compare statement |
| `BinaryOPStmt`   | `getOpcode`                          | Get the opcode                                   |
|                  | `getRes`                             | Get the result variable                          |
|                  | `getResId`                           | Get the ID of the result variable                |
|                  | `getOpVar`                           | Get the operand variable                         |
| `UnaryOPStmt`    | `getOp`                              | Get the opcode                                   |
|                  | `getRes`                             | Get the result variable                          |
|                  | `getResVar`                          | Get the result variable                          |
|                  | `getResId`                           | Get the ID of the result variable                |
|                  | `getOpVar`                           | Get the operand variable                         |
|                  | `getOpVarId`                         | Get the ID of the operand variable               |
| `BranchStmt`     | `getSuccessors`                      | Get the successors of the branch statement       |
|                  | `getNumSuccessors`                   | Get the number of successors                     |
|                  | `isConditional`                      | Check if the branch statement is conditional     |
|                  | `isUnconditional`                    | Check if the branch statement is unconditional   |
|                  | `getCondition`                       | Get the condition variable                       |
|                  | `getBranchInst`                      | Get the branch instruction                       |

From the other document [SVFIR](SVFIR.ipynb) (PAG), we know the definitions of these SVF statements. We can use these SVF statements to analyze the data flow and value propagation within the program, which is essential for optimizing compilers and detecting potential vulnerabilities.



 structure.

### SVF Statements (SVFStmt) under ICFGNode

Each ICFGNode is associated with a set of SVF statements that represent the program's behavior at that point. By analyzing these SVF statements, we can understand the data flow and value propagation within the program, which is essential for optimizing compilers and detecting potential vulnerabilities.

We have following SVF statements:

| Class Name       | Method Name                          | Description                                      |
|------------------|--------------------------------------|--------------------------------------------------|
| `SVFStmt`        | `toString`                           | Get the string representation of the SVF statement |
|                  | `getEdgeId`                          | Get the ID of the SVF statement                  |
|                  | `getICFGNode`                        | Get the ICFG node that the SVF statement belongs to |
|                  | `getValue`                           | Get the value of the SVF statement               |
|                  | `getBB`                              | Get the basic block that the SVF statement belongs to |
|                  | `isAddrStmt`                         | Check if the SVF statement is an address statement |
|                  | `isCopyStmt`                         | Check if the SVF statement is a copy statement   |
|                  | `isStoreStmt`                        | Check if the SVF statement is a store statement  |
|                  | `isLoadStmt`                         | Check if the SVF statement is a load statement   |
|                  | `isCallPE`                           | Check if the SVF statement is a call PE          |
|                  | `isRetPE`                            | Check if the SVF statement is a return PE        |
|                  | `isGepStmt`                          | Check if the SVF statement is a GEP statement    |
|                  | `isPhiStmt`                          | Check if the SVF statement is a phi statement    |
|                  | `isSelectStmt`                       | Check if the SVF statement is a select statement |
|                  | `isCmpStmt`                          | Check if the SVF statement is a compare statement |
|                  | `isBinaryOpStmt`                     | Check if the SVF statement is a binary operation statement |
|                  | `isUnaryOpStmt`                      | Check if the SVF statement is a unary operation statement |
|                  | `isBranchStmt`                       | Check if the SVF statement is a branch statement |
|                  | `asAddrStmt`                         | Downcast the SVF statement to an address statement |
|                  | `asCopyStmt`                         | Downcast the SVF statement to a copy statement   |
|                  | `asStoreStmt`                        | Downcast the SVF statement to a store statement  |
|                  | `asLoadStmt`                         | Downcast the SVF statement to a load statement   |
|                  | `asCallPE`                           | Downcast the SVF statement to a call PE          |
|                  | `asRetPE`                            | Downcast the SVF statement to a return PE        |
|                  | `asGepStmt`                          | Downcast the SVF statement to a GEP statement    |
|                  | `asPhiStmt`                          | Downcast the SVF statement to a phi statement    |
|                  | `asSelectStmt`                       | Downcast the SVF statement to a select statement |
|                  | `asCmpStmt`                          | Downcast the SVF statement to a compare statement |
|                  | `asBinaryOpStmt`                     | Downcast the SVF statement to a binary operation statement |
|                  | `asUnaryOpStmt`                      | Downcast the SVF statement to a unary operation statement |
|                  | `asBranchStmt`                       | Downcast the SVF statement to a branch statement |
| `AddrStmt`       | `getLHSVar`                          | Get the LHS variable of the address statement    |
|                  | `getLHSVarID`                        | Get the ID of the LHS variable of the address statement |
|                  | `getRHSVar`                          | Get the RHS variable of the address statement    |
|                  | `getRHSVarID`                        | Get the ID of the RHS variable of the address statement |
|                  | `getArrSize`                         | Get the array size of the address statement      |
| `CopyStmt`       | `getLHSVar`                          | Get the LHS variable of the copy statement       |
|                  | `getLHSVarID`                        | Get the ID of the LHS variable of the copy statement |
|                  | `getRHSVar`                          | Get the RHS variable of the copy statement       |
|                  | `getRHSVarID`                        | Get the ID of the RHS variable of the copy statement |
| `StoreStmt`      | `getLHSVar`                          | Get the LHS variable of the store statement      |
|                  | `getLHSVarID`                        | Get the ID of the LHS variable of the store statement |
|                  | `getRHSVar`                          | Get the RHS variable of the store statement      |
|                  | `getRHSVarID`                        | Get the ID of the RHS variable of the store statement |
| `LoadStmt`       | `getLHSVar`                          | Get the LHS variable of the load statement       |
|                  | `getLHSVarID`                        | Get the ID of the LHS variable of the load statement |
|                  | `getRHSVar`                          | Get the RHS variable of the load statement       |
|                  | `getRHSVarID`                        | Get the ID of the RHS variable of the load statement |
| `CallPE`         | `getCallSite`                        | Get the call site                                |
|                  | `getLHSVar`                          | Get the LHS variable of the call PE              |
|                  | `getLHSVarID`                        | Get the ID of the LHS variable of the call PE    |
|                  | `getRHSVar`                          | Get the RHS variable of the call PE              |
|                  | `getRHSVarID`                        | Get the ID of the RHS variable of the call PE    |
|                  | `getFunEntryICFGNode`                | Get the function entry ICFG node                 |
| `RetPE`          | `getCallSite`                        | Get the call site                                |
|                  | `getLHSVar`                          | Get the LHS variable of the return PE            |
|                  | `getLHSVarID`                        | Get the ID of the LHS variable of the return PE  |
|                  | `getRHSVar`                          | Get the RHS variable of the return PE            |
|                  | `getRHSVarID`                        | Get the ID of the RHS variable of the return PE  |
|                  | `getFunExitICFGNode`                 | Get the function exit ICFG node                  |
| `GepStmt`        | `getLHSVar`                          | Get the LHS variable of the GEP statement        |
|                  | `getLHSVarID`                        | Get the ID of the LHS variable of the GEP statement |
|                  | `getRHSVar`                          | Get the RHS variable of the GEP statement        |
|                  | `getRHSVarID`                        | Get the ID of the RHS variable of the GEP statement |
|                  | `isConstantOffset`                   | Check if the GEP statement has a constant offset |
|                  | `getConstantOffset`                  | Get the constant offset                          |
|                  | `getConstantByteOffset`              | Get the constant byte offset                     |
|                  | `getConstantStructFldIdx`            | Get the constant struct field index              |
|                  | `getOffsetVarAndGepTypePairVec`      | Get the offset variable and GEP type pair vector |
|                  | `getSrcPointeeType`                  | Get the source pointee type                      |
| `PhiStmt`        | `getRes`                             | Get the result variable                          |
|                  | `getResId`                           | Get the ID of the result variable                |
|                  | `getOpVar`                           | Get the operand variable                         |
|                  | `getOpICFGNode`                      | Get the operand ICFG node                        |
|                  | `getOpVarNum`                        | Get the number of operand variables              |
| `CmpStmt`        | `getPredicate`                       | Get the predicate                                |
|                  | `getRes`                             | Get the result variable                          |
|                  | `getResId`                           | Get the ID of the result variable                |
|                  | `getOpVar`                           | Get the operand variable                         |
|                  | `getOpVarNum`                        | Get the number of operands of the compare statement |
| `BinaryOPStmt`   | `getOpcode`                          | Get the opcode                                   |
|                  | `getRes`                             | Get the result variable                          |
|                  | `getResId`                           | Get the ID of the result variable                |
|                  | `getOpVar`                           | Get the operand variable                         |
| `UnaryOPStmt`    | `getOp`                              | Get the opcode                                   |
|                  | `getRes`                             | Get the result variable                          |
|                  | `getResVar`                          | Get the result variable                          |
|                  | `getResId`                           | Get the ID of the result variable                |
|                  | `getOpVar`                           | Get the operand variable                         |
|                  | `getOpVarId`                         | Get the ID of the operand variable               |
| `BranchStmt`     | `getSuccessors`                      | Get the successors of the branch statement       |
|                  | `getNumSuccessors`                   | Get the number of successors                     |
|                  | `isConditional`                      | Check if the branch statement is conditional     |
|                  | `isUnconditional`                    | Check if the branch statement is unconditional   |
|                  | `getCondition`                       | Get the condition variable                       |
|                  | `getBranchInst`                      | Get the branch instruction                       |

From the other document [SVFIR](SVFIR.ipynb) (PAG), we know the definitions of these SVF statements. We can use these SVF statements to analyze the data flow and value propagation within the program, which is essential for optimizing compilers and detecting potential vulnerabilities.



In [5]:
for node in cfg.getNodes():
    for edge in node.getOutEdges():
        if isinstance(edge, pysvf.CallCFGEdge):
            call_edge = edge.asCallCFGEdge()
            print(call_edge.getCallSite()) # This function can only be called on CallICFGEdge, which can get the call icfg node corresponding to the call edge
        elif isinstance(edge, pysvf.RetCFGEdge):
            ret_edge = edge.asRetCFGEdge()
            print(ret_edge.getCallSite()) # This function can only be called on RetICFGEdge, which can get the call icfg node corresponding to the return edge

CallICFGNode22 {fun: main}
CallPE: [Var17 <-- Var51]	
ValVar ID: 54
   %call = call i32 @add_or_sub(i32 noundef %1, i32 noundef %2, i32 noundef 1) 
CallPE: [Var18 <-- Var53]	
ValVar ID: 54
   %call = call i32 @add_or_sub(i32 noundef %1, i32 noundef %2, i32 noundef 1) 
CallPE: [Var19 <-- Var55]	
ValVar ID: 54
   %call = call i32 @add_or_sub(i32 noundef %1, i32 noundef %2, i32 noundef 1) 

CallICFGNode22 {fun: main}
CallPE: [Var17 <-- Var51]	
ValVar ID: 54
   %call = call i32 @add_or_sub(i32 noundef %1, i32 noundef %2, i32 noundef 1) 
CallPE: [Var18 <-- Var53]	
ValVar ID: 54
   %call = call i32 @add_or_sub(i32 noundef %1, i32 noundef %2, i32 noundef 1) 
CallPE: [Var19 <-- Var55]	
ValVar ID: 54
   %call = call i32 @add_or_sub(i32 noundef %1, i32 noundef %2, i32 noundef 1) 




The above code snippet demonstrates how to downcast an ICFGEdge to a specific subtype and access the associated information. By analyzing the ICFG nodes and edges, we can gain insights into the control flow between functions and identify potential issues in the program's structure.

### SVF Statements (SVFStmt) under ICFGNode

Each ICFGNode is associated with a set of SVF statements that represent the program's behavior at that point. By analyzing these SVF statements, we can understand the data flow and value propagation within the program, which is essential for optimizing compilers and detecting potential vulnerabilities.

We have following SVF statements:

| Class Name       | Method Name                          | Description                                      |
|------------------|--------------------------------------|--------------------------------------------------|
| `SVFStmt`        | `toString`                           | Get the string representation of the SVF statement |
|                  | `getEdgeId`                          | Get the ID of the SVF statement                  |
|                  | `getICFGNode`                        | Get the ICFG node that the SVF statement belongs to |
|                  | `getValue`                           | Get the value of the SVF statement               |
|                  | `getBB`                              | Get the basic block that the SVF statement belongs to |
|                  | `isAddrStmt`                         | Check if the SVF statement is an address statement |
|                  | `isCopyStmt`                         | Check if the SVF statement is a copy statement   |
|                  | `isStoreStmt`                        | Check if the SVF statement is a store statement  |
|                  | `isLoadStmt`                         | Check if the SVF statement is a load statement   |
|                  | `isCallPE`                           | Check if the SVF statement is a call PE          |
|                  | `isRetPE`                            | Check if the SVF statement is a return PE        |
|                  | `isGepStmt`                          | Check if the SVF statement is a GEP statement    |
|                  | `isPhiStmt`                          | Check if the SVF statement is a phi statement    |
|                  | `isSelectStmt`                       | Check if the SVF statement is a select statement |
|                  | `isCmpStmt`                          | Check if the SVF statement is a compare statement |
|                  | `isBinaryOpStmt`                     | Check if the SVF statement is a binary operation statement |
|                  | `isUnaryOpStmt`                      | Check if the SVF statement is a unary operation statement |
|                  | `isBranchStmt`                       | Check if the SVF statement is a branch statement |
|                  | `asAddrStmt`                         | Downcast the SVF statement to an address statement |
|                  | `asCopyStmt`                         | Downcast the SVF statement to a copy statement   |
|                  | `asStoreStmt`                        | Downcast the SVF statement to a store statement  |
|                  | `asLoadStmt`                         | Downcast the SVF statement to a load statement   |
|                  | `asCallPE`                           | Downcast the SVF statement to a call PE          |
|                  | `asRetPE`                            | Downcast the SVF statement to a return PE        |
|                  | `asGepStmt`                          | Downcast the SVF statement to a GEP statement    |
|                  | `asPhiStmt`                          | Downcast the SVF statement to a phi statement    |
|                  | `asSelectStmt`                       | Downcast the SVF statement to a select statement |
|                  | `asCmpStmt`                          | Downcast the SVF statement to a compare statement |
|                  | `asBinaryOpStmt`                     | Downcast the SVF statement to a binary operation statement |
|                  | `asUnaryOpStmt`                      | Downcast the SVF statement to a unary operation statement |
|                  | `asBranchStmt`                       | Downcast the SVF statement to a branch statement |
| `AddrStmt`       | `getLHSVar`                          | Get the LHS variable of the address statement    |
|                  | `getLHSVarID`                        | Get the ID of the LHS variable of the address statement |
|                  | `getRHSVar`                          | Get the RHS variable of the address statement    |
|                  | `getRHSVarID`                        | Get the ID of the RHS variable of the address statement |
|                  | `getArrSize`                         | Get the array size of the address statement      |
| `CopyStmt`       | `getLHSVar`                          | Get the LHS variable of the copy statement       |
|                  | `getLHSVarID`                        | Get the ID of the LHS variable of the copy statement |
|                  | `getRHSVar`                          | Get the RHS variable of the copy statement       |
|                  | `getRHSVarID`                        | Get the ID of the RHS variable of the copy statement |
| `StoreStmt`      | `getLHSVar`                          | Get the LHS variable of the store statement      |
|                  | `getLHSVarID`                        | Get the ID of the LHS variable of the store statement |
|                  | `getRHSVar`                          | Get the RHS variable of the store statement      |
|                  | `getRHSVarID`                        | Get the ID of the RHS variable of the store statement |
| `LoadStmt`       | `getLHSVar`                          | Get the LHS variable of the load statement       |
|                  | `getLHSVarID`                        | Get the ID of the LHS variable of the load statement |
|                  | `getRHSVar`                          | Get the RHS variable of the load statement       |
|                  | `getRHSVarID`                        | Get the ID of the RHS variable of the load statement |
| `CallPE`         | `getCallSite`                        | Get the call site                                |
|                  | `getLHSVar`                          | Get the LHS variable of the call PE              |
|                  | `getLHSVarID`                        | Get the ID of the LHS variable of the call PE    |
|                  | `getRHSVar`                          | Get the RHS variable of the call PE              |
|                  | `getRHSVarID`                        | Get the ID of the RHS variable of the call PE    |
|                  | `getFunEntryICFGNode`                | Get the function entry ICFG node                 |
| `RetPE`          | `getCallSite`                        | Get the call site                                |
|                  | `getLHSVar`                          | Get the LHS variable of the return PE            |
|                  | `getLHSVarID`                        | Get the ID of the LHS variable of the return PE  |
|                  | `getRHSVar`                          | Get the RHS variable of the return PE            |
|                  | `getRHSVarID`                        | Get the ID of the RHS variable of the return PE  |
|                  | `getFunExitICFGNode`                 | Get the function exit ICFG node                  |
| `GepStmt`        | `getLHSVar`                          | Get the LHS variable of the GEP statement        |
|                  | `getLHSVarID`                        | Get the ID of the LHS variable of the GEP statement |
|                  | `getRHSVar`                          | Get the RHS variable of the GEP statement        |
|                  | `getRHSVarID`                        | Get the ID of the RHS variable of the GEP statement |
|                  | `isConstantOffset`                   | Check if the GEP statement has a constant offset |
|                  | `getConstantOffset`                  | Get the constant offset                          |
|                  | `getConstantByteOffset`              | Get the constant byte offset                     |
|                  | `getConstantStructFldIdx`            | Get the constant struct field index              |
|                  | `getOffsetVarAndGepTypePairVec`      | Get the offset variable and GEP type pair vector |
|                  | `getSrcPointeeType`                  | Get the source pointee type                      |
| `PhiStmt`        | `getRes`                             | Get the result variable                          |
|                  | `getResId`                           | Get the ID of the result variable                |
|                  | `getOpVar`                           | Get the operand variable                         |
|                  | `getOpICFGNode`                      | Get the operand ICFG node                        |
|                  | `getOpVarNum`                        | Get the number of operand variables              |
| `CmpStmt`        | `getPredicate`                       | Get the predicate                                |
|                  | `getRes`                             | Get the result variable                          |
|                  | `getResId`                           | Get the ID of the result variable                |
|                  | `getOpVar`                           | Get the operand variable                         |
|                  | `getOpVarNum`                        | Get the number of operands of the compare statement |
| `BinaryOPStmt`   | `getOpcode`                          | Get the opcode                                   |
|                  | `getRes`                             | Get the result variable                          |
|                  | `getResId`                           | Get the ID of the result variable                |
|                  | `getOpVar`                           | Get the operand variable                         |
| `UnaryOPStmt`    | `getOp`                              | Get the opcode                                   |
|                  | `getRes`                             | Get the result variable                          |
|                  | `getResVar`                          | Get the result variable                          |
|                  | `getResId`                           | Get the ID of the result variable                |
|                  | `getOpVar`                           | Get the operand variable                         |
|                  | `getOpVarId`                         | Get the ID of the operand variable               |
| `BranchStmt`     | `getSuccessors`                      | Get the successors of the branch statement       |
|                  | `getNumSuccessors`                   | Get the number of successors                     |
|                  | `isConditional`                      | Check if the branch statement is conditional     |
|                  | `isUnconditional`                    | Check if the branch statement is unconditional   |
|                  | `getCondition`                       | Get the condition variable                       |
|                  | `getBranchInst`                      | Get the branch instruction                       |

From the other document [SVFIR](SVFIR.ipynb) (PAG), we know the definitions of these SVF statements. We can use these SVF statements to analyze the data flow and value propagation within the program, which is essential for optimizing compilers and detecting potential vulnerabilities.



In [9]:
for node in cfg.getNodes():
    for stmt in node.getSVFStmts():
        if isinstance(stmt, pysvf.AddrStmt):
            addr_stmt = stmt.asAddrStmt()
            print("AddrStmt: lhs_var={}, rhs_var={}".format(addr_stmt.getLHSVar(), addr_stmt.getRHSVar()))
        elif isinstance(stmt, pysvf.CopyStmt):
            copy_stmt = stmt.asCopyStmt()
            print("CopyStmt: lhs_var={}, rhs_var={}".format(copy_stmt.getLHSVar(), copy_stmt.getRHSVar()))
        elif isinstance(stmt, pysvf.BinaryOPStmt):
            binary_op_stmt = stmt.asBinaryOpStmt()
            print("BinaryOPStmt: opcode={}, res={}, op_var_0={}, op_var_1={}".format(binary_op_stmt.getOpcode(), binary_op_stmt.getRes(), binary_op_stmt.getOpVar(0), binary_op_stmt.getOpVar(1)))
        elif isinstance(stmt, pysvf.BranchStmt):
            branch_stmt = stmt.asBranchStmt()
            print("BranchStmt: successors={}, condition={}".format(branch_stmt.getSuccessors(), branch_stmt.getCondition()))
        elif isinstance(stmt, pysvf.LoadStmt):
            load_stmt = stmt.asLoadStmt()
            print("LoadStmt: lhs_var={}, rhs_var={}".format(load_stmt.getLHSVar(), load_stmt.getRHSVar()))
        elif isinstance(stmt, pysvf.StoreStmt):
            store_stmt = stmt.asStoreStmt()
            print("StoreStmt: lhs_var={}, rhs_var={}".format(store_stmt.getLHSVar(), store_stmt.getRHSVar()))
        elif isinstance(stmt, pysvf.CallPE):
            call_pe = stmt.asCallPE()
            print("CallPE: callsite={}, lhs_var={}, rhs_var={}".format(call_pe.getCallSite(), call_pe.getLHSVar(), call_pe.getRHSVar()))
        elif isinstance(stmt, pysvf.RetPE):
            ret_pe = stmt.asRetPE()
            print("RetPE: callsite={}, lhs_var={}, rhs_var={}".format(ret_pe.getCallSite(), ret_pe.getLHSVar(), ret_pe.getRHSVar()))
        elif isinstance(stmt, pysvf.GepStmt):
            gep_stmt = stmt.asGepStmt()
            print("GepStmt: lhs_var={}, rhs_var={}".format(gep_stmt.getLHSVar(), gep_stmt.getRHSVar()))
        elif isinstance(stmt, pysvf.PhiStmt):
            phi_stmt = stmt.asPhiStmt()
            print("PhiStmt: res_var={}, op_var_num={}".format(phi_stmt.getRes(), phi_stmt.getOpVarNum()))
        elif isinstance(stmt, pysvf.CmpStmt):
            cmp_stmt = stmt.asCmpStmt()
            print("CmpStmt: predicate={}, res={}, op_var0={}, op_var1={}".format(cmp_stmt.getPredicate(), cmp_stmt.getRes(), cmp_stmt.getOpVar(0), cmp_stmt.getOpVar(1)))
        elif isinstance(stmt, pysvf.UnaryOPStmt):
            unary_op_stmt = stmt.asUnaryOpStmt()
            print("UnaryOPStmt: opcode={}, res={}, op_var={}".format(unary_op_stmt.getOpcode(), unary_op_stmt.getRes(), unary_op_stmt.getOpVar()))

CopyStmt: lhs_var=DummyValVar ID: 1, rhs_var=ConstNullPtrValVar ID: 0
 ptr null { constant data }
AddrStmt: lhs_var=ConstIntValVar ID: 5
 i8 37 { constant data }, rhs_var=ConstIntObjVar ID: 6
 i8 37 { constant data }
AddrStmt: lhs_var=ConstIntValVar ID: 7
 i8 100 { constant data }, rhs_var=ConstIntObjVar ID: 8
 i8 100 { constant data }
AddrStmt: lhs_var=ConstIntValVar ID: 9
 i8 10 { constant data }, rhs_var=ConstIntObjVar ID: 10
 i8 10 { constant data }
AddrStmt: lhs_var=ConstIntValVar ID: 11
 i8 0 { constant data }, rhs_var=ConstIntObjVar ID: 12
 i8 0 { constant data }
AddrStmt: lhs_var=ConstIntValVar ID: 48
 i32 3 { constant data }, rhs_var=ConstIntObjVar ID: 49
 i32 3 { constant data }
AddrStmt: lhs_var=ConstIntValVar ID: 45
 i64 1 { constant data }, rhs_var=ConstIntObjVar ID: 46
 i64 1 { constant data }
AddrStmt: lhs_var=ConstIntValVar ID: 42
 i32 5 { constant data }, rhs_var=ConstIntObjVar ID: 43
 i32 5 { constant data }
AddrStmt: lhs_var=ConstIntValVar ID: 39
 i64 0 { constant da

The above code snippet demonstrates how to reach SVFStmts and SVFVars from ICFGNode. By analyzing these SVF statements, we can understand the data flow and value propagation within the program, which is essential for optimizing compilers and detecting potential vulnerabilities.

### Summary

In this tutorial, we explored the use of the SVF tool for static analysis of LLVM bitcode, focusing on the Interprocedural Control-Flow Graph (ICFG). We learned how to load the ICFG from a bitcode file, traverse the ICFG nodes and edges, and analyze the control flow between functions. By using the ICFG-related functions, we gained insights into the program's structure, control flow, and data flow, which are essential for optimizing compilers and detecting potential vulnerabilities.