
## PySVF Tutorial: SVFIR (Program Assignment Graph)

### Introduction

This Jupyter Notebook explores the use of the SVF (Static Value-Flow) tool for static analysis of LLVM bitcode. We will focus on utilizing the `pysvf` library to analyze bitcode files and extract valuable insights. The journey begins with the `getPAG` function, which is essential for analyzing LLVM bitcode and returning an `SVFIR` (Static Value-Flow Intermediate Representation). This notebook will guide you through setting up the environment, using the `getPAG` function, and analyzing the resulting data structures.

### Significance

- **Data Flow Analysis**: PAG offers a comprehensive framework for analyzing data flow within a program. It aids in understanding how values propagate through variables and functions, which is crucial for optimizing compilers and detecting potential vulnerabilities.
- **Pointer Analysis**: A primary application of PAG is in pointer analysis, helping determine the possible values that pointers can reference at various points in the program.
- **Optimization and Refactoring**: By providing insights into data flow, PAG assists in optimizing code and refactoring it for enhanced performance and maintainability.

### Functionality

- **Representation of Variables and Statements**: In PAG, variables are represented as nodes, and statements as edges. This graph-based representation allows for efficient traversal and analysis of the program's structure.
- **Interprocedural Analysis**: PAG supports interprocedural analysis, enabling data flow analysis across function boundaries, offering a more comprehensive view of the program's behavior.
- **Integration with LLVM**: Designed to work seamlessly with LLVM bitcode, PAG is a powerful tool for analyzing programs written in languages supported by LLVM.

By leveraging these features, PAG provides a robust framework for static analysis, enabling developers to gain deep insights into their code's behavior and optimize it accordingly.

**The code above performs the following steps:**

- Import the `pysvf` library: This library is used for static analysis of LLVM bitcode.
- Load the LLVM bitcode file: The bitcode file named `demo.ll` is loaded.
- Get the PAG (SVFIR) from the bitcode file: The `getPAG` function is used to obtain the Program Assignment Graph (PAG), also known as the Static Value-Flow Intermediate Representation (SVFIR).
- Get the control flow graph (CFG) from the PAG: The `getICFG` function is used to extract the interprocedural control flow graph (CFG) from the PAG.



In [6]:
# bash install pip install pysvf from testpypi

!pip install pysvf --index-url https://test.pypi.org/simple/

Looking in indexes: https://test.pypi.org/simple/


In [1]:
# Install the pysvf library
# You might need to run this command in your terminal or use a Jupyter magic command
# !pip install pysvf

# Import necessary libraries
import pysvf

# Load the LLVM bitcode file
bitcode_file = "demo.ll"

# Get the pag(SVFIR) from the bitcode file
svfir = pysvf.getPAG(bitcode_file)

# Get the control flow graph (CFG) from the pag
cfg = svfir.getICFG()



*********General Stats***************
################ (program : demo.ll)###############
AddrsNum            21
BBWith2Succ         1
BBWith3Succ         0
CallsNum            3
ConstArrayObj       0
ConstStructObj      0
ConstantObj         14
CopysNum            2
FIObjNum            13
FSObjNum            9
FunctionObjs        5
GepsNum             10
GlobalObjs          1
HeapObjs            1
IndCallSites        0
LoadsNum            3
MaxStructSize       0
NonPtrObj           22
ReturnsNum          1
StackObjs           2
StoresNum           7
TotalCallSite       4
TotalFieldObjects   2
TotalObjects        25
TotalPTASVFStmts    22
TotalPointers       70
TotalSVFStmts       56
VarArrayObj         1
VarStructObj        0
----------------Time and memory stats--------------------
LLVMIRTime          0.006
SVFIRTime           0.002
SymbolTableTime     0.001
#######################################################

*********PTACallGraph Stats (Andersen analysis)***************
######

### PAG Nodes (SVFVar)

The PAG (Program Assignment Graph) is a key data structure in SVFIR that represents the static value-flow information of the program. It consists of nodes that represent variables and statements, and edges that represent the flow of values between them. By analyzing the PAG nodes, we can gain insights into how values are propagated through the program and how variables are related to each other.

Let's explore the PAG nodes in the SVFIR obtained from the LLVM bitcode file. We will extract the number of PAG nodes and print information about each node to understand its structure and properties.

- Retrieve the total number of PAG nodes using svfir.get_pag_node_num().
- Iterate through each node and retrieve its details using svfir.get_gnode(i).

Print the details of each node.


In [2]:

pag_node_num = svfir.getPAGNodeNum()
print("Number of PAG nodes: ", pag_node_num)

for i in range(pag_node_num):
    node = svfir.getGNode(i)
    print("Node ", i, " : ", node)

Number of PAG nodes:  95
Node  0  :  ConstNullPtrValVar ID: 0
 ptr null { constant data }
Node  1  :  DummyValVar ID: 1
Node  2  :  DummyObjVar ID: 2
Node  3  :  DummyObjVar ID: 3
Node  4  :  GlobalValVar ID: 4
 @.str = private unnamed_addr constant [4 x i8] c"%d\0A\00", align 1 { Glob  }
Node  5  :  ConstIntValVar ID: 5
 i8 37 { constant data }
Node  6  :  ConstIntObjVar ID: 6
 i8 37 { constant data }
Node  7  :  ConstIntValVar ID: 7
 i8 100 { constant data }
Node  8  :  ConstIntObjVar ID: 8
 i8 100 { constant data }
Node  9  :  ConstIntValVar ID: 9
 i8 10 { constant data }
Node  10  :  ConstIntObjVar ID: 10
 i8 10 { constant data }
Node  11  :  ConstIntValVar ID: 11
 i8 0 { constant data }
Node  12  :  ConstIntObjVar ID: 12
 i8 0 { constant data }
Node  13  :  GlobalObjVar ID: 13
 @.str = private unnamed_addr constant [4 x i8] c"%d\0A\00", align 1 { Glob  }
Node  14  :  FunValVar ID: 14
add_or_sub
Node  15  :  FunObjVar ID: 15 (base object)
add_or_sub
Node  16  :  RetValPN ID: 16 uni


SVFVar is a class representing a variable in the SVF (Static Value-Flow) framework. It is not intended for direct instantiation and provides various methods to interact with and query properties of the variable.

Here we can simply introduce the inheritance hierarchy of the SVFVar class:

- **SVFVar** : Base class for all SVF variables.
  - **ValVar**: Represents a value variable.
    - **ArgValVar** : Represents an argument value variable.
    - **GepValVar** : Represents a getelementptr value variable.
    - **FunValVar** : Represents a function value variable.
    - **ConstDataValVar** : Represents a constant data value variable.
      - **ConstFPValVar** : Represents a constant floating-point value variable.
      - **ConstIntValVar** : Represents a constant integer value variable.
      - **ConstNullPtrValVar** : Represents a constant null pointer value variable.
    - **RetValPN** : Represents a return value variable.
    - **VarArgValPN** : Represents a variable argument value variable.
    - **DummyValVar** : Represents a dummy value variable.
    - **GlobalValVar** : Represents a global value variable.
    - **ConstAggValVar** : Represents a constant aggregate value variable.
    - **BlackHoleValVar** : Represents a black hole value variable.
  - **ObjVar** : Represents an object variable.
    - **BaseObjVar** : Represents a base object variable.
      - **FunObjVar** : Represents a function object variable.
      - **GlobalObjVar** : Represents a global object variable.
      - **HeapObjVar** : Represents a heap object variable.
      - **StackObjVar** : Represents a stack object variable.
      - **ConstAggObjVar** : Represents a constant aggregate object variable.
      - **ConstDataObjVar** : Represents a constant data object variable.
        - **ConstFPObjVar** : Represents a constant floating-point object variable.
        - **ConstIntObjVar** : Represents a constant integer object variable.
        - **ConstNullPtrObjVar** : Represents a constant null pointer object variable.
      - **DummyObjVar** : Represents a dummy object variable.
    - **GepObjVar** : Represents a getelementptr object variable.


Then we can downcast the SVFVar object to its derived type and access the specific properties and methods of that derived type. This allows us to work with the variables in a more specialized manner based on their type and context. 

In [3]:
for i in range(pag_node_num):
    svfvar = svfir.getGNode(i)
    if isinstance(svfvar, pysvf.ObjVar): # or you can use svfvar.is_obj_var():
        obj_var = svfvar.asObjVar()
        print("Object Variable ", i, " : ", obj_var)
    elif isinstance(svfvar, pysvf.ObjVar): # or you can use svfvar.is_val_var():
        val_var = svfvar.asValVar()
        print("Value Variable ", i, " : ", val_var)
    # other derived types can be checked similarly

Object Variable  2  :  DummyObjVar ID: 2
Object Variable  3  :  DummyObjVar ID: 3
Object Variable  6  :  ConstIntObjVar ID: 6
 i8 37 { constant data }
Object Variable  8  :  ConstIntObjVar ID: 8
 i8 100 { constant data }
Object Variable  10  :  ConstIntObjVar ID: 10
 i8 10 { constant data }
Object Variable  12  :  ConstIntObjVar ID: 12
 i8 0 { constant data }
Object Variable  13  :  GlobalObjVar ID: 13
 @.str = private unnamed_addr constant [4 x i8] c"%d\0A\00", align 1 { Glob  }
Object Variable  15  :  FunObjVar ID: 15 (base object)
add_or_sub
Object Variable  22  :  ConstIntObjVar ID: 22
 i32 0 { constant data }
Object Variable  32  :  FunObjVar ID: 32 (base object)
main
Object Variable  35  :  StackObjVar ID: 35
   %0 = alloca i8, i64 8, align 8 
Object Variable  37  :  ConstIntObjVar ID: 37
 i64 8 { constant data }
Object Variable  40  :  ConstIntObjVar ID: 40
 i64 0 { constant data }
Object Variable  43  :  ConstIntObjVar ID: 43
 i32 5 { constant data }
Object Variable  46  :  Con

### Additional SVFIR APIs

The `pysvf` library offers a variety of APIs to interact with SVFIR data structures and conduct static analysis on LLVM bitcode. Below are some key APIs that facilitate the extraction of valuable insights from SVFIR:

```python
class SVFIR(SVFLLVMValue):
    def __init__(self, *args, **kwargs) -> None: ...
    """Not intended for direct instantiation."""
    
    def getICFG(self) -> ICFG: ...
    """Retrieve the ICFG of the SVFIR"""

    def getCallSites(self) -> List[CallICFGNode]: ...
    """Retrieve the call sites of the SVFIR"""

    def getPAGNodeNum(self) -> int: ...
    """Retrieve the number of PAG nodes"""
    
    def getCallGraph(self) -> "CallGraph": ...
    """Retrieve the call graph of the SVFIR"""
    
    def getBaseObject(self, id: int) -> BaseObjVar: ...
    """Retrieve the base object with the specified ID"""
    
    def getGNode(self, id: int) -> SVFVar: ...
    """Retrieve the SVFVar with the specified ID"""
    
    def getGepObjVar(self, id: int, offset: int) -> int: ...
    """Retrieve the GEP object variable ID"""
    
    def getNumOfFlattenElements(self, T: SVFType) -> int: ...
    """Retrieve the number of flattened elements"""
    
    def getFlattenedElemIdx(self, T: SVFType, origId: int) -> int: ...
    """Retrieve the flattened element index"""
```

Utilizing these APIs allows for comprehensive static analysis of LLVM bitcode, providing insights into the program's structure and behavior.

### Summary

In this tutorial, we delved into the use of the `pysvf` library for static analysis of LLVM bitcode. We explored how to extract the PAG (Program Assignment Graph) from a bitcode file and analyze the PAG nodes to comprehend the static value-flow information of the program. By leveraging SVFIR data structures and APIs, you can gain profound insights into the program's behavior, optimize the code, and identify potential vulnerabilities.

