WarpTrace: GPU Branch Divergence Simulator

Hardware-accurate SIMT stack simulation with immediate post-dominator reconvergence

Overview

WarpTrace is a high-fidelity GPU branch divergence simulator that implements hardware-accurate SIMT (Single Instruction Multiple Thread) execution with stack-based reconvergence. This project demonstrates:

✅ 100% thread divergence resolution using immediate post-dominator reconvergence algorithm
🔧 Hardware-accurate SIMT stack implementation in C++
📊 Static analysis of NVIDIA SASS assembly code
🔄 Automatic Control Flow Graph (CFG) generation and export
📈 Detailed execution statistics and performance metrics

Key Features

C++ Simulation Engine

The core simulator implements a production-quality SIMT stack that mirrors real GPU hardware:

struct StackEntry {
    int pc;                  // Reconvergence program counter
    uint32_t active_mask;    // 32-bit thread activity mask
};

Divergence Handling:

Automatic detection of branch divergence points
Stack-based tracking of divergent execution paths
Guaranteed reconvergence at immediate post-dominator blocks
Support for nested divergence and complex control flow

Python SASS Analyzer

Static analysis toolkit for NVIDIA SASS assembly:

Instruction Parser: Parses SASS opcodes, operands, and branch targets
CFG Builder: Constructs control flow graphs with basic block analysis
Divergence Detection: Identifies branch points and reconvergence locations
JSON Export: Generates CFG data for the C++ simulator

Architecture

┌─────────────────────────────────────────────────────────────┐
│                      WarpTrace Pipeline                      │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  SASS Assembly  ──→  Python Parser  ──→  CFG Builder        │
│                                            │                 │
│                                            ↓                 │
│                                        JSON Export           │
│                                            │                 │
│                                            ↓                 │
│  C++ Simulator  ←───  Load CFG  ←───  cfg.json             │
│      │                                                       │
│      ├─→  SIMT Stack Execution                             │
│      ├─→  Divergence Handling                              │
│      ├─→  Thread Reconvergence                             │
│      └─→  Statistics & Metrics                             │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Building

Requirements

C++: C++17 compatible compiler (GCC 7+, Clang 5+, MSVC 2017+)
CMake: Version 3.12 or higher
Python: Version 3.8 or higher

Compilation

# Create build directory
mkdir build && cd build

# Configure with CMake
cmake ..

# Build
cmake --build .

# The executable will be: build/warptrace (or warptrace.exe on Windows)

Usage

1. Analyze SASS Code

Create or use an existing SASS assembly file, then parse it with the Python analyzer:

cd src/python
python main.py analyze ../../examples/simple_branch.sass -o cfg.json

This generates:

cfg.json: Control Flow Graph for the C++ simulator
Console output with CFG statistics

2. Run Simulation

Execute the C++ simulator with the generated CFG:

./build/warptrace cfg.json

Or run the built-in demonstration:

./build/warptrace --demo

Example Output

WarpTrace: GPU Branch Divergence Simulator
========================================

[SIMT] Initialized with PC=0, mask=0xffffffff

[DIVERGENCE] at PC=1
  Current mask: 0xffffffff
  Taken mask:   0x0000ffff -> PC=2 (16 threads)
  Not-taken:    0xffff0000 -> PC=3 (16 threads)
  Reconverge at PC=4
  Stack depth: 2

[RECONVERGE] PC=2 -> 4, mask=0xffff -> 0xffffffff
  Active threads: 32
  
=== Execution Statistics ===
Total instructions executed: 156
Divergent branches: 1
Reconvergences: 1
Reconvergence rate: 100%
============================

✓ Successfully handled divergence with 100% reconvergence

Example Programs

The examples/ directory contains sample SASS programs demonstrating various divergence patterns:

Simple Branch (`simple_branch.sass`)

Basic if-then-else divergence with 50/50 thread split.

ISETP.LT.AND P0, PT, R0, 16, PT  // if (threadIdx < 16)
@P0 BRA then_block               // 16 threads take this path
// else_block: 16 threads take this path

Nested Branches (`nested_branch.sass`)

Multiple levels of divergence demonstrating recursive reconvergence.

Loop Divergence (`loop_divergence.sass`)

Threads exit a loop at different iterations, reconverging at loop exit.

Technical Implementation

SIMT Stack Algorithm

The simulator implements the immediate post-dominator reconvergence algorithm:

Divergence Detection: When a branch instruction is encountered, the warp splits based on predicate evaluation
Stack Management: Push reconvergence point and not-taken path onto stack
Sequential Execution: Execute taken path first with reduced active mask
Reconvergence: Pop stack to restore full warp when paths complete

This matches the behavior of real NVIDIA GPU hardware (Fermi architecture and later).

CFG Construction

The Python analyzer performs:

Lexical Analysis: Tokenize SASS instructions
Semantic Analysis: Identify control flow instructions (BRA, JMP, RET, etc.)
Graph Building: Create basic blocks and edges
Post-Dominator Analysis: Find reconvergence points for branches

Project Structure

WarpTrace/
├── src/
│   ├── cpp/                    # C++ simulation engine
│   │   ├── simt_stack.h        # SIMT stack interface
│   │   ├── simt_stack.cpp      # Stack implementation
│   │   └── simulator.cpp       # Main simulator executable
│   └── python/                 # Python SASS analyzer
│       ├── __init__.py
│       ├── sass_parser.py      # SASS instruction parser
│       ├── cfg_builder.py      # Control flow graph builder
│       ├── cfg_exporter.py     # JSON/DOT export
│       └── main.py             # Command-line interface
├── examples/                   # Sample SASS programs
│   ├── simple_branch.sass
│   ├── nested_branch.sass
│   └── loop_divergence.sass
├── CMakeLists.txt             # Build configuration
└── README.md

Performance Metrics

The simulator tracks and reports:

Total Instructions: Dynamic instruction count across all threads
Divergent Branches: Number of branch divergence events
Reconvergences: Number of successful thread reconvergences
Reconvergence Rate: Percentage of divergences that reconverge (target: 100%)
Stack Depth: Maximum SIMT stack depth during execution
Active Thread Count: Threads executing at each program point

Related Technologies

NVIDIA SASS: Streaming ASSembly - low-level GPU instruction format
CUDA: Parallel computing platform and programming model
SIMT: Single Instruction Multiple Thread execution model
PTX: Parallel Thread Execution - NVIDIA's virtual ISA

Future Enhancements

Real PTX/SASS compilation integration
Interactive execution debugger
Warp occupancy analysis
Memory divergence tracking
Visualization of execution timeline
Support for other GPU architectures (AMD, Intel)

Technical Background

Branch Divergence Problem

In GPU architectures, threads are grouped into warps (typically 32 threads) that execute in lockstep (SIMT). When threads in a warp take different paths at a branch:

Divergence: Warp splits into multiple execution paths
Serialization: Paths execute sequentially (performance penalty)
Reconvergence: Threads rejoin at a common point

WarpTrace simulates this behavior to:

Understand GPU performance characteristics
Validate compiler optimizations
Analyze control flow complexity
Educate about GPU architecture

Why This Matters

Branch divergence is a critical performance bottleneck in GPU computing:

Performance Impact: Up to 32× slowdown for fully divergent warps
Compiler Optimization: Modern compilers try to minimize divergence
Algorithm Design: GPU algorithms must consider warp-level behavior

License

MIT License - See LICENSE file for details

Author

Created as a portfolio demonstration of:

Systems programming in C++
GPU architecture knowledge
Compiler/simulator design
Professional software engineering practices

View on GitHub

Simulating GPU hardware instruction scheduling with stack-based reconvergence

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
examples		examples
src		src
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
CMakeLists.txt		CMakeLists.txt
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
build.bat		build.bat
build.sh		build.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WarpTrace: GPU Branch Divergence Simulator

Overview

Key Features

C++ Simulation Engine

Python SASS Analyzer

Architecture

Building

Requirements

Compilation

Usage

1. Analyze SASS Code

2. Run Simulation

Example Output

Example Programs

Simple Branch (`simple_branch.sass`)

Nested Branches (`nested_branch.sass`)

Loop Divergence (`loop_divergence.sass`)

Technical Implementation

SIMT Stack Algorithm

CFG Construction

Project Structure

Performance Metrics

Related Technologies

Future Enhancements

Technical Background

Branch Divergence Problem

Why This Matters

License

Author

About

Uh oh!

Releases

Packages

Languages

License

MuadhGeorge/GPU-Branch-Simulator

Folders and files

Latest commit

History

Repository files navigation

WarpTrace: GPU Branch Divergence Simulator

Overview

Key Features

C++ Simulation Engine

Python SASS Analyzer

Architecture

Building

Requirements

Compilation

Usage

1. Analyze SASS Code

2. Run Simulation

Example Output

Example Programs

Simple Branch (simple_branch.sass)

Nested Branches (nested_branch.sass)

Loop Divergence (loop_divergence.sass)

Technical Implementation

SIMT Stack Algorithm

CFG Construction

Project Structure

Performance Metrics

Related Technologies

Future Enhancements

Technical Background

Branch Divergence Problem

Why This Matters

License

Author

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Simple Branch (`simple_branch.sass`)

Nested Branches (`nested_branch.sass`)

Loop Divergence (`loop_divergence.sass`)

Packages