Skip to content

MapleSilicon/SparseFlow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

59 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SparseFlow

MLIR-based compiler for structured sparsity optimization

Build Status MLIR License


🎯 What is SparseFlow?

SparseFlow is an MLIR compiler infrastructure that detects and exploits structured sparsity in tensor operations. Our SPA (Sparsity Propagation Analysis) pass performs static analysis at compile-time to identify zero computation and generate optimized runtimes.

Key Achievement

βœ… ~4Γ— CPU speedup on structured sparse matmuls (proven and reproducible)
βœ… Static analysis at compile-time (no runtime overhead)
βœ… 2D sparsity tracking (rows + columns)
βœ… Production-ready OpenMP runtime
βœ… Cross-platform verified (WSL + GitHub Codespaces)


πŸ“Š Project Status β€” SPA v0.6

Last Updated: December 2024

βœ… What Works (Production-Ready)

  • MLIR SPA Pass: 2D sparsity analysis for linalg.matmul (row + column masks)
  • JSON Export: spa_sparsity.json with runtime-ready metadata
  • Python Demos: Reference implementations for validation
  • C++ OpenMP Runtime: Production kernel achieving ~4Γ— CPU speedup
  • Cross-Platform: Verified on WSL (Ubuntu 22.04) and GitHub Codespaces (Ubuntu 24.04)
  • Health Check: One-command verification (./quick_check.sh)
  • Documentation: Technical overview, pitch deck, benchmarks

⚠️ What's Missing (Future Work)

  • GPU Kernels: No CUDA/ROCm support yet (CPU-only)
  • MLIR Integration: No automatic lowering to runtime calls
  • Framework Integration: No PyTorch / ONNX / TensorRT support
  • Dynamic Sparsity: Only static analysis (no runtime profiling)

🎯 Honest Claim

"SparseFlow SPA v0.6 provides static 2D sparsity analysis for MLIR that detects ~75% removable FLOPs on structured patterns, exports JSON metadata, and drives an OpenMP runtime achieving ~4Γ— CPU speedup on benchmarks from 128Γ—128 to 1024Γ—1024. Verified on WSL and GitHub Codespaces."


πŸš€ Quick Start

Try in GitHub Codespaces (3 minutes)

# Open this repo in Codespaces, then:

# 1) Health check (builds everything + runs tests)
./quick_check.sh

# 2) See the speedup
cd runtime/build && ./benchmark_sparse

Expected Result: ~4Γ— speedup on CPU with ~75% sparsity detection

Local Setup (WSL/Linux)

# Prerequisites
sudo apt install -y llvm-19-dev mlir-19-tools libmlir-19-dev libomp-dev

# Clone and build
git clone https://github.com/MapleSilicon/SparseFlow.git
cd SparseFlow

# Build compiler passes
cd compiler/build
cmake -DCMAKE_PREFIX_PATH=/usr/lib/llvm-19 .. && make -j4

# Build runtime
cd ../../runtime/build
cmake .. && make -j4

# Run demo
cd ../../
./run_spa_v06_demo.sh

πŸ“ˆ Benchmark Results

CPU Performance (OpenMP, GitHub Codespaces)

Matrix Size Dense Time Sparse Time Speedup
256Γ—256 22.3 ms 5.2 ms 4.3Γ—
512Γ—512 336 ms 101 ms 3.3Γ—
768Γ—768 745 ms 156 ms 4.8Γ—
1024Γ—1024 4073 ms 945 ms 4.3Γ—

Average: 4.2Γ— speedup (consistent with 75% FLOP reduction)

Pattern: 50% row + 50% column sparsity = 75% total sparsity

See BENCHMARKS.md for detailed methodology and cross-environment results.


πŸ“š Documentation


πŸ”¬ How It Works

Pipeline Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ MLIR Source β”‚  Standard linalg.matmul
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  SPA Pass   β”‚  Detects: rowmask=[T,F,T,F]
β”‚   (v0.6)    β”‚          colmask=[T,T,F,F]
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ JSON Export β”‚  spa_sparsity.json
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ C++ Runtime β”‚  OpenMP masked matmul
β”‚   (OpenMP)  β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ ~4Γ— Speedup β”‚  πŸ”₯
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Example

Input MLIR:

linalg.matmul ins(%A, %B : tensor<512x512xf32>)

After SPA Analysis:

linalg.matmul {
  sparseflow.spa_rowmask = [true, false, true, false, ...],
  sparseflow.spa_colmask = [true, true, false, false, ...]
} ins(%A, %B : tensor<512x512xf32>)

JSON Export:

{
  "name": "linalg.matmul",
  "row_sparsity_pct": 50,
  "col_sparsity_pct": 50,
  "total_rows": 512,
  "total_cols": 512
}

Runtime: Uses masks to skip 75% of computation β†’ 3.3Γ— faster


πŸ› οΈ Repository Structure

SparseFlow/
β”œβ”€β”€ compiler/passes/        # MLIR analysis passes
β”‚   β”œβ”€β”€ spa/               # SPA v0.6 implementation
β”‚   β”œβ”€β”€ SPAExportPass.cpp  # JSON export
β”‚   └── ...
β”œβ”€β”€ runtime/               # C++ OpenMP runtime
β”‚   β”œβ”€β”€ masked_matmul.cpp  # Optimized sparse kernel
β”‚   └── benchmark_sparse.cpp
β”œβ”€β”€ docs/
β”‚   β”œβ”€β”€ SPA_OVERVIEW.md    # Technical deep-dive
β”‚   └── pitch/SLIDES.md    # Investor deck
β”œβ”€β”€ tests/                 # Test cases
β”œβ”€β”€ quick_check.sh         # Health check script
β”œβ”€β”€ run_spa_v06_demo.sh    # Complete demo
└── BENCHMARKS.md          # Performance results

πŸŽ“ Technical Details

What SPA Detects

  • 2D Sparsity: Tracks zero rows AND columns (not just 1D)
  • Static Analysis: Compile-time detection (no runtime overhead)
  • Structured Patterns: N:M, block, and custom sparsity
  • Propagation: Tracks sparsity through arithmetic operations

Supported Operations (SPA v0.6)

  • βœ… linalg.matmul (fully supported)
  • βœ… arith.addf, arith.subf (union semantics)
  • βœ… arith.mulf, arith.divf (intersection semantics)
  • βœ… arith.maximumf (ReLU detection)
  • βœ… linalg.transpose (swaps rows ↔ cols)
  • βœ… linalg.reduce (preserves non-reduced dimension)
  • βœ… tensor.expand_shape (broadcasts pattern)

Runtime Implementation

  • Language: C++ with OpenMP
  • Parallelization: #pragma omp parallel for
  • Mask Type: std::vector<uint8_t> (SIMD-friendly)
  • Algorithm: Extract active block β†’ compute β†’ scatter back

🚧 Roadmap

βœ… Phase 1: Static Analysis (Complete)

  • 2D sparsity tracking
  • JSON export
  • CPU runtime
  • Cross-platform verification

πŸ”¨ Phase 2: GPU Acceleration (Next)

  • CUDA masked matmul kernel
  • 10-50Γ— speedup potential
  • cuSPARSE comparison

πŸ“… Phase 3: Framework Integration (Future)

  • PyTorch plugin
  • ONNX Runtime backend
  • TensorRT integration

πŸ”¬ Phase 4: Advanced Features (Research)

  • Dynamic sparsity profiling
  • Automatic pattern detection
  • Multi-dimensional tensors

🀝 Contributing

Contributions welcome! Areas of interest:

  • GPU kernel development (CUDA/ROCm)
  • MLIR dialect integration
  • Framework plugins (PyTorch/ONNX)
  • Benchmark suite expansion

πŸ“« Contact

Gourav Kumar - Founder, MapleSilicon
GitHub: @MapleSilicon
Project: https://github.com/MapleSilicon/SparseFlow


πŸ“„ License

Apache 2.0 - See LICENSE for details


πŸŽ‰ Acknowledgments

Built with LLVM/MLIR 19. Tested on WSL and GitHub Codespaces.

Star this repo ⭐ if you find it useful!

SPA Runtime (C++ / OpenMP)

SparseFlow includes a minimal C++ runtime that consumes the SPA masks and accelerates matmuls on CPU:

  • Uses row/column masks from SPA to skip zero rows/cols
  • Implements a blocked, OpenMP-parallel matmul kernel
  • Achieves ~3–4Γ— speedup on large matmuls (512–1024) when SPA detects 75% sparsity

Quick Start

Run the full demo:

./spa-runner.sh

This will run:

  • MLIR β†’ SPA β†’ spa_sparsity.json
  • C++ runtime benchmark with dense vs sparse timings

Results

On CPU with 50% row + 50% col sparsity (75% FLOP reduction):

  • 512Γ—512: ~3.4Γ— speedup
  • 1024Γ—1024: ~4.9Γ— speedup
  • Theoretical maximum: 4.0Γ—

Performance varies with cache effects and OpenMP overhead. Production deployments should target workloads β‰₯512Γ—512 for consistent speedup.

Python CLI (developer preview)

From the repo root:

cd sparseflow_package
pip install -e .



## MLIR Driver (sparseflow-opt.sh)

SparseFlow provides a small convenience wrapper around \`mlir-opt\` to run SPA:

```bash
./sparseflow-opt.sh tests/test_spa_v6_full_2d.mlir > /tmp/out.mlir
cat spa_sparsity.json