Introduce Quadtrix benchmark suite with Python and C++ support by Eamon2009 · Pull Request #44 · Eamon2009/Quadtrix.cpp

Eamon2009 · 2026-05-21T13:05:57Z

Summary

Project Versioning: Sets the starting project version to 0.1.0.
Code Shortcuts (Macros): Creates clean shorthand terms for CUDA keywords (like wrapping device into QX_DEVICE) to make writing GPU kernels cleaner.
Math & Memory Utilities: Adds fast math helpers for aligning memory, rounding numbers, and calculating power-of-two boundaries quickly.
Memory Optimization: Forces a 128-byte memory alignment to ensure the GPU can read data as fast as possible (coalesced memory access).
Automatic Error Checking: Introduces safety wrappers (CUDA_CHECK, CUBLAS_CHECK, NCCL_CHECK) that instantly watch for crashes or failures in Nvidia's core hardware and math libraries, making debugging much easier.

## Summary Introduces a CLI tool to load, index, and align benchmark JSON results from both backends. It displays a side-by-side comparison table showing latency (ms), throughput (tokens/s), and the percentage speedup/slowdown.

## Summary execution wrapper for Python runner Adds a boilerplate compatibility script to handle safe system exits and execution routing for python benchmark.

## Summary Introduces the primary Python benchmark runner, measuring model metadata, data throughput, forward latency, training-step latency, and autoregressive generation. Includes utility functions for dynamic module loading, timing, and percentile calculation. ## Model BenchmarkingLatency Profiling: Tracks forward pass, training step, and autoregressive generation latencies.Throughput Tracking: Measures tokenizer processing speeds and data throughput.Resource Monitoring: Captures model metadata and system memory footprints during runs. ## Math UtilitiesDynamic Loading: Implements safe runtime module loading via importlib to dynamically interact with engine/inference.py.Statistical Metrics: Adds custom mathematical utility functions, including a precise percentile calculator ($P_{50}$, $P_{90}$, $P_{99}$) for latency distribution reporting.Standardized Exports: Lays the groundwork for structured JSON and CSV output formatting.

Introduces the primary C++ benchmark runner (cpp_benchmark.cpp). It defines the parsing configurations, tracking metrics structures (Stats and BenchRow), and basic time/utility abstractions needed to mirror the Python benchmark suite capabilities.

Added an image to the README for better visualization.

…ted iGPUs

…ted GPUs (iGPUs)

- Define core architecture, compiler hints (`QX_INLINE`, `QX_DEVICE`, etc.). - Implement generic math macros (`CEIL_DIV`, `ROUND_UP`, `NEXT_POW2`). - Add memory alignment utilities targeting 128-byte boundaries. - Implement explicit error-checking macros for CUDA, cuBLAS, and NCCL.

…45) ## Summary - Project Versioning: Sets the starting project version to 0.1.0. - Code Shortcuts (Macros): Creates clean shorthand terms for CUDA keywords (like wrapping __device__ into QX_DEVICE) to make writing GPU kernels cleaner. - Math & Memory Utilities: Adds fast math helpers for aligning memory, rounding numbers, and calculating power-of-two boundaries quickly. - Memory Optimization: Forces a 128-byte memory alignment to ensure the GPU can read data as fast as possible (coalesced memory access). - Automatic Error Checking: Introduces safety wrappers (CUDA_CHECK, CUBLAS_CHECK, NCCL_CHECK) that instantly watch for crashes or failures in Nvidia's core hardware and math libraries, making debugging much easier.

Eamon2009 added 15 commits May 16, 2026 18:16

Entry point for Python benchmark (#41)

21d9654

## Summary execution wrapper for Python runner Adds a boilerplate compatibility script to handle safe system exits and execution routing for python benchmark.

build: update package version

a001e10

Add image to README for project visualization

88165df

Added an image to the README for better visualization.

chore: upload performance and training graphs

e2a26be

chore: upload performance and training graphs

1eda492

chore: upload performance and training graphs

1a40f16

chore: upload performance and training graphs

1e20c5a

feat(main): add DirectML backend support enables execution on integra…

403e8d7

…ted iGPUs

feat(main): add DirectML backend support enables execution on integra…

b00eb71

…ted GPUs (iGPUs)

feat(main): add DirectML backend support enables execution on integra…

2f3dd15

…ted GPUs (iGPUs)

feat(main): add DirectML backend support enables execution on integra…

cbdbe40

…ted GPUs (iGPUs)

Eamon2009 self-assigned this May 21, 2026

Eamon2009 added cuda python Pull requests that update python code labels May 21, 2026

Eamon2009 merged commit 0517a06 into exp May 21, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce Quadtrix benchmark suite with Python and C++ support#44

Introduce Quadtrix benchmark suite with Python and C++ support#44
Eamon2009 merged 15 commits into
expfrom
master

Eamon2009 commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Eamon2009 commented May 21, 2026

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant