PTX Backend by WillTrojak · Pull Request #18 · PyFR/GiMMiK

WillTrojak · 2026-05-15T12:23:17Z

This adds a PTX backend to GiMMiK. The key features are:

Mild optimisation of exist CUDA algorithms.
Optional async loads for some sparse kernels
Added dense generation for Hopper and above

Optimisations have focused on FP64, FP32 is future work.

FreddieWitherden · 2026-05-15T18:31:49Z

I know this is an utter pain but for FP32/FP64 can you confirm correctness for all relevant PyFR matrices at a suite of N values for all instances where a kernel is expected to work on A100/H100/B100)?

FreddieWitherden · 2026-05-15T18:33:25Z

+                         .param .u64 _c)
+{
+% endif
+    .reg .u32 n, id, tid_x, tid_y;


Ensure we throw higher up if n is too big.

Checking here

We don't handle n being too large in any of the other backends.

https://github.com/PyFR/GiMMiK/blob/master/gimmik/kernels/cuda/cstream.mako#L20 in the embedded case we do (argument case doesn't but that is not currently used for CUDA).

FreddieWitherden · 2026-05-21T13:29:40Z

+        nnz = np.count_nonzero(arr)
+        nuq = len(np.unique(np.abs(arr)))
+        density = nnz / arr.size
+        return (nuq <= 28) or (density <= 0.15)


Check if these could do with tuning

I think that would be a seperate PR

FreddieWitherden · 2026-05-22T15:28:41Z

+%   for idx, kx in enumerate(bchunks[bb]):
+    ld.shared.${pftype} bv, [bsub_thread + ${bsub_off(buf_cur, idx)}];
+%    for j, row_j in enumerate(mcx):
+<%    jx = A[row_j, kx] %>


See if NumPy can be used in the for loop A[mcx, kx]

Will Trojak and others added 6 commits December 2, 2025 22:13

[wip] added ptx generator for bstream

0cd7485

Addtional sparse and dense work

626c2f5

Dense and sparse optimisation

bbbb8ef

Added warp specialised dense kernel

393b409

Performance tuning and cleanup

67d1beb

Whitespace

e2a818b

WillTrojak mentioned this pull request May 15, 2026

Support for GiMMiK PTX Provider PyFR/PyFR#556

Open

FreddieWitherden reviewed May 15, 2026

View reviewed changes

Comment thread gimmik/kernels/ptx/bstream-msplit.mako Outdated

FreddieWitherden reviewed May 15, 2026

View reviewed changes

Comment thread gimmik/ptx.py Outdated

FreddieWitherden reviewed May 15, 2026

View reviewed changes

Comment thread gimmik/ptx.py Outdated

FreddieWitherden reviewed May 15, 2026

View reviewed changes

Comment thread gimmik/ptx.py Outdated

FreddieWitherden reviewed May 15, 2026

View reviewed changes

Comment thread gimmik/ptx.py Outdated

FreddieWitherden reviewed May 15, 2026

View reviewed changes

Comment thread gimmik/ptx.py Outdated

FreddieWitherden reviewed May 15, 2026

View reviewed changes

Comment thread gimmik/ptx.py Outdated

FreddieWitherden reviewed May 15, 2026

View reviewed changes

Comment thread gimmik/ptx.py Outdated

FreddieWitherden reviewed May 15, 2026

View reviewed changes

Comment thread gimmik/ptx.py Outdated

FreddieWitherden reviewed May 15, 2026

View reviewed changes

Comment thread gimmik/ptx.py Outdated

FreddieWitherden reviewed May 15, 2026

View reviewed changes

Comment thread gimmik/kernels/ptx/base.mako Outdated

FreddieWitherden reviewed May 15, 2026

View reviewed changes

Comment thread gimmik/kernels/ptx/bstream-msplit.mako Outdated

FreddieWitherden reviewed May 15, 2026

View reviewed changes

Comment thread gimmik/kernels/ptx/bstream-msplit.mako Outdated

FreddieWitherden reviewed May 15, 2026

View reviewed changes

Comment thread gimmik/kernels/ptx/bstream-msplit.mako Outdated

FreddieWitherden reviewed May 15, 2026

View reviewed changes

Comment thread gimmik/kernels/ptx/cstream-ksplit.mako Outdated

FreddieWitherden reviewed May 15, 2026

View reviewed changes

Comment thread gimmik/kernels/ptx/bstream.mako

Cleanups, formating and addressign comments

7d7299a

FreddieWitherden reviewed May 19, 2026

View reviewed changes

Comment thread gimmik/ptx.py Outdated

FreddieWitherden reviewed May 19, 2026

View reviewed changes

Comment thread gimmik/ptx.py Outdated

FreddieWitherden reviewed May 19, 2026

View reviewed changes

Comment thread gimmik/ptx.py Outdated

FreddieWitherden reviewed May 19, 2026

View reviewed changes

Comment thread gimmik/ptx.py Outdated

General cleanups and moved smem to pyfr

1d405c3

FreddieWitherden reviewed May 21, 2026

View reviewed changes

Comment thread gimmik/ptx.py Outdated

FreddieWitherden reviewed May 21, 2026

View reviewed changes

Comment thread gimmik/ptx.py Outdated

FreddieWitherden reviewed May 21, 2026

View reviewed changes

Comment thread gimmik/ptx.py Outdated

FreddieWitherden reviewed May 21, 2026

View reviewed changes

Comment thread gimmik/ptx.py Outdated

WillTrojak added 3 commits May 21, 2026 09:26

Fixed missing import

0e86053

Fixed additional args

1f62b5f

Cleanup and added PTX Version to handle older drivers.

79f41cb

FreddieWitherden reviewed May 22, 2026

View reviewed changes

Comment thread gimmik/kernels/ptx/bstream-msplit.mako Outdated

FreddieWitherden reviewed May 22, 2026

View reviewed changes

Comment thread gimmik/kernels/ptx/dense-mma-gAd.mako Outdated

Further cleanup

7b59ca4

This was referenced May 27, 2026

added float4 and double2 #17

Closed

Added spill to shared and launch bounds #16

Closed

Conversation

WillTrojak commented May 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

FreddieWitherden commented May 15, 2026

Uh oh!

Uh oh!

FreddieWitherden May 15, 2026

Choose a reason for hiding this comment

Uh oh!

FreddieWitherden May 22, 2026

Choose a reason for hiding this comment

Uh oh!

WillTrojak May 26, 2026

Choose a reason for hiding this comment

Uh oh!

FreddieWitherden May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

FreddieWitherden May 21, 2026

Choose a reason for hiding this comment

Uh oh!

WillTrojak May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

FreddieWitherden May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants