v1.18.0-release
cuDNN Frontend v1.18.0 Release Notes
cuDNN Frontend v1.18.0 is the recommended version for cuDNN 9.18.1 and later releases.
General Improvements 🚀
- Move away from internally using the v0.x API. Rather, now the cudnn backend API is directly called.
- Improve the execution overhead by caching repeated graph query.
Open-Source Kernels
New open source kernel for Grouped Gemm and Swiglu fussion
Enhancements ✨
Scaled Dot-Product Attention (SDPA)
-
New Features: Allows support for dynamic shapes for fprop. This will help reduce the graph building across different batch and sequence lengths.
-
Support Surface:
- Now allows deterministic bprop for SDPA
- Added support for bprop for ragged tensors in A100
-
More samples:
- Open sourcing our sdpa test harness. Showcase additional testing for determinism, fp8 sizes for MLA
- Added samples to showcase chunked prefill.
Mixture of Expers (MoE)
- New API: Added support for
moe_grouped_matmul. See cpp sample and documentation for API reference.
Matmul
- More samples: Open sourcing cudnn`s fuzzy testing of matmuls
Convolution
- More samples: Open sourcing cudnn`s fuzzy testing of convolutions
Additional Improvements
Benchmarking 📊
- Updated the benchmark results for the sdpa improvements added in cuDNN 9.18.1