Skip to content

v0.9.2 — Custom op decode, Dynamo-traceable fast path

Choose a tag to compare

@wizzense wizzense released this 02 Apr 21:28
· 18 commits to main since this release

torch.library.custom_op for CUDA graph readiness. 1.14ms/call decode. Dynamo-traceable fast path. 45 tok/s single-request (4x from v0.8.1 eager).