Skip to content

Add BLAS triangular solve derivatives#2825

Open
jlperla wants to merge 2 commits into
EnzymeAD:mainfrom
jlperla:blas-triangular-derivatives
Open

Add BLAS triangular solve derivatives#2825
jlperla wants to merge 2 commits into
EnzymeAD:mainfrom
jlperla:blas-triangular-derivatives

Conversation

@jlperla
Copy link
Copy Markdown

@jlperla jlperla commented May 14, 2026

Summary

Adds native BLAS/LAPACK derivative support for real triangular solve routines:

  • trsv
  • trsm
  • potrs

This covers forward and reverse mode for real s/d routines, with tests for Fortran and selected CBLAS entry points.

Implementation Notes

The trsv rule follows the existing trtrs shape with a single RHS/vector solve. Reverse mode needs to solve the x cotangent before forming the A cotangent, so the generated reverse rewrite order is extended for trsv like trtrs.

The trsm rule adds active A/B support with inactive alpha. It also adds trsm to BLAS extraction. Supporting both left/right side solves and CBLAS layout required a few small tablegen extensions rather than hand-written derivative code.

The potrs rule handles differentiation through the Cholesky solve using triangular BLAS operations. This does not add pivot support and does not address LU getrf/getrs.

Generator/Infrastructure Changes

Most changes are local extensions to the existing BLAS tablegen machinery:

  • Generalize hidden/by-ref character handling from only trans to all BLAS char arguments such as uplo, diag, and side.
  • Add mat_ld support for temporary leading dimensions that depend on CBLAS row-major vs column-major layout.
  • Add a side_square temporary shape for trsm, where the triangular factor dimension depends on side.
  • Add a Dep DAG helper so generated operands can express data/cache dependencies while emitting the intended BLAS argument.
  • Add a layout-aware matrix accumulation helper, analogous to the existing differential matrix memcpy helper, for row-major/column-major shadow accumulation.

Tests

Adds focused lit tests for:

  • dtrsv_64_
  • dtrsm_64_
  • cblas_dtrsm64_
  • dpotrs_64_

Extends BLAS integration tracing and integration tests for forward and reverse triangular solve coverage.

I also validated the Julia integration locally by building Enzyme.jl against this branch with deps/build_local.jl, temporarily disabling fallback for trsv, trsm, and potrs, and running:

  • julia --project=test test/runtests.jl --verbose blas
  • julia --project=test test/runtests.jl --verbose rules/internal_rules/linear_algebra_rules
  • downstream triangular BLAS MWE

The Julia checks passed without fallback warnings for BLAS.trsv! and BLAS.trsm!. LU getrf/getrs and high-level A \ b remain separate follow-up work.

@jlperla
Copy link
Copy Markdown
Author

jlperla commented May 14, 2026

This is intended as the first step in solving EnzymeAD/Enzyme.jl#3039

Full disclosure: This is heavily LLM generated because I do not understand the LLVM/etc. side. Beyond my capability, but given the comment from @vchuravy it seemed that this approach was the only feasible one to get support.

I tried to test the usage with downstream Enzyme.jl by flipping a fallback flag, etc. but I might have made a mistake.

Comment thread enzyme/Enzyme/Utils.cpp Outdated
Comment thread enzyme/Enzyme/BlasDerivatives.td Outdated
@jlperla
Copy link
Copy Markdown
Author

jlperla commented May 14, 2026

Thanks @wsmoses I was about to ping you. This is a test to see if I can get an LLM to generate the C++ rules (impenetrable for someone like me) to get the triangular and (and eventually LU stuff) up to date. I want to push out some communications on Enzyme to the economics community but right now it is failing on some basic examples.

If what the LLM generated is complete gibberish then I should just close this entirely. If it isn't then keep putting comments and I will tell the AI agent to implement anything you suggest and push again.

@jlperla
Copy link
Copy Markdown
Author

jlperla commented May 14, 2026

Updated this to reduce the helper surface and address the review comments.

Main changes:

  • Removed the separate layout-aware differential matrix memcpy helper.
  • Folded row-major/column-major addressing into the existing getOrInsertDifferentialFloatMemcpyMat.
  • Removed the numeric CBLAS enum cases from the uplo_to_* tablegen matchers.
  • Adjusted the generated BLAS rule code so CBLAS layout is handled explicitly before calling the matrix accumulation helper.
  • Regenerated/updated the affected lit checks.

Validation:

  • ninja -C build-llvm16 BlasDerivativesIncGen BlasDeclarationsIncGen BlasTAIncGen BlasDiffUseIncGen
  • ninja -C build-llvm16 LLVMEnzyme-16
  • lit -sv build-llvm16/test/Enzyme/ForwardMode/blas
  • lit -sv build-llvm16/test/Enzyme/ReverseMode/blas
  • lit -sv build-llvm16/test/Integration/ForwardMode/blas.cpp
  • lit -sv build-llvm16/test/Integration/ReverseMode/blas.cpp
  • ninja -C build-llvm16 check-enzyme
  • local Enzyme.jl build against this branch with fallback temporarily disabled for trsv, trsm, and potrs
  • julia --project=test test/runtests.jl --verbose blas
  • julia --project=test test/runtests.jl --verbose rules/internal_rules/linear_algebra_rules
  • downstream triangular BLAS MWE

@jlperla
Copy link
Copy Markdown
Author

jlperla commented May 14, 2026

@wsmoses OK, take two with the AI. It changed the code to simplify given your feedback.

I feel like this is worth at most one more iteration. If this is nonsense and a waste of your time just let me know and I will kill it. If you think the AI is getting pretty close to the right design then we can continue the experiment.

Copy link
Copy Markdown
Member

@wsmoses wsmoses left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we would need any changes to enzyme/Enzyme or tools/enzyme-tblgen other than adding def trsv, or adding the forward mode rules to things like potrf/etc.

Concurrently, it would be a lot easier to separate these into individual prs for each rule added.

@jlperla
Copy link
Copy Markdown
Author

jlperla commented May 16, 2026

@wsmoses Splitting this up per your suggestion. First of the split is #2828, which is just def trsv (forward + reverse), with the minimum-viable tablegen surface to support it — no Utils.cpp/Utils.h changes, no other BLAS rule changes. Subsequent PRs will follow for trsm, potrs, and forward-mode rules for potrf/etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants