Skip to content

Conversation

@ChrisRackauckas-Claude
Copy link
Contributor

Summary

  • Fixes the issue where Enzyme AD with sparse matrices corrupts the primal matrix's sparsity pattern (rowval, colptr)
  • Adds sparse-safe helper functions that operate directly on nzval arrays instead of using broadcast operations that can change sparsity
  • For dense matrices, falls back to standard broadcast operations

Root Cause

Enzyme.make_zero creates shadow sparse matrices that share the structural arrays (rowval, colptr) with the primal matrix. When the reverse rule executes broadcast operations like dA .-= z * transpose(y), these can change the sparsity pattern of the shadow matrix, and because the structural arrays are shared, this inadvertently corrupts the primal matrix's structure.

Evidence from investigation:

Primal rowval === shadow rowval? true
Primal colptr === shadow colptr? true

Solution

Add sparse-safe helper functions that dispatch on AbstractSparseMatrix:

  1. _safe_add!(dst, src): For sparse matrices, adds values via dst.nzval .+= src.nzval
  2. _safe_zero!(A): For sparse matrices, zeros via fill!(A.nzval, 0)
  3. _sparse_outer_sub!(dA, z, y): For sparse matrices, only accumulates gradients into existing non-zero positions using a direct loop over the CSC structure

These preserve the sparsity pattern by operating only on the nzval array.

Test plan

  • Verify sparsity pattern is preserved after AD (rowval, colptr, nnz unchanged)
  • Verify matrix can still be displayed without AssertionError: _goodbuffers(S) crash
  • Verify dense matrix Enzyme tests still pass (note: existing tests have pre-existing failures unrelated to this change)

Fixes #835

🤖 Generated with Claude Code

@ChrisRackauckas-Claude
Copy link
Contributor Author

Updated based on feedback

Addressed two concerns:

1. GPU Safety

Changed the implementation from nested loops with scalar indexing to vectorized operations:

# Build column indices vector from colptr
col_indices = _expand_colptr_to_col_indices(colptr, n_cols, nnz_count)

# Vectorized update instead of nested loop
vals .-= z[rows] .* y[col_indices]

The _expand_colptr_to_col_indices helper still uses a loop internally (dispatched on Vector{Ti}), but this is CPU-only since SparseMatrixCSC itself is CPU-only. GPU sparse matrices (like CuSparseMatrixCSC) are different types and would need handling in their respective extensions (LinearSolveCUDAExt, etc.).

2. Non-zeros concern

The dense fallback dA .-= z * transpose(y) only applies to truly dense AbstractArray types. Sparse matrices dispatch to the SparseMatrixCSC method which only updates existing positions in the sparsity pattern. This is mathematically correct for sparse AD - gradients are only meaningful at positions where the matrix can be modified.

The issue: Enzyme.make_zero shares structural arrays (rowval, colptr)
between primal and shadow sparse matrices. Broadcast operations like
`dA .-= z * y'` can change the sparsity pattern, corrupting both shadow
AND primal matrices.

The fix: Add sparse-safe helper functions that operate directly on
the nonzeros array to preserve the sparsity pattern:
- _safe_add!: Add arrays preserving sparsity pattern
- _safe_zero!: Zero arrays preserving sparsity pattern
- _sparse_outer_sub!: Compute outer product subtraction preserving
  sparsity pattern (uses non-allocating CSC loop for CPU sparse matrices)

Also added SparseArrays as a dependency for the LinearSolveEnzymeExt
extension.

Note: This PR addresses the sparsity pattern corruption issue. The
dense matrix Enzyme tests are failing on main (pre-existing issue).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@ChrisRackauckas-Claude ChrisRackauckas-Claude force-pushed the fix-enzyme-sparse-sparsity-pattern-corruption branch from 98949fb to 223cc46 Compare December 21, 2025 02:19
@ChrisRackauckas-Claude
Copy link
Contributor Author

Updated implementation

Based on feedback, the implementation has been revised to use a non-allocating loop for sparse matrix operations:

function _sparse_outer_sub!(dA::SparseMatrixCSC, z::AbstractVector, y::AbstractVector)
    rows = rowvals(dA)
    vals = nonzeros(dA)
    colptr = getcolptr(dA)

    # Non-allocating loop over CSC structure
    # This is efficient and cache-friendly (column-major order)
    @inbounds for col in 1:size(dA, 2)
        y_col = y[col]
        for idx in colptr[col]:(colptr[col + 1] - 1)
            vals[idx] -= z[rows[idx]] * y_col
        end
    end

    return dA
end

Key points:

  • Zero allocations: The loop operates directly on the existing arrays
  • Cache-friendly: Column-major order iteration matches CSC storage
  • CPU-only by design: SparseMatrixCSC is a CPU-only type; GPU sparse matrices (like CuSparseMatrixCSC) have their own types and would need handling in their respective extensions (CUDA, AMDGPU, etc.)

The SparseArrays package has been added as a dependency for the Enzyme extension.

Note: The dense matrix Enzyme tests are failing on main as a pre-existing issue (unrelated to this PR).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enzyme + SparseArrays + LinearSolve = Not Happy

2 participants