Fix allocations in 32Mixed precision methods by pre-allocating temporaries #758
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes excessive allocations in all 32Mixed precision LU factorization methods by properly pre-allocating temporary 32-bit arrays in the
init_cacheval
functions.Problem
The mixed precision methods (
MKL32MixedLUFactorization
,OpenBLAS32MixedLUFactorization
,AppleAccelerate32MixedLUFactorization
,RF32MixedLUFactorization
,CUDAOffload32MixedLUFactorization
,MetalOffload32MixedLUFactorization
) were allocating new Float32/ComplexF32 arrays on every solve call, causing unnecessary memory allocations and reduced performance.Solution
Modified
init_cacheval
functions to:solve!
functions by copying data instead of allocating new arraysChanges by File
init_cacheval
andsolve!
forMKL32MixedLUFactorization
init_cacheval
andsolve!
forOpenBLAS32MixedLUFactorization
init_cacheval
andsolve!
forAppleAccelerate32MixedLUFactorization
init_cacheval
andsolve!
forRF32MixedLUFactorization
init_cacheval
andsolve!
forCUDAOffload32MixedLUFactorization
init_cacheval
andsolve!
forMetalOffload32MixedLUFactorization
Performance Impact
Allocations reduced from ~80KB per solve to <1KB per solve for 100x100 matrices, providing significant performance improvements for repeated solves with the same factorization.
Test Results
All existing tests pass. The mixed precision test suite confirms the methods work correctly with both real and complex matrices.
🤖 Generated with Claude Code