BLAS: address banded matrix storage (gbmv, tbmv, etc.) #24872
+26
−31
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR attempts to address complexities with the BLAS banded matrix routine wrappers (general banded matrix-vector
gbmv
, triangular bandedtbmv
).Context
gbmv
(resp.tbmv
) computes the matrix-vector productAx = y
, whereA
is a banded matrix (resp. triangular) of shapem,n
,x
is a vector of lengthn
,y
is vector of lengthm
.General banded storage
A banded matrix
A
of shapem,n
, withku
upper bands andkl
lower bands can be represented in general as (adapted from OneAPI spec docs):In banded storage, the matrix
A
is stored in a contiguous array as (see the documentation for?gbmv
routines):order == CblasRowMajor
(adapted from OneAPI spec docs):order == CblasColMajor
(adapted from OneAPI spec docs):Elements
*
are not read by the?gbmv
routines, and may be whatever (e.g. set to zero).Example
For example, consider the matrix
A
withm=9
,n=8
,ku=3
andkl=2
(adapted from: IBM docs):Column-major layout
In the column-major layout, the matrix
A
is stored as:The column-major layout can be interpreted as a matrix
AB_col
of shapeLDA,n
(withLDA=kl+ku+1
) in a column-major language (e.g. Fortran), where the bands are arranged row-wise, from the upper bands to the lower bands.The column-major layout interpreted in a row-major language (e.g. C or Chapel) is interpreted as the transpose, where the matrix
AB_col^t
has shapen,LDA
:Row-major layout
In row-major layout, the matrix
A
is stored as:The row-major layout can be interpreted as a matrix
AB_row
of shapem,LDA
(withLDA=kl+ku+1
) in a row-major language, where the bands are arranged column-wise, from the lower bands to the upper bands:The row-major layout is interpreted in a column-major language as the tranpose
AB_row^t
of shapeLDA,m
.BLAS routine wrappers
In
modules/packages/BLAS.chpl
, the proceduregbmv
wraps the usage ofcblas_?gbmv
by providing automatic detection of the array shapes.Detection of
LDA
is done with:Due to the signature of the
gbmv
wrapper:a user is forced to pass a matrix (
Adom.rank == 2
) asA
, which means they pass:AB_col^t
of shapen,LDA
iforder == Order.Col
AB_row
of shapem,LDA
iforder == Order.Row
Proposed solution
This means that
LDA
is always on the last dimension:The counterpoint to this solution is that semantically
LDA
is no longer the "leading dimension" (number of rows), but here the number of columns. Although one can argue that the "leading dimension" in a column-major language is the number of rows, and in a row-major language the number of columns.Alternative solutions
In order to be consistent with the "math" point of view (col-layout interpreted by col-language, row-layout interpreted by row-language), the user passes
AB_col
withorder == Order.Col
andAB_row
withorder == Order.Row
. In this case_ldA = getLeadingDim(A, order)
is correct, but the matrixAB_col
needs to be transposed before thecblas_?gbmv
call so that it has the correct layout in memory, which might be expensive.Force the user to pass a one-dimensional array to
gbmv
. The user is responsible for doing the index computations. In this case,LDA
should also be provided. Though this defies the purpose of having a wrapper in the first place.Code demo
With the proposed changes:
Output (compiled with
-lopenblas
):Next steps
test/library/BLAS/test_blas2.chpl