Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hipblasTrmm API change to allow inplace and outofplace behaviour #504

Closed
wants to merge 11 commits into from

Conversation

daineAMD
Copy link
Contributor

@daineAMD daineAMD commented Jul 5, 2022

See SWDEV-294491.

This PR changes the hipBLAS API to amend the hipblasXtrmm interface. cuBLAS deviates from the legacy BLAS interface by introducing a third matrix to the trmm interface, C. The initial implementation for this in rocBLAS can be seen in https://github.com/ROCmSoftwarePlatform/rocBLAS-internal/pull/850.

We added a deprecation warning for this API in October 2021 in #386, this PR realizes this change. We may want to wait until the next major release for this, but I thought I'd open the PR with the changes now. We can discuss next meeting with Braga and all.

Copy link
Contributor

@TorreZuk TorreZuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay if we have deprecated long enough. Minor questions made

device_vector<T> d_alpha(1);

device_vector<T>* dOut = inplace ? &dB : &dC;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you made a pointer so you clue into potential swapping, fine by me. A reference fine by me too.

Comment on lines -14441 to -14573
// clang-format off
HIPBLAS_DEPRECATED_MSG("The hipblasXtrmm API, along with batched versions, will \
be changing in a future release to allow in-place and out-of-place behavior. This change \
will introduce an output matrix 'C', matching the rocblas_xtrmm_outofplace API and the \
cublasXtrmm API.")
// clang-format on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was around for how many releases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +14541 to +14542
ldc [int]
ldc specifies the first dimension of C. ldc >= max( 1, m ).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When in-place ldc must equal ldb. ???

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't think rocBLAS or cuBLAS does any checking. I'll double check this and add comments, and can change behaviour to be consistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting question. in-place gives the legacy BLAS functionality where B and C matrices are the same, so I would expect in-place to require ldb==ldc. AFAIK neither the in-place or out-of-place algorithm will not work if C == B but ldb != ldc. cuBLAS does not specify that if B==C then ldb must equal ldc to get the in place functionality. I expect we should return rocblas_status_invalid_size if B==C and ldb != ldc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cuBLAS doesn't seem to have any checking for ldb == ldc if B == C, i.e. I'm able to run cuBLAS trmm successfully with B == C and ldb != ldc.
rocBLAS doesn't do any checking for this at the moment either. I can add a comment regarding this in this file.
We can discuss if we want to deviate during tomorrow's meeting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can look at rocblas_gemm_ex.hpp validateArgs( ) I was the one who added these checks only 7 months ago. There was likely a ticket if you want me to find it, but was recent rocblas PR #998. Would think we could follow this pattern to be consistent.

Comment on lines +14894 to +14895
strideC [hipblasStride]
stride from the start of one matrix (C_i) and the next one (C_i+1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here, if B==C must the strides match or return status error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A decision was taken when _strided_batched was introduced to omit checks on stride. The user can do whatever they want, and there is no checking on stride. I think we can keep with this decision, and have no argument error checks on stride.

@daineAMD daineAMD added the noCI Disable Jenkins for this PR. label Jul 14, 2022
@daineAMD
Copy link
Contributor Author

No CI for now until we discuss which release this deprecation will go into.

emankov added a commit to emankov/HIPIFY that referenced this pull request Sep 22, 2022
+ Mark rocBLAS TRMM functions rocblas_(s|d|c|z)trmm_outofplace, as supported only for TRMM v2 CUDA analogues
+ Mark hipBLAS TRMM functions hipblas(S|D|C|Z)trmm as HIP_UNSUPPORTED
+ Regenerate and update docs and hipify-perl accordingly

[Reasons]
+ hipBLAS TRMM functions hipblas(S|D|C|Z)trmm, actually, do not match neither cublas TRMM functions, nor cublas TRMM _v2 functions: ROCm/hipBLAS#524
+ There is a correspondence between cuBLAS cublas_(s|d|c|z)trmm and rocBLAS TRMM rocblas_(s|d|c|z)trmm_outofplace, not rocblas_(s|d|c|z)trmm: fixed it

[ToDo]
+ Close ROCm/rocBLAS#1265 as erroneous
+ Remove HIP_UNSUPPORTED mark from hipblas(S|D|C|Z)trmm functions after merging ROCm/hipBLAS#504
+ Add cublas2rocblas and update cublas2hipblas synthetic tests
@daineAMD daineAMD removed the noCI Disable Jenkins for this PR. label Jan 13, 2023
@TorreZuk
Copy link
Contributor

What is going on with this PR?

@daineAMD
Copy link
Contributor Author

Will be replaced by Andrew's PR as I understand.

@daineAMD daineAMD closed this Mar 17, 2023
@amcamd
Copy link
Contributor

amcamd commented Mar 17, 2023

I forgot about this PR, I am planning to add it like the deprecation in rocBLAS. It will be possible to change behavior with -DHIPBLAS_V1.

@daineAMD
Copy link
Contributor Author

Closing in favour of #617

@daineAMD daineAMD closed this Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants