-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert "Added small-size larfg and larf kernels" #769
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This reverts commit 5156c3a.
cgmb
requested review from
jzuniga-amd,
tfalders,
qjojo,
EdDAzevedo,
jmachado-amd and
a team
as code owners
July 23, 2024 18:35
tfalders
approved these changes
Jul 23, 2024
qjojo
approved these changes
Jul 23, 2024
EdDAzevedo
pushed a commit
to EdDAzevedo/rocSOLVER
that referenced
this pull request
Oct 26, 2024
* Merge release-staging/rocm-rel-6.2 into develop (ROCm#750) cmake: add C as a project language, as clients use it (ROCm#736) Co-authored-by: Steve Leung <Steve.Leung@amd.com> * Update changelog to mention the `stebz` synchronization defect (ROCm#752) * Update changelog to mention the `stebz` synchronization defect * Address review comments * Minor edits to changelog * Make line wrapping consistent * Make sure containers used in clients are zero-initialized (ROCm#746) * Make sure containers used in clients are zero-initialized * Apply clang-format, fix bug * Minimal changes * Apply review comments * Bump rocm-docs-core from 1.4.0 to 1.4.1 in /docs/sphinx (ROCm#754) Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.4.0 to 1.4.1. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](ROCm/rocm-docs-core@v1.4.0...v1.4.1) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Fix memory allocation issues in Jacobi solvers (ROCm#755) The Jacobi solvers where trying to allocate more memory than they needed, and this led to allocation failures with clients that manage their own memory (hipSOLVER in particular). * Bump rocm-docs-core from 1.4.1 to 1.5.0 in /docs/sphinx (ROCm#757) Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.4.1 to 1.5.0. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](ROCm/rocm-docs-core@v1.4.1...v1.5.0) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump certifi from 2023.7.22 to 2024.7.4 in /docs/sphinx (ROCm#758) Bumps [certifi](https://github.com/certifi/python-certifi) from 2023.7.22 to 2024.7.4. - [Commits](certifi/python-certifi@2023.07.22...2024.07.04) --- updated-dependencies: - dependency-name: certifi dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update static rocprim dependency with correct name (ROCm#745) * Use std::min and std::max (ROCm#760) * Use std::min and std::max The clang compiler provides integer min and max functions when building for the HIP langauge, which will implicitly convert 64-bit integers to 32-bit. To avoid this, we will always use std::min and std::max as they do not convert their arguments. * Use fmax for floating point values These values may be affected by user input and it is undefined behaviour to use std::max with NaN values. Use fmax to ensure there is defined behaviour for this case. * add gfx12 support (ROCm#751) * Updates to types and index (ROCm#761) * Fix jacobi cycle-sync (ROCm#762) * Add quick fix for Jacobi cycle-pairs method * Tidy-up code * Apply clang-format * Restore old method in a desperate attempt to make git produce useful diffs Of course, the old `syevj_cycle_pairs` kernel will be removed when this PR is finished. * Apply review comments. * Add small fix and tidy-up code * Add 64-bit versions of lacgv, larf, and larfg (ROCm#747) * Added 64-bit larf APIs * Testing for 64-bit larf API * Updated documentation * Add larfg_64 * Add lacgv_64 * add lacgv_64 * simplify template parameters * modify test coverage, remove unused code --------- Co-authored-by: Troy Alderson <58866654+tfalders@users.noreply.github.com> * External CI: Add triggers for mainline branch * Added small-size larfg and larf kernels (ROCm#759) * Empty small-size kernel for larfg * Implemented small-size larfg kernel * Minor improvement to geqr2 * Empty small-size kernel for larf * Basic implementation of small-size kernel for larf * Improved implementation of small-size kernel for larf * Fixed workspace issues * Increased size constants * Updated changelog * Fix failures in latrd and sytd2 * Revert "Added small-size larfg and larf kernels" (ROCm#769) Revert "Added small-size larfg and larf kernels (ROCm#759)" Reverting due to failures in single-precision SYGVDX on gfx1101 when GetParam() = ({ 50, 50, 60, 70, -15, -5, 28, 35, 1 }, { 2, V, I, U }). This reverts commit 5156c3a. * Drop use of f16c instructions (ROCm#768) The -mf16c flag is troublesome as those instructions may not be available on older CPUs. The clang compiler also seems to emit AVX instructions when it is told that it can use F16C. The flag can be dropped without consequence as rocSOLVER does not use half precision. This was done for rocBLAS in c6bc09073959a2881a701b88ae1ed9de469354f1. * Load rocsparse using dlopen (ROCm#634) When BUILD_WITH_SPARSE=OFF, the rocSOLVER library will attempt to load the rocsparse shared library at runtime. The compile-time behaviour of returning rocblas_status_not_implemented is retained at run-time when rocsparse is not available. The default value of BUILD_WITH_SPARSE for shared library builds is now OFF. * Use rocBLAS gemm_64 (ROCm#765) * Use rocBLAS gemm_64 * Addressed review comment * Update documentation requirements (ROCm#773) * Add 64-bit version of geqrf, geqr2 (ROCm#767) * add 64-bit geqrf * update changelog and docs * make new 64-bit APIs optional * address feedback, fix exports * fix docs * Fix synchronization issue in stein (ROCm#775) * Fix synchronization issue in stein * Updated changelog * Fix stein initial eigenvectors' choices (ROCm#714) Improve choice of starting eigenvectors in STEIN This commit introduces a pseudorandom number generator that is subsequently used to generate more robust initial eigenvectors in `roclapack_stein`. * Fix dead link to .pdf that does not exist * Add 64-bit api for potf2, potrf, potrs (ROCm#776) * add 64-bit geqrf * update changelog and docs * make new 64-bit APIs optional * first pass * address feedback, fix exports * fix docs * add potrf_64, potf2_64 * add potrs_64 * update docs and changelog * fix info type in potf2 impl * Added potrf_info32 * Updated test * Add 32-bit info variant for compatibility --------- Co-authored-by: Troy Alderson <58866654+tfalders@users.noreply.github.com> * Multikernel option for bdsqr optimization (ROCm#717) * Rearranged functions * Added completed array to bdsqr * Work iteration by iteration * Extract bdsqr_rotate from QR step * Rotation improvements * Updated documentation * Typo correction * Restored small-size bdsqr kernel * Added inner loop * Combined logic for QR step * Combined logic for rotations * Use sqrt(n) split groups * Updated changelog * Match number of thread groups to number of split groups * Spawn new split blocks when a zero is detected * Simplify splits array * Restore small-size computation * Use iamax to find smax * Updated documentation * Only check for zeroes when QR step not applied * Increase BDSQR_ITERS_PER_SYNC to 10 * Fix incorrect strideW calculation * Restore original rotation data layout * Addressed review comment * Added small-size larfg and larf kernels (limited architectures) (ROCm#774) * Empty small-size kernel for larfg * Implemented small-size larfg kernel * Minor improvement to geqr2 * Empty small-size kernel for larf * Basic implementation of small-size kernel for larf * Improved implementation of small-size kernel for larf * Fixed workspace issues * Increased size constants * Updated changelog * Fix failures in latrd and sytd2 * Workaround: disable small-size kernels for wave 32 architectures * Addressed review comment * Revert "Addressed review comment" This reverts commit a08550f. * Revert "Workaround: disable small-size kernels for wave 32 architectures" This reverts commit bcddae6. * Workaround: disable small-size kernels for wave 32 architectures * Fix memory access fault * Fix compile error * Minor fix * Fix link to rocBLAS API * STEDC optimizations (ROCm#786) * optimize mergePrepare kernel to work with higher number of sub-blocks * optimize mergeValues kernel to work with higher number of sub-blocks * clean code * addressed review comments * update changelog * remove unused arguments * Revert "Update changelog to mention the `stebz` synchronization defect (ROCm#752)" (ROCm#794) This reverts commit 1eb0f16. * Update changelog in preparation for the 6.3 release (ROCm#795) * Update changelog in preparation for the 6.3 release * Update changelog. * CHANGELOG for 6.3 FC * Add helper methods to simplify writing and debugging tests (part 1) (ROCm#788) Currently, most input matrices used in rocSOLVER's tests are written elementwise (by hand) in a test-by-test basis. This commit adds foundational methods to simplify the creation of input matrices with desired properties, and also syntactic sugar to abstract the common matrix operations required to support this task. Moreover, it also adds code to allow writing test code in a higher level of abstraction, to improve their legibility and future maintenance. This PR focus on utilities and symmetric matrix generation required to refactor SYEVX's tests. Future PRs will address: * immutable, special (Toeplitz, Wilkinson, Clement, etc.) matrix generation, glued matrices, and random matrices; * lazy-evaluation and caching of expensive computations; * a method to export matrices build as part of a test to an octave script (for debugging purposes). * Bump rocm-docs-core from 1.6.1 to 1.7.1 in /docs/sphinx (ROCm#798) Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.6.1 to 1.7.1. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](ROCm/rocm-docs-core@v1.6.1...v1.7.1) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * bump versions and prepare changelog for new dev cycle (ROCm#796) bump versions * Add option to skip sparse on pull requests (ROCm#801) * Add option to skip sparse on pull requests The CI will build and test without rocSPARSE when the ci:no-sparse label is added to a pull request. * Add hipBLASLt to deps list * Add hipBLAS-common as a hipBLASLt dependency * Fix some links to the rocBLAS memory alloc API (ROCm#803) * Fix some links to the rocBLAS memory alloc API * Wording changes * Add support for Azure Linux (ROCm#785) CBL-Mariner has changed its name to Azure Linux. This patch fixes the following error with rocsolver packages built on Azure Linux: # dnf install ./rocsolver-tests-3.27.0-6fe60164_dirty.azl3.x86_64.rpm Last metadata expiration check: 0:44:09 ago on Thu Aug 22 22:24:19 2024. Error: Problem: conflicting requests - nothing provides libgfortran4 needed by rocsolver-tests-3.27.0-6fe60164_dirty.azl3.x86_64 from @commandline * Add gfx1151 to default targets (ROCm#804) * Fixed a broken link to the rocBLAS logging page. * Bump rocm-docs-core from 1.7.1 to 1.7.2 in /docs/sphinx (ROCm#800) Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.7.1 to 1.7.2. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.7.2/CHANGELOG.md) - [Commits](ROCm/rocm-docs-core@v1.7.1...v1.7.2) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump cryptography from 43.0.0 to 43.0.1 in /docs/sphinx (ROCm#820) Bumps [cryptography](https://github.com/pyca/cryptography) from 43.0.0 to 43.0.1. - [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst) - [Commits](pyca/cryptography@43.0.0...43.0.1) --- updated-dependencies: - dependency-name: cryptography dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Prevent bdsqr_estimate from rounding t to zero (ROCm#819) In bdsqr_estimate, the tolerance used to decide that elements of E have converged is modified by the value t, which changes as the loop traverses elements of D and E. Thanks to the order of operations used to calculate it, the updated t may be rounded down to zero in some cases, which brings the tolerance down to zero and prevents elements of E from being considered to have converged. * Prevent bdsqr_estimate from rounding t to zero * Addressed review comments * Addressed review comment * Bump rocm-docs-core from 1.7.2 to 1.8.2 in /docs/sphinx (ROCm#824) Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.7.2 to 1.8.2. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.8.2/CHANGELOG.md) - [Commits](ROCm/rocm-docs-core@v1.7.2...v1.8.2) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Add option to disable rocBLAS in rocsolver_gemm/_trsm (ROCm#815) * add flag to disable rocblas gemm and trsmm in rocsolver_blas * Add mixed batched and strided rocsolver_gemm * fix rocsolver_gemm undefined reference error * update macro and option names * formatting * change flag name * remove rocsolver_gemm bool template params * add conj transpose to rocsolver_gemm * Fix libfmt build errors (ROCm#828) * Fix build errors with libfmt 10.2.1 * Use `fmt::print` instead of `std::cout` * Convert change log to new format (ROCm#833) * Convert change log to new format * Add support for new architectures * Remove extra line break * Align earlier releases * Align releases --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Steve Leung <Steve.Leung@amd.com> Co-authored-by: Julio Machado Silva <161654951+jmachado-amd@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Lauren Wrubleski <Liam.Wrubleski@amd.com> Co-authored-by: Cory Bloor <Cordell.Bloor@amd.com> Co-authored-by: Juan Zuniga-Anaya <50754207+jzuniga-amd@users.noreply.github.com> Co-authored-by: randyh62 <42045079+randyh62@users.noreply.github.com> Co-authored-by: Jonah Quist <jonquist@amd.com> Co-authored-by: amd-jmacaran <Joseph.Macaranas@amd.com> Co-authored-by: Joseph Macaranas <145489236+amd-jmacaran@users.noreply.github.com> Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com> Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> Co-authored-by: amd-garydeng <garydeng@amd.com> Co-authored-by: Sandra Polifroni <sandra.polifroni@amd.com> Co-authored-by: Angelo Gonzales <angonzal@amd.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reverts #759 due to failures in SYGVDX on gfx1101.