Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Added small-size larfg and larf kernels" #769

Merged
merged 1 commit into from
Jul 24, 2024

Conversation

cgmb
Copy link
Collaborator

@cgmb cgmb commented Jul 23, 2024

Reverts #759 due to failures in SYGVDX on gfx1101.

@cgmb cgmb enabled auto-merge (squash) July 24, 2024 00:47
@cgmb cgmb disabled auto-merge July 24, 2024 00:48
@cgmb cgmb enabled auto-merge (squash) July 24, 2024 00:51
@cgmb cgmb merged commit 057cfa7 into develop Jul 24, 2024
12 of 16 checks passed
EdDAzevedo pushed a commit to EdDAzevedo/rocSOLVER that referenced this pull request Oct 26, 2024
* Merge release-staging/rocm-rel-6.2 into develop (ROCm#750)

cmake: add C as a project language, as clients use it (ROCm#736)

Co-authored-by: Steve Leung <Steve.Leung@amd.com>

* Update changelog to mention the `stebz` synchronization defect (ROCm#752)

* Update changelog to mention the `stebz` synchronization defect

* Address review comments

* Minor edits to changelog

* Make line wrapping consistent

* Make sure containers used in clients are zero-initialized (ROCm#746)

* Make sure containers used in clients are zero-initialized

* Apply clang-format, fix bug

* Minimal changes

* Apply review comments

* Bump rocm-docs-core from 1.4.0 to 1.4.1 in /docs/sphinx (ROCm#754)

Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.4.0 to 1.4.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](ROCm/rocm-docs-core@v1.4.0...v1.4.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix memory allocation issues in Jacobi solvers (ROCm#755)

The Jacobi solvers where trying to allocate more memory than they
needed, and this led to allocation failures with clients that manage
their own memory (hipSOLVER in particular).

* Bump rocm-docs-core from 1.4.1 to 1.5.0 in /docs/sphinx (ROCm#757)

Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.4.1 to 1.5.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](ROCm/rocm-docs-core@v1.4.1...v1.5.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump certifi from 2023.7.22 to 2024.7.4 in /docs/sphinx (ROCm#758)

Bumps [certifi](https://github.com/certifi/python-certifi) from 2023.7.22 to 2024.7.4.
- [Commits](certifi/python-certifi@2023.07.22...2024.07.04)

---
updated-dependencies:
- dependency-name: certifi
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update static rocprim dependency with correct name (ROCm#745)

* Use std::min and std::max (ROCm#760)

* Use std::min and std::max

The clang compiler provides integer min and max functions when building
for the HIP langauge, which will implicitly convert 64-bit integers to
32-bit. To avoid this, we will always use std::min and std::max as they
do not convert their arguments.

* Use fmax for floating point values

These values may be affected by user input and it is undefined
behaviour to use std::max with NaN values. Use fmax to ensure
there is defined behaviour for this case.

* add gfx12 support (ROCm#751)

* Updates to types and index (ROCm#761)

* Fix jacobi cycle-sync (ROCm#762)

* Add quick fix for Jacobi cycle-pairs method

* Tidy-up code

* Apply clang-format

* Restore old method in a desperate attempt to make git produce useful diffs

Of course, the old `syevj_cycle_pairs` kernel will be removed when this
PR is finished.

* Apply review comments.

* Add small fix and tidy-up code

* Add 64-bit versions of lacgv, larf, and larfg (ROCm#747)

* Added 64-bit larf APIs

* Testing for 64-bit larf API

* Updated documentation

* Add larfg_64

* Add lacgv_64

* add lacgv_64

* simplify template parameters

* modify test coverage, remove unused code

---------

Co-authored-by: Troy Alderson <58866654+tfalders@users.noreply.github.com>

* External CI: Add triggers for mainline branch

* Added small-size larfg and larf kernels (ROCm#759)

* Empty small-size kernel for larfg

* Implemented small-size larfg kernel

* Minor improvement to geqr2

* Empty small-size kernel for larf

* Basic implementation of small-size kernel for larf

* Improved implementation of small-size kernel for larf

* Fixed workspace issues

* Increased size constants

* Updated changelog

* Fix failures in latrd and sytd2

* Revert "Added small-size larfg and larf kernels" (ROCm#769)

Revert "Added small-size larfg and larf kernels (ROCm#759)"

Reverting due to failures in single-precision SYGVDX on gfx1101 when
GetParam() = ({ 50, 50, 60, 70, -15, -5, 28, 35, 1 }, { 2, V, I, U }).

This reverts commit 5156c3a.

* Drop use of f16c instructions (ROCm#768)

The -mf16c flag is troublesome as those instructions may not be
available on older CPUs. The clang compiler also seems to emit
AVX instructions when it is told that it can use F16C.
The flag can be dropped without consequence as rocSOLVER does not use
half precision.

This was done for rocBLAS in c6bc09073959a2881a701b88ae1ed9de469354f1.

* Load rocsparse using dlopen (ROCm#634)

When BUILD_WITH_SPARSE=OFF, the rocSOLVER library will attempt to load
the rocsparse shared library at runtime. The compile-time behaviour of
returning rocblas_status_not_implemented is retained at run-time when
rocsparse is not available. The default value of BUILD_WITH_SPARSE for
shared library builds is now OFF.

* Use rocBLAS gemm_64 (ROCm#765)

* Use rocBLAS gemm_64

* Addressed review comment

* Update documentation requirements (ROCm#773)

* Add 64-bit version of geqrf, geqr2 (ROCm#767)

* add 64-bit geqrf

* update changelog and docs

* make new 64-bit APIs optional

* address feedback, fix exports

* fix docs

* Fix synchronization issue in stein (ROCm#775)

* Fix synchronization issue in stein

* Updated changelog

* Fix stein initial eigenvectors' choices (ROCm#714)

Improve choice of starting eigenvectors in STEIN

This commit introduces a pseudorandom number generator that is
subsequently used to generate more robust initial eigenvectors in
`roclapack_stein`.

* Fix dead link to .pdf that does not exist

* Add 64-bit api for potf2, potrf, potrs (ROCm#776)

* add 64-bit geqrf

* update changelog and docs

* make new 64-bit APIs optional

* first pass

* address feedback, fix exports

* fix docs

* add potrf_64, potf2_64

* add potrs_64

* update docs and changelog

* fix info type in potf2 impl

* Added potrf_info32

* Updated test

* Add 32-bit info variant for compatibility

---------

Co-authored-by: Troy Alderson <58866654+tfalders@users.noreply.github.com>

* Multikernel option for bdsqr optimization (ROCm#717)

* Rearranged functions

* Added completed array to bdsqr

* Work iteration by iteration

* Extract bdsqr_rotate from QR step

* Rotation improvements

* Updated documentation

* Typo correction

* Restored small-size bdsqr kernel

* Added inner loop

* Combined logic for QR step

* Combined logic for rotations

* Use sqrt(n) split groups

* Updated changelog

* Match number of thread groups to number of split groups

* Spawn new split blocks when a zero is detected

* Simplify splits array

* Restore small-size computation

* Use iamax to find smax

* Updated documentation

* Only check for zeroes when QR step not applied

* Increase BDSQR_ITERS_PER_SYNC to 10

* Fix incorrect strideW calculation

* Restore original rotation data layout

* Addressed review comment

* Added small-size larfg and larf kernels (limited architectures) (ROCm#774)

* Empty small-size kernel for larfg

* Implemented small-size larfg kernel

* Minor improvement to geqr2

* Empty small-size kernel for larf

* Basic implementation of small-size kernel for larf

* Improved implementation of small-size kernel for larf

* Fixed workspace issues

* Increased size constants

* Updated changelog

* Fix failures in latrd and sytd2

* Workaround: disable small-size kernels for wave 32 architectures

* Addressed review comment

* Revert "Addressed review comment"

This reverts commit a08550f.

* Revert "Workaround: disable small-size kernels for wave 32 architectures"

This reverts commit bcddae6.

* Workaround: disable small-size kernels for wave 32 architectures

* Fix memory access fault

* Fix compile error

* Minor fix

* Fix link to rocBLAS API

* STEDC optimizations (ROCm#786)

* optimize mergePrepare kernel to work with higher number of sub-blocks
* optimize mergeValues kernel to work with higher number of sub-blocks
* clean code
* addressed review comments
* update changelog
* remove unused arguments

* Revert "Update changelog to mention the `stebz` synchronization defect (ROCm#752)" (ROCm#794)

This reverts commit 1eb0f16.

* Update changelog in preparation for the 6.3 release (ROCm#795)

* Update changelog in preparation for the 6.3 release
* Update changelog.

* CHANGELOG for 6.3 FC

* Add helper methods to simplify writing and debugging tests (part 1) (ROCm#788)

Currently, most input matrices used in rocSOLVER's tests are written elementwise (by hand) in a test-by-test basis. This commit adds foundational methods to simplify the creation of input matrices with desired properties, and also syntactic sugar to abstract the common matrix operations required to support this task. Moreover, it also adds code to allow writing test code in a higher level of abstraction, to improve their legibility and future maintenance.

This PR focus on utilities and symmetric matrix generation required to refactor SYEVX's tests. Future PRs will address:
* immutable, special (Toeplitz, Wilkinson, Clement, etc.) matrix generation, glued matrices, and random matrices;
* lazy-evaluation and caching of expensive computations;
* a method to export matrices build as part of a test to an octave script (for debugging purposes).

* Bump rocm-docs-core from 1.6.1 to 1.7.1 in /docs/sphinx (ROCm#798)

Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.6.1 to 1.7.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](ROCm/rocm-docs-core@v1.6.1...v1.7.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* bump versions and prepare changelog for new dev cycle (ROCm#796)

bump versions

* Add option to skip sparse on pull requests (ROCm#801)

* Add option to skip sparse on pull requests

The CI will build and test without rocSPARSE when the ci:no-sparse
label is added to a pull request.

* Add hipBLASLt to deps list

* Add hipBLAS-common as a hipBLASLt dependency

* Fix some links to the rocBLAS memory alloc API (ROCm#803)

* Fix some links to the rocBLAS memory alloc API

* Wording changes

* Add support for Azure Linux (ROCm#785)

CBL-Mariner has changed its name to Azure Linux. This patch fixes the
following error with rocsolver packages built on Azure Linux:

# dnf install ./rocsolver-tests-3.27.0-6fe60164_dirty.azl3.x86_64.rpm
Last metadata expiration check: 0:44:09 ago on Thu Aug 22 22:24:19 2024.
Error: 
 Problem: conflicting requests
  - nothing provides libgfortran4 needed by rocsolver-tests-3.27.0-6fe60164_dirty.azl3.x86_64 from @commandline

* Add gfx1151 to default targets (ROCm#804)

* Fixed a broken link to the rocBLAS logging page.

* Bump rocm-docs-core from 1.7.1 to 1.7.2 in /docs/sphinx (ROCm#800)

Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.7.1 to 1.7.2.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.7.2/CHANGELOG.md)
- [Commits](ROCm/rocm-docs-core@v1.7.1...v1.7.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump cryptography from 43.0.0 to 43.0.1 in /docs/sphinx (ROCm#820)

Bumps [cryptography](https://github.com/pyca/cryptography) from 43.0.0 to 43.0.1.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](pyca/cryptography@43.0.0...43.0.1)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Prevent bdsqr_estimate from rounding t to zero (ROCm#819)

In bdsqr_estimate, the tolerance used to decide that elements of E have converged is modified by the value t, which changes as the loop traverses elements of D and E. Thanks to the order of operations used to calculate it, the updated t may be rounded down to zero in some cases, which brings the tolerance down to zero and prevents elements of E from being considered to have converged.

* Prevent bdsqr_estimate from rounding t to zero

* Addressed review comments

* Addressed review comment

* Bump rocm-docs-core from 1.7.2 to 1.8.2 in /docs/sphinx (ROCm#824)

Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.7.2 to 1.8.2.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.8.2/CHANGELOG.md)
- [Commits](ROCm/rocm-docs-core@v1.7.2...v1.8.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add option to disable rocBLAS in rocsolver_gemm/_trsm (ROCm#815)

* add flag to disable rocblas gemm and trsmm in rocsolver_blas

* Add mixed batched and strided rocsolver_gemm

* fix rocsolver_gemm undefined reference error

* update macro and option names

* formatting

* change flag name

* remove rocsolver_gemm bool template params

* add conj transpose to rocsolver_gemm

* Fix libfmt build errors (ROCm#828)

* Fix build errors with libfmt 10.2.1

* Use `fmt::print` instead of `std::cout`

* Convert change log to new format (ROCm#833)

* Convert change log to new format

* Add support for new architectures

* Remove extra line break

* Align earlier releases

* Align releases

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Steve Leung <Steve.Leung@amd.com>
Co-authored-by: Julio Machado Silva <161654951+jmachado-amd@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Lauren Wrubleski <Liam.Wrubleski@amd.com>
Co-authored-by: Cory Bloor <Cordell.Bloor@amd.com>
Co-authored-by: Juan Zuniga-Anaya <50754207+jzuniga-amd@users.noreply.github.com>
Co-authored-by: randyh62 <42045079+randyh62@users.noreply.github.com>
Co-authored-by: Jonah Quist <jonquist@amd.com>
Co-authored-by: amd-jmacaran <Joseph.Macaranas@amd.com>
Co-authored-by: Joseph Macaranas <145489236+amd-jmacaran@users.noreply.github.com>
Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
Co-authored-by: amd-garydeng <garydeng@amd.com>
Co-authored-by: Sandra Polifroni <sandra.polifroni@amd.com>
Co-authored-by: Angelo Gonzales <angonzal@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants