Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* PMIS MPI support (#100) * mpi to hip backend * MPI enablement for PMIS * MPI RS example * clang-format * example * fix * global Ext+I interpolation (#112) * unused function * Added PM as argument for aggregation function * added pm for each level to MG class * Added PM as argument for aggregation function #2 * Added PM as argument for aggregation function #3 * RS Ext+I added to global matrix * modified example to dump some data for validation - work in progress * RS Ext+I function added to headers * RS Ext+I HIP implementation * RS Ext+I host implementation * Improved PM * global Ext+I kernel update * some multinode improvements (#118) * added some more useful guards to parallel manager * added CopyFromHostData, CopyToHostData, ExclusiveSum and Sort functionality for vector class ; moved boundary information from vector to matrix ; added GetFormat() to GlobalMatrix class * clang-format * P and R should be OperatorType, not LocalMatrix * clang-format * check if PM is valid when required * added extra function for triple matrix product for simplicity * clang-format * clang-format * skip free when ptr is nullptr * fix memory leak in BaseAMG class * rocsparse_csrgeam added * renamed csr ext+i kernel because it is generally usable * allowing CSR zero matrices with row_offset != nullptr, as well as zero vectors with size == 0 * clang-format * duplicated row column entries throw a warning * copy_x2x() functions added for readability and simplicity * search and replace memcpy with copy fct * search and replace memcpy with copy fct #2 ; fix for random csr generator to not generate duplicated row col entries * clang-format * fixes * those asserts are wrong * major version bump * OpenMP parallel loop threshold need to be int64_t in order to work with larger structures * Allow basic structures with 64bit entries, e.g. for global indices * nnz should be 64bit ; also restructured RSAMG for readability * vector size need to be 64bit locally - also added inclusive and exclusive sum functionality * 64bit sizes for stencils * host vector implementation changes for 64bit sizes and in/exclusive sum ; host I/O changed to always write 64bit sizes * host stencil 64bit changes * max residual index changed to 64bit accordingly * solvers adjusted for 64bit nnz * int64_t to double conversion * allocation size should always be 64bit ; also added copy_h2h() for simplicity * long and long long communication support added * cleaned up types ; IndexType2 was a stupid name anyway * removed deprecations (major release); enabled global structure support in RSAMG * major changes to PM; added guards for transfers; removed deprecations; fixed int overflows; functionality to generate a PM from global ghost column ids, and a parent PM * matrix conversions 64bit nnz support with guards * host matrix I/O changed to always write 64bit sizes ; backward compatible * host matrix implementations changed to 64bit nnz * RSAMG restructured - global communication should not happen in local implementations ; switched to 64bit sizes * host CSR matrix implementation * hip implementation ; added copy_d2h/h2d/d2d for simplicity, with async flag * adjusted unit tests to removed deprecated functions * RSAMG MPI example updated * fixed sanity assert * doc update * example should work with only 1 process * global routines should work with single process * global transpose operator * using copy_h2h() * _rocalution_sync should force a global barrier, too * improved asynchronous apply / comm / halo apply * accelerator must be available for pinned alloc/free * fixing few compiler warnings * readability * removed the flood of printf on multi gpu systems * adjusted openmp nested (deprecation) to v5.0 * weak scaling examples * distributed laplacian generator * updated rsamg example * updated rsamg mpi example * should use OperatorType, nothing else * fixed RSDirectInterpolation(); fixed const PM issue * updated unit tests * types.hpp generated by cmake ; CSR(64/32) added on host ; moved RSPMIS communication into global matrix class * removed old types.hpp * initial implementation for unordered set and map on hip backend * outsourced RSAMG to improve compilation performance; added async communication for multinode; moved multinode rspmis into globalmatrix; outsourced atomics * clang-format * fix for streams when not building for mpi * SA amg merge fix * fixed missing shared memory size * clang-format * clang-format * typo * clang format * add blockdim to UAAMG benchmark * adjusting unit tests for removed deprecated functions * clang format * test fix * clang-format * std::sort required algorithm header * fixing merge error * merge fix #2 * header cleaned up * header cleaned up #2 * fix issue with HIP not being found * free_pinned() does nothing on nullptr * global triple matrix product * proper error message when coarsening fails * fixed a bug in global triplematrixproduct * fixed a typo * fixed compilation issue when HIP=off * fixes COO and CSR conversions on both host and device, and ELL on host only (#211) * empty matrix conversion fix * host fallback fix for rsamg and triplematprod * Fix documentation failures (#214) Co-authored-by: jsandham <james.sandham@amd.com> * Add Smoothed Aggregation to amgmpi branch (#213) * Adding global aggregation to SAAMG (#166) Co-authored-by: jsandham <james.sandham@amd.com> * Add MPI support for global prolongation to SAAMG (#171) Co-authored-by: jsandham <james.sandham@amd.com> * Add MPI support for SAAMG global transpose (#172) * Add MPI support for SAAMG global transpose * Fix failures in greedy aggregation caused by unfilled aggregate_root_nodes array --------- Co-authored-by: jsandham <james.sandham@amd.com> * Add MPI unsmoothed aggregation (#174) Co-authored-by: jsandham <james.sandham@amd.com> * Adding debug printing to test triple product * Adding debug print statements for testing * Adding more debug printing * Testing * Testing * Testing * Testing * Testing * Testing * Testing * Testing * Testing * Fix floating point fault caused by division by zero * Testing * Testing * Testing * Testing * Testing * Testing * Testing * Fix failures in local matrix when max_nnz_per_row is too high * Testing * Fix bug where we were not using a large enough hash table size * Fix discrepency in host and hip assert in ExtractSubMatrix * Fixing hangs in multinode hip backend * Fix RSAMG documentation warnings * Testing MPI uaamg * Fix testing_local_matrix failure * Remove comments and temporary testing code * PR fixes * PR fixes * PR fixes * Clang formatting --------- Co-authored-by: jsandham <james.sandham@amd.com> * removed unused variables * Add back functions that cannot be removed until next major release (#216) Co-authored-by: jsandham <james.sandham@amd.com> * fix for very large sizes where local ext matrix exceeds int32 * Remove print statements from saamg testing file --------- Co-authored-by: James Sandham <33790278+jsandham@users.noreply.github.com> Co-authored-by: jsandham <james.sandham@amd.com>
- Loading branch information