The Fill function for the CUDA matrix is inefficient #624

sjsprecious · 2024-08-19T16:52:22Z

Currently the Fill function for the CUDA matrix is implemented by filling a host matrix first and then copying it to the device (https://github.com/NCAR/micm/blob/main/include/micm/cuda/util/cuda_dense_matrix.hpp#L232-L233).

A better implementation should use either CudaMemset function or a customized CUDA kernel to do it on the device directly.

Acceptance Criteria

The text was updated successfully, but these errors were encountered:

sjsprecious self-assigned this Aug 19, 2024

sjsprecious added the enhancement New feature or request label Aug 19, 2024

sjsprecious added this to the CUDA Rosenbrock Solver milestone Aug 19, 2024

sjsprecious mentioned this issue Aug 21, 2024

Update fill function for CUDA matrix #626

Merged

sjsprecious closed this as completed in #626 Aug 21, 2024

Provide feedback