Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Fill function for the CUDA matrix is inefficient #624

Closed
sjsprecious opened this issue Aug 19, 2024 · 0 comments · Fixed by #626
Closed

The Fill function for the CUDA matrix is inefficient #624

sjsprecious opened this issue Aug 19, 2024 · 0 comments · Fixed by #626
Assignees
Labels
enhancement New feature or request

Comments

@sjsprecious
Copy link
Collaborator

Currently the Fill function for the CUDA matrix is implemented by filling a host matrix first and then copying it to the device (https://github.com/NCAR/micm/blob/main/include/micm/cuda/util/cuda_dense_matrix.hpp#L232-L233).

A better implementation should use either CudaMemset function or a customized CUDA kernel to do it on the device directly.

Acceptance Criteria

  • Pass all the CUDA unit tests
  • No more data transfer for this function
@sjsprecious sjsprecious self-assigned this Aug 19, 2024
@sjsprecious sjsprecious added the enhancement New feature or request label Aug 19, 2024
@sjsprecious sjsprecious added this to the CUDA Rosenbrock Solver milestone Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant