Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test/perambulator gpu #83

Merged
merged 6 commits into from Apr 12, 2022
Merged

Conversation

felixerben
Copy link
Contributor

Speed-up of the perambulator code, which is particularly drastic for setups with short solve times and many eigenvectors. The dominating cost in that case happens in ExtractSliceLocal (of the 3D eigenvectors) deeply nested inside the loops.

The solution of this pull request is to copy the data into an std::vector object before the actual code starts. At the expense of this extra memory this leads to a substantial speed-up (factor 5 for a particular setup that Fabian Joswig was testing).

The code is tested and compiles, runs, and produces bit-identical results with the old one on GPU and CPU. Fabian sees the factor 5 speed-up when running on Tursa.

@aportelli aportelli merged commit 3f6959f into aportelli:develop Apr 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants