Test/perambulator gpu #83

felixerben · 2022-04-11T16:06:04Z

Speed-up of the perambulator code, which is particularly drastic for setups with short solve times and many eigenvectors. The dominating cost in that case happens in ExtractSliceLocal (of the 3D eigenvectors) deeply nested inside the loops.

The solution of this pull request is to copy the data into an std::vector object before the actual code starts. At the expense of this extra memory this leads to a substantial speed-up (factor 5 for a particular setup that Fabian Joswig was testing).

The code is tested and compiles, runs, and produces bit-identical results with the old one on GPU and CPU. Fabian sees the factor 5 speed-up when running on Tursa.

Felix Erben added 6 commits April 4, 2022 15:35

test: pre-extract evec slices

1762ff2

get type from object directly

3c8e8ca

grid from object

cf8451f

remove tmp from environment

ac9ad40

all handled by environment now

a733a89

final cleanup

fe45166

felixerben requested a review from aportelli as a code owner April 11, 2022 16:06

aportelli merged commit 3f6959f into aportelli:develop Apr 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test/perambulator gpu #83

Test/perambulator gpu #83

felixerben commented Apr 11, 2022

Test/perambulator gpu #83

Test/perambulator gpu #83

Conversation

felixerben commented Apr 11, 2022