run ends with cudaMemcpy() error #33

davidbenncsiro · 2021-11-17T03:27:46Z

Hi @CyprienBosserelle, as per today's chat, please find attached the params in question with nx and ny changed as per discussion and flow set to 1.

davidbenncsiro · 2021-11-17T05:46:21Z

params.zip

davidbenncsiro · 2021-11-19T06:09:55Z

Debugging eventually revealed the cudaMemcpy() point of failure in the code for the cases where flow=1 or swave=1. In the case where both are set to 1, the “Model crashed” exit still occurs when dt goes to zero, but for just flow=1 or swave=1, the "offending" cudaMemcpy() is on line 852 in Wave_gpu.cu:

CUDA_CHECK(cudaMemcpy(OutputVarMapCPU[Param.outvars[ivar]], 
                      OutputVarMapGPU[Param.outvars[ivar]],
                      OutputVarMaplen[Param.outvars[ivar]] * sizeof(DECNUM), …

in a loop relating to output variables.

For one of the variables being output (E), OutputVarMapGPU[Param.outvars[ivar]], is zero.

I added a conditional check for this value (OutputVarMapGPU[Param.outvars[ivar]]) being zero, in which case I skip the cudaMemcpy(), allowing all other variables to be output and the simulation to complete.

A run with flow=1 was quick to complete but the output didn't look great. I'm running with swave=1 and it seems more reasonable.

Even for a partial run, zb looks the same as XBeach, except for colours.

Will keep you posted on output results and further debug re: E.

We can talk more details next week as well if you like.

CyprienBosserelle · 2021-11-30T05:00:23Z

Hum... the E variable is a bit of a special one because it is allocated and freed in the wave step. so it is not a valid pointer if the wave loop is not running and it may be a ghost pointer if it works after the wave step. I't a bit of an ancillary output and I'm not sure I ever output it (H = E8/(rhog) so I output H).

I might need to remove it from the output list or directly allocate it once for all. This was done when GPUs add 32Mb of RAM but now memory is cheap and it would remove the overhead of reallocating it every step...

CyprienBosserelle · 2021-11-30T05:27:06Z

I have made a new branch called CheapMem where I moved all the mem allocation to the main function and ran a quick test that seems to work.

davidbenncsiro · 2021-11-30T05:28:42Z

Thanks Cyp. Will try this out.

davidbenncsiro · 2021-12-02T06:33:05Z

@CyprienBosserelle should I switch to the CheapMem branch yet? I seem to recall you saying on Wed that you were not convinced it had fixed the E problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run ends with cudaMemcpy() error #33

run ends with cudaMemcpy() error #33

davidbenncsiro commented Nov 17, 2021 •

edited

Loading

davidbenncsiro commented Nov 17, 2021

davidbenncsiro commented Nov 19, 2021 •

edited

Loading

CyprienBosserelle commented Nov 30, 2021

CyprienBosserelle commented Nov 30, 2021

davidbenncsiro commented Nov 30, 2021

davidbenncsiro commented Dec 2, 2021

run ends with cudaMemcpy() error #33

run ends with cudaMemcpy() error #33

Comments

davidbenncsiro commented Nov 17, 2021 • edited Loading

davidbenncsiro commented Nov 17, 2021

davidbenncsiro commented Nov 19, 2021 • edited Loading

CyprienBosserelle commented Nov 30, 2021

CyprienBosserelle commented Nov 30, 2021

davidbenncsiro commented Nov 30, 2021

davidbenncsiro commented Dec 2, 2021

davidbenncsiro commented Nov 17, 2021 •

edited

Loading

davidbenncsiro commented Nov 19, 2021 •

edited

Loading