Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run ends with cudaMemcpy() error #33

Open
davidbenncsiro opened this issue Nov 17, 2021 · 6 comments
Open

run ends with cudaMemcpy() error #33

davidbenncsiro opened this issue Nov 17, 2021 · 6 comments

Comments

@davidbenncsiro
Copy link
Collaborator

davidbenncsiro commented Nov 17, 2021

Hi @CyprienBosserelle, as per today's chat, please find attached the params in question with nx and ny changed as per discussion and flow set to 1.

@davidbenncsiro
Copy link
Collaborator Author

params.zip

@davidbenncsiro
Copy link
Collaborator Author

davidbenncsiro commented Nov 19, 2021

Debugging eventually revealed the cudaMemcpy() point of failure in the code for the cases where flow=1 or swave=1. In the case where both are set to 1, the “Model crashed” exit still occurs when dt goes to zero, but for just flow=1 or swave=1, the "offending" cudaMemcpy() is on line 852 in Wave_gpu.cu:

CUDA_CHECK(cudaMemcpy(OutputVarMapCPU[Param.outvars[ivar]], 
                      OutputVarMapGPU[Param.outvars[ivar]],
                      OutputVarMaplen[Param.outvars[ivar]] * sizeof(DECNUM), …

in a loop relating to output variables.

For one of the variables being output (E), OutputVarMapGPU[Param.outvars[ivar]], is zero.

I added a conditional check for this value (OutputVarMapGPU[Param.outvars[ivar]]) being zero, in which case I skip the cudaMemcpy(), allowing all other variables to be output and the simulation to complete.

A run with flow=1 was quick to complete but the output didn't look great. I'm running with swave=1 and it seems more reasonable.

Even for a partial run, zb looks the same as XBeach, except for colours.

Will keep you posted on output results and further debug re: E.

We can talk more details next week as well if you like.

@CyprienBosserelle
Copy link
Owner

Hum... the E variable is a bit of a special one because it is allocated and freed in the wave step. so it is not a valid pointer if the wave loop is not running and it may be a ghost pointer if it works after the wave step. I't a bit of an ancillary output and I'm not sure I ever output it (H = E8/(rhog) so I output H).

I might need to remove it from the output list or directly allocate it once for all. This was done when GPUs add 32Mb of RAM but now memory is cheap and it would remove the overhead of reallocating it every step...

@CyprienBosserelle
Copy link
Owner

I have made a new branch called CheapMem where I moved all the mem allocation to the main function and ran a quick test that seems to work.

@davidbenncsiro
Copy link
Collaborator Author

Thanks Cyp. Will try this out.

@davidbenncsiro
Copy link
Collaborator Author

@CyprienBosserelle should I switch to the CheapMem branch yet? I seem to recall you saying on Wed that you were not convinced it had fixed the E problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants