Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NaN detected with RS with Dunavant quadrature rule, not with Strout #668

Open
Thomas-Ulrich opened this issue Sep 6, 2022 · 9 comments
Open
Labels

Comments

@Thomas-Ulrich
Copy link
Contributor

Thomas-Ulrich commented Sep 6, 2022

Describe the bug
I'm using the sulawesi RS Training with a new mesh (fault a bit more segmented).
on v1.0.0-rc with Dunavant quadrature rule, I get:

Tue Sep 06 10:52:55, Info:  Writing faultoutput at time 0.
Tue Sep 06 10:52:55, Info:  Writing faultoutput at time 0. Done.
Tue Sep 06 10:52:55, Info:  Writing free surface at time 0.
Tue Sep 06 10:52:55, Info:  Writing free surface at time 0. Done.
Tue Sep 06 10:52:55, Info:  Writing energy output at time 0 
Tue Sep 06 10:52:55, Info:  Writing energy output at time 0 Done. 
Tue Sep 06 10:53:01, Error: Detected Inf/NaN in free surface output. Aborting. 
Tue Sep 06 10:53:01, Error: Detected Inf/NaN in free surface output. Aborting.

and no problem with Strout:

ue Sep 06 10:51:35, Info:  Writing free surface at time 0.
Tue Sep 06 10:51:35, Info:  Writing free surface at time 0. Done.
Tue Sep 06 10:51:35, Info:  Writing energy output at time 0
Tue Sep 06 10:51:36, Info:  Writing energy output at time 0 Done.
Tue Sep 06 10:51:36, Info:  Writing faultoutput at time 0.
Tue Sep 06 10:51:36, Info:  Writing faultoutput at time 0. Done.
Tue Sep 06 10:51:40, Info:  Writing free surface at time 0.25.
Tue Sep 06 10:51:40, Info:  Writing free surface at time 0.25. Done.

Expected behavior
no nan with Dunavant
To Reproduce

compiled with cmake -DCOMMTHREAD=ON -DNUMA_AWARE_PINNING=ON -DASAGI=ON -DCMAKE_BUILD_TYPE=Release -DHOST_ARCH=skx -DPRECISION=double -DORDER=4 -DGEMM_TOOLS_LIST=LIBXSMM,PSpaMM ..
with seissol-env/develop-intel21-impi-x2b on supermucNG.

setup: /hppfs/work/pr63qo/di73yeq4/Training_bug/sulawesi_bugRS

note: v1.0.0-rc setup with LSW and Strout works.

@Thomas-Ulrich
Copy link
Contributor Author

ok it turned out to be caused by dunavant quadrature rule. With strout, it works.
Strange, isn't it?

@Thomas-Ulrich Thomas-Ulrich changed the title NaN detected with RS with v1.0.0-rc, not with v0.9.0 NaN detected with RS with Dunavant quadrature rule, not with Strout Sep 6, 2022
@sebwolf-de
Copy link
Contributor

Indeed strange, thanks for reporting!

@Thomas-Ulrich
Copy link
Contributor Author

Not sure If I should re-open the issue or not.

@sebwolf-de sebwolf-de reopened this Sep 6, 2022
@Thomas-Ulrich
Copy link
Contributor Author

@sebwolf-de :
I tried adding to main.cpp

#include <fenv.h>

// Enable floating point exceptions.
feenableexcept(FE_OVERFLOW | FE_UNDERFLOW | FE_DIVBYZERO | FE_INVALID);

But then I only get:

Fri Oct 28 08:46:37, Info:  Computing LTS weights. 

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 1261509 RUNNING AT i03r11c03s10
=   KILLED BY SIGNAL: 8 (Floating point exception)
===================================================================================

when running with intel compilers in debug mode.
How can I get more info about the location of the error?

@sebwolf-de
Copy link
Contributor

Intel compilers "Debug" mode still uses some optimizations. See also #298 for that matter. Or you could try gcc, which doesn't optimize that aggresively in "Debug" mode.

@sebwolf-de
Copy link
Contributor

I tested the fully-coupled palu scenario with R&S friction, mesh BATNAS_v1.0_manual_final2, with the dunavant rule and order 6. I've got no problems so far. Can you try order 6 for your scenario?

@sebwolf-de
Copy link
Contributor

Or could you change this line https://github.com/SeisSol/Training/blob/b09bfe3ff732256d95d0ea0b64eb80f5e61b853b/sulawesi/Sulawesi_initial_stressRS.yaml#L18 to 1.0-0.999*Sx;? Then the taper is smooth.

@sebwolf-de
Copy link
Contributor

Or could you change this line https://github.com/SeisSol/Training/blob/b09bfe3ff732256d95d0ea0b64eb80f5e61b853b/sulawesi/Sulawesi_initial_stressRS.yaml#L18 to 1.0-0.999*Sx;? Then the taper is smooth.

This fixes the problem, although this is not the desired behaviour of SeisSol.

@sebwolf-de
Copy link
Contributor

After a more careful consideration, we came to the conclusion that a traction close to 0 is unphysical for R&S friction. The setup in the training is built such that in a very narrow depth zone, the traction is close to 0. Out of coincidence, with the new mesh and the new quad rule, we had a quadpoint on the fault that lay in that zero-traction-zone. The close-to-zero traction led to a negative state variable and then a lot of more unwanted stuff happening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants