Fix asymmetry of Coordinates x1f for multiple MeshBlock grids #112

felker · 2018-04-21T17:52:25Z

Description

While #98 fixed the small round-off asymmetries in Coordinates class floating-point calculations of x1f, x1v for uniform meshes composed of a single MeshBlock, such changes were largely restricted to calculations in coordinates.cpp. This resulted in a discrepancy with the calculation of MeshBlock boundaries in mesh.cpp's non-restart class constructor Mesh::Mesh()

This pull request applies UniformMeshGeneratorX1() to two locations in mesh.cpp:

Mesh::SetBlockSizeAndBoundaries() computation of block_size.x1min, etc.
SMR/AMR refinement checks

while also condensing and modularizing the logic of switching between UniformMeshGeneratorX1(Real x, RegionSize rs) and DefaultMeshGeneratorX1(Real x, RegionSize rs) in a new inlined function, ComputeMeshGeneratorX(int64_t index, int64_t nrange, bool sym_interval) which returns the different Real x ranges needed as input for the two functions. User-enrolled mesh generating functions should behave the same as DefaultMeshGeneratorX1()

This also fixes all violations of clang++ warning flag -Wshorten-64-to-32 (does not exist for g++ mentioned in #94. Most of these were in FFT and Multigrid functionalities, and involved loading the int64_t quantities LogicalLocation.lxi (in athena.hpp) and Mesh class nrbxi into int variables. @tomidakn explained that issues were unlikely to result from this, since FFT/Multigrid applications wouldn't have large lxi, nrbxi. Still, this may change in the future, and implicit conversions are usually bad, so they were all wrapped with

int var = static_cast<int>(loc.lxi)

for example. Also, the restart simulation Mesh() constructor was using an int for pointer indexing computed from IOWrapperSize_t, which is now a proper uint64_t.

Not all -Wshorten-64-to-32 violations detected by icc were fixed in this PR. Notably, the shearing box module treats nrbxi values as 32-bit integers, e.g.

./bvals/bvals_shear.cpp:        while (jtmp > (nrbx2 - 1)) jtmp -= nrbx2;

and I am not entirely convinced that we need nrbxi to be int64_t, even though they had been long int before I removed such data model-dependent types from the code.

Extending #105, the CI implementations now check for strict compliance to -Wshorten-64-to-32 clang++ warning via tst/ci/set_warning_cflag.sh.

Testing and validation

As with #98, tested locally on the 2D Rayleigh-Taylor problem. Now, used double precision HDF5 output implemented in #108 and loaded with h5py to validate that x1 symmetry was exactly preserved (see #111 for why athena_read.py could not be used).

Configured for clang++ with:

make clean; ./configure.py --prob=rt --nghost=4 -hdf5 -h5double; make

Tested with time/xorder=3 on a Cartesian mesh of 100x500 cells, but this time the MeshBlock size was 20x200 cells to ensure that 25x MeshBlocks spanned x1.

rt-ppm-nx1-500-mbnx1-20.pdf shows the density solution and the mirrored density differences using master.

rt-ppm-nx1-500-mbnx1-20-newmesh.pdf
shows the result produced by this pull request, which is exactly symmetric and matches the single MeshBlock solution produced by master.

I am not sure there is much reason to continue using DefaultMeshGeneratorX1() instead of adapting the [-0.5, 0.5] mapping of UniformMeshGeneratorX1() to nonuniform meshes-- neither will preserve symmetry in such problems. The only reason I kept the former is because https://github.com/PrincetonUniversity/athena/wiki/Coordinate-Systems-and-Meshes also uses the [0,1] logical location mapping. If we are ok with breaking existing user-defined mesh generators, then the code can be simplified.

To-do

Decide whether or not to keep nrbxi as int64_t
Fix icc warnings of illegal narrowing -Wshorten-64-to-32 for nrbx1 (or make them 32-bit integers), and reenable flag in the CI test on Jenkins.
Test with SMR/AMR
Consider replacing DefaultMeshGeneratorX1() with a x in [-0.5, 0.5] analog and changing the user-defined mesh generator instructions.

Add comments to athena.hpp to clarify 64-bit vs. 32-bit integer quantities in LogicalLocation and RegionSize structs

Need to clarify why the int64_t LogicalLocation.lxi and Mesh class nrbxi loaded by the file are unlikely/cannot to exceed in32_t limits

Flag works with icc, but not all icc-detected violations have been fixed. Comment it out for now.

Use before calling MeshGenerator_[] functions to compute their Real x arguments; use unit_interval=true to compute the [0,1] parameterization expected by user or non-uniform DefaultMeshGen. Use unit_interval=false to compute the zero-centered [-0.5,0.5] fractional location expected by UniformMeshGeneratorX1()

Now, it matches the use_uniform_meshgen_fn_ flag usage

felker · 2018-04-23T14:32:56Z

A quick and dirty attempt to test refinement symmetry shows a pretty asymmetric solution:

rt-ppm-nx1-500-nx2-1000-mbnx-50-amr-newmesh.pdf

I had never run any AMR simulations before, so I will have to spend more time on this. Could have issues with the refinement condition (copied from Kelvin-Hemholtz file), input variables, and with using athena_read.py.

felker · 2018-04-23T20:53:43Z

While I am not going to fix the AMR asymmetry right now, this test and others revealed that the PR was imperfect. The latest commit to master, 7a9737a, addresses a small floating point associativity error in the UniformMeshGeneratorX*() functions (the same issue that was fixed for PPM in #98). I am guessing that the nx1=500 tests avoided the error since the dx1f was smaller than in the nx1=303,304 tests that I tried after merging.

Another small fix to come soon.

felker · 2018-04-23T22:35:21Z

Well, it turns out the latest asymmetries were due to unsafe floating-point compiler optimizations. After merging the PR, I carelessly moved from configuring and compiling with

make clean; ./configure.py --prob=rt --nghost=4 -hdf5 -h5double; make

which used Apple LLVM version 9.1.0 (clang-902.0.39.1) to running parallel simulations with MPICH wrappings on Intel icc (ICC) 18.0.2 20180210; configured and compiled with:

make clean; ./configure.py --prob=rt --nghost=4 -hdf5 -h5double --cxx=icc -mpi; make

which causes ./configure.py to add the following compiler flags:

mpicxx  -O3 -ipo -xhost -inline-forceinline -qopenmp-simd -qopt-prefetch=4 -diag-disable 3180

and resulted in asymmetries in all the cases (regardless if MPI is used or not). The Intel compiler is more aggressive than the GNU compiler with math optimizations. Neither clang nor g++ enables `-ffast-math at any optimization level by default.

I tried adding -fp-model precise to the Intel compiler flags, but it did not fix the asymmetries. Also, -fp-model fast=1 is the default setting and it is value unsafe. Adding the strictest floating point math mode, -fp-model strict, causes the compiler to emit the following warnings at link-time:

src/reconstruct/reconstruction.cpp(140): warning #15552: loop was not vectorized with "simd"
src/reconstruct/reconstruction.cpp(217): warning #15552: loop was not vectorized with "simd"
src/reconstruct/reconstruction.cpp(294): warning #15552: loop was not vectorized with "simd"
src/hydro/rsolvers/hydro/hllc.cpp(44): warning #15552: loop was not vectorized with "simd"
src/task_list/time_integrator.cpp(777): warning #15552: loop was not vectorized with "simd"
src/bvals/bvals.cpp(1714): warning #15552: loop was not vectorized with "simd"
src/mesh/mesh.cpp(1338): warning #15552: loop was not vectorized with "simd"

but the asymmetries disappear. According to Intel fp-model documentation, the only differences between strict and precise is that the former 1) also enables floating point exception semantics and 2) disables FMA optimizations. Both modes ensure reproducibility by prohibiting changing the associativity of floating point calculations:

disables optimizations that can change the result of floating-point calculations, which is required for strict ANSI conformance.

so clearly that was not the source of the problems here. It seems that breaking the OpenMP vectorization of those loops or disabling the FMA optimizations was essential to maintaining symmetry. The ability of OpenMP SIMD loops to reorder FP operations can be toggled via the environment variable KMP_DETERMINISTIC_REDUCTION, so I will check this out. -fp-model strict is very slow, so avoiding this while maintaining symmetry would be ideal.

I will add my lesson to the Wiki as a warning to others. The ./configure.py script may benefit from a flag to load such options automatically.

For example, with mesh/nx1=304 and meshblock/nx1=76 or 304, just adding --cflag="-fp-model strict" changed:
rt-ppm-nx1-304-mbnx1-76-newmesh.pdf
rt-ppm-nx1-304-mbnx1-304-newmesh.pdf

to
rt-ppm-nx1-304-mbnx1-76-newmesh.pdf
rt-ppm-nx1-304-mbnx1-304-newmesh.pdf

Save nrbx1<<(ll-root_level) to lcl var for readability and reuse

efa14f0

felker added bug Broken functionality or unexpected result SMR/AMR Relating to static and adaptive mesh refinement testing Regression test suite and CI labels Apr 21, 2018

felker self-assigned this Apr 21, 2018

felker requested review from c-white, changgoo, jmstone and tomidakn as code owners April 21, 2018 17:52

felker mentioned this pull request Apr 21, 2018

Need to address g++, clang++, icc compiler warnings #94

Closed

11 tasks

felker added 14 commits April 21, 2018 16:46

Invert significance of use_meshgen_fn_ flag to prepare for new fn

979d6be

Fix implicit loss of precision coversion in athena_fft

2767e79

Add comments to athena.hpp to clarify 64-bit vs. 32-bit integer quantities in LogicalLocation and RegionSize structs

Static cast all implicit 64-to-32 bit int conversions in MgDriver

cf261f3

Need to clarify why the int64_t LogicalLocation.lxi and Mesh class nrbxi loaded by the file are unlikely/cannot to exceed in32_t limits

Use static_cast<int> conversions from int64_t nrbxi in mggravity

929e64f

Fix implicit conversion from unit64_t to int in Mesh restart const

9a4ddfb

Make nbmax int64_t type to be consistent with nrbxi

903a210

Now check -Wshorten-64-to-32 warning in CI tests

f63de32

Remove clang++ flag -Wshorten-64-to-32 from g++ warnings

cef20ab

Flag works with icc, but not all icc-detected violations have been fixed. Comment it out for now.

Switch between Uniform and DefualtMeshGenerator() fns

0bf5993

Compute rx in Coordinates using new ComputeMeshGeneratorX()

b667b95

Flip meaning of ComputeMeshGeneratorX boolean flag for consistency

9dfe129

Now, it matches the use_uniform_meshgen_fn_ flag usage

Use new rx wrapper function in SetBlockSizeAndBoundaries()

34725a6

Use new rx wrapper function in SMR/AMR refinement in Mesh()

63de3a0

felker force-pushed the multi-mb-symmetry branch from 90e6795 to 63de3a0 Compare April 21, 2018 20:48

felker changed the title ~~Fix double precision asymmetry of Coordinates x1f for multiple MeshBlock grids~~ Fix asymmetry of Coordinates x1f for multiple MeshBlock grids Apr 21, 2018

jmstone approved these changes Apr 23, 2018

View reviewed changes

felker merged commit 210b8d6 into master Apr 23, 2018

felker deleted the multi-mb-symmetry branch April 23, 2018 16:00

felker mentioned this pull request Apr 24, 2018

Fix floating-point asymmetry of AMR cell-centered prolong/restrict operators #114

Merged

4 tasks

felker mentioned this pull request May 17, 2018

SIMD vectorization disabled for Roe-type Riemann solvers #126

Closed

felker removed their assignment May 31, 2018

This was referenced Oct 5, 2018

Add more runtime NGHOST checks (especially for SMR/AMR) #163

Closed

MHD constrained transport calculations create asymmetric floating-point errors #165

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix asymmetry of Coordinates x1f for multiple MeshBlock grids #112

Fix asymmetry of Coordinates x1f for multiple MeshBlock grids #112

felker commented Apr 21, 2018 •

edited

Loading

felker commented Apr 23, 2018

felker commented Apr 23, 2018

felker commented Apr 23, 2018 •

edited

Loading

Fix asymmetry of Coordinates x1f for multiple MeshBlock grids #112

Fix asymmetry of Coordinates x1f for multiple MeshBlock grids #112

Conversation

felker commented Apr 21, 2018 • edited Loading

Description

Testing and validation

To-do

felker commented Apr 23, 2018

felker commented Apr 23, 2018

felker commented Apr 23, 2018 • edited Loading

felker commented Apr 21, 2018 •

edited

Loading

felker commented Apr 23, 2018 •

edited

Loading