Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix asymmetry of Coordinates x1f for multiple MeshBlock grids #112

Merged
merged 15 commits into from
Apr 23, 2018

Conversation

felker
Copy link
Contributor

@felker felker commented Apr 21, 2018

Description

While #98 fixed the small round-off asymmetries in Coordinates class floating-point calculations of x1f, x1v for uniform meshes composed of a single MeshBlock, such changes were largely restricted to calculations in coordinates.cpp. This resulted in a discrepancy with the calculation of MeshBlock boundaries in mesh.cpp's non-restart class constructor Mesh::Mesh()

This pull request applies UniformMeshGeneratorX1() to two locations in mesh.cpp:

  • Mesh::SetBlockSizeAndBoundaries() computation of block_size.x1min, etc.
  • SMR/AMR refinement checks

while also condensing and modularizing the logic of switching between UniformMeshGeneratorX1(Real x, RegionSize rs) and DefaultMeshGeneratorX1(Real x, RegionSize rs) in a new inlined function, ComputeMeshGeneratorX(int64_t index, int64_t nrange, bool sym_interval) which returns the different Real x ranges needed as input for the two functions. User-enrolled mesh generating functions should behave the same as DefaultMeshGeneratorX1()


This also fixes all violations of clang++ warning flag -Wshorten-64-to-32 (does not exist for g++ mentioned in #94. Most of these were in FFT and Multigrid functionalities, and involved loading the int64_t quantities LogicalLocation.lxi (in athena.hpp) and Mesh class nrbxi into int variables. @tomidakn explained that issues were unlikely to result from this, since FFT/Multigrid applications wouldn't have large lxi, nrbxi. Still, this may change in the future, and implicit conversions are usually bad, so they were all wrapped with

int var = static_cast<int>(loc.lxi)

for example. Also, the restart simulation Mesh() constructor was using an int for pointer indexing computed from IOWrapperSize_t, which is now a proper uint64_t.

Not all -Wshorten-64-to-32 violations detected by icc were fixed in this PR. Notably, the shearing box module treats nrbxi values as 32-bit integers, e.g.

./bvals/bvals_shear.cpp:        while (jtmp > (nrbx2 - 1)) jtmp -= nrbx2;

and I am not entirely convinced that we need nrbxi to be int64_t, even though they had been long int before I removed such data model-dependent types from the code.

Extending #105, the CI implementations now check for strict compliance to -Wshorten-64-to-32 clang++ warning via tst/ci/set_warning_cflag.sh.

Testing and validation

As with #98, tested locally on the 2D Rayleigh-Taylor problem. Now, used double precision HDF5 output implemented in #108 and loaded with h5py to validate that x1 symmetry was exactly preserved (see #111 for why athena_read.py could not be used).

Configured for clang++ with:

make clean; ./configure.py --prob=rt --nghost=4 -hdf5 -h5double; make

Tested with time/xorder=3 on a Cartesian mesh of 100x500 cells, but this time the MeshBlock size was 20x200 cells to ensure that 25x MeshBlocks spanned x1.

rt-ppm-nx1-500-mbnx1-20.pdf shows the density solution and the mirrored density differences using master.

rt-ppm-nx1-500-mbnx1-20-newmesh.pdf
shows the result produced by this pull request, which is exactly symmetric and matches the single MeshBlock solution produced by master.

I am not sure there is much reason to continue using DefaultMeshGeneratorX1() instead of adapting the [-0.5, 0.5] mapping of UniformMeshGeneratorX1() to nonuniform meshes-- neither will preserve symmetry in such problems. The only reason I kept the former is because https://github.com/PrincetonUniversity/athena/wiki/Coordinate-Systems-and-Meshes also uses the [0,1] logical location mapping. If we are ok with breaking existing user-defined mesh generators, then the code can be simplified.

To-do

  • Decide whether or not to keep nrbxi as int64_t
  • Fix icc warnings of illegal narrowing -Wshorten-64-to-32 for nrbx1 (or make them 32-bit integers), and reenable flag in the CI test on Jenkins.
  • Test with SMR/AMR
  • Consider replacing DefaultMeshGeneratorX1() with a x in [-0.5, 0.5] analog and changing the user-defined mesh generator instructions.

@felker felker added bug Broken functionality or unexpected result SMR/AMR Relating to static and adaptive mesh refinement testing Regression test suite and CI labels Apr 21, 2018
@felker felker self-assigned this Apr 21, 2018
Add comments to athena.hpp to clarify 64-bit vs. 32-bit
integer quantities in LogicalLocation and RegionSize structs
Need to clarify why the int64_t LogicalLocation.lxi and Mesh class
nrbxi loaded by the file are unlikely/cannot to exceed in32_t
limits
Flag works with icc, but not all icc-detected violations have
been fixed. Comment it out for now.
Use before calling MeshGenerator_[] functions to compute their
Real x arguments; use unit_interval=true to compute the [0,1]
parameterization expected by user or non-uniform DefaultMeshGen.
Use unit_interval=false to compute the zero-centered [-0.5,0.5]
fractional location expected by UniformMeshGeneratorX1()
Now, it matches the use_uniform_meshgen_fn_ flag usage
@felker felker changed the title Fix double precision asymmetry of Coordinates x1f for multiple MeshBlock grids Fix asymmetry of Coordinates x1f for multiple MeshBlock grids Apr 21, 2018
@felker
Copy link
Contributor Author

felker commented Apr 23, 2018

A quick and dirty attempt to test refinement symmetry shows a pretty asymmetric solution:

rt-ppm-nx1-500-nx2-1000-mbnx-50-amr-newmesh.pdf

I had never run any AMR simulations before, so I will have to spend more time on this. Could have issues with the refinement condition (copied from Kelvin-Hemholtz file), input variables, and with using athena_read.py.

@felker felker merged commit 210b8d6 into master Apr 23, 2018
@felker felker deleted the multi-mb-symmetry branch April 23, 2018 16:00
@felker
Copy link
Contributor Author

felker commented Apr 23, 2018

While I am not going to fix the AMR asymmetry right now, this test and others revealed that the PR was imperfect. The latest commit to master, 7a9737a, addresses a small floating point associativity error in the UniformMeshGeneratorX*() functions (the same issue that was fixed for PPM in #98). I am guessing that the nx1=500 tests avoided the error since the dx1f was smaller than in the nx1=303,304 tests that I tried after merging.

Another small fix to come soon.

@felker
Copy link
Contributor Author

felker commented Apr 23, 2018

Well, it turns out the latest asymmetries were due to unsafe floating-point compiler optimizations. After merging the PR, I carelessly moved from configuring and compiling with

make clean; ./configure.py --prob=rt --nghost=4 -hdf5 -h5double; make

which used Apple LLVM version 9.1.0 (clang-902.0.39.1) to running parallel simulations with MPICH wrappings on Intel icc (ICC) 18.0.2 20180210; configured and compiled with:

make clean; ./configure.py --prob=rt --nghost=4 -hdf5 -h5double --cxx=icc -mpi; make

which causes ./configure.py to add the following compiler flags:

mpicxx  -O3 -ipo -xhost -inline-forceinline -qopenmp-simd -qopt-prefetch=4 -diag-disable 3180

and resulted in asymmetries in all the cases (regardless if MPI is used or not). The Intel compiler is more aggressive than the GNU compiler with math optimizations. Neither clang nor g++ enables `-ffast-math at any optimization level by default.

I tried adding -fp-model precise to the Intel compiler flags, but it did not fix the asymmetries. Also, -fp-model fast=1 is the default setting and it is value unsafe. Adding the strictest floating point math mode, -fp-model strict, causes the compiler to emit the following warnings at link-time:

src/reconstruct/reconstruction.cpp(140): warning #15552: loop was not vectorized with "simd"
src/reconstruct/reconstruction.cpp(217): warning #15552: loop was not vectorized with "simd"
src/reconstruct/reconstruction.cpp(294): warning #15552: loop was not vectorized with "simd"
src/hydro/rsolvers/hydro/hllc.cpp(44): warning #15552: loop was not vectorized with "simd"
src/task_list/time_integrator.cpp(777): warning #15552: loop was not vectorized with "simd"
src/bvals/bvals.cpp(1714): warning #15552: loop was not vectorized with "simd"
src/mesh/mesh.cpp(1338): warning #15552: loop was not vectorized with "simd"

but the asymmetries disappear. According to Intel fp-model documentation, the only differences between strict and precise is that the former 1) also enables floating point exception semantics and 2) disables FMA optimizations. Both modes ensure reproducibility by prohibiting changing the associativity of floating point calculations:

disables optimizations that can change the result of floating-point calculations, which is required for strict ANSI conformance.

so clearly that was not the source of the problems here. It seems that breaking the OpenMP vectorization of those loops or disabling the FMA optimizations was essential to maintaining symmetry. The ability of OpenMP SIMD loops to reorder FP operations can be toggled via the environment variable KMP_DETERMINISTIC_REDUCTION, so I will check this out. -fp-model strict is very slow, so avoiding this while maintaining symmetry would be ideal.

I will add my lesson to the Wiki as a warning to others. The ./configure.py script may benefit from a flag to load such options automatically.

For example, with mesh/nx1=304 and meshblock/nx1=76 or 304, just adding --cflag="-fp-model strict" changed:
rt-ppm-nx1-304-mbnx1-76-newmesh.pdf
rt-ppm-nx1-304-mbnx1-304-newmesh.pdf

to
rt-ppm-nx1-304-mbnx1-76-newmesh.pdf
rt-ppm-nx1-304-mbnx1-304-newmesh.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Broken functionality or unexpected result SMR/AMR Relating to static and adaptive mesh refinement testing Regression test suite and CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants