Add Modular fp #119

belericant · 2023-03-10T05:46:44Z

Addresses issue #42

sbryngelson · 2023-03-10T13:11:11Z

I found a few things:

We are still using dble, which converts to double precision: https://gcc.gnu.org/onlinedocs/gfortran/DBLE.html

you can get rid of the calls to dble:

[I]shb-m1pro: MFC/src $ grep -iR 'dble' ./*
./post_process/m_global_parameters.f90:        Re_trans = dble(trans)
./post_process/m_global_parameters.f90:                      + dble(ir - 1)*DLOG(R0mx/R0mn)/dble(Npt - 1)
./pre_process/m_global_parameters.fpp:        Re_trans = dble(trans)
./pre_process/m_global_parameters.fpp:                      + dble(ir - 1)*DLOG(R0mx/R0mn)/dble(nb - 1)
./simulation/m_global_parameters.fpp:        Re_trans = dble(trans)
./simulation/m_global_parameters.fpp:                      + dble(ir - 1)*DLOG(R0mx/R0mn)/dble(nb - 1)

Please change these to lowercase exp/sqrt:

[I]shb-m1pro: MFC/src $ grep -R 'EXP(' ./*
./post_process/m_global_parameters.f90:        c3 = (CEXP(c2) - CEXP(-c2))/(CEXP(c2) + CEXP(-c2)) ! TANH(c2)
./post_process/m_global_parameters.f90:        R0mn = 0.8_wp*EXP(-2.8_wp*sd)
./post_process/m_global_parameters.f90:        R0mx = 0.2_wp*EXP(9.5_wp*sd) + 1._wp
./post_process/m_global_parameters.f90:            R0(ir) = EXP(phi(ir))
./post_process/m_global_parameters.f90:            tmp = EXP(-0.5_wp*(phi(ir)/sd)**2)/SQRT(2._wp*pi)/sd
./post_process/m_global_parameters.f90:        tmp = EXP(-0.5_wp*(phi(1)/sd)**2)/SQRT(2._wp*pi)/sd
./post_process/m_global_parameters.f90:        tmp = EXP(-0.5_wp*(phi(Npt)/sd)**2)/SQRT(2._wp*pi)/sd
./pre_process/m_global_parameters.fpp:        c3 = (CEXP(c2) - CEXP(-c2))/(CEXP(c2) + CEXP(-c2)) ! TANH(c2)
./pre_process/m_global_parameters.fpp:        R0mn = 0.8_wp*EXP(-2.8_wp*sd)
./pre_process/m_global_parameters.fpp:        R0mx = 0.2_wp*EXP(9.5_wp*sd) + 1._wp
./pre_process/m_global_parameters.fpp:            R0(ir) = EXP(phi(ir))
./pre_process/m_global_parameters.fpp:            tmp = EXP(-0.5_wp*(phi(ir)/sd)**2)/SQRT(2._wp*pi)/sd
./pre_process/m_global_parameters.fpp:        tmp = EXP(-0.5_wp*(phi(1)/sd)**2)/SQRT(2._wp*pi)/sd
./pre_process/m_global_parameters.fpp:        tmp = EXP(-0.5_wp*(phi(nb)/sd)**2)/SQRT(2._wp*pi)/sd
./simulation/m_global_parameters.fpp:        c3 = (CEXP(c2) - CEXP(-c2))/(CEXP(c2) + CEXP(-c2)) ! TANH(c2)
./simulation/m_global_parameters.fpp:        R0mn = 0.8_wp*EXP(-2.8_wp*sd)
./simulation/m_global_parameters.fpp:        R0mx = 0.2_wp*EXP(9.5_wp*sd) + 1._wp
./simulation/m_global_parameters.fpp:            R0(ir) = EXP(phi(ir))
./simulation/m_global_parameters.fpp:            tmp = EXP(-0.5_wp*(phi(ir)/sd)**2)/SQRT(2._wp*pi)/sd
./simulation/m_global_parameters.fpp:        tmp = EXP(-0.5_wp*(phi(1)/sd)**2)/SQRT(2._wp*pi)/sd
./simulation/m_global_parameters.fpp:        tmp = EXP(-0.5_wp*(phi(nb)/sd)**2)/SQRT(2._wp*pi)/sd

And these:

[I]shb-m1pro: MFC/src $ grep -R 'SQRT(' ./*
./common/m_helper.f90:        ntmp = SQRT((4._wp*pi/3._wp)*nR3/vftmp)
./post_process/m_global_parameters.f90:        uu = SQRT(pl0/rhol0)
./post_process/m_global_parameters.f90:        phi_vn = (1._wp + SQRT(mu_v/mu_n)*(M_n/M_v)**(0.25_wp))**2 &
./post_process/m_global_parameters.f90:                 /(SQRT(8._wp)*SQRT(1._wp + M_v/M_n))
./post_process/m_global_parameters.f90:        phi_nv = (1._wp + SQRT(mu_n/mu_v)*(M_v/M_n)**(0.25_wp))**2 &
./post_process/m_global_parameters.f90:                 /(SQRT(8._wp)*SQRT(1._wp + M_n/M_v))
./post_process/m_global_parameters.f90:        omegaN = SQRT(3._wp*k_poly*Ca + 2._wp*(3._wp*k_poly - 1._wp)/(Web*R0))/R0
./post_process/m_global_parameters.f90:        c2 = CSQRT(c1)
./post_process/m_global_parameters.f90:            tmp = EXP(-0.5_wp*(phi(ir)/sd)**2)/SQRT(2._wp*pi)/sd
./post_process/m_global_parameters.f90:        tmp = EXP(-0.5_wp*(phi(1)/sd)**2)/SQRT(2._wp*pi)/sd
./post_process/m_global_parameters.f90:        tmp = EXP(-0.5_wp*(phi(Npt)/sd)**2)/SQRT(2._wp*pi)/sd
./pre_process/m_global_parameters.fpp:        uu = SQRT(pl0/rhol0)
./pre_process/m_global_parameters.fpp:        phi_vn = (1._wp + SQRT(mu_v/mu_n)*(M_n/M_v)**(0.25_wp))**2 &
./pre_process/m_global_parameters.fpp:                 /(SQRT(8._wp)*SQRT(1._wp + M_v/M_n))
./pre_process/m_global_parameters.fpp:        phi_nv = (1._wp + SQRT(mu_n/mu_v)*(M_v/M_n)**(0.25_wp))**2 &
./pre_process/m_global_parameters.fpp:                 /(SQRT(8._wp)*SQRT(1._wp + M_n/M_v))
./pre_process/m_global_parameters.fpp:        omegaN = SQRT(3._wp*k_poly*Ca + 2._wp*(3._wp*k_poly - 1._wp)/(Web*R0))/R0
./pre_process/m_global_parameters.fpp:        c2 = CSQRT(c1)
./pre_process/m_global_parameters.fpp:            tmp = EXP(-0.5_wp*(phi(ir)/sd)**2)/SQRT(2._wp*pi)/sd
./pre_process/m_global_parameters.fpp:        tmp = EXP(-0.5_wp*(phi(1)/sd)**2)/SQRT(2._wp*pi)/sd
./pre_process/m_global_parameters.fpp:        tmp = EXP(-0.5_wp*(phi(nb)/sd)**2)/SQRT(2._wp*pi)/sd
./pre_process/m_patches.f90:                    !    q_prim_vf(E_idx)%sf(i,j,0) = (1._wp * (10._wp ** 5)) + 1._wp/SQRT(x_cc(i)**2+y_cc(j)**2) &
./pre_process/m_patches.f90:                    ! radius_pressure = SQRT(x_cc(i)**2) ! 1D
./pre_process/m_patches.f90:                    ! radius_pressure = SQRT(x_cc(i)**2 + cart_y**2) ! 2D
./pre_process/m_patches.f90:                    ! radius_pressure = SQRT(x_cc(i)**2 + cart_y**2 + cart_z**2) ! 3D
./simulation/m_qbmm.fpp:                            c = SQRT(c)
./simulation/m_qbmm.fpp:        fup(1) = bu - SQRT(c2)
./simulation/m_qbmm.fpp:        fup(2) = bu + SQRT(c2)
./simulation/m_global_parameters.fpp:        uu = SQRT(pl0/rhol0)
./simulation/m_global_parameters.fpp:        phi_vn = (1._wp + SQRT(mu_v/mu_n)*(M_n/M_v)**(0.25_wp))**2 &
./simulation/m_global_parameters.fpp:                 /(SQRT(8._wp)*SQRT(1._wp + M_v/M_n))
./simulation/m_global_parameters.fpp:        phi_nv = (1._wp + SQRT(mu_n/mu_v)*(M_v/M_n)**(0.25_wp))**2 &
./simulation/m_global_parameters.fpp:                 /(SQRT(8._wp)*SQRT(1._wp + M_n/M_v))
./simulation/m_global_parameters.fpp:        omegaN = SQRT(3._wp*k_poly*Ca + 2._wp*(3._wp*k_poly - 1._wp)/(Web*R0))/R0
./simulation/m_global_parameters.fpp:        c2 = CSQRT(c1)
./simulation/m_global_parameters.fpp:            tmp = EXP(-0.5_wp*(phi(ir)/sd)**2)/SQRT(2._wp*pi)/sd
./simulation/m_global_parameters.fpp:        tmp = EXP(-0.5_wp*(phi(1)/sd)**2)/SQRT(2._wp*pi)/sd
./simulation/m_global_parameters.fpp:        tmp = EXP(-0.5_wp*(phi(nb)/sd)**2)/SQRT(2._wp*pi)/sd
./simulation/m_data_output.fpp:                        nbub = SQRT((4._wp*pi/3._wp)*nR3/alf)
./simulation/m_data_output.fpp:                            nbub = SQRT((4._wp*pi/3._wp)*nR3/alf)
./simulation/m_bubbles.fpp:                            c_liquid = SQRT(n_tait*(myP + B_tait)/(myRho*(1._wp - alf)))
./simulation/m_bubbles.fpp:        f_cgas = SQRT(tmp + (fntait - 1._wp)*fH)

sbryngelson · 2023-03-10T13:20:29Z

@anshgupta1234 would you mind giving this PR a review as well, just to check for typos that might be around? I looked, but another set of eyes would be useful. Thanks!

anshgupta1234

Didn't find anything else. Looking good!

henryleberre · 2023-03-11T23:26:43Z

@sbryngelson What do you think about moving the contents of m_precision_select into m_constants ?

I wonder how NVFortran will handle precision selection because we specify -Mr8:

-Mr8: (Fortran only) The compiler promotes REAL variables and constants to DOUBLE PRECISION variables and constants, respectively. DOUBLE PRECISION elements are 8 bytes in length.

-Mnor8: (Fortran only) The compiler does not promote REAL variables and constants to DOUBLE PRECISION. REAL variables will be single precision (4 bytes in length).

From https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-ref-guide/#cmdln-m-opts-optimize. It's unclear to me when and how this "promotion" happens.

In any case, should this feature be made available to the build system? I feel like another way we could have implemented this would have been to use real everywhere and let CMake set the appropriate compiler flags. We already use the flags that turn real into double everywhere. Setting single precision could (possibly) be as simple as not specifying these flags.

The benefit of the approach @belericant took is that you could elect not to use working precision for kernels that need double precision. It's modularity at the cost of verbosity. Note that we could still achieve this with the other approach by adding/removing compiler flags depending on what file is being compiled.

One last remark, we may lose precision in a few places by being aggressive in the way we enforce working precision. Consider this example

x_cc(i) = x_domain%beg + (5._wp * (10._wp ** -(1)))*dx*real(2*i + 1, wp)

(5._wp * (10._wp ** -(1))) is a constant evaluated at compile-time. All operations that lead to its evaluation are done in working precision. It might be preferable to compute this in double precision and then cast it to single working precision for better fidelity.

sbryngelson · 2023-03-12T03:21:57Z

@sbryngelson What do you think about moving the contents of m_precision_select into m_constants ?

@henryleberre I don't have particular preference about this, seems fine.

I wonder how NVFortran will handle precision selection because we specify -Mr8:

-Mr8: (Fortran only) The compiler promotes REAL variables and constants to DOUBLE PRECISION variables and constants, respectively. DOUBLE PRECISION elements are 8 bytes in length.

-Mnor8: (Fortran only) The compiler does not promote REAL variables and constants to DOUBLE PRECISION. REAL variables will be single precision (4 bytes in length).

From [docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-ref-guide/#cmdln-m-opts-optimize](https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-ref-guide/#cmdln-m-opts-optimize). It's unclear to me when and how this "promotion" happens.

In any case, should this feature be made available to the build system? I feel like another way we could have implemented this would have been to use real everywhere and let CMake set the appropriate compiler flags. We already use the flags that turn real into double everywhere. Setting single precision could (possibly) be as simple as not specifying these flags.

The benefit of the approach @belericant took is that you could elect not to use working precision for kernels that need double precision. It's modularity at the cost of verbosity. Note that we could still achieve this with the other approach by adding/removing compiler flags depending on what file is being compiled.

This current setup might be messing with @belericant 's tests, then, no?

But to your main point: I think we will probably have to abandon any forced precision flags. The reasoning is that we now have fine-grained control, which is what I would like. I'm not sure even compiling each module with different flags would do the trick here because some "setup" routines like s_initialize_module_XXXXX should presumably be done in higher precision (since it's only done once). Then we only use a lower precision in compute kernels.

One last remark, we may lose precision in a few places by being aggressive in the way we enforce working precision. Consider this example

x_cc(i) = x_domain%beg + (5._wp * (10._wp ** -(1)))*dx*real(2*i + 1, wp)
(5._wp * (10._wp ** -(1))) 

is a constant evaluated at compile-time. All operations that lead to its evaluation are done in working precision. It might be preferable to compute this in double precision and then cast it to single working precision for better fidelity.

@henryleberre If I understand correctly, there shouldn't be a meaningful difference as long as the computation is backward stable and not victim to catastrophic cancelation.
But I'm not sure I am fully understanding.

sbryngelson · 2023-04-23T01:42:28Z

is this ready for merge? @belericant

sbryngelson · 2023-04-23T01:43:20Z

we should also check performance on cpu and gpu, perhaps @wilfonba can pass along his script for that. i would also be curious to see the performance numbers on one of those cases using single precision on a V100 or A100

belericant · 2023-04-24T00:49:45Z

is this ready for merge? @belericant

@wilfonba found issues exporting 3d runs in single precision but that isn't due to the modifications necessary to have modular working precision. I don't know if it makes more sense for that to be fixed in this PR or in a new PR.

sbryngelson · 2023-04-24T14:04:56Z

What happens if you use wp=single and precision (output) = 1 or 2, and if you use wp=double and precision = 1 or 2? Do any of these cases break? are they documented? @belericant @wilfonba

wilfonba · 2023-04-24T14:29:44Z

@sbryngelson That's the logic I still need to add

sbryngelson · 2023-04-24T14:54:26Z

Ok I will merge once that is complete.

wilfonba · 2023-04-27T18:13:48Z

Numerical Schlieren doesn't seem to work with single working precision. When I run with double working precision and save single precision silo files it's nearly uniformly 1 after the first 10 time steps, but if I run with single precision and save in single precision it's uniformly zero after the first 10 time steps.

sbryngelson · 2023-04-27T22:16:25Z

Numerical Schlieren doesn't seem to work with single working precision. When I run with double working precision and save single precision silo files it's nearly uniformly 1 after the first 10 time steps, but if I run with single precision and save in single precision it's uniformly zero after the first 10 time steps.

https://github.com/belericant/MFC/blob/4134a660e77c20e0639f5565e768768728caa39f/src/post_process/m_derived_variables.fpp#L573-L718 here is the relevant code. It could be sensitive to precision issues, as well as the value of schlieren_alpha(i). What do you use for these?

wilfonba · 2023-04-27T22:19:51Z

@sbryngelson
'schlieren_alpha(1)' : 40 'schlieren_alpha(2)' : 400

sbryngelson · 2023-04-27T22:24:06Z

I don't know if these are reasonable or not, I haven't used the feature in a while.

wilfonba · 2023-04-27T22:25:16Z

It's one of the example cases, so I'd assume they're reasonable, but I can run it longer to see

…recision and silo precision on 1, 2, and 3 dimension.

wilfonba · 2023-05-10T23:27:01Z

@sbryngelson Silo output works for all combinations now. For a 3D shock droplet with 2 million cells on 2 V100's with case optimization enabled, I get the following times per time-step (over a 1000 time-step run)

Double Precision: 3.288e-2 s/step (~109,500 steps/hr)
Single Precision: 2.499e-2 s/step (~144,000 steps/hr)

for a 31.5% increase in steps/hr.

I did have to comment out the calls to cufftexecd2z and cufftexecz2d in m_fftw.fpp for it to compile on GPUs with NVHPC in single precision, works fine on CPUs with GNU though. It looked like on GPUs fftw3 was compiled after I got these errors, not sure what's up with that. Also, some of the test cases don't pass on double precision, likely a bug introduced somewhere in all the changes this pr has made or me merging it with upstream. None of the test cases pass with wp = single_precision because they all request double-precision silo files and those options are incompatible. Perhaps there's a way to modify the toolchain such that you pass the working precision as an option and m_precision_select.f90 is automatically modified and changes all the test cases to use the correct precision silo files? @henryleberre or @anshgupta1234 could probably add this far more quickly than I can given they have experience with it.

sbryngelson · 2023-05-10T23:33:58Z

@wilfonba

It looked like on GPUs fftw3 was compiled after I got these errors, not sure what's up with that.

I've seen this recently as well. It doesn't build fftw when using GPUs (because I guess it finds cuFFT?) but it will build fftw for the post-processing code for CPU purposes.

None of the test cases pass with wp = single_precision because they all request double-precision silo files and those options are incompatible.

I don't understand what you mean here. What options? We don't have silo golden files (though we do use h5dump to check for obviously incorrect hdf5/silo files).

Also, some of the test cases don't pass on double precision, likely a bug introduced somewhere in all the changes this pr has made or me merging it with upstream

This will have to be fixed of course.

Perhaps there's a way to modify the toolchain such that you pass the working precision as an option and m_precision_select.f90 is automatically modified and changes all the test cases to use the correct precision silo files?

Presumably, we would like the precision to be selected in the case.py file. In this situation, we would just double the number of test cases, a set for single and a set for double precision.

wilfonba · 2023-05-10T23:46:59Z

@sbryngelson

I don't understand what you mean here. What options? We don't have silo golden files (though we do use h5dump to check for obviously incorrect hdf5/silo files).>

I have a check in m_checker that says single working precision and double precision silo files are incompatible because running a simulation in single precision and then requesting a double precision silo file doesn't make much sense. I could make it so that if single working precision is used and double silo precision is requested the single precision data is cast to double precision, but I think it makes more sense to just disallow that combination for practical purposes. I suppose allowing it would fix the testing problem.

Presumably, we would like the precision to be selected in the case.py file. In this situation, we would just double the number of test cases, a set for single and a set for double precision.>

I see how this would be desirable, but I don't think it's as straightforward as other case file options. All of the working precision differences appear to take effect at compile time, not run time, while all current case file options are run time. I'm guessing trying to change precision at run time would cause a bunch of run time errors, but I can check. (edit. since the case file goes through mfc.sh I think we could just have the toolchain catch this option and rebuild everything)

I'll work on getting the test cases to pass given all the changes, and talk with Henry/Ansh about toolchain changes for testing.

wilfonba · 2023-05-13T18:07:27Z

Came across a cool tool that takes math expressions and the expected ranges of each variable and outputs the best way to write the expression with the least loss of precision. May be useful if there are any calculations in the code that we might suspect to have problems with single precision.

https://herbie.uwplse.org/

sbryngelson · 2023-05-13T18:20:40Z

Seems like it could be useful!

added ci added ci things2 things added single precision to test suite and generated golden files added working precision to case_dicts, testing, and documentation fixed code not passing using double working precision

belericant requested a review from sbryngelson as a code owner March 10, 2023 05:46

sbryngelson requested a review from anshgupta1234 March 10, 2023 13:20

anshgupta1234 approved these changes Mar 10, 2023

View reviewed changes

henryleberre linked an issue Mar 11, 2023 that may be closed by this pull request

Switching to modern and modular precision declaration #42

Open

henryleberre added enhancement New feature or request good first issue Good for newcomers labels Mar 11, 2023

henryleberre self-requested a review March 11, 2023 23:32

henryleberre pushed a commit to henryleberre/ChemMFC that referenced this pull request Mar 21, 2023

Merge pull request MFlowCode#119 from JRChreim/master

0d26140

belericant added 3 commits April 20, 2023 23:26

add fp flags

7a84f88

change to generic intrisincs

e667ac1

change inline constants to use working precision

06bcf95

belericant force-pushed the modular-fp branch from 6d73bf8 to 3cad307 Compare April 21, 2023 05:14

remove dble casts

8fa3412

belericant force-pushed the modular-fp branch from 3cad307 to 8fa3412 Compare April 21, 2023 05:19

fixed 1D output

4134a66

wilfonba added 4 commits May 6, 2023 16:31

all combinations of working precision and output precision work for 1D

002e54a

works for all combinations in 2D

0e1b64b

Single precision silo files working for all combinations of working p…

c14d5a7

…recision and silo precision on 1, 2, and 3 dimension.

Merge remote-tracking branch 'upstream' into modular-fp

ebadc94

sbryngelson assigned wilfonba May 15, 2023

Finishing touches

843a50b

added ci added ci things2 things added single precision to test suite and generated golden files added working precision to case_dicts, testing, and documentation fixed code not passing using double working precision

wilfonba force-pushed the modular-fp branch from 22629b4 to 843a50b Compare May 16, 2023 23:10

Update ci.yml

e108882

sbryngelson mentioned this pull request May 24, 2023

Intel compilers require some help #156

Closed

wilfonba added 5 commits May 26, 2023 11:46

Merge remote-tracking branch 'upstream/master' into modular-fp

20a205a

small hypoelastic changes

7ac4db8

another fix

7693941

revert to old hypoelastic test tolerance

3661c63

tests pass?

5bb0cf0

sbryngelson marked this pull request as draft December 20, 2023 20:53

sbryngelson unassigned wilfonba May 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Modular fp #119

Add Modular fp #119

belericant commented Mar 10, 2023

sbryngelson commented Mar 10, 2023 •

edited

sbryngelson commented Mar 10, 2023

anshgupta1234 left a comment

henryleberre commented Mar 11, 2023 •

edited

sbryngelson commented Mar 12, 2023 •

edited

sbryngelson commented Apr 23, 2023

sbryngelson commented Apr 23, 2023

belericant commented Apr 24, 2023

sbryngelson commented Apr 24, 2023

wilfonba commented Apr 24, 2023

sbryngelson commented Apr 24, 2023

wilfonba commented Apr 27, 2023

sbryngelson commented Apr 27, 2023

wilfonba commented Apr 27, 2023

sbryngelson commented Apr 27, 2023

wilfonba commented Apr 27, 2023

wilfonba commented May 10, 2023 •

edited

sbryngelson commented May 10, 2023 •

edited

wilfonba commented May 10, 2023 •

edited

wilfonba commented May 13, 2023

sbryngelson commented May 13, 2023

Add Modular fp #119

Are you sure you want to change the base?

Add Modular fp #119

Conversation

belericant commented Mar 10, 2023

sbryngelson commented Mar 10, 2023 • edited

sbryngelson commented Mar 10, 2023

anshgupta1234 left a comment

Choose a reason for hiding this comment

henryleberre commented Mar 11, 2023 • edited

sbryngelson commented Mar 12, 2023 • edited

sbryngelson commented Apr 23, 2023

sbryngelson commented Apr 23, 2023

belericant commented Apr 24, 2023

sbryngelson commented Apr 24, 2023

wilfonba commented Apr 24, 2023

sbryngelson commented Apr 24, 2023

wilfonba commented Apr 27, 2023

sbryngelson commented Apr 27, 2023

wilfonba commented Apr 27, 2023

sbryngelson commented Apr 27, 2023

wilfonba commented Apr 27, 2023

wilfonba commented May 10, 2023 • edited

sbryngelson commented May 10, 2023 • edited

wilfonba commented May 10, 2023 • edited

wilfonba commented May 13, 2023

sbryngelson commented May 13, 2023

sbryngelson commented Mar 10, 2023 •

edited

henryleberre commented Mar 11, 2023 •

edited

sbryngelson commented Mar 12, 2023 •

edited

wilfonba commented May 10, 2023 •

edited

sbryngelson commented May 10, 2023 •

edited

wilfonba commented May 10, 2023 •

edited