SIGFPE: Floating-point exception in CH4Mod.F90 #6428

ndkeen · 2024-05-17T05:12:47Z

On pm-cpu (as well as gcp12) this DEBUG test
SMS_D_Ld1.ne30pg2_r05_IcoswISC30E3r5.WCYCLSSP370.pm-cpu_gnu.allactive-wcprodssp
fails with GNU.

364: #0  0x14b320651dbf in ???
364: #1  0x36f58e7 in ch4_tran
364:    at /global/cfs/cdirs/e3sm/ndk/repos/ndk_mf_pm-avoid-DVS-warning/components/elm/src/biogeochem/CH4Mod.F90:3121
364: #2  0x37353e5 in __ch4mod_MOD_ch4
364:    at /global/cfs/cdirs/e3sm/ndk/repos/ndk_mf_pm-avoid-DVS-warning/components/elm/src/biogeochem/CH4Mod.F90:1709
364: #3  0x2a16186 in __elm_driver_MOD_elm_drv
364:    at /global/cfs/cdirs/e3sm/ndk/repos/ndk_mf_pm-avoid-DVS-warning/components/elm/src/main/elm_driver.F90:1189
364: #4  0x29dd153 in __lnd_comp_mct_MOD_lnd_run_mct
364:    at /global/cfs/cdirs/e3sm/ndk/repos/ndk_mf_pm-avoid-DVS-warning/components/elm/src/cpl/lnd_comp_mct.F90:617
364: #5  0x49fdab in __component_mod_MOD_component_run
364:    at /global/cfs/cdirs/e3sm/ndk/repos/ndk_mf_pm-avoid-DVS-warning/driver-mct/main/component_mod.F90:734
364: #6  0x483563 in __cime_comp_mod_MOD_cime_run
364:    at /global/cfs/cdirs/e3sm/ndk/repos/ndk_mf_pm-avoid-DVS-warning/driver-mct/main/cime_comp_mod.F90:2968
364: #7  0x49d116 in cime_driver
364:    at /global/cfs/cdirs/e3sm/ndk/repos/ndk_mf_pm-avoid-DVS-warning/driver-mct/main/cime_driver.F90:153
364: #8  0x49d179 in main
364:    at /global/cfs/cdirs/e3sm/ndk/repos/ndk_mf_pm-avoid-DVS-warning/driver-mct/main/cime_driver.F90:23
srun: error: nid004572: task 364: Floating point exception
srun: Terminating StepId=25683685.0

In components/elm/src/biogeochem/CH4Mod.F90

      ! Perform competition for oxygen and methane in each soil layer if demands over the course of the timestep                                                                                                                                                    
      ! exceed that available. Assign to each process in proportion to the quantity demanded in the absense of                                                                                                                                                      
      ! the limitation.                                                                                                                                                                                                                                             
      do j = 1,nlevsoi
         do fc = 1, num_methc
            c = filter_methc (fc)

            o2demand = o2_decomp_depth(c,j) + o2_oxid_depth(c,j) ! o2_decomp_depth includes autotrophic root respiration                                                                                                                                            
            if (o2demand > 0._r8) then
               o2stress(c,j) = min((conc_o2(c,j) / dtime + o2_aere_depth(c,j)) / o2demand, 1._r8)   ! <-- line 3121
            else
               o2stress(c,j) = 1._r8

The text was updated successfully, but these errors were encountered:

rljacob · 2024-05-29T00:35:13Z

This would get more traction with a title that points to the routine. Fixed it. The test name doesn't help for these fully coupled water cycle cases since so many components are running.

ndkeen · 2024-05-29T17:57:40Z

Adding prints, I see that o2demand=1.2750394683855957E-312 at crash point. Which is above 0._r8 and seems reasonable to me, but this compiler isn't happy with that.

If I try something like this:

           mach_eps       = epsilon(1.0_r8)
       
       ...
            !if (o2demand > 0._r8) then                                                                                                                                                                                                                             
            if (o2demand > mach_eps) then

the run continues.

I'm not sure if that's good solution or if all compilers support epsilonI(). Or maybe we already have shared variable representing fortrans intrinsic epsilon()? ... Or... we don't expect o2demand to have such a value which may point to some other issue...

peterdschwartz · 2024-05-29T18:33:15Z

@ndkeen Thanks for further looking into this. There have been a few cases of floating-point underflow/overflow happening like this, and it's good to get rid of them as unless the machine is dividing a number by an exact multiple, the result in undetermined.

I'd have to look further into how epsilon works (there's also a tiny function). It should find the smallest representable number for your machine based on some criterion (so would be architecture dependent / maybe compiler dependent).
We do use the huge to initialize some variables, but for computations, I wonder if it's better to explicitly use a smallparameter value as certain subroutines might be more or less sensitive to the value of that parameter in producing non-bfb results.

ndkeen · 2024-06-10T16:51:37Z

I suppose we aren't hitting div-by-zero, but a different FP issue: overflow. It seems safe to use epsilon() here, though it raises question of how many places will we need to do something like this. Let me know if I should make PR -- I would prefer to let decision be the developers as they may know more about reasonable values for these quantities (ie, if some value should really never be below x, could implement that instead).

peterdschwartz · 2024-06-10T19:24:32Z

@ndkeen Ok, I solved similar issue in a separate module PR 5828 and I will go through a similar process to how dependent the code is on the value of the parameter.

peterdschwartz · 2024-06-21T18:48:40Z

Made a PR that addresses this Issue. Point of note, the function we want is tiny instead of epsilon . epsilon provides the round-off error for a data type, so for doubles it's on the order of 1E-16. This is much too large for most calculations as the variables may be on the order of 1E-24 and causes DIFFs.

On the other hand tiny provides the smallest representable number for a given data type on the compiler+machine and is on the order of 1.E-308

…flow' into next (PR #6483) Replace >0._r8 check in CH4Mod with a small parameter set by tiny intrinsic function. The exact value of tiny(1._r8) may depend on the compiler and machine but is on the order of 1.E-308 Tested on pm-cpu_intel and pm-cpu_gnu Fixes #6428 [BFB]

ndkeen added GNU GNU compiler related issues Atmospheric chemistry Land BGC and removed Atmospheric chemistry labels May 17, 2024

rljacob changed the title ~~SIGFPE: Floating-point exception with SMS_D_Ld1.ne30pg2_r05_IcoswISC30E3r5.WCYCLSSP370.pm-cpu_gnu.allactive-wcprodssp~~ SIGFPE: Floating-point exception in CH4Mod.F90 May 29, 2024

rljacob assigned peterdschwartz May 29, 2024

ndkeen mentioned this issue May 29, 2024

Hang in dp_coupling::d_p_coupling with newer module versions and compilers (GNU version 12.3) #6451

Open

peterdschwartz mentioned this issue Jun 21, 2024

fix underflow in methane calculation #6483

Merged

peterdschwartz closed this as completed in fff7243 Jul 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SIGFPE: Floating-point exception in CH4Mod.F90 #6428

SIGFPE: Floating-point exception in CH4Mod.F90 #6428

ndkeen commented May 17, 2024 •

edited by rljacob

Loading

rljacob commented May 29, 2024 •

edited

Loading

ndkeen commented May 29, 2024 •

edited

Loading

peterdschwartz commented May 29, 2024

ndkeen commented Jun 10, 2024 •

edited

Loading

peterdschwartz commented Jun 10, 2024

peterdschwartz commented Jun 21, 2024

SIGFPE: Floating-point exception in CH4Mod.F90 #6428

SIGFPE: Floating-point exception in CH4Mod.F90 #6428

Comments

ndkeen commented May 17, 2024 • edited by rljacob Loading

rljacob commented May 29, 2024 • edited Loading

ndkeen commented May 29, 2024 • edited Loading

peterdschwartz commented May 29, 2024

ndkeen commented Jun 10, 2024 • edited Loading

peterdschwartz commented Jun 10, 2024

peterdschwartz commented Jun 21, 2024

ndkeen commented May 17, 2024 •

edited by rljacob

Loading

rljacob commented May 29, 2024 •

edited

Loading

ndkeen commented May 29, 2024 •

edited

Loading

ndkeen commented Jun 10, 2024 •

edited

Loading