Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EAM physgrid: high-order, property preserving remap #3141

Merged
merged 16 commits into from
Sep 10, 2019

Conversation

ambrad
Copy link
Member

@ambrad ambrad commented Aug 19, 2019

Adds high-order, property-preserving remap between physics and GLL grids.

Provide a module, gllfvremap_mod, whose use in dynamics/se is optionally enabled
with atmosphere namelist option se_fv_phys_remap_alg=1 (default 0 uses the
original remap algorithm), to remap between physics and GLL grids. The remap

-- is mass conserving;
-- produces tracers that do not violate extrema established before the remap;
-- remaps tendencies only, so that the background state is not affected by remap;
-- has order of accuracy N, where N is the parameter in the physgrid specification pgN.

[BFB]

Provide a module, gllfvremap_mod, whose use in dynamics/se is optionally enabled
with atmosphere namelist option se_fv_phys_remap_alg=1 (default 0 uses the
original remap algorithm), to remap between physics and GLL grids. The remap
* is mass conserving;
* produces tracers that do not violate extrema established before the remap;
* remaps tendencies only, so that the background state is not affected by remap;
* has order of accuracy N, where N is the parameter in the physgrid
  specification pgN.

[BFB]
…id-ho-pp-remap

Conflicts:
	components/cam/src/dynamics/se/dp_coupling.F90
@ambrad
Copy link
Member Author

ambrad commented Aug 19, 2019

@ambrad
Copy link
Member Author

ambrad commented Aug 19, 2019

This branch will not run with the updated, better domain files until PR #3113 is merged.

@rljacob
Copy link
Member

rljacob commented Aug 20, 2019

PR #3113 has been merged.

ambrad and others added 2 commits August 19, 2019 18:39
…id-ho-pp-remap

Merge master to bring in PR 3113, which has better domain files for ne30pg2.
@@ -209,6 +211,7 @@ subroutine dyn_init1(fh, NLFileName, dyn_in, dyn_out)
! Initialize FV physics grid variables
if (fv_nphys > 0) then
call fv_physgrid_init()
if (se_fv_phys_remap_alg == 1) call gfr_init(par, elem, fv_nphys)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make more sense to have this inside fv_physgrid_init()?

@@ -598,6 +600,7 @@ subroutine stepon_final(dyn_in, dyn_out)
! Deallocate variables needed for the FV physics grid
if (fv_nphys > 0) then
call fv_physgrid_final()
if (se_fv_phys_remap_alg == 1) call gfr_finish()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the comment above, would it make sense to move this into fv_physgrid_final()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks; I'll do both.

@ambrad
Copy link
Member Author

ambrad commented Aug 20, 2019

I'm running this branch on cori and seeing substantial climo differences relative to before I merged master; I'm looking into the cause.

Edit: I strongly suspect the differences are due to the recent update of CLUBB. Here is master now vs master ~1.5 months ago, PRECT ANN, although so far averaged just over year 2. I'll run this out through 6 years, then provide updated data for pg2 vs np4 at this PR's SHA1.

Later: Yes, the differences are due to the CLUBB update. I've updated the physgrid results page with consistent model-vs-model results.

T_tmp(1:nphys_sq,:,:),uv_tmp(1:nphys_sq,:,:,:),&
om_tmp(1:nphys_sq,:,:),q_tmp(1:nphys_sq,:,:,:))
else
call gfr_dyn_to_fv_phys(par, dom_mt, tl_f, hvcoord, elem, ps_tmp, zs_tmp, &
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we try to think of a naming scheme that makes it clearer what the two options are? Maybe something like dyn_to_fv_phys_low-order() and dyn_to_fv_phys_high-order()?
Maybe it's moot since there might not be a good reason to keep both in the long run.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 5 public API functions that dynamics/se uses. Please feel free to suggest names for each:

               gfr_init, &
	       gfr_finish, &
	       gfr_dyn_to_fv_phys, &
	       gfr_fv_phys_to_dyn, &
	       gfr_fv_phys_to_dyn_topo,

If you would like me to rename the original functions, too, let me know what names you would like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, I thought it makes sense to use some indication of "high order" in the names, whether it's "high_order" or "HO", but that also seems a bit too general if a higher order scheme gets implemented later.
Would it be accurate to just call this "2nd order" and use "O2" in the front (or back) of the names instead of "gfr"?

Copy link
Member Author

@ambrad ambrad Aug 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be sure to mention in the added documentation (but see also the latex doc) that the OOA is N for pgN for u,v,T and max(2,N) for Q because of the strict limiter. So this current method has arbitrarily high order in N. But it's true: other methods might come in later. For example, I'm working on OOA > 1 for pg1 right now, and I might propose a boosted OOA for N < 3 later that would require a new halo exchange pattern to be implemented. On the other hand, I think we can expect any new method to be called through this interface, so I think it's safe to use the names you'd like without concern another interface will be introduced.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh ok, I misunderstood the OOA then. I guess the current name convention is fine then. Maybe it would be better to rename my scheme with something indicating "low order". In any case, let's just keep the current name convention.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is gfr_ for here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(G)LL (F)V (r)emap. I tend to give public functions in a module a common prefix like one does in C; see, e.g., scalable space-filling curve, sfcmap and scalable grid initialization, sgi

fld(i,k,2)=D(2,1)*v1 + D(2,2)*v2
end do
end do

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going on here? Does this affect anything outside the physgrid mapping routines?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good eye. This is inert code (i.e., it runs, but it has no effect) that I noticed while working on physgrid. I proposed to Mark and Oksana that I should remove this code chunk in this PR.

ambrad and others added 4 commits August 23, 2019 20:43
DCMIP1 experiments show that pg1 is too coarse, but perhaps this code will be of
use at some point. In the meantime, it's unit tested.
* Move gfr_init/finish to fv_physgrid_init/final.

* Reorganize file to group routines more logically; add delimiter between
  groups.

* Add more comments.

Also remove some 'optional' annotations that are no longer relevant.
end do

call dcmip2016_append_measurements(max_w,max_precl,min_ps,tl,hybrid)
end subroutine dcmip2016_test1_pg_forcing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests don't seem important for the main changes of the PR, but I noticed the DSS of precipitation and I'm curious why you would do that? I realize these test are for stand-alone HOMME, so I'm guessing this has something to do with the output being on the GLL grid?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Homme produces GLL-based output for these DCMIP tests, and precl is one output. So I need to remap it to the GLL grid (which includes DSSing it) for output.

do i = 1,np
p = change_coordinates(elem(ie)%spherep(i,j))
elem(ie)%state%ps_v(i,j,timeidx) = &
1.0d3*(1 + 0.05*sin(2*p%x+0.5)*sin(p%y+1.5)*sin(3*p%z+2.5))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to put a comment about where these magic numbers come from. I assume they are from the DCMIP protocol?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will. They are not. It's just a bunch of C^inf functions with which to do convergence testing.

do j = 1,np
do i = 1,np
p = change_coordinates(elem(ie)%spherep(i,j))
tend(i,j,:) = 0.25_real_kind*sin(3.2*p%x)*sin(4.2*p%y)*sin(2.3*p%z)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as comment above about noting where these parameter values come from.

!! Test 4.
! Expensive test to determine mass conservation quality over many
! iterations.
#if 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you going to leave this in? If so it might be good to change this to a more descriptive CPP variable? Or does this used in other parts of HOMME?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll just take it out.

! Top-level data type and functions for high-order, shape-preserving FV <->
! GLL remap.
type, private :: GllFvRemap_t
integer :: nphys, npi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to explicitly state what "npi" is for, since it's not obvious.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Copy link
Contributor

@whannah1 whannah1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments are still sparse and overly terse for my taste, but otherwise the code looks good to me. It's nice to see such thorough testing.

! Interfaces to support calling inside or outside a horizontally
! threaded region.
interface gfr_dyn_to_fv_phys
module procedure gfr_dyn_to_fv_phys_hybrid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is _hybrid for threaded? what is _dom_mt for? _mt probably stands multithreaded?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dom_mt is the name of the structure that contains threading information. _hybrid is the impl that is called when in a threaded region. _dom_mt is the impl that is called when not, and it then sets up a threaded region (using dom_mt) and calls _hybrid.

call abortmp('gllfvremap_mod: nphys must be <= np')
end if
if (qsize == 0) then
call abortmp('gllfvremap_mod: qsize must be >= 1')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is qsize>0 needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I suppose it isn't. But a use case for physgrid without any tracers seems contrived. Since I don't have a test for qsize=0, I prefer to abort on this b/c I think it's more useful to clearly say that an anomalous situation has arisen.

call get_temperature(elem(ie), wr2, hvcoord, nt)
call calc_p(hvcoord, elem(ie)%state%ps_v(:,:,nt), p)
call gfr_g2f_scalar(ie, elem(ie)%metdet, p, p_fv)
wr2 = wr2*(p/p0)**kappa
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is here Exner pressure? Is this code only for theta-l?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it works for all dycores. The idea is to remap theta so that remap is adiabatic. theta_v would probably be even better, but I don't want to bring in qv.

if (nf > 1 .or. .not. gfr%boost_pg1) then
do ie = nets,nete
if (gfr%check) wr(:,:,1) = elem(ie)%state%phis
call limiter_clip_and_sum(np, elem(ie)%spheremp, gfr%qmin(1,1,ie), &
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you limit topo?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is up for discussion, along with everything else we're thinking about re: topo. But for now I feel it's safest not to introduce new topo extrema. So the topo remap problem is: remap such that the average height is conserved and so that no new extrema are created.

end do
end subroutine run

subroutine gfr_check_api(hybrid, nets, nete, hvcoord, elem)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this routine do?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It runs a sequence of tests that are permitted to use gllfvremap_mod
only through its public API, i.e., the same routines called in the
full atmosphere. This is in contrast to the unit tests in
gllfvremap_mod.F90, which access internal functions. The API tests are
in three parts, as follows (copy-pasted from comments):

    !! Test 1.                                                                                                                                                                                                                                                   
    ! Test physgrid API.                                                                                                                                                                                                                                         
    !   Test that if tendency is 0, then the original field is                                                                                                                                                                                                   
    ! recovered with error eps ~ machine precision.                                                                                                                                                                                                              
    !   OOA for pgN should min(N, 2). The OOA is limited to 2 because                                                                                                                                                                                            
    ! of the way the tendencies are created. Instead of setting the FV                                                                                                                                                                                           
    ! value to the average over the subcell, the sampled value at the                                                                                                                                                                                            
    ! cell midpoint (ref midpoint mapped to sphere) is used. This                                                                                                                                                                                                
    ! creates a 2nd-order error. Tests 2 and 3 test whether the remaps                                                                                                                                                                                           
    ! achieve design OOA, in particular OOA 3 for pg3 u, v, T, phis                                                                                                                                                                                              
    ! fields.                                                                                                                                                                                                                                                    

    !! Test 2.                                                                                                                                                                                                                                                   
    ! Test topo routines. For pgN, phis should have OOA min(N,2)                                                                                                                                                                                                 
    ! because the field is limited.                                                                                                                                                                                                                              

    !! Test 3.                                                                                                                                                                                                                                                   
    ! Test FV fields that were not covered in the previous tests. This                                                                                                                                                                                           
    ! is done by copying them to look like tendencies.                                                                                                                                                                                                           
    !   For pg4, u,v,T should have l2 errors that are eps because pg4                                                                                                                                                                                            
    ! reconstructions the fields exactly, even with DSS.                                                                                                                                                                                                         
    !   For pg4, q should have l2 errors that converge at OOA >= 2. 2                                                                                                                                                                                            
    ! is the formal OOA b/c of the limiter. The limiter acts on the                                                                                                                                                                                              
    ! tendency, which in this case is exactly what is being examined.                                                                                                                                                                                            
    !   For pgN, N=1,2,3, u,v,T should have OOA N.                                                                                                                                                                                                               
    !   For pgN, q should have OOA min(N,2).                                                                                                                                                                                                                     

@oksanaguba
Copy link
Contributor

oksanaguba commented Sep 3, 2019

  1. I assume this is bfb cause it does not touch original Walter's test in nightlies.

  2. It is a suggestion to add description to each of routines in components/homme/src/share/gllfvremap_mod.F90 .

  3. As a follow up, would be good to have a PR with at least 1 ne4 test with pg2 (or 3?).

  4. Minor question: if Qdp is remapped instead of Q, why u, v, T, etc. are not remapped as u*dp, ... values? Similarly, why remap omega_p instead of omega? Also, I did not really follow up what's going on with Exner pressure in remap code.

@oksanaguba oksanaguba added the BFB PR leaves answers BFB label Sep 3, 2019
@AaronDonahue
Copy link
Contributor

AaronDonahue commented Sep 3, 2019

I've been following the conversation a bit and in response to Okasana's (4) above, I'm not sure if this would be a good time to mention issue #2680 about how Exner pressure is calculated in dp_coupling and how it is used elsewhere.

@ambrad
Copy link
Member Author

ambrad commented Sep 3, 2019

  1. Yes.
  2. OK, I'll add more.
  3. Or I could do it in this PR with ne4pg2. Can I meet with you to learn how to add a newcase test?
  4. u,v,omega(_p) is a topic of discussion. One argument is: remap to conserve average velocity; another is, remap to conserve momentum. Re: T, theta*dp is in fact remapped as part of making it approximately (up to moisture) adiabatic. Re: omega_p, I was again thinking of conserving its average value, but maybe omega is better; not sure. @mt5555, any thoughts on converting omega_p to omega before remapping?

@oksanaguba
Copy link
Contributor

oksanaguba commented Sep 3, 2019

For a test, we can meet but it is rather simple:

See this file cime/config/e3sm/tests.py for a list of tests .
See this folder components/cam/cime_config/testdefs/testmods_dirs/cam/thetahy_ftype0 for shell commands and namelist . Suffix thetahy_ftype0 will be picked up from running a test SMS.ne4_ne4.FC5AV1C-L.cam-thetahy_ftype0 and so, shell commands and nl for this test will be used. From what I see, Walter's test does not have a separate folder in testmods_dirs , but it is needed in case you want to run theta-l instead of preqx .

_Ln5 in tests means running 5 time steps, to make it shorter.

@ambrad
Copy link
Member Author

ambrad commented Sep 3, 2019

@AaronDonahue, I use p0 as defined in CAM (or in standalone Homme) in the calculation of Exner pressure. Does this answer the question?

check_p reports
    atm.log: gfr> ERROR dp p  0.000000000000000E+000  2.651938505852153E-011
in an ne30pg2 FC5AV1C-L run, meaning dp agrees exactly with dp3d but
these two defs of p agree only to ~10 digits:
    do k = 1,nlev
       p(:,:,k) = hvcoord%hyam(k)*hvcoord%ps0 + hvcoord%hybm(k)*ps
    end do
and
    dp_accum(:,:,1) = hvcoord%hyai(1)*hvcoord%ps0
    do k = 1,nlev
       dp_accum(:,:,2) = dp_accum(:,:,1) + dp3d(:,:,k)
       p(:,:,k) = half*(dp_accum(:,:,1) + dp_accum(:,:,2))
       dp_accum(:,:,1) = dp_accum(:,:,2)
    end do
This is because hybm or hyam do not give exactly the middle of the
element. The first def is consistent with what the EAM driver
currently does, so I'm switching back to it.
@AaronDonahue
Copy link
Contributor

@ambrad, I'm guessing that there is an internal "Exner" calculation used in Homme and that as you stated it is defined correctly based on p0. The issue I referred to is related to how "Exner" is calculated for use by the physics, which is (we believe) incorrect and different. The issue points out that this can lead to confusion when someone attempts to use state%exner from physics, when in fact it isn't the classic Exner formula.
In short, based on your response the issue is probably unrelated, but I didn't want to miss a chance to bring up what could be an inconsistency in the model and a potentially difficult to find bug.

But retain omega_p itself, with instructions in comments for what to
do when derived%omega_p is removed in favor of omega.
@ambrad
Copy link
Member Author

ambrad commented Sep 4, 2019

@oksanaguba, I believe I've addressed all your comments:

  1. omega is now remapped, with omega_p computations still remaining until an orthogonal PR that cleans up omega_p in general is made.
  2. We discussed u,v vs dp*u,dp*v in person.
  3. I added more comments. I think every routine now has a header doc.
  4. I discussed and worked out approved p and dp calculations with the group.
  5. With your help, I added test SMS_Ln5.ne4pg2_ne4pg2.FC5AV1C-L.cam-thetahy_pg2.

Thanks for the initial review round.

@whannah1
Copy link
Contributor

whannah1 commented Sep 4, 2019

@ambrad, can you summarize what you guys discussed about mass weighting the momentum fields? I'm curious about what your conclusions were.

@ambrad
Copy link
Member Author

ambrad commented Sep 4, 2019

@whannah1

The choices are to remap dp*(u,v,omega) or (u,v,omega). An additional
constraint is that we want and have implemented remap of (u,v) as follows:

  GLL (u,v) -> GLL contravariant velocity -> FV contravariant velocity -> FV (u,v).

A nonmandatory constraint is that I think we should do the same thing for each
of (u,v) and omega. At this point, it's a question of conserving linear
momentum in the reference element or average linear velocity in the reference
element. Or perhaps one could introduce a nonlinear remap operator that
conserves energy in the element. There's no sense of a global linear momentum to
conserve. The DSS will break any of these. In any case, one is not obviously
better than the others. I decided it's better not to give the impression of
conserving some well-defined momentum, so I went with remapping velocity
alone. This choice also nicely coincides with how we remap phis (no dp): we
conserve average height in the remap, not some sort of mass-weighted height.

@ambrad
Copy link
Member Author

ambrad commented Sep 6, 2019

I ran e3sm_developer on sandiatoss3.

@oksanaguba
Copy link
Contributor

running merged next with developer suite on skybridge now.

@oksanaguba
Copy link
Contributor

oksanaguba commented Sep 10, 2019

Tests are ok ./cs.status.20190909_171401_5e8tyq , tested against

commit 45b42717ce0f027c439baed371190ced3508979a
Merge: a8bf285 50e7a78
Author: Oksana Guba <oksana.guba@gmail.com>
Date:   Fri Sep 6 15:30:57 2019 -0500

    Merge branch 'origin/whannah/atm/fix-DYN_OMEGA-output' into next (PR #3165)

oksanaguba added a commit that referenced this pull request Sep 10, 2019
…3141)

Adds high-order, property-preserving remap between physics and GLL grids.

Provide a module, gllfvremap_mod, whose use in dynamics/se is optionally enabled
with atmosphere namelist option se_fv_phys_remap_alg=1 (default 0 uses the
original remap algorithm), to remap between physics and GLL grids. The remap

-- is mass conserving;
-- produces tracers that do not violate extrema established before the remap;
-- remaps tendencies only, so that the background state is not affected by remap;
-- has order of accuracy N, where N is the parameter in the physgrid specification pgN.

[BFB]
@oksanaguba
Copy link
Contributor

merged to next, will need bless for a new test tomorrow.

oksanaguba added a commit that referenced this pull request Sep 10, 2019
…3141)

Adds high-order, property-preserving remap between physics and GLL grids.

Provide a module, gllfvremap_mod, whose use in dynamics/se is optionally enabled
with atmosphere namelist option se_fv_phys_remap_alg=1 (default 0 uses the
original remap algorithm), to remap between physics and GLL grids. The remap

-- is mass conserving;
-- produces tracers that do not violate extrema established before the remap;
-- remaps tendencies only, so that the background state is not affected by remap;
-- has order of accuracy N, where N is the parameter in the physgrid specification pgN.

[BFB]
@oksanaguba oksanaguba merged commit 90f5e77 into E3SM-Project:master Sep 10, 2019
Comment on lines +1766 to +1771
integer :: ithr

ithr = omp_get_thread_num()
nets = dom_mt(ithr+1)%start
nete = dom_mt(ithr+1)%end
hybrid = hybrid_create(par, ithr, hthreads)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the access to dom_mt be with ithr and not ithr+1 for zero-length arrays? A seg-fault shows up with SMS_Ln5.ne4pg2_ne4pg2.FC5AV1C-L.anvil_intel.cam-thetahy_pg2 (MPI-only run with 108 tasks for ATM) for tasks 96 and up: e.g.

106: forrtl: severe (174): SIGSEGV, segmentation fault occurred
106: Image              PC                Routine            Line        Source
106: e3sm.exe           0000000002981B51  Unknown               Unknown  Unknown
106: e3sm.exe           000000000297FC8B  Unknown               Unknown  Unknown
106: e3sm.exe           000000000290F104  Unknown               Unknown  Unknown
106: e3sm.exe           000000000290EF16  Unknown               Unknown  Unknown
106: e3sm.exe           000000000288B147  Unknown               Unknown  Unknown
106: e3sm.exe           0000000002896D40  Unknown               Unknown  Unknown
106: libpthread-2.12.s  00002B87996807E0  Unknown               Unknown  Unknown
106: e3sm.exe           000000000127A829  gllfvremap_mod_mp        1769  gllfvremap_mod.F90
106: e3sm.exe           00000000016897FA  inidat_mp_read_in         466  inidat.F90
106: e3sm.exe           00000000013CF77D  startup_initialco          54  startup_initialconds.F90
106: e3sm.exe           00000000011EB75F  inital_mp_cam_ini          42  inital.F90
106: e3sm.exe           0000000000500F40  cam_comp_mp_cam_i         159  cam_comp.F90
106: e3sm.exe           00000000004F7E39  atm_comp_mct_mp_a         312  atm_comp_mct.F90
106: e3sm.exe           000000000043B808  component_mod_mp_         257  component_mod.F90
106: e3sm.exe           000000000042A861  cime_comp_mod_mp_        1337  cime_comp_mod.F90
106: e3sm.exe           00000000004388B9  MAIN__                    122  cime_driver.F90

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should definitely be (ithr+1). I actually ran into a very similar issue, but I'm having trouble remembering what it was. I'm gonna look back through some slack conversations with Andrew to try and recall the details.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. There are two separate issues here. I will need to fix the second.

The first is whether ithr or ithr+1 should be used for indexing in general, given that ithr starts at 0. This depends on how dom_mt is declared. If it's declared as a pointer, then it inherits the index space from the allocation of the pointer. If it's declared as an assumed-shape array, then it does not inherit the index space. This issue comes up enough that I've created a single-file reproducer of the various cases:

module ptr
contains
  subroutine ptr1(a)
    real, pointer, intent(in) :: a(:) ! pointer attribute means a has index space of allocation
    print *,'ptr1',a(0)
  end subroutine ptr1

  subroutine ptr2(a)
    real, intent(in) :: a(:) ! non-pointer attribute means a has default index space
    print *,'ptr2',a(1)
  end subroutine ptr2

  subroutine ptr3(a)
    real, intent(in) :: a(4:6) ! non-pointer attribute also allows reindexing
    print *,'ptr3',a(4)
  end subroutine ptr3

#if 0
  subroutine ptr4(a)
    real, pointer, intent(in) :: a(4:6) ! not allowed
    print *,'ptr4',a(4)
  end subroutine ptr4
#endif
end module ptr

program main
  use ptr

  real, pointer :: a(:)

  allocate(a(0:3))
  a(0) = 42
  a(1) = 2
  call ptr1(a)
  call ptr2(a)
  call ptr3(a)
  deallocate(a)
end program main

Thus, ithr+1 is correct in this case.

However, the second issue is a genuine bug. Here, for ne=4, there are 96 elements. Thus, if more than 96 procs are used, then this code is indexing a dom_mt that has 0 length. No index value is correct in this case. I need to fix this code so if an MPI rank has 0 elements, it doesn't do anything.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened #3241 and will create a PR to fix this in the next few days.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, wouldn't this declaration work with dom_mt(ithr) access?

type (domain1d_t), pointer, intent(in) :: dom_mt(:)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, the issue you found relates to having more processes than elements. Declaration of dom_mt is orthogonal to that problem.

Re: declaring dom_mt: As a matter of personal taste, I don't like to declare something as a pointer unless it needs to be, as doing so narrows the scope of application of the subroutine. In this case, taking an assumed-shape array as input provides the greatest generality.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that works, but another seg-fault occurs at hybrid_create: need the par%dynproc check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, exactly. Unless I'm mistaken, PR #3242 takes care of exactly the issue of including a dynproc check at all necessary spots. As I report in that PR, the 108-proc run for the 96-element case now completes. Thanks again for reporting the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants