Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ctrl/autodiff cleanup #471

Merged
merged 23 commits into from
Jun 16, 2021
Merged

ctrl/autodiff cleanup #471

merged 23 commits into from
Jun 16, 2021

Conversation

mjlosch
Copy link
Member

@mjlosch mjlosch commented May 6, 2021

What changes does this PR introduce?

This is part of a cleanup of pkg/ctrl and autodiff.

What is the current behaviour?

Many variables are declared in header files, but are really local variables, see part of #440, in particular:

  • clean up ctrl.h, e.g. define local arrays locally instead of in ctrl.h
  • reduce number of "unused variable" warnings by moving local declarations out of header files (e.g. AUTODIFF.h)

What is the new behaviour

local variables are removed from pkg/ctrl/ctrl.h and pkg/autodiff/AUTODIFF.h and declared locally. Number of "unused" warnings reduced.

Does this PR introduce a breaking change?

No, unless any of you customized files in your local code directories rely on these variables. In that case, you'll have to declare the variables in question locally.

Other information:

Draft PR in order to discuss further steps, if any.

Suggested addition to tag-index

o pkg/ctrl/autodiff

  • cleanup: move declarations of local variables in header files ctrl.h and AUTODIFF.h to the local variables section in the respective routines.
  • remove unused variables where this is straightforward
  • remove a few i/jmin/max variables
  • disentangle ALLOW_AUTODIFF/ALLOW_AUTODIFF_TAMC/ALLOW_OPENAD a bit more.

- part 1: AUTODIFF.h, ctrl.h
- instead declare them locally, to remove many "unused" warnings for
  tmpfld2/3d, ilev_?, max_lev?, fname_$varname
- also remove some locally declared unused variable
@mjlosch mjlosch added the adjoint Affects the adjoint model; label triggers full OpenAD test label May 6, 2021
@mjlosch mjlosch requested a review from jm-c May 6, 2021 20:09
@mjlosch
Copy link
Member Author

mjlosch commented May 6, 2021

@jm-c
I created a Draft-PR, so that we can discuss the changes here before I actually implement them.

The next step is to move act?, max?, and the keys from tamc.h, but I am not sure if that's the best idea, because there are many versions of tamc.h around (not only in the verification experiments), so that this step will break any existing setup (although the fix would be simple: just remove the variables for your local tamc.h. What do you think?

@mjlosch
Copy link
Member Author

mjlosch commented May 6, 2021

I guess I will need help with debugging the OpenAD tests. I don't even think I can compile the forward code without OpenAD installed and set up. How annoying.

@jm-c
Copy link
Member

jm-c commented May 7, 2021

@mjlosch I can take a look at OpenAD setup.
Regarding tamc.h, we have postponed this cleaning for a long time (> 20.yrs); we know it's useful to be able to check for un-used variables (it does help to find some issues) but, practically, cannot be used when local variables are declared in header file that are shared by so many routines.
we could add a bad CPP options (in CPP_OPTIONS.h ?) to enable to remove the correct variable declaration but it's a bit complicated for just few declarations to remove in tamc.h so not sure it will really help.

@mjlosch
Copy link
Member Author

mjlosch commented May 7, 2021

I don't like the CPP option idea too much. If you think it's OK to modify tamc.h this substantially, I'll go ahead an give it a try (preferably only in one commit). We can always revert if it does not look good. Should I do this before you look at OpenAD or should I wait?

@jm-c
Copy link
Member

jm-c commented May 10, 2021

I took a look at global_ocean.90x40x15 OpenAD error. I tried gdb but did not help to see where the error comes from inside S/R OpenAD_external_forcing_surf (file: external_forcing_surf_cb2m_oad.f). Then I diff it with version from unmodified OpenAD built but did not help (given that many variables are numbered and these numbers change). Note that if I were to compare external_forcing_surf.f I would not see any difference.
I can see few things to try:

  1. go back and test one change at a time (e.g., just tmpfld2d & tmpfld3d);
  2. put print statement in external_forcing_surf_cb2m_oad.f (but it's going to be painful);
  3. try to use a better debugger than the one I tried (gdb) on engaging.

@mjlosch
Copy link
Member Author

mjlosch commented May 11, 2021

I am not sure, how I can contribute to debugging the OpenAD problem (where can I find instructions to download and setup OpenAD?).

  • There are 5 failing checks (global_ocean.cs90x40x15, OpenAD, tutorial_global_oce_biogeo, tutorial_global_oce_optim, tutorial_tracer_adjsens) and 3 passing checks (halfpipe_streamice, hs94.1x64x5, isomip). What do they have in common?
  • as far as I can see, the only changes that have the potential to affect OpenAD experiments are the fields tmpfld2d, tmpfld3d, and potentially the many deprecated filenames (e.g. character*( 80) fname_theta(3)), but these are only used in ctrl_pack/unpack. Everything else is just taf-keys and integers related to the taf-checkpointing that is not used with OpenAD.
  • the change in ctrl_map_ini_genarr.F: CHARACTER*(MAX_LEN_FNAM) fnamebase (from 80 to MAX_LEN_FNAME) is a bit suspicious. That was necessary, only to suppress a warning (CHARACTER expression will be truncated in assignment (80/512) ..., but I don't think that this should make any difference.

@mjlosch
Copy link
Member Author

mjlosch commented May 13, 2021

@jahn found a fix so that it works with:

before PR 471 we had in ctrl.h (and obviously this works):

cph      common /controlvars_r/
cph     &                        tmpfld2d
cph     &                      , tmpfld3d
     _RL tmpfld2d(1-olx:snx+olx,1-oly:sny+oly,nsx,nsy)
     _RL tmpfld3d(1-olx:snx+olx,1-oly:sny+oly,nr,nsx,nsy)

Now when you put back the common block:

     common /controlvars_r/
    &                        tmpfld2d
    &                      , tmpfld3d
     _RL tmpfld2d(1-olx:snx+olx,1-oly:sny+oly,nsx,nsy)
     _RL tmpfld3d(1-olx:snx+olx,1-oly:sny+oly,nr,nsx,nsy)

and this works, too.

Leaving OpenAD aside, tmpfld2/3d clearly do not need to be in a common block, because they are always used as local variable arrays. The purpose of this PR #471 is to clean that up, declare the variables locally and by this means get rid of the unused warning messages.

If we now instead put tmpld2/3d back into a common block again (provided this works with OpenAD), this would get rid of the unused warnings, but would have these fields defined globally. Does it matter at all?

@jm-c @heimbach please comment. In particular, Patrick, can you remember, why you commented out the common blocks? in 57b21c8 (6 years ago, ...)

@jm-c
Copy link
Member

jm-c commented May 13, 2021

@mjlosch changing these 2 local arrays to global one (in common block) is not really the way we want to go.
So why not adding them back (as local array) in header file ctrl.h just for OpenAD (#ifdef ALLOW_OPENAD) ? I don't think it's worth spending too much time right now trying to cleanup OpenAD stuff.

@heimbach
Copy link
Collaborator

heimbach commented May 13, 2021 via email

@mjlosch
Copy link
Member Author

mjlosch commented May 13, 2021

@heimbach apparently having the common block now seems to work, but we don't really want to have them, do we?

@jm-c are you suggesting that I put back into "ctrl.h":

#ifdef ALLOW_OPENAD
     _RL tmpfld2d(1-olx:snx+olx,1-oly:sny+oly,    nsx,nsy)
     _RL tmpfld3d(1-olx:snx+olx,1-oly:sny+oly,nr,nsx,nsy)
#endif

and then add a #ifndef ALLOW_OPENAD around all local declarations of tmpfld2d?
That's actually not so many files as far as I can see (6 plus the ones in the local code directories.)

@jm-c
Copy link
Member

jm-c commented May 13, 2021

@mjlosch Yes, this is what I had in mind. I don't think it's a bad compromise given where we are now with OpenAD.

use CPP-flag ALLOW_OPENAD to switch declarations of tmpfld2/3d in and
out.
@mjlosch
Copy link
Member Author

mjlosch commented May 13, 2021

That worked really well ... hmph.

@jahn
Copy link
Member

jahn commented May 13, 2021

FWIW, the tmpfld*d related errors with OpenAD also go away if one initializes the local fields to zero in each subroutine.

@mjlosch
Copy link
Member Author

mjlosch commented May 14, 2021

I like that solution much better than what I have done now. @jahn it would be great if you could make this change, i.e. remove the declaration again from ctrl.h and all of the #ifndef ALLOW_OPENAD that I put in with this d37aded .
f8b1189 contains two fixes in the_main_loop.F that I would like to keep (and the obvious fix in ctrl.h that would become obsolete with your fix).
Alternatively, I can do this again "im Blindflug" ...

@jahn
Copy link
Member

jahn commented May 14, 2021

Shall I do this between "#ifdef ALLOW_OPENAD"? @jm-c?

@mjlosch
Copy link
Member Author

mjlosch commented May 14, 2021

If you asked me, then it's always a good idea to initialise fields before they are used. I didn't think of that because I just moved the declarations around. I don't think that we need #ifdef ALLOW_OPENAD.

@jahn
Copy link
Member

jahn commented May 14, 2021

There is one occurrence of tmpfld2d/3d in pkg/admtlm/admtlm_bypassad.F where the fields are clearly used uninitialized. The first use is this:

tmpfld2d(i,j,bi,bj) = tmpfld2d(i,j,bi,bj) + adetan(i,j,bi,bj)

Can someone explain to me why this is?

@mjlosch
Copy link
Member Author

mjlosch commented May 14, 2021

To me this looks like very old and unmaintained code. Do we have an example experiment for this?
It looks like tmpfld2/3d should be declared and initialised to zero here (because they are never initialised anywhere in the code so far). I don't think they need to be in a common block for this code, because the subroutines adread_active_read_* accumulate the sensitivities in files. I don't understand why the 3d sensitivities are accumulated on tmpfld3d AND in files where they are clearly mixed. This doesn't look right, but then again, I don't know who this package works.

We should probably leave this as it is, so that it is clear that it needs work, if someone tries to use the code?

@mjlosch
Copy link
Member Author

mjlosch commented May 15, 2021

@jahn @jm-c @heimbach
I had another quick look at this. I tried to compile this package (just the forward part) with global_with_exf (some random choice with exf, ECCO_CTRL_DEPRECATED defined) on the current upstream/master (I just added admtlm to code_ad/packages.conf) and there a numerous errors at compile time for these files:

admtlm_dsvd2model.F
admtlm_map.F
admtlm_metric.F
admtlm_model2dsvd.F
grdchk_main.F

the errors are related to fname_obcs* and some xx_*_file, adobjf_state_fine, g_objf_state_final not being defined. Note that this is not for this branch, but for the upstream/master branch.

Maybe Patrick can confirm, but my impression is that this package is basically an example of how to use the AD-code to compute leading growing modes (perform an SVD to get the larges singular vectors), but that this hasn't been used in a long while, not maintained and that it is not expected to work out of the box.

It wouldn't take too much effort to fix the compile time errors, but it would take (me) quite some time (which I will not invest) to make this work properly. I suggest that we leave this alone for now and, if we decide to bring this code up-to-date, have a separate PR for this.

@jahn
Copy link
Member

jahn commented May 17, 2021

Thanks for looking into this. I'll just put the declaration, so as to not introduce yet another error.

I also left the code_oad files alone. Initialization doesn't seem necessary and I did not want to make these genarr code examples too messy.

@mjlosch
Copy link
Member Author

mjlosch commented May 28, 2021

I think I am done here. There's much more to clean up, but I would like to keep the number of modified files below 200 (-:
As far as I can see there are only a few unused-variable-warnings left, for which it would be very tedious to use an endless if defined THIS || defined THAT || ... list to remove them.

@jm-c
Copy link
Member

jm-c commented Jun 9, 2021

@mjlosch 2 preliminary comments:

  1. There is a new "conflict" that pops up after I merged-in PR KPP: some cleanup and regularisation #346, might be good to fix it.
  2. When I run "testreport -adm", there are many AD experiments that do not compile:
    1D_ocean_ice_column, bottom_ctrl_5x5, global_ocean.cs32x15, global_with_exf, lab_sea, tutorial_dic_adjoffline, tutorial_global_oce_biogeo, tutorial_global_oce_optim.
    Could you check that you get same problem ?

@mjlosch
Copy link
Member Author

mjlosch commented Jun 10, 2021

Fixed the conflict. The compiler errors were due to the previous merge with master fe82ba4, not sure why it didn't show up with on my Apple computer; now fixed with 6ba58b4 (where I totally screwed up the commit message ...)

@jm-c
Copy link
Member

jm-c commented Jun 14, 2021

@mjlosch I started to look over some of the changes here but, since there are many updated files, I am not done yet. There is one thing that might be useful to add to this PR, because it's closely related to the changes already in: Normally, tamc.h should not be included in any set-up that uses OpenAD, and could, in theory, be removed form files "cb2mFiles" (as you did with tamc_keys). Do you think it's worth to try ?

And unrelated: I was going to make a PR on verification_other with modifications needed after this PR is merged, unless you are already ready to do this (I started this but not yet done).

Copy link
Member

@jm-c jm-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good to me. Some changes in kpp_calc.F related to useShelfIce might need some little adjustment but over all looks like every thing is in order.

pkg/kpp/kpp_calc.F Outdated Show resolved Hide resolved
pkg/kpp/kpp_calc.F Outdated Show resolved Hide resolved
@mjlosch
Copy link
Member Author

mjlosch commented Jun 15, 2021

@jm-c
In one place in the_main_loop.F #include tamc.h is not enclosed by #ifdef ALLOW_AUTODIFF_TAMC, but instead it's in a large #ifndef ALLOW_OPENAD block, so I think we are safe.

And unrelated: I was going to make a PR on verification_other with modifications needed after this PR is merged, unless you are already ready to do this (I started this but not yet done).

Please go ahead, I haven't started that yet.

Copy link
Member

@jm-c jm-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jm-c
Copy link
Member

jm-c commented Jun 15, 2021

If this is OK for everyone, could merge this PR soon, like tomorrow.

@jm-c jm-c merged commit 7c50f07 into MITgcm:master Jun 16, 2021
@mjlosch mjlosch deleted the ctrl_cleanup branch June 16, 2021 15:27
jm-c added a commit to owang01/MITgcm that referenced this pull request Jun 18, 2021
- document fixing missing keys decalartion in shelfice_thermodynamics.F
- update description related to PR MITgcm#471 (ctrl_cleanup).
jm-c added a commit that referenced this pull request Jun 18, 2021
* ctrl/autodiff cleanup follow-up

* make tamc.h template consistent with recent changes

- forgot to adjust the tamc.h template (which is never used)
- remove all local keys and variables, add a comment and an include statement

* remove unnecessary "#include PACKAGES_CONFIG.h" from all tamc.h

* A little cleanup

* Document a fix + update previous comments

- document fixing missing keys decalartion in shelfice_thermodynamics.F
- update description related to PR #471 (ctrl_cleanup).

* fake update

Co-authored-by: Martin Losch <martin.losch@awi.de>
Co-authored-by: Jean-Michel Campin <jmc@mit.edu>
@jm-c
Copy link
Member

jm-c commented Jun 22, 2021

Stats: over the full set (18 built) of AD verification experiments, number of compiler warning for unused variables got reduced by more than 27, from > 31.k to 1146.

jm-c added a commit to jm-c/MITgcm that referenced this pull request Jun 23, 2021
Left over from PR MITgcm#471 where local keys have been moved out of tamc.h:
All pieces of code that is tested (in regression tests) got updated.
But some pieces that are not tested might not have been updated (e.g., see PR MITgcm#482):
 bling_bio.F : fixed for the case #define USE_BLING_V1
 bling_light.F : fixed for the case #define PHYTO_SELF_SHADING
 obcs_calc_stevens.F : missing AUTODIFF_OPTIONS.h + missing local keys declarations
 obcs_sponge.F : missing AUTODIFF_OPTIONS.h + missing local keys declarations
@jm-c jm-c mentioned this pull request Jun 23, 2021
jm-c added a commit that referenced this pull request Jun 26, 2021
* add missing AUTODIFF_OPTIONS.h

need to include AUTODIFF_OPTIONS.h to get ALLOW_AUTODIFF_TAMC defined

* Add missing declaration of local keys for TAF store-dir

Left over from PR #471 where local keys have been moved out of tamc.h:
All pieces of code that is tested (in regression tests) got updated.
But some pieces that are not tested might not have been updated (e.g., see PR #482):
 bling_bio.F : fixed for the case #define USE_BLING_V1
 bling_light.F : fixed for the case #define PHYTO_SELF_SHADING
 obcs_calc_stevens.F : missing AUTODIFF_OPTIONS.h + missing local keys declarations
 obcs_sponge.F : missing AUTODIFF_OPTIONS.h + missing local keys declarations

* Remove included "tamc.h" (not needed).

* Remove included "tamc.h" (not needed).

also change 1 ALLOW_AUTODIFF_TAMC into ALLOW_AUTODIFF

* replace ALLOW_AUTODIFF with ALLOW_AUTODIFF_TAMC around TAF-store-dir

* replace byte by kind, store cflZon/Mer and gFacZ/M to avoid warnings

* Remove inluded "tamc.h" if not needed.

also:
- remove included "AUTODIFF_OPTIONS.h" if only there for tamc.h.
- change 1 ALLOW_AUTODIFF_TAMC to ALLOW_AUTODIFF in seaice_calc_strainrates.F
- + minor edits (remove few trailing blank).

* Minor ALLOW_AUTODIFF <-> ALLOW_AUTODIFF_TAMC adjustment

* minor: change some indentation in comments

* Remove TAF store-dirs and included "tamc.h"

Storage dir for OB temp and salt are not necessary since it's done in forward_step.F
(in checkpoint_lev1_directives.h -> obcs_ad_check_lev1_dir.h).
Simplify the code by removing them here (in obcs_songe.f).

* Finish removing TAF store-dirs

Forgot to remove/cleanup the rest of this file in previous commit; done now.

* Switch to #define ALLOW_OBCS_STEVENS

Note: the Stevens OB code is not used in this set-up so that compiling this code is not
necessary (and does increase the number of written tapes) but does not change results.
The ONLY reason it is turned on here is to check that this code compiles in Adjoint set-up.

* document few updates in tamc.h included files.

Co-authored-by: Martin Losch <martin.losch@awi.de>
@jm-c jm-c mentioned this pull request Dec 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adjoint Affects the adjoint model; label triggers full OpenAD test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants