Bugfix: write tracer restart files for all experiments. #124

TomasTorsvik · 2021-10-21T06:43:02Z

Suggestion for bugfix. This prevents call to restart_trcwt if expcnf is not 'cesm'.

It would probably be better to make the tracer restart file generation more general, but this may require some restructuring of the restart writing procedure. At the moment there are dependencies between restart_wt, restart_ocntrcwt and aufw_bgc in terms of the restart filename.

JorgSchwinger · 2021-10-21T21:49:33Z

Not sure whether we will need restarts for other types of configurations than 'cesm' in the future, but for now this should do the job. Thanks Tomas

matsbn · 2021-10-21T23:20:52Z

Although I have not tested it in a while, tracer restart should work for other tracers than HAMOCC's for all experiment configurations. This is done in restart_ocntrcwt, called by restart_trcwt, and it will not be possible to write restart for these tracers if the test on expcnf is done as suggested in this PR. Would a possibility be to place the expcnf test inside the HAMOCC CPP block of restart_trcwt? Of course it would be good to be able to restart HAMOCC in some standalone configurations, even if just for efficient testing.

TomasTorsvik · 2021-10-22T11:23:11Z

I got confused by some preprocessing flag settings. I think I will try to replicate the restart_ocntrcwt structure for aufw_bgc and restart_ocntrcrd for aufr_bgc, so that we can have the restart file functionality for testing.

TomasTorsvik · 2021-10-24T16:35:56Z

@matsbn @JorgSchwinger
I changed the pull request so that restart files are created regardless of expcnf settings (using 'blom' filename for 'cesm' setting). I found a CPP flag CCSMCOUPLED in restart_ocntrc[rd/wt] that I removed. This does not appear anywhere else in the source code that I could find.

The new version of the tracer restart setup defines restart file names in restart_trc[rd/wt] which are passed to restart_ocntrc[rd/wt] and auf[r/w]_bgc. This removes some code duplication, but code duplication remains in restart_trc[rd/wt]. It also kind of replicates code in restart_[rd/wt], so perhaps a more generic restart module would be a better solution?

Also, this removes the option to read in HAMOCC restart files from micom runs, but it looks like this option has already been removed elsewhere in the code. Is this OK, or should we aim to keep this functionality in place in master?

Finally, should this bug fix be applied also for the release-1.0 branch? If so, I should probably rebase the bugfix branch to a common master / release-1.0 ancestor.

JorgSchwinger · 2021-10-25T08:39:04Z

Hi Tomas,

I didn't have a closer look, but we definitely don't want to remove the capability to restart from restart files with '.micom.' in them. We don't need to write such files, but we want to restart from them.

matsbn · 2021-10-25T09:37:33Z

I agree with @JorgSchwinger that we should still be able to restart from '.micom.' files.

TomasTorsvik · 2021-10-26T12:12:39Z

@matsbn @JorgSchwinger
Here is a new version of the bugfix. This is a more generic solution that relies on the index function to split up the original restart filename, and just replaces the restart label, e.g. 'r' with 'rbgc' or 'rtrc'. This requires that the filename has a standard format, i.e. there is a fixed number of separators '.' or '_' at the end of the filename. I expect (hope) this is a fairly safe assumption for the restart files. This procedure doesn't care what comes before the label, so either 'blom' or 'micom' should be fine.

I tried to run a hybrid run with restart files from NOINYOC_T62_tn21_27 (Jörg's earlier run). It starts up, but I get an error after running one day:
ERROR: ocn_run_mct:: Internal blom clock not in sync with Sync Clock
Both RUN_REFDATE and RUN_STARTDATE are 0101-01-01, and I have CONTINUE_RUN: FALSE. Any idea what can be wrong? (I don't think it's restart related, but I would like to know anyway)

JorgSchwinger

From the HAMOCC side, this looks ok, although I suspect that the error you get somehow is related to these changes.

This should be thoroughly tested in all possible combinations (with/without PNETCDF) and with .blom. and .micom. restart files, and also in coupled configuration.

TomasTorsvik · 2021-10-27T10:07:37Z

From the HAMOCC side, this looks ok, although I suspect that the error you get somehow is related to these changes.

This should be thoroughly tested in all possible combinations (with/without PNETCDF) and with .blom. and .micom. restart files, and also in coupled configuration.

I agree that some more testing is needed. I am able to run N1850 with micom restart files from piControl, but I didn't check PNETCDF.

I was able to trace the error to 1 day difference between the BLOM internal clock and the clock for the coupled system. In ocn_run_mct I get output blom ymd= 1010102 and sync ymd= 1010103, so for some reason BLOM is on a different clock cycle.

Create filenames in restart_trcrd/restart_trcwt. Pass filenames as arguments to ocntrc and hamocc rd/wt subroutines.

matsbn · 2021-10-28T14:07:05Z

With respect to the clock difference, it seems the internal BLOM date has not been correctly incremented to take into account that with hybrid run type, the ocean component will start after the first coupling time step. What is the ocean coupling frequency in the test you did, @TomasTorsvik?

TomasTorsvik · 2021-10-28T14:32:10Z

@matsbn
I didn't do anything with the ocean coupling frequency before, so I'm not sure what I should be looking for. In the env_run.xml file in the case folder I find:
<entry id="NCPL_BASE_PERIOD" value="day">
<entry id="OCN_NCPL" value="1">
for other components:
<entry id="ATM_NCPL" value="24">
<entry id="LND_NCPL" value="$ATM_NCPL">
<entry id="ICE_NCPL" value="$ATM_NCPL">
<entry id="ROF_NCPL" value="$ATM_NCPL">
Is this it, or are you thinking of something else?

matsbn · 2021-10-28T15:16:23Z

With OCN_NCPL=1 it is daily ocean coupling. Probably, the time management change associated with the major code restructuring (commit e2a4230) has not been properly tested with daily coupling (most CMIP6 simulations uses sub-diurnal coupling frequency). This should of course be corrected and is likely unrelated to the restart filename generation of this PR.

TomasTorsvik · 2021-10-28T15:18:06Z

Thanks for the update. Should we make a new issue for the ocean coupling frequency?

TomasTorsvik · 2021-10-28T15:19:38Z

Also, I am getting errors when running with IOTYPE = 1. Have tested on Betzy, will test again on Fram.

The software_environment.txt file on Betzy has no mention of PnetCDF, so I suppose NorESM has not been configured to run with PnetCDF on Betzy?

JorgSchwinger · 2021-10-28T15:52:07Z

This is probably the T62_tn21 resolution? I've been running this a lot, and I think it also worked after the major code restructuring. I'll test this tomorrow.

TomasTorsvik · 2021-10-28T16:11:56Z

This is probably the T62_tn21 resolution? I've been running this a lot, and I think it also worked after the major code restructuring. I'll test this tomorrow.

Yes, it is T62_tn21. I have been using restart files from with NOINYOC_T62_tn21_27 and NOINYOC_T62_tn21_27_betzy.

TomasTorsvik · 2021-10-29T07:19:50Z

Also, I am getting errors when running with IOTYPE = 1. Have tested on Betzy, will test again on Fram.

The software_environment.txt file on Betzy has no mention of PnetCDF, so I suppose NorESM has not been configured to run with PnetCDF on Betzy?

I included the PnetCDF module for the test run on Betzy. I get bit identical output for N1850_piControl run when comparing with noresm2.0.5 (without pnetcdf), for daily output after 1 year, so I don't think there are any issues with IOTYPE = 1 for these code changes.

JorgSchwinger · 2021-10-29T10:45:57Z

@TomasTorsvik @matsbn

I did a short test with the NorESM release 2.0.5 (but with the current version of blom master), and I cannot reproduce any restart issue with the T62_tn21 resolution and daily coupling frequency. I restarted from NOINYOC_T62_tn21_betzy, and also from restart files generated during the run and didn't have any problems.

This suggests it must have something to do with the changes related to this PR.

TomasTorsvik · 2021-10-29T11:49:13Z

@JorgSchwinger
Then I guess I must have a problem with my run script, because I have the same problem with noresm2.0.5. Here is the script I am using.

#!/usr/bin/bash -f

PROJECT="nn2980k"
MODVERSION="noresm2.0.5"
MODEL="NORESM/${MODVERSION}"
MACHINE="betzy"
SRCROOT="/cluster/projects/${PROJECT}/${USER}/${MODEL}"

# Use current time for case directory
timestamp=$(date +"%Y%m%dT%H%M")

# OMIP1 compset (interannual forcing, transient CO2)
#COMPSET="NOIIAOC20TR"
#CASEDIR="NOIIAOC20TR_T62_tn21_test_2x1"

# Normal year forcing, constant CO2. A 2000 year model run with this compset is available
# at /projects/NS2980K/schwinger/NOINYOC_T62_tn21_27/.
# PERIOD : "ndays", "nmonths", "nyears"
COMPSET="NOINYOC"
GRID="T62_tn21"
NSTEP="1"
PERIOD="nyears"
CASEDIR="${COMPSET}_${GRID}_${MODVERSION}_${NSTEP}${PERIOD:1:1}_${timestamp}"


if [ -d ${CASEDIR} ]; then
  echo "Case directory already exists. Please remove and start again"
  exit
fi

echo "=== Creating the case ==="

# "--pecount XS" will use 4 nodes only, suitable for fram-development queue
#${SRCROOT}/cime/scripts/create_newcase --case ${CASEDIR} --res T62_tn21 --mach fram --compset ${COMPSET} --project ${PROJECT} --pecount XS

# without pecount-option 11 nodes are choosen
${SRCROOT}/cime/scripts/create_newcase --case ${CASEDIR} --res ${GRID} --mach ${MACHINE} --compset ${COMPSET} --project ${PROJECT} --pecount L

cd ${CASEDIR}


#echo "=== Setup the case ==="
./case.setup

# Copy NOINYOC restart files (0101-01-01) to run directory
RESTDIR="/cluster/projects/nn2980k/NorESM_restart/NOINYOC_T62_tn21_noresm2.0.5/0003-01-01-00000/"
RUNDIR="${USERWORK}/noresm/${CASEDIR}/run/"
rsync -auv ${RESTDIR} ${RUNDIR}

# restart settings
./xmlchange RUN_TYPE=hybrid
./xmlchange CONTINUE_RUN=false
./xmlchange RUN_REFCASE=NOINYOC_T62_tn21_noresm2.0.5_1y_20211027T1437
./xmlchange RUN_REFDATE=0003-01-01
./xmlchange RUN_STARTDATE=0003-01-01

# One year does not fit into the time limit of 00:30:00 of the devel-queue, so the run will be killed
# when the time-limit is hit (of course you can make this nicer and choose something that fits)
./xmlchange STOP_OPTION=${PERIOD}
./xmlchange STOP_N=${NSTEP}
#./xmlchange CONTINUE_RUN=FALSE
#./xmlchange RESUBMIT=9
./xmlchange JOB_QUEUE=""                --subgroup case.run --force
./xmlchange JOB_WALLCLOCK_TIME=1:00:00  --subgroup case.run
./xmlchange JOB_WALLCLOCK_TIME=1:00:00  --subgroup case.st_archive


#echo "=== Build the case ==="
#./preview_namelists
./case.build


#echo "=== Submit the case ==="
./case.submit

JorgSchwinger · 2021-10-29T17:44:01Z

Ok, I have never tested a hybrid restart for this. I just tested restarting.

TomasTorsvik · 2021-10-31T21:58:10Z

Ok, I have never tested a hybrid restart for this. I just tested restarting.

Normal restarting ("RESUBMIT", "CONTINUE_RUN") has been tested and works fine, so at least it doesn't break anything that was already working. I suppose hybrid restart is a separate issue (I already created one in the issue tracker).

TomasTorsvik added the bug Something isn't working label Oct 21, 2021

TomasTorsvik requested review from matsbn and JorgSchwinger October 21, 2021 06:43

TomasTorsvik self-assigned this Oct 21, 2021

TomasTorsvik linked an issue Oct 21, 2021 that may be closed by this pull request

Writing tracer restart files only work for 'cesm' experiment? #123

Closed

JorgSchwinger approved these changes Oct 21, 2021

View reviewed changes

TomasTorsvik marked this pull request as draft October 22, 2021 11:25

TomasTorsvik force-pushed the bugfix_write_tracer_restart branch from a6c0fe1 to 398e5f4 Compare October 22, 2021 16:01

TomasTorsvik marked this pull request as ready for review October 24, 2021 16:13

TomasTorsvik requested a review from JorgSchwinger October 25, 2021 05:39

TomasTorsvik changed the title ~~Bugfix: only write tracer restart files for 'cesm' experiments.~~ Bugfix: write tracer restart files for all experiments. Oct 27, 2021

TomasTorsvik added this to To do in release-1.2 via automation Oct 27, 2021

JorgSchwinger reviewed Oct 27, 2021

View reviewed changes

TomasTorsvik and others added 8 commits October 27, 2021 13:48

Restructure tracer restart file read and write.

9b60bcd

Create filenames in restart_trcrd/restart_trcwt. Pass filenames as arguments to ocntrc and hamocc rd/wt subroutines.

Generate file name on mnproc 1, and broadcast to other nodes.

2977fbb

Remove cpp flag CCSMCOUPLED (not defined). Simplify structure.

e6d57e2

bugfix: variable name change

b879f2d

Allow 'cesm' restart files from either blom or micom.

57c8951

Include new file to get restart file name for tracers.

07cd530

Fix typos ...

c29fbe4

For consistency, rename rstfnm_ocn to rstfnm_hamocc in hamocc_init.F

afd854c

TomasTorsvik force-pushed the bugfix_write_tracer_restart branch from d2d4f62 to afd854c Compare October 27, 2021 11:49

This was referenced Oct 29, 2021

BLOM date missmatch for hybrid run with OCN_NCPL=1 #127

Open

Update restart_trc[rd/wt] files to GF90 #128

Closed

Include optional keyword "back" in index search.

7d1a9ac

release-1.2 automation moved this from To do to In progress Nov 3, 2021

matsbn approved these changes Nov 3, 2021

View reviewed changes

TomasTorsvik merged commit 9190282 into NorESMhub:master Nov 3, 2021

release-1.2 automation moved this from In progress to Done Nov 3, 2021

TomasTorsvik deleted the bugfix_write_tracer_restart branch November 3, 2021 12:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugfix: write tracer restart files for all experiments. #124

Bugfix: write tracer restart files for all experiments. #124

TomasTorsvik commented Oct 21, 2021

JorgSchwinger commented Oct 21, 2021

matsbn commented Oct 21, 2021

TomasTorsvik commented Oct 22, 2021

TomasTorsvik commented Oct 24, 2021

JorgSchwinger commented Oct 25, 2021

matsbn commented Oct 25, 2021

TomasTorsvik commented Oct 26, 2021 •

edited

Loading

JorgSchwinger left a comment

TomasTorsvik commented Oct 27, 2021 •

edited

Loading

matsbn commented Oct 28, 2021

TomasTorsvik commented Oct 28, 2021

matsbn commented Oct 28, 2021

TomasTorsvik commented Oct 28, 2021

TomasTorsvik commented Oct 28, 2021 •

edited

Loading

JorgSchwinger commented Oct 28, 2021

TomasTorsvik commented Oct 28, 2021 •

edited

Loading

TomasTorsvik commented Oct 29, 2021

JorgSchwinger commented Oct 29, 2021

TomasTorsvik commented Oct 29, 2021

JorgSchwinger commented Oct 29, 2021

TomasTorsvik commented Oct 31, 2021

Bugfix: write tracer restart files for all experiments. #124

Bugfix: write tracer restart files for all experiments. #124

Conversation

TomasTorsvik commented Oct 21, 2021

JorgSchwinger commented Oct 21, 2021

matsbn commented Oct 21, 2021

TomasTorsvik commented Oct 22, 2021

TomasTorsvik commented Oct 24, 2021

JorgSchwinger commented Oct 25, 2021

matsbn commented Oct 25, 2021

TomasTorsvik commented Oct 26, 2021 • edited Loading

JorgSchwinger left a comment

Choose a reason for hiding this comment

TomasTorsvik commented Oct 27, 2021 • edited Loading

matsbn commented Oct 28, 2021

TomasTorsvik commented Oct 28, 2021

matsbn commented Oct 28, 2021

TomasTorsvik commented Oct 28, 2021

TomasTorsvik commented Oct 28, 2021 • edited Loading

JorgSchwinger commented Oct 28, 2021

TomasTorsvik commented Oct 28, 2021 • edited Loading

TomasTorsvik commented Oct 29, 2021

JorgSchwinger commented Oct 29, 2021

TomasTorsvik commented Oct 29, 2021

JorgSchwinger commented Oct 29, 2021

TomasTorsvik commented Oct 31, 2021

TomasTorsvik commented Oct 26, 2021 •

edited

Loading

TomasTorsvik commented Oct 27, 2021 •

edited

Loading

TomasTorsvik commented Oct 28, 2021 •

edited

Loading

TomasTorsvik commented Oct 28, 2021 •

edited

Loading