Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix: write tracer restart files for all experiments. #124

Merged

Conversation

TomasTorsvik
Copy link
Contributor

Suggestion for bugfix. This prevents call to restart_trcwt if expcnf is not 'cesm'.

It would probably be better to make the tracer restart file generation more general, but this may require some restructuring of the restart writing procedure. At the moment there are dependencies between restart_wt, restart_ocntrcwt and aufw_bgc in terms of the restart filename.

@TomasTorsvik TomasTorsvik added the bug Something isn't working label Oct 21, 2021
@TomasTorsvik TomasTorsvik self-assigned this Oct 21, 2021
@TomasTorsvik TomasTorsvik linked an issue Oct 21, 2021 that may be closed by this pull request
@JorgSchwinger
Copy link
Contributor

Not sure whether we will need restarts for other types of configurations than 'cesm' in the future, but for now this should do the job. Thanks Tomas

@matsbn
Copy link
Contributor

matsbn commented Oct 21, 2021

Although I have not tested it in a while, tracer restart should work for other tracers than HAMOCC's for all experiment configurations. This is done in restart_ocntrcwt, called by restart_trcwt, and it will not be possible to write restart for these tracers if the test on expcnf is done as suggested in this PR. Would a possibility be to place the expcnf test inside the HAMOCC CPP block of restart_trcwt? Of course it would be good to be able to restart HAMOCC in some standalone configurations, even if just for efficient testing.

@TomasTorsvik
Copy link
Contributor Author

I got confused by some preprocessing flag settings. I think I will try to replicate the restart_ocntrcwt structure for aufw_bgc and restart_ocntrcrd for aufr_bgc, so that we can have the restart file functionality for testing.

@TomasTorsvik TomasTorsvik marked this pull request as draft October 22, 2021 11:25
@TomasTorsvik TomasTorsvik marked this pull request as ready for review October 24, 2021 16:13
@TomasTorsvik
Copy link
Contributor Author

@matsbn @JorgSchwinger
I changed the pull request so that restart files are created regardless of expcnf settings (using 'blom' filename for 'cesm' setting). I found a CPP flag CCSMCOUPLED in restart_ocntrc[rd/wt] that I removed. This does not appear anywhere else in the source code that I could find.

The new version of the tracer restart setup defines restart file names in restart_trc[rd/wt] which are passed to restart_ocntrc[rd/wt] and auf[r/w]_bgc. This removes some code duplication, but code duplication remains in restart_trc[rd/wt]. It also kind of replicates code in restart_[rd/wt], so perhaps a more generic restart module would be a better solution?

Also, this removes the option to read in HAMOCC restart files from micom runs, but it looks like this option has already been removed elsewhere in the code. Is this OK, or should we aim to keep this functionality in place in master?

Finally, should this bug fix be applied also for the release-1.0 branch? If so, I should probably rebase the bugfix branch to a common master / release-1.0 ancestor.

@JorgSchwinger
Copy link
Contributor

Hi Tomas,

I didn't have a closer look, but we definitely don't want to remove the capability to restart from restart files with '.micom.' in them. We don't need to write such files, but we want to restart from them.

@matsbn
Copy link
Contributor

matsbn commented Oct 25, 2021

I agree with @JorgSchwinger that we should still be able to restart from '.micom.' files.

@TomasTorsvik
Copy link
Contributor Author

TomasTorsvik commented Oct 26, 2021

@matsbn @JorgSchwinger
Here is a new version of the bugfix. This is a more generic solution that relies on the index function to split up the original restart filename, and just replaces the restart label, e.g. 'r' with 'rbgc' or 'rtrc'. This requires that the filename has a standard format, i.e. there is a fixed number of separators '.' or '_' at the end of the filename. I expect (hope) this is a fairly safe assumption for the restart files. This procedure doesn't care what comes before the label, so either 'blom' or 'micom' should be fine.

I tried to run a hybrid run with restart files from NOINYOC_T62_tn21_27 (Jörg's earlier run). It starts up, but I get an error after running one day:
ERROR: ocn_run_mct:: Internal blom clock not in sync with Sync Clock
Both RUN_REFDATE and RUN_STARTDATE are 0101-01-01, and I have CONTINUE_RUN: FALSE. Any idea what can be wrong? (I don't think it's restart related, but I would like to know anyway)

@TomasTorsvik TomasTorsvik changed the title Bugfix: only write tracer restart files for 'cesm' experiments. Bugfix: write tracer restart files for all experiments. Oct 27, 2021
@TomasTorsvik TomasTorsvik added this to To do in release-1.2 via automation Oct 27, 2021
Copy link
Contributor

@JorgSchwinger JorgSchwinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the HAMOCC side, this looks ok, although I suspect that the error you get somehow is related to these changes.

This should be thoroughly tested in all possible combinations (with/without PNETCDF) and with .blom. and .micom. restart files, and also in coupled configuration.

@TomasTorsvik
Copy link
Contributor Author

TomasTorsvik commented Oct 27, 2021

From the HAMOCC side, this looks ok, although I suspect that the error you get somehow is related to these changes.

This should be thoroughly tested in all possible combinations (with/without PNETCDF) and with .blom. and .micom. restart files, and also in coupled configuration.

I agree that some more testing is needed. I am able to run N1850 with micom restart files from piControl, but I didn't check PNETCDF.

I was able to trace the error to 1 day difference between the BLOM internal clock and the clock for the coupled system. In ocn_run_mct I get output blom ymd= 1010102 and sync ymd= 1010103, so for some reason BLOM is on a different clock cycle.

@matsbn
Copy link
Contributor

matsbn commented Oct 28, 2021

With respect to the clock difference, it seems the internal BLOM date has not been correctly incremented to take into account that with hybrid run type, the ocean component will start after the first coupling time step. What is the ocean coupling frequency in the test you did, @TomasTorsvik?

@TomasTorsvik
Copy link
Contributor Author

@matsbn
I didn't do anything with the ocean coupling frequency before, so I'm not sure what I should be looking for. In the env_run.xml file in the case folder I find:
<entry id="NCPL_BASE_PERIOD" value="day">
<entry id="OCN_NCPL" value="1">
for other components:
<entry id="ATM_NCPL" value="24">
<entry id="LND_NCPL" value="$ATM_NCPL">
<entry id="ICE_NCPL" value="$ATM_NCPL">
<entry id="ROF_NCPL" value="$ATM_NCPL">
Is this it, or are you thinking of something else?

@matsbn
Copy link
Contributor

matsbn commented Oct 28, 2021

With OCN_NCPL=1 it is daily ocean coupling. Probably, the time management change associated with the major code restructuring (commit e2a4230) has not been properly tested with daily coupling (most CMIP6 simulations uses sub-diurnal coupling frequency). This should of course be corrected and is likely unrelated to the restart filename generation of this PR.

@TomasTorsvik
Copy link
Contributor Author

Thanks for the update. Should we make a new issue for the ocean coupling frequency?

@TomasTorsvik
Copy link
Contributor Author

TomasTorsvik commented Oct 28, 2021

Also, I am getting errors when running with IOTYPE = 1. Have tested on Betzy, will test again on Fram.

The software_environment.txt file on Betzy has no mention of PnetCDF, so I suppose NorESM has not been configured to run with PnetCDF on Betzy?

@JorgSchwinger
Copy link
Contributor

This is probably the T62_tn21 resolution? I've been running this a lot, and I think it also worked after the major code restructuring. I'll test this tomorrow.

@TomasTorsvik
Copy link
Contributor Author

TomasTorsvik commented Oct 28, 2021

This is probably the T62_tn21 resolution? I've been running this a lot, and I think it also worked after the major code restructuring. I'll test this tomorrow.

Yes, it is T62_tn21. I have been using restart files from with NOINYOC_T62_tn21_27 and NOINYOC_T62_tn21_27_betzy.

@TomasTorsvik
Copy link
Contributor Author

Also, I am getting errors when running with IOTYPE = 1. Have tested on Betzy, will test again on Fram.

The software_environment.txt file on Betzy has no mention of PnetCDF, so I suppose NorESM has not been configured to run with PnetCDF on Betzy?

I included the PnetCDF module for the test run on Betzy. I get bit identical output for N1850_piControl run when comparing with noresm2.0.5 (without pnetcdf), for daily output after 1 year, so I don't think there are any issues with IOTYPE = 1 for these code changes.

@JorgSchwinger
Copy link
Contributor

@TomasTorsvik @matsbn

I did a short test with the NorESM release 2.0.5 (but with the current version of blom master), and I cannot reproduce any restart issue with the T62_tn21 resolution and daily coupling frequency. I restarted from NOINYOC_T62_tn21_betzy, and also from restart files generated during the run and didn't have any problems.

This suggests it must have something to do with the changes related to this PR.

@TomasTorsvik
Copy link
Contributor Author

@JorgSchwinger
Then I guess I must have a problem with my run script, because I have the same problem with noresm2.0.5. Here is the script I am using.

#!/usr/bin/bash -f

PROJECT="nn2980k"
MODVERSION="noresm2.0.5"
MODEL="NORESM/${MODVERSION}"
MACHINE="betzy"
SRCROOT="/cluster/projects/${PROJECT}/${USER}/${MODEL}"

# Use current time for case directory
timestamp=$(date +"%Y%m%dT%H%M")

# OMIP1 compset (interannual forcing, transient CO2)
#COMPSET="NOIIAOC20TR"
#CASEDIR="NOIIAOC20TR_T62_tn21_test_2x1"

# Normal year forcing, constant CO2. A 2000 year model run with this compset is available
# at /projects/NS2980K/schwinger/NOINYOC_T62_tn21_27/.
# PERIOD : "ndays", "nmonths", "nyears"
COMPSET="NOINYOC"
GRID="T62_tn21"
NSTEP="1"
PERIOD="nyears"
CASEDIR="${COMPSET}_${GRID}_${MODVERSION}_${NSTEP}${PERIOD:1:1}_${timestamp}"


if [ -d ${CASEDIR} ]; then
  echo "Case directory already exists. Please remove and start again"
  exit
fi

echo "=== Creating the case ==="

# "--pecount XS" will use 4 nodes only, suitable for fram-development queue
#${SRCROOT}/cime/scripts/create_newcase --case ${CASEDIR} --res T62_tn21 --mach fram --compset ${COMPSET} --project ${PROJECT} --pecount XS

# without pecount-option 11 nodes are choosen
${SRCROOT}/cime/scripts/create_newcase --case ${CASEDIR} --res ${GRID} --mach ${MACHINE} --compset ${COMPSET} --project ${PROJECT} --pecount L

cd ${CASEDIR}


#echo "=== Setup the case ==="
./case.setup

# Copy NOINYOC restart files (0101-01-01) to run directory
RESTDIR="/cluster/projects/nn2980k/NorESM_restart/NOINYOC_T62_tn21_noresm2.0.5/0003-01-01-00000/"
RUNDIR="${USERWORK}/noresm/${CASEDIR}/run/"
rsync -auv ${RESTDIR} ${RUNDIR}

# restart settings
./xmlchange RUN_TYPE=hybrid
./xmlchange CONTINUE_RUN=false
./xmlchange RUN_REFCASE=NOINYOC_T62_tn21_noresm2.0.5_1y_20211027T1437
./xmlchange RUN_REFDATE=0003-01-01
./xmlchange RUN_STARTDATE=0003-01-01

# One year does not fit into the time limit of 00:30:00 of the devel-queue, so the run will be killed
# when the time-limit is hit (of course you can make this nicer and choose something that fits)
./xmlchange STOP_OPTION=${PERIOD}
./xmlchange STOP_N=${NSTEP}
#./xmlchange CONTINUE_RUN=FALSE
#./xmlchange RESUBMIT=9
./xmlchange JOB_QUEUE=""                --subgroup case.run --force
./xmlchange JOB_WALLCLOCK_TIME=1:00:00  --subgroup case.run
./xmlchange JOB_WALLCLOCK_TIME=1:00:00  --subgroup case.st_archive


#echo "=== Build the case ==="
#./preview_namelists
./case.build


#echo "=== Submit the case ==="
./case.submit

@JorgSchwinger
Copy link
Contributor

Ok, I have never tested a hybrid restart for this. I just tested restarting.

@TomasTorsvik
Copy link
Contributor Author

Ok, I have never tested a hybrid restart for this. I just tested restarting.

Normal restarting ("RESUBMIT", "CONTINUE_RUN") has been tested and works fine, so at least it doesn't break anything that was already working. I suppose hybrid restart is a separate issue (I already created one in the issue tracker).

release-1.2 automation moved this from To do to In progress Nov 3, 2021
@TomasTorsvik TomasTorsvik merged commit 9190282 into NorESMhub:master Nov 3, 2021
release-1.2 automation moved this from In progress to Done Nov 3, 2021
@TomasTorsvik TomasTorsvik deleted the bugfix_write_tracer_restart branch November 3, 2021 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Writing tracer restart files only work for 'cesm' experiment?
3 participants