Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restart files are read back in again right after being written out, unnecessarily #4513

Closed
worleyph opened this issue Sep 3, 2021 · 5 comments · Fixed by #4523
Closed

Restart files are read back in again right after being written out, unnecessarily #4513

worleyph opened this issue Sep 3, 2021 · 5 comments · Fixed by #4523
Assignees

Comments

@worleyph
Copy link
Contributor

worleyph commented Sep 3, 2021

In cime_comp._mod.F90 is the following code:

   !----------------------------------------------------------
   !| Write driver restart file
   !----------------------------------------------------------
   call cime_run_write_restart(drv_pause, restart_alarm, drv_resume)

...

   !----------------------------------------------------------
   !| RESUME (read restart) if signaled
   !----------------------------------------------------------
   if (len_trim(drv_resume) > 0) then
      if (iamroot_CPLID) then
         write(logunit,103) subname,' Reading restart (resume) file ',trim(drv_resume)
         call shr_sys_flush(logunit)
      end if
      if (iamin_CPLID) then
         call seq_rest_read(drv_resume, infodata,                          &
              atm, lnd, ice, ocn, rof, glc, wav, esp, iac,                 &
              fractions_ax, fractions_lx, fractions_ix, fractions_ox,      &
              fractions_rx, fractions_gx, fractions_wx, fractions_zx)
      end if
      ! Clear the resume file so we don't try to read it again
      drv_resume = ' '
   end if

In a 1 month initial run (with restart write), the restart files are read back in immediately after being written out, after which the job ends. Also occurs for a continuation run. For a 2 month initial run with restart writes every month, the restart files are read back in immediately after being written out each time (twice, once per month). So, perhaps this is needed to verify the restart write? If not, does it makes sense to read it back in right before the job ends?

@jedwards4b
Copy link
Contributor

The drv_resume string is part of the partially implemented pause/resume capability. This capability allows for the just written restart file to be manipulated by some external process (DA) before being read back into the model. The drv_resume variable should be '' unless the pause/resume functionality is invoked. If you are finding this not to be the case then I think you have identified a bug that should be corrected.

@jayeshkrishna
Copy link
Contributor

Some more context:
cime_run_write_restart(...,drv_resume) was added to ESMCI/cime in 53700fb

Old logic (left) vs New logic (right) : cime_comp_mod.F90:

      !----------------------------------------------------------                                        |          end if    
      !| RESUME (read restart) if signaled                                                               |          call seq_resume_get_files('x', resume_files)
      !----------------------------------------------------------                                        |          if (associated(resume_files)) then
      call seq_infodata_GetData(infodata, cpl_resume=drv_resume)                                         |             drv_resume = resume_files(driver_id)
      if (len_trim(drv_resume) > 0) then                                                                 |          end if
         if (iamroot_CPLID) then                                                                         |       end if
            write(logunit,103) subname,' Reading restart (resume) file ',trim(drv_resume)                |
            call shr_sys_flush(logunit)                                                                  |       !----------------------------------------------------------
         end if                                                                                          |       !| RESUME (read restart) if signaled
         if (iamin_CPLID) then                                                                           |       !----------------------------------------------------------   
            call seq_rest_read(drv_resume, infodata,                          &                          |       if (len_trim(drv_resume) > 0) then     
                 atm, lnd, ice, ocn, rof, glc, wav, esp,                      &                          |          if (iamroot_CPLID) then
                 fractions_ax, fractions_lx, fractions_ix, fractions_ox,      &                          |             write(logunit,103) subname,' Reading restart (resume) file ',trim(drv_resume)
                 fractions_rx, fractions_gx, fractions_wx)                                               |             call shr_sys_flush(logunit)
         end if                                                                                          |          end if 
         ! Clear the resume file so we don't try to read it again                                        |          if (iamin_CPLID) then
         drv_resume = ' '                                                                                |             call seq_rest_read(drv_resume, infodata,                          &
         call seq_infodata_PutData(infodata, cpl_resume=drv_resume)                                      |                  atm, lnd, ice, ocn, rof, glc, wav, esp, iac,                 &
      end if                   

@worleyph worleyph added the bug label Sep 3, 2021
@worleyph
Copy link
Contributor Author

worleyph commented Sep 3, 2021

Thanks @jedwards4b and @jayeshkrishna. Appears that we have a bug then, as the timers (via list of PIO routines called and time spent in the timer I added around seq_rest_read) and the cpl.log output both indicate that restart files are being read back in immediately following each restart write.

Note that the case I ran is

 -compset WCYCL1850-4xCO2 --res ne30pg2_EC30to60E2r2 

@worleyph
Copy link
Contributor Author

worleyph commented Sep 3, 2021

In the current logic ...

a) drv_resume is not initialized.

b) drv_resume is first set in the call to cime_run_write_restart, to

 drv_resume = ''

if restart_alarm is not true and drv_pause is not true.

c) If a restart_alarm is true (or drv_pause is true), then drv_resume is set to a file path in the call to seq_rest_write. It retains this value upon exit from this routine, which then triggers the following call to seq_rest_read .

I do not know what the logic is supposed to be here (set drv_resume = '' at the end of seq_rest_write?), so someone else needs to make the fix.

Also note that the immediately preceding logic associated with "RUN ESP MODEL" can also leave drv_resume set to a nonzero length string, but perhaps this is what is wanted there.

@rljacob, any suggestion on who to assign to fix this bug?

@worleyph worleyph changed the title [Question] Why are restart files read back in again right after being written and right before job ends? Restart files are read back in again right after being written out, unnecessarily Sep 3, 2021
@worleyph worleyph removed the question label Sep 3, 2021
@rljacob
Copy link
Member

rljacob commented Sep 3, 2021

I'll work on it. Yes when ESP model is run then drv_resume should be a nonblank string.

rljacob added a commit that referenced this issue Sep 8, 2021
Make drv_resume a logical and use it to determine if
a restart file should be read.  Introduce drv_resume_file and use it
where drv_resume was assumed to be a filename.

Also add a log message for the real restart read at the start of a run.

Fixes #4513
rljacob added a commit that referenced this issue Sep 14, 2021
Make drv_resume a logical and set it to FALSE to prevent read of restart immediately after it is written.
Introduce drv_resume_file to use where a filename is really needed.

Also add a log message for the real restart read and make the seq_io_read_avs log messages more informative.

Fixes #4513
rljacob added a commit that referenced this issue Sep 15, 2021
Make drv_resume a logical and set it to FALSE to prevent read of restart immediately after it is written.
Introduce drv_resume_file to use where a filename is really needed.

Also add a log message for the real restart read and make the seq_io_read_avs log messages more informative.

Fixes #4513
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants