Feature request: always report model throughput after finalize #1930

lizziel · 2023-01-13T16:23:20Z

Reporting the model throughput to log at the end of the run is currently tied to pFlogger and requires MAPL logger level INFO (see here).

It would be useful if it was always output regardless of logging settings. Currently both the throughput estimation per timestep and the model timing info are always printed and are not tied to pFlogger. End-of-run model throughput reporting would fall into the same category of information as these, and always doing it would not impact performance or clutter the log.

The code has a comment that it is going to be removed soon. We use this in GCHP as a handy performance metric so request that it stay. It also seems to be much more accurate than the throughput reported per timestep. For example, my 6-hr run has the following stats. If you do the math with the total run time the correct estimation is 48 days/day.

2019/07/01  Time: 06:00:00 Throughput(days/day)[Avg Tot Run]:         63.3         73.4        219.6
   TOTAL                      449.3732       0.0000     449.3732       0
           MAPL: INFO: Model Throughput:       48.008 days per day

The text was updated successfully, but these errors were encountered:

mathomp4 · 2023-01-13T16:38:44Z

@lizziel Huh. You are right. There is a comment from William that it will be removed. I'm not sure why. I depend on it quite a bit (of course, I wrote it so, well... 😄).

Obviously as you say, the reason it is more accurate overall is that the during-run timers never see the Finalize actions (and perhaps any History writing still going on). When I wrote that, I used the brain dead method of just wall time divided by model time. Nothing fancy.

Frankly, I think that comment probably should go away or maybe change. Perhaps @tclune had a thought to revamp how we calculate or report that a few years ago?

I guess the other question for @tclune is is there a "logger" that always exists? That is, even if a user doesn't provide a logging.yaml, is there still a way to use pFlogger to write it "always"?

tclune · 2023-01-13T17:08:57Z

Even if we revamp, it is clear the comment is wrong. People now rely on it, so it cannot be deleted.

The whole point of pflogger is to be able to control whether some outputs happen and also to control where they go. I want the timing profile to go through pflogger as in theory we really should want that in a separate file rather than hidden at the bottom of the console log. (Can be both, of course.)

Isn't INFO the default, which means this in "always on" unless you don't want it to be? Could be I've not thought it through.

mathomp4 · 2023-01-13T17:33:10Z

I just checked and, yes, if you don't pass in a logging.yaml file, then you still get the print.

But maybe what @lizziel is asking is what if you build MAPL without pFlogger support? Then I suppose the print is never done since it is going through logger?

Maybe there is there a way to "if pflogger". So we could do:

if (pflogger) then
   lgr => logging%get_logger('MAPL')
   call lgr%info("Model Throughput: %f12.3 days per day", model_days_per_day)
else
   write (*,*) "Model Throughput: %f12.3 days per day", model_days_per_day
end if

(with the right format statement of course). I suppose we could do it with #ifdef BUILD_WITH_PFLOGGER...

tclune · 2023-01-13T18:29:46Z

I'm pretty sure GCHP uses pFlogger - we've added several diagnostics in ExtData using that layer.

And having a switch on top of pFlogger is the way of insanity.

mathomp4 · 2023-01-13T18:33:30Z

And having a switch on top of pFlogger is the way of insanity.

Indeed. Since it looks like the print happens no matter what, I'll make a PR to at least remove the comment as we will want this.

lizziel · 2023-01-17T15:07:32Z

Yes, we do use pFlogger but recently changed the default setting for MAPL to be WARNINGS rather than INFO. However, we still want model throughput printed. I'll revisit the reasoning behind changing INFO to WARNINGS. I think we were trying to avoid certain prints that were showing up in the log.

tclune · 2023-01-17T15:45:31Z

You can leave the default settings for MAPL to be INFO, and we can spin off a sublogger for profiling. Then you can set the reporting to be INFO for that but WARNING for everything else.

I'll have to look at the code, but I thought I was already using a sublogger for profiling. But maybe not.

tclune · 2023-01-17T16:11:10Z

For the routine small throughput bursts, this line should change:

MAPL/gridcomps/Cap/MAPL_Cap.F90

Line 339 in 5d0c6ec

lgr => logging%get_logger('MAPL')

Should instead be

lgr => logging%get_logger('MAPL.profile')

And then the level for logging the profile can be set independently.

Things are a bit more complicated for the profiles of individual components. I see now that I never finished the other request to push those writes to pFlogger:

MAPL/generic/MAPL_Generic.F90

Lines 2327 to 2334 in 5d0c6ec

    
              report = reporter%generate_report(state%t_profiler) 
        
              write(OUTPUT_UNIT,*)'' 
        
              write(OUTPUT_UNIT,*)'Times for component <' // trim(comp_name) // '>' 
        
              do i = 1, size(report) 
        
                 write(OUTPUT_UNIT,'(a)')report(i) 
        
              end do 
        
              write(OUTPUT_UNIT,*)'' 
        
           end if

This could at least check the appropriate logger to see if its level is above some threshold. Baby step. The problem is should it check MAPL.profile or should it check <comp-name>.profile. I see merits in both. The former is easier to be global, but the latter is really the "right way". (TM).

tclune · 2023-01-17T16:12:19Z

And the same issue for the global timers:

MAPL/base/ApplicationSupport.F90

Lines 187 to 193 in 5d0c6ec

    
           if (my_rank == 0) then 
        
                 report_lines = reporter%generate_report(t_p) 
        
                 write(*,'(a,1x,i0)')'Report on process: ', my_rank 
        
                 do i = 1, size(report_lines) 
        
                    write(*,'(a)') report_lines(i) 
        
                 end do 
        
            end if

But at least there it is clearer that it should be MAPL.profile

lizziel · 2023-01-17T16:52:18Z

A profiling sublogger would be great. This reminds me, if there are changes to the loggers what is the best way for me to get that update? We have a logging.yml file that sits in the GEOS-Chem repository rather than be extracted from MAPL. Are the MAPL release notes the best place to get notified that GCHP's logging.yml should be updated for new features?

tclune · 2023-01-17T17:03:36Z

@lizziel Well pFlogger itself is very stable. (Or at least mostly neglected at this time.) So the only reason to update the YAML file would be to change your defaults or add new sections because someone has instrumented a new GEOS component. AFAIK, there is no current activity on that front either. But if there were, you could monitor our main YAML file: https://github.com/GEOS-ESM/GEOSgcm_App/blob/develop/logging.yaml (Has not changed since March 2022)

With MAPL3 there will probably be a bit more thought given to how to arrange MAPL specific loggers.

In the mean time, hopefully we can quickly provide the changes you have requested in this ticket.

lizziel · 2023-01-17T17:19:48Z

Thank you!

mathomp4 · 2023-01-19T19:05:53Z

I'm taking a look at this. First, I'm just working on the "Model Throughput" timer and if I take the suggestion of @tclune for a MAPL.profiler it does work. If we use our (GEOS') current logging.yaml which has:

   MAPL:
       handlers: [mpi_shared]
       level: WARNING
       root_level: INFO

   MAPL.PROF:
       handlers: [console_plain]
       propagate: FALSE
       level: WARNING
       root_level: INFO

we get out:

       profiler: INFO: Model Throughput:     1030.440 days per day

because it's just inheriting the MAPL logger setup.

but if we change to:

   MAPL.profiler:
       handlers: [console_plain]
       propagate: FALSE
       level: WARNING
       root_level: INFO

where we edit to use the old MAPL.PROF setup, we get the console_plain output:

Model Throughput:      968.064 days per day

which I kind of like. I'll try and figure out the timers now...

mathomp4 · 2023-01-19T20:22:15Z

@lizziel Can you look at #1938

I think if you update your logging.yml to have a new handler:

   console_plain:
      class: streamhandler
      formatter: plain
      unit: OUTPUT_UNIT
      level: INFO

and a new logger:

   MAPL.profiler:
       handlers: [console_plain]
       propagate: FALSE
       level: WARNING
       root_level: INFO

you should be able to do WARNING for all of MAPL, but set this at info for the profiler output

stale · 2023-03-21T03:13:01Z

This issue has been automatically marked as stale because it has not had recent activity. If there are no updates within 7 days, it will be closed. You can add the "long term" tag to prevent the Stale bot from closing this issue.

lizziel · 2023-03-21T14:11:51Z

Thanks @mathomp4!

mathomp4 assigned tclune and mathomp4 Jan 13, 2023

mathomp4 added the ❓ Question Further information is requested label Jan 13, 2023

mathomp4 mentioned this issue Jan 13, 2023

Remove comment saying throughput report will be removed #1932

Closed

mathomp4 mentioned this issue Jan 19, 2023

Create MAPL.profiler logger #1938

Merged

7 tasks

mathomp4 linked a pull request Jan 19, 2023 that will close this issue

Create MAPL.profiler logger #1938

Merged

7 tasks

stale bot added the ❄️ Stale This issue has been marked stale label Mar 21, 2023

lizziel closed this as completed Mar 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: always report model throughput after finalize #1930

Feature request: always report model throughput after finalize #1930

lizziel commented Jan 13, 2023

mathomp4 commented Jan 13, 2023

tclune commented Jan 13, 2023

mathomp4 commented Jan 13, 2023

tclune commented Jan 13, 2023

mathomp4 commented Jan 13, 2023

lizziel commented Jan 17, 2023 •

edited

Loading

tclune commented Jan 17, 2023

tclune commented Jan 17, 2023

tclune commented Jan 17, 2023

lizziel commented Jan 17, 2023

tclune commented Jan 17, 2023

lizziel commented Jan 17, 2023

mathomp4 commented Jan 19, 2023

mathomp4 commented Jan 19, 2023

stale bot commented Mar 21, 2023

lizziel commented Mar 21, 2023

Feature request: always report model throughput after finalize #1930

Feature request: always report model throughput after finalize #1930

Comments

lizziel commented Jan 13, 2023

mathomp4 commented Jan 13, 2023

tclune commented Jan 13, 2023

mathomp4 commented Jan 13, 2023

tclune commented Jan 13, 2023

mathomp4 commented Jan 13, 2023

lizziel commented Jan 17, 2023 • edited Loading

tclune commented Jan 17, 2023

tclune commented Jan 17, 2023

tclune commented Jan 17, 2023

lizziel commented Jan 17, 2023

tclune commented Jan 17, 2023

lizziel commented Jan 17, 2023

mathomp4 commented Jan 19, 2023

mathomp4 commented Jan 19, 2023

stale bot commented Mar 21, 2023

lizziel commented Mar 21, 2023

lizziel commented Jan 17, 2023 •

edited

Loading