Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: always report model throughput after finalize #1930

Closed
lizziel opened this issue Jan 13, 2023 · 16 comments · Fixed by #1938
Closed

Feature request: always report model throughput after finalize #1930

lizziel opened this issue Jan 13, 2023 · 16 comments · Fixed by #1938
Assignees
Labels
❓ Question Further information is requested ❄️ Stale This issue has been marked stale

Comments

@lizziel
Copy link
Contributor

lizziel commented Jan 13, 2023

Reporting the model throughput to log at the end of the run is currently tied to pFlogger and requires MAPL logger level INFO (see here).

It would be useful if it was always output regardless of logging settings. Currently both the throughput estimation per timestep and the model timing info are always printed and are not tied to pFlogger. End-of-run model throughput reporting would fall into the same category of information as these, and always doing it would not impact performance or clutter the log.

The code has a comment that it is going to be removed soon. We use this in GCHP as a handy performance metric so request that it stay. It also seems to be much more accurate than the throughput reported per timestep. For example, my 6-hr run has the following stats. If you do the math with the total run time the correct estimation is 48 days/day.

2019/07/01  Time: 06:00:00 Throughput(days/day)[Avg Tot Run]:         63.3         73.4        219.6
   TOTAL                      449.3732       0.0000     449.3732       0
           MAPL: INFO: Model Throughput:       48.008 days per day
@mathomp4
Copy link
Member

@lizziel Huh. You are right. There is a comment from William that it will be removed. I'm not sure why. I depend on it quite a bit (of course, I wrote it so, well... 😄).

Obviously as you say, the reason it is more accurate overall is that the during-run timers never see the Finalize actions (and perhaps any History writing still going on). When I wrote that, I used the brain dead method of just wall time divided by model time. Nothing fancy.

Frankly, I think that comment probably should go away or maybe change. Perhaps @tclune had a thought to revamp how we calculate or report that a few years ago?

I guess the other question for @tclune is is there a "logger" that always exists? That is, even if a user doesn't provide a logging.yaml, is there still a way to use pFlogger to write it "always"?

@mathomp4 mathomp4 added the ❓ Question Further information is requested label Jan 13, 2023
@tclune
Copy link
Collaborator

tclune commented Jan 13, 2023

Even if we revamp, it is clear the comment is wrong. People now rely on it, so it cannot be deleted.

The whole point of pflogger is to be able to control whether some outputs happen and also to control where they go. I want the timing profile to go through pflogger as in theory we really should want that in a separate file rather than hidden at the bottom of the console log. (Can be both, of course.)

Isn't INFO the default, which means this in "always on" unless you don't want it to be? Could be I've not thought it through.

@mathomp4
Copy link
Member

I just checked and, yes, if you don't pass in a logging.yaml file, then you still get the print.

But maybe what @lizziel is asking is what if you build MAPL without pFlogger support? Then I suppose the print is never done since it is going through logger?

Maybe there is there a way to "if pflogger". So we could do:

if (pflogger) then
   lgr => logging%get_logger('MAPL')
   call lgr%info("Model Throughput: %f12.3 days per day", model_days_per_day)
else
   write (*,*) "Model Throughput: %f12.3 days per day", model_days_per_day
end if

(with the right format statement of course). I suppose we could do it with #ifdef BUILD_WITH_PFLOGGER...

@tclune
Copy link
Collaborator

tclune commented Jan 13, 2023

I'm pretty sure GCHP uses pFlogger - we've added several diagnostics in ExtData using that layer.

And having a switch on top of pFlogger is the way of insanity.

@mathomp4
Copy link
Member

And having a switch on top of pFlogger is the way of insanity.

Indeed. Since it looks like the print happens no matter what, I'll make a PR to at least remove the comment as we will want this.

@lizziel
Copy link
Contributor Author

lizziel commented Jan 17, 2023

Yes, we do use pFlogger but recently changed the default setting for MAPL to be WARNINGS rather than INFO. However, we still want model throughput printed. I'll revisit the reasoning behind changing INFO to WARNINGS. I think we were trying to avoid certain prints that were showing up in the log.

@tclune
Copy link
Collaborator

tclune commented Jan 17, 2023

You can leave the default settings for MAPL to be INFO, and we can spin off a sublogger for profiling. Then you can set the reporting to be INFO for that but WARNING for everything else.

I'll have to look at the code, but I thought I was already using a sublogger for profiling. But maybe not.

@tclune
Copy link
Collaborator

tclune commented Jan 17, 2023

For the routine small throughput bursts, this line should change:

lgr => logging%get_logger('MAPL')

Should instead be

lgr => logging%get_logger('MAPL.profile') 

And then the level for logging the profile can be set independently.

Things are a bit more complicated for the profiles of individual components. I see now that I never finished the other request to push those writes to pFlogger:

MAPL/generic/MAPL_Generic.F90

Lines 2327 to 2334 in 5d0c6ec

report = reporter%generate_report(state%t_profiler)
write(OUTPUT_UNIT,*)''
write(OUTPUT_UNIT,*)'Times for component <' // trim(comp_name) // '>'
do i = 1, size(report)
write(OUTPUT_UNIT,'(a)')report(i)
end do
write(OUTPUT_UNIT,*)''
end if

This could at least check the appropriate logger to see if its level is above some threshold. Baby step. The problem is should it check MAPL.profile or should it check <comp-name>.profile. I see merits in both. The former is easier to be global, but the latter is really the "right way". (TM).

@tclune
Copy link
Collaborator

tclune commented Jan 17, 2023

And the same issue for the global timers:

if (my_rank == 0) then
report_lines = reporter%generate_report(t_p)
write(*,'(a,1x,i0)')'Report on process: ', my_rank
do i = 1, size(report_lines)
write(*,'(a)') report_lines(i)
end do
end if

But at least there it is clearer that it should be MAPL.profile

@lizziel
Copy link
Contributor Author

lizziel commented Jan 17, 2023

A profiling sublogger would be great. This reminds me, if there are changes to the loggers what is the best way for me to get that update? We have a logging.yml file that sits in the GEOS-Chem repository rather than be extracted from MAPL. Are the MAPL release notes the best place to get notified that GCHP's logging.yml should be updated for new features?

@tclune
Copy link
Collaborator

tclune commented Jan 17, 2023

@lizziel Well pFlogger itself is very stable. (Or at least mostly neglected at this time.) So the only reason to update the YAML file would be to change your defaults or add new sections because someone has instrumented a new GEOS component. AFAIK, there is no current activity on that front either. But if there were, you could monitor our main YAML file: https://github.com/GEOS-ESM/GEOSgcm_App/blob/develop/logging.yaml (Has not changed since March 2022)

With MAPL3 there will probably be a bit more thought given to how to arrange MAPL specific loggers.

In the mean time, hopefully we can quickly provide the changes you have requested in this ticket.

@lizziel
Copy link
Contributor Author

lizziel commented Jan 17, 2023

Thank you!

@mathomp4
Copy link
Member

I'm taking a look at this. First, I'm just working on the "Model Throughput" timer and if I take the suggestion of @tclune for a MAPL.profiler it does work. If we use our (GEOS') current logging.yaml which has:

   MAPL:
       handlers: [mpi_shared]
       level: WARNING
       root_level: INFO

   MAPL.PROF:
       handlers: [console_plain]
       propagate: FALSE
       level: WARNING
       root_level: INFO

we get out:

       profiler: INFO: Model Throughput:     1030.440 days per day

because it's just inheriting the MAPL logger setup.

but if we change to:

   MAPL.profiler:
       handlers: [console_plain]
       propagate: FALSE
       level: WARNING
       root_level: INFO

where we edit to use the old MAPL.PROF setup, we get the console_plain output:

Model Throughput:      968.064 days per day

which I kind of like. I'll try and figure out the timers now...

@mathomp4 mathomp4 linked a pull request Jan 19, 2023 that will close this issue
7 tasks
@mathomp4
Copy link
Member

@lizziel Can you look at #1938

I think if you update your logging.yml to have a new handler:

   console_plain:
      class: streamhandler
      formatter: plain
      unit: OUTPUT_UNIT
      level: INFO

and a new logger:

   MAPL.profiler:
       handlers: [console_plain]
       propagate: FALSE
       level: WARNING
       root_level: INFO

you should be able to do WARNING for all of MAPL, but set this at info for the profiler output

@stale
Copy link

stale bot commented Mar 21, 2023

This issue has been automatically marked as stale because it has not had recent activity. If there are no updates within 7 days, it will be closed. You can add the "long term" tag to prevent the Stale bot from closing this issue.

@stale stale bot added the ❄️ Stale This issue has been marked stale label Mar 21, 2023
@lizziel
Copy link
Contributor Author

lizziel commented Mar 21, 2023

Thanks @mathomp4!

@lizziel lizziel closed this as completed Mar 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
❓ Question Further information is requested ❄️ Stale This issue has been marked stale
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants