Issues related to FastTimerService and HLT #39756

silviodonato · 2022-10-18T15:09:53Z

I just want to report here two problems about FastTimerService and the HLT online DQM .

DQM: HLT / TimerService / Running on AMD EPYC 7763 64-Core Processor with 24 streams on 32 threads

Here the timing seems wrong of a factor of 2. Our offline measurements showed that the timing should be above 300 ms at high pileup.

Plots timing VsPU and VsSCAL are empty.

@cms-sw/hlt-l2

cmsbuild · 2022-10-18T15:10:11Z

A new Issue was created by @silviodonato Silvio Donato.

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

makortel · 2022-10-18T15:15:33Z

assign hlt

cmsbuild · 2022-10-18T15:15:52Z

New categories assigned: hlt

@missirol,@Martin-Grunewald you have been requested to review this Pull request/Issue and eventually sign? Thanks

missirol · 2022-10-18T16:30:49Z

(Noted, I will try to understand this in the next days, unless it is considered important to fix this asap.)

FYI: @fwyzard

fwyzard · 2022-10-18T16:38:02Z

Plots timing VsPU and VsSCAL are empty.

The reason is that the FastTimerServiceClient that is responsible for filling those plots is trying to read the information about luminosity and pileup from SCAL, which does not exist any more in Run-3.

I inquired about it a few months ago, and according to @mmusich's answer the solution would be to change the code to read those information from

the OnlineLuminosityRecord from the onlineMetaDataDigis.

Unfortunately I never had time to work on the changes :-/

fwyzard · 2022-10-18T16:40:37Z

Here the timing seems wrong of a factor of 2. Our offline measurements showed that the timing should be above 300 ms at high pileup.

@silviodonato keep in mind that as long as the cpu usage is below ~70%, it's almost like running without hyperthreading, so it would make sense to observe a timing roughly a factor 2 (I'd expect 1.8x) faster than on a fully used machine.

fwyzard · 2022-10-18T17:21:45Z

Unfortunately I never had time to work on the changes :-/

By the way, if anyone else makes the necessary changes, I would suggest to also rename these plots to _vs_pileup and _vs_lumi ...

fwyzard · 2022-10-18T19:07:37Z

OK, some more information: I've re-run over the first lumisections of run 360459 (the same one of @silviodonato's plot), using similar conditions as what we have online (CMSSW_12_4_10, same menu and global tag, 8 jobs times 32 threads/24 streams) and looked at the CPU time (Silvio's plot is for CPU time):

Taking total - other (which is what the FastTimerService plots) I get 260 ms/ev:

If I zoom on the DQM plot we see that for the first 2-3 lumisections the CPU time measured on the HLT farm was indeed 258 ms/ev:

I'm looking at the first two lumisections because at the beginning of the run the HLT has buffered some data, while it was loading the application, starting the jobs, and getting the first conditions -- so the whole farm will run at the maximum capacity until the buffer has been drained.

Keeping all these effects in mind, I would say that the online measurement is in very good agreement with an online-like measurement done in the same conditions 👍🏻

One last comment is about the CPU time vs real (wall clock) time: of course what actually matters for keeping up with the L1 rate is the latter.

The plot for the real time looks similar, just a bit higher:

Zooming on the first lumisections shows a similar effect, with a peak for the first lumisection around 298 ms/ev:

From my online-like measurement I get 417 - 118 = 299 ms/ev for the real time:

which is also very consistent with the online value for the first lumisection.

So... the measurements done on the online machines reproduce pretty accurately the HLT timing measured online (better than I imagined before making this check).

And the comparison between the timing value of ~ 200 ms/ev (~210 ms/ev real) after the first lumisections and the initial peak of 260 ms/ev (~300 ms/ev real) gives us an indication of the effect of hyperthreading at the level of occupancy we have around pileup 50.

silviodonato · 2022-10-19T08:51:40Z

Thanks a lot @fwyzard ! So

we should/can check the timing including the hyper-threading effect by looking at the first lumisections
the DQM plot doesn't include the "others" part (~10%).

I will keep the issue open for the issue 2) (which is not urgent)

fwyzard · 2022-10-19T09:18:33Z

we should/can check the timing including the hyper-threading effect by looking at the first lumisections

Yes, but only for runs that start already in stable beams, otherwise there is very little to run.

the DQM plot doesn't include the "others" part (~10%).

Correct - and the difference is more significative for "real time" than for "cpu time".

missirol · 2022-10-26T14:38:09Z

To fix the empty plots, an attempt is in #39859.

missirol · 2022-11-04T07:45:50Z

+hlt

cmsbuild · 2022-11-04T07:46:14Z

This issue is fully signed and ready to be closed.

cmsbuild added the pending-assignment label Oct 18, 2022

cmsbuild added hlt-pending pending-signatures and removed pending-assignment labels Oct 18, 2022

missirol mentioned this issue Oct 26, 2022

give OnlineLuminosityRecord info to HLT's LumiMonitor plugin #39859

Merged

This was referenced Oct 28, 2022

give OnlineLuminosityRecord info to HLT's LumiMonitor plugin [12_5_X] #39896

Merged

give OnlineLuminosityRecord info to HLT's LumiMonitor plugin [12_4_X] #39897

Merged

cmsbuild closed this as completed in #39859 Nov 4, 2022

cmsbuild added fully-signed hlt-approved and removed hlt-pending pending-signatures labels Nov 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues related to FastTimerService and HLT #39756

Issues related to FastTimerService and HLT #39756

silviodonato commented Oct 18, 2022

cmsbuild commented Oct 18, 2022

makortel commented Oct 18, 2022

cmsbuild commented Oct 18, 2022

missirol commented Oct 18, 2022

fwyzard commented Oct 18, 2022

fwyzard commented Oct 18, 2022

fwyzard commented Oct 18, 2022

fwyzard commented Oct 18, 2022

silviodonato commented Oct 19, 2022 •

edited

Loading

fwyzard commented Oct 19, 2022

missirol commented Oct 26, 2022

missirol commented Nov 4, 2022

cmsbuild commented Nov 4, 2022

Issues related to FastTimerService and HLT #39756

Issues related to FastTimerService and HLT #39756

Comments

silviodonato commented Oct 18, 2022

cmsbuild commented Oct 18, 2022

makortel commented Oct 18, 2022

cmsbuild commented Oct 18, 2022

missirol commented Oct 18, 2022

fwyzard commented Oct 18, 2022

fwyzard commented Oct 18, 2022

fwyzard commented Oct 18, 2022

fwyzard commented Oct 18, 2022

silviodonato commented Oct 19, 2022 • edited Loading

fwyzard commented Oct 19, 2022

missirol commented Oct 26, 2022

missirol commented Nov 4, 2022

cmsbuild commented Nov 4, 2022

silviodonato commented Oct 19, 2022 •

edited

Loading