Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing FastTimerClient Hist Remaking Issue : 12_4_X #37402

Merged

Conversation

Sam-Harper
Copy link
Contributor

@Sam-Harper Sam-Harper commented Mar 29, 2022

PR description:

This PR is a bug fix doing two things

  1. properly checking if a histogram already exists so there is no needless remaking (the path was missing before so the hist was not found)
  2. removing the per lumisection making of the histograms, its just the same histograms being reset to exactly the same content each time
    update: 2) after review has been changed to keep this behavior as default as it is needed by the online DQM and it is now controlled by a parameter which defaults to keep the old behaviour.

point 1) is absolutely necessary. 2) is just nice but can be removed if needed.

This is an important fix as we can not harvest timing results over many lumisections without it. This is because its very very slow to change a histogram labels which this does when it thinks it has created a new histogram.

It superseeds: #36313

PR validation:

The histograms are identical before and after this PR with the exception that the number of entries now accurately reflects the number of times SetBinContent should be called (before it was that number times the number of LS +1)

if this PR is a backport please specify the original PR and why you need to backport that PR:

We'll need this in 12_3_0, its causing a real problem when running on reemulated L1 samples (which have lots of LS to gain stats).

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37402/29067

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @Sam-Harper (Sam Harper) for master.

It involves the following packages:

  • HLTrigger/Timer (hlt)

@cmsbuild, @missirol, @Martin-Grunewald can you please review it and eventually sign? Thanks.
@Martin-Grunewald, @missirol, @silviodonato this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@missirol
Copy link
Contributor

please test

Doing the honours.

@missirol
Copy link
Contributor

type bugfix

To be backported to 12_3_X to ease HLT development.

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-6c8f0d/23510/summary.html
COMMIT: a0aa423
CMSSW: CMSSW_12_4_X_2022-03-29-1100/slc7_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37402/23510/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 2 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3585896
  • DQMHistoTests: Total failures: 7
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 3585866
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.004 KiB( 47 files compared)
  • DQMHistoSizes: changed ( 312.0 ): 0.004 KiB MessageLogger/Warnings
  • Checked 200 log files, 45 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@missirol
Copy link
Contributor

A quick look back in the history led to #5305: "Adapt the DQM clients to the 7.2.x DQMEDHarvesting interface, which is necessary to properly update histograms online, and in general when running the DQM harvesting with the "Online" convention.".

Is it possible that this update per-LS has to do with displaying the updated plots in the online DQM (or something along these lines..)?

Would it make sense to add a config parameter here to enable/disable per-LS updates? (default could be "disable")

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37402/29073

@cmsbuild
Copy link
Contributor

Pull request #37402 was updated. @cmsbuild, @missirol, @Martin-Grunewald can you please check and sign again.

@missirol
Copy link
Contributor

please test

@Sam-Harper
Copy link
Contributor Author

Hi Andrea,

This module isnt run inside the HLT menu as far as I can tell

image

Appears to be only the service, not the client. However you are right, it may be running as part of the DQM at p5? This I dont know and dont know how to find out.

Lets just solve the issue by setting the default back to the old behaviour and people can then just set it in their harvesting jobs to the correct one.

Point 1) is the actual bug which should have zero effect on anything. Other histograms already do this.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37402/29086

@cmsbuild
Copy link
Contributor

Pull request #37402 was updated. @cmsbuild, @missirol, @Martin-Grunewald can you please check and sign again.

@fwyzard
Copy link
Contributor

fwyzard commented Mar 30, 2022

This module isnt run inside the HLT menu as far as I can tell

You are right -- being a "client" means that is should be running in the harvesting step, so only in the DQM machines, reading the histograms produces by the FastTimerService which is inside the HLT itself.

@missirol
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-6c8f0d/23537/summary.html
COMMIT: 4b84728
CMSSW: CMSSW_12_4_X_2022-03-30-1100/slc7_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37402/23537/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 12 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3585896
  • DQMHistoTests: Total failures: 19
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 3585854
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -0.004 KiB( 47 files compared)
  • DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
  • Checked 200 log files, 45 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@missirol
Copy link
Contributor

missirol commented Mar 31, 2022

+hlt

  • bugfix + 1 new config parameter (without changes in default behaviour)
  • to be backported to 12_3_X

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@@ -488,7 +492,7 @@ void FastTimerServiceClient::fillDescriptions(edm::ConfigurationDescriptions& de
edm::ParameterSetDescription puMEPSet;
fillPUMePSetDescription(puMEPSet);
desc.add<edm::ParameterSetDescription>("puME", puMEPSet);

desc.add<bool>("fillEveryLumiSection", true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the default be "false", according to what you write in the comments?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's just that the description does not take into account the latest changes applied to the PR.

In the description, replacing this sentence

  1. removing the per lumisection making of the histograms

with this sentence

  1. adding an option to disable the per-lumisection updating of the histograms (which remains enabled by default)

would better reflect the content of the PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@missirol in #37402 (comment) you were also suggesting to keep "disable per-LS updates" as the default, which is the same as written in the PR description.
@Sam-Harper , @missirol could you please agree on what's your actual intent here, and either fix the PR or its description? It can be merged right after.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@perrotta , I wrote that, but later comments clarified that it was safer to keep the default unchanged (per-LS update enabled by default), see #37402 (comment).

In summary, the PR description needs to be updated (it is already correct in the backport PR). @Sam-Harper could you please edit ? (I don't think I can)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry the intent changed. So 99% of people will want it to default to false. However changing the default behaviour will mess up online DQM and for the quiet life, I decided it was better to just keep the old behaviour (which is fine just inefficient) and have users update. Regardless the majority of users will actually be running a config we control not them so we can just disable it there.

So just to sumarise: Andrea raised the point that turning this off will effect online DQM. Therefore we have kept the old behaviour as default otherwise we would need to adapt DQM workflows and this is a late fix.

@perrotta
Copy link
Contributor

+1

  • Thank you Marino and Sam

@cmsbuild cmsbuild merged commit e35468f into cms-sw:master Mar 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants