Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DQM: Remove more DQMStore features #28247

Merged
merged 14 commits into from Nov 24, 2019

Conversation

schneiml
Copy link
Contributor

@schneiml schneiml commented Oct 23, 2019

PR description:

This PR removes more rarely used functionality from the DQM infrastructure, namely

  • scaleElements: not sure what this was ever used for, online maybe?
  • softReset: this feature is rather useful, but the few usages don't really justify the complexity. Used mainly in ECAL DQM.
  • setAccumulate flag: basically unused, not sure what it was supposed to signal.
  • remove*: Used in a bunch of places mostly for non-standard workflows. Remove is very hard to support safely (even today, it might randomly invalidate pointers that other modules are holding), so we have to drop it. The most common pattern is "check if ME exists, if yes remove, then book": this probably was not required in most cases, but it might be required to re-bin plots. We might change semantics of book* to replace an existing ME if merge is not possible, without invalidating the exisiting MEs (only replace the TH1). [update]
  • QStatisticalTests: completely unused. [update]

PR validation:

Requires feedback from subsystem users, mainly ECAL (@tanmaymudholkar ?) regarding softReset. I suspect this PR will change behavior for ECAL online DQM.

The other user of softReset is HLXMonitor, which as far as I understand is history. Not sure if anybody feels responsible.

scaleElements was called in the MEtoEDM workflow, so it might affect AlCa/PCL. I suspect it does not (I can't see any configuration for it, and I don't see what it would be good for), but it might be worth another check.

Remove was used in a lot of places, I suspect in most cases this will only make additional MEs show up in the output. However, there might be cases where re-booking now behaves incorrectly. I typically removed the entire cleanup functionality incl. config options, unless that seemed to complicated, then I left a comment in the now-useless function stubs.

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-28247/12393

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @schneiml (Marcel Schneider) for master.

It involves the following packages:

DQM/EcalCommon
DQM/EcalMonitorTasks
DQM/EcalPreshowerMonitorModule
DQM/HLXMonitor
DQM/TrackingMonitor
DQMServices/Components
DQMServices/Core

@andrius-k, @kmaeshima, @schneiml, @cmsbuild, @jfernan2, @fioriNTU can you please review it and eventually sign? Thanks.
@barvic, @hdelanno, @makortel, @mtosi, @argiro, @missirol, @fioriNTU, @jandrea, @idebruyn, @threus this is something you requested to watch as well.
@davidlange6, @slava77, @fabiocos you are the release manager for this.

cms-bot commands are listed here

@schneiml
Copy link
Contributor Author

please test

I don't expect the PR tests to be very useful for validation here, but let's see if they notice anything.

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 23, 2019

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/3114/console Started: 2019/10/23 16:29

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-734c5c/3114/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 10 differences found in the comparisons
  • DQMHistoTests: Total files compared: 34
  • DQMHistoTests: Total histograms compared: 2961036
  • DQMHistoTests: Total failures: 128
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2960567
  • DQMHistoTests: Total skipped: 341
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 33 files compared)
  • Checked 147 log files, 16 edm output root files, 34 DQM output files

@schneiml
Copy link
Contributor Author

No significant changes in the comparison. Still requires confirmation from ECAL and AlCa.

@fabiocos
Copy link
Contributor

please test workflow 1010.0,1020.0,1030.0,1040.0

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 20, 2019

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/3551/console Started: 2019/11/20 13:48

@fabiocos
Copy link
Contributor

@schneiml in my private tests based on CMSSW_11_0_X_2019-11-19-2300 I see that 1040.0 is now crashing...

@schneiml
Copy link
Contributor Author

@fabiocos 1040.0 runs for me, after clean rebase on CMSSW_11_0_X_2019-11-19-2300. Let's see what the bot thinks...

I have not checked the output, though.

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-734c5c/3551/summary.html

@slava77 comparisons for the following workflows were not done due to missing matrix map:

  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/results/JR-comparison/PR-734c5c/1010.0_TestEnableEcalHCAL2017B+TestEnableEcalHCAL2017B+TIER0EXPTE+ALCAEXPTE+ALCAHARVDTE
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/results/JR-comparison/PR-734c5c/1020.0_AlCaLumiPixels2016H+AlCaLumiPixels2016H+TIER0EXPLP+ALCAEXPLP+ALCAHARVLP+TIER0PROMPTLP
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/results/JR-comparison/PR-734c5c/1030.0_RunHLTPhy2017B+RunHLTPhy2017B+TIER0EXPHPBS+ALCASPLITHPBS+ALCAHARVDHPBS+ALCAHARVDHPBSLOWPU
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/results/JR-comparison/PR-734c5c/1040.0_RunZeroBias2017F+RunZeroBias2017F+TIER0RAWSIPIXELCAL+ALCASPLITSIPIXELCAL+ALCAHARVDSIPIXELCAL

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 3 differences found in the comparisons
  • DQMHistoTests: Total files compared: 34
  • DQMHistoTests: Total histograms compared: 2936360
  • DQMHistoTests: Total failures: 2
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2936017
  • DQMHistoTests: Total skipped: 341
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 1454.244 KiB( 33 files compared)
  • DQMHistoSizes: changed ( 1000.0,... ): 25.259 KiB Egamma/gedPhotonAnalyzer
  • DQMHistoSizes: changed ( 1000.0,... ): 25.259 KiB Egamma/stdPhotonAnalyzer
  • DQMHistoSizes: changed ( 1000.0,... ): 5.613 KiB Egamma/Electrons
  • DQMHistoSizes: changed ( 10024.0,... ): 2.364 KiB BDHadronTracks/JetContent
  • DQMHistoSizes: changed ( 10024.0,... ): 0.842 KiB ParticleFlow/PFJetValidation
  • DQMHistoSizes: changed ( 10024.0,... ): 0.451 KiB ParticleFlow/PFMETValidation
  • DQMHistoSizes: changed ( 20034.0,... ): 25.649 KiB Egamma/stdPhotonAnalyzerHGCalFromMultiCl
  • DQMHistoSizes: changed ( 20034.0,... ): 7.058 KiB Egamma/Electrons
  • Checked 147 log files, 16 edm output root files, 34 DQM output files

@fabiocos
Copy link
Contributor

apparently there was something wrong in my setup, repeating the test

@@ -98,7 +98,6 @@
# 2 provide more detailed output
Frequency = cms.untracked.int32(50),
MEPathToSave = cms.untracked.string('AlCaReco/SiStripGainsAAG'),
deleteAfterCopy = cms.untracked.bool(True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tocheng @christopheralanwest @mmusich please suggest how to test this ALCAPROMPT, using --dasquery=file dataset=/ZeroBias/Run2016C-SiStripCalMinBiasAfterAbortGap-18Apr2017-v1/ALCARECO run=276097 in a similar way as done for SiStrpGains does not work:

Begin processing the 1st record. Run 276092, Event 32587086, LumiSection 74 on stream 0 at 22-Nov-2019 10:43:57.009 CET
----- Begin Fatal Exception 22-Nov-2019 10:43:57 CET-----------------------
An exception of category 'Configuration' occurred while
   [0] Processing  Event run: 276092 lumi: 74 event: 32587086 stream: 0
   [1] Running path 'pathALCARECOPromptCalibProdSiStripGainsAAG'
   [2] Calling method for module HLTHighLevel/'ALCARECOCalMinBiasFilterForSiStripGainsAAG'
Exception Message:
requested HLT path "pathALCARECOSiStripCalMinBiasAAG" does not exist
----- End Fatal Exception -------------------------------------------------

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you change the ALCA producer?

Copy link
Contributor

@fabiocos fabiocos Nov 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmsDriver.py step3 --datatier ALCARECO --conditions auto:run2_data -s ALCA:PromptCalibProdSiStripGainsAAG --eventcontent ALCARECO -n 1000 --dasquery=file dataset=/ZeroBias/Run2016C-SiStripCalMinBiasAfterAbortGap-18Apr2017-v1/ALCARECO run=276097 

Copy link
Contributor

@mmusich mmusich Nov 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabiocos, the command is correct but the input dataset is not.
In 2017 we changed SiStripCalMinBiasAfterAbortGap to SiStripCalMinBiasAAG due to limitations in the output dataset name length in dataset storage system.
Because of this the PD you chose is not processable tout-court in recent releases, but needs to have this line https://github.com/cms-sw/cmssw/blob/master/Calibration/TkAlCaRecoProducers/python/ALCARECOPromptCalibProdSiStripGainsAAG_cff.py#L9 adjusted.
To make it work out of the box just switch to a more recent dataset like
/StreamExpress/Run2018C-SiStripCalMinBiasAAG-Express-v1/ALCARECO (is at T2_CH_CERN)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmusich with step3 --processName ALCA2 --datatier ALCARECO --conditions auto:run2_data -s ALCA:PromptCalibProdSiStripGainsAAG --eventcontent ALCARECO -n 1000 --dasquery=file dataset=/StreamExpress/Run2018C-SiStripCalMinBiasAAG-Express-v1/ALCARECO I get

stripaag

which looks to me comparable to your test at the file crossing boundaries

@@ -27,7 +27,6 @@
# 2 provide more detailed output
Frequency=cms.untracked.int32(50),
MEPathToSave=cms.untracked.string('AlCaReco/EcalPedestalsPCL'),
deleteAfterCopy=cms.untracked.bool(True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tocheng @christopheralanwest @schneiml this can be tested using wf 1010.0 . In its step3 the log file is flooded with

Begin processing the 100th record. Run 299149, Event 579095302, LumiSection 408 on stream 3 at 20-Nov-2019 17:35:47.296 CET
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_0' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_1' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_10' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_11' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_12' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_13' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_14' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_15' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_16' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_17' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_18' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_19' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_2' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_20' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_21' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_22' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_23' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_24' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_25' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_26' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_27' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_28' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_29' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_3' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_30' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_31' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_32' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_33' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_34' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_35' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_36' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_37' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_38' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_39' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_4' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_40' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_41' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_42' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_43' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_44' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_45' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_46' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_47' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_48' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_49' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_5' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_50' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_51' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_52' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_53' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_54' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_55' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_56' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_57' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_58' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_59' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_6' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_60' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_61' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_62' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_63' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_64' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_65' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_66' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_67' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_68' in 'AlCaReco/EcalPedestalsPCL/EB/0'
(...)

Please comment

Copy link
Contributor

@fabiocos fabiocos Nov 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeating the test of @mmusich with wf 1010 step3 running 10000 events:

ecal.pdf

Copy link
Contributor Author

@schneiml schneiml Nov 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re the message see the comment I made in another location: the code that used to produce them is deleted in this PR.

Re the RSS graph: We see the usual MEtoEDM endJob spike, before as after the PR, doubling the memory use as all histograms are copied into a EDM product. Removing the delete-calls makes it an exact doubling, before it might have been a bit less, as some histos where deleted before saving.

This spike is why "normal" reco jobs switched to DQMIO saving in LS1 (which removes this spike alltogether), but for the ALCA workflows memory use was not considered a problem and they where never migrated. (Edit: This is just for context, this PR does not affect anything about the MEtoEDM procedure.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schneiml the message is in the default version, this PR removes it, sorry for the possible confusion.

Concerning the RSS plot I do not see anything worrying, the behaviour old vs new looks quite similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabiocos re the message: ok, that is expected, there is nothing deleted any more and ofc there is no reason to warn about deleting non-existing things then.

This might explain why the memory use is effectively unaffected: The delete calls were not doing anything to begin with.

if (pos != data_.end())
data_.erase(pos);
else if (warning) {
std::cout << "DQMStore: WARNING: attempt to remove non-existent"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabiocos I don't fully understand your comment in a parallel thread, were you testing the old or the new code?

The code that makes the messages you show is deleted here.

@@ -9,7 +9,6 @@
# 2 provide more detailed output
Frequency = cms.untracked.int32(50),
MEPathToSave = cms.untracked.string('AlCaReco/SiStrip'),
deleteAfterCopy = cms.untracked.bool(False)
Copy link
Contributor

@fabiocos fabiocos Nov 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

running the step2 of wf 1040 with the ALCAPRODUCER:@allForExpress option on 1000 events:
express

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabiocos, I am sorry to be picky, but I don't think @allForExpress resolves into the PromptCalibProd... ALCAPROMPT producer changed here, so I am afraid your plot proves nothing.
A more interesting test IMHO is to monitor the RSS of step3 of wf 1001.0:

cmsDriver.py step3  --datatier ALCARECO --triggerResultsProcess RECO --conditions
auto:run1_data -s ALCAOUTPUT:SiStripCalZeroBias+TkAlMinBias+DtCalib+Hotline+LumiPix
elsMinBias+AlCaPCCZeroBiasFromRECO+AlCaPCCRandomFromRECO,ALCA:PromptCalibProd+Promp
tCalibProdSiStrip+PromptCalibProdSiStripGains+PromptCalibProdSiStripGainsAAG+Prompt
CalibProdSiPixelAli+PromptCalibProdSiPixel --eventcontent ALCARECO -n 100  --filein
  file:step2.root  --fileout file:step3.root  

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmusich as far as I can see in also the Tracer dump it do runs both MEtoEDMConvertSiStrip and pathALCARECOSiStripPCLHistos that are also modified in this PR

Copy link
Contributor

@mmusich mmusich Nov 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about PromptCalibProdSiStripGains... and PromptCalibProdSiPixelAli?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or your test was limited to this particular file change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmusich I believe that we are clearly missing a test that is realistically mimicking what happens nowadays at the T0 , not bits and pieces. My understanding from a discussion with @tocheng is that @allForExpress should cover all the ALCAPRODUCER available, and I looked at a few ALCAPROMPT touched by this PR separately.
It would be important to get from ALCA a realistic workflow on recent data. Anyway I will make a final check following your suggestion (but is this test on very old data representative of what happens in today's situation?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the additional tests. I agree about the lack of a suitable relval test testing all the AlCa aspects of Express.
For tracker at least 1001.0 is a reasonable proxy. For the changes proposed here I don't think occupancy or n. of channel differences matter to measure a comparison of the memory consumption (it would be nice to have 1001.0 updated with Run2 input data though).

Copy link
Contributor

@fabiocos fabiocos Nov 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

running wf 1001 on 5000 events, the comparison of RSS used by step3 is

express2011

At this point I think that there is no evidence in these tests of strong reasons of concern, and a full test in a T0 replay may provide a final cross check

@fabiocos
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 96fbb47 into cms-sw:master Nov 24, 2019
@schneiml
Copy link
Contributor Author

Thanks everybody for the help with getting this integrated! -- now lets move on to #28379, which I am currently rebasing on top of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants