Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CondDB frontierKey update (11_3_X) #33644

Merged
merged 5 commits into from May 20, 2021

Conversation

smorovic
Copy link
Contributor

@smorovic smorovic commented May 6, 2021

PR description:

frontierKey in OnlineDBOutputService and CondDBESSource is set to untracked string (this is a parameter relevant only at runtime).

Additional Python changes are included to add a VarParsing parameter and apply the comand line parameter to the OnlineDBOutputService.

PR validation:

Tested validity of Python changes with pylint.

if this PR is a backport please specify the original PR and why you need to backport that PR:

Backport of #33643

@cmsbuild
Copy link
Contributor

cmsbuild commented May 6, 2021

A new Pull Request was created by @smorovic (Srecko Morovic) for CMSSW_11_3_X.

It involves the following packages:

CondCore/DBOutputService
CondCore/ESSources
DQM/Integration

@malbouis, @andrius-k, @yuanchao, @kmaeshima, @ErnestaP, @ahmad3213, @cmsbuild, @jfernan2, @tlampen, @ggovi, @pohsun, @rvenditti, @francescobrivio can you please review it and eventually sign? Thanks.
@mmusich, @threus, @batinkov, @tocheng, @battibass this is something you requested to watch as well.
@silviodonato, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@smorovic
Copy link
Contributor Author

smorovic commented May 6, 2021

@cmsbuild please test

@cmsbuild
Copy link
Contributor

cmsbuild commented May 6, 2021

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-48146c/14909/summary.html
COMMIT: c07fbc1
CMSSW: CMSSW_11_3_X_2021-05-05-2300/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/33644/14909/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

The workflows 140.53 have different files in step1_dasquery.log than the ones found in the baseline. You may want to check and retrigger the tests if necessary. You can check it in the "files" directory in the results of the comparisons

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 1268 differences found in the comparisons
  • DQMHistoTests: Total files compared: 38
  • DQMHistoTests: Total histograms compared: 2877046
  • DQMHistoTests: Total failures: 3680
  • DQMHistoTests: Total nulls: 19
  • DQMHistoTests: Total successes: 2873325
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -45.703 KiB( 37 files compared)
  • DQMHistoSizes: changed ( 140.53 ): -44.531 KiB Hcal/DigiRunHarvesting
  • DQMHistoSizes: changed ( 140.53 ): -1.172 KiB RPC/DCSInfo
  • Checked 160 log files, 37 edm output root files, 38 DQM output files
  • TriggerResults: no differences found

@jfernan2
Copy link
Contributor

jfernan2 commented May 7, 2021

@smorovic since this PR affects mainly Online DQM, we would like to test it at P5 machines before approving. Is tehre any specific sample or condition which should be used or we could test it out-of-the box?
Thank you very much!

@smorovic
Copy link
Contributor Author

smorovic commented May 7, 2021

Hi @jfernan2,
do you plan to test it in your playback system?
It would be good if for the test run if you are able to produce in ramdisk ".run%RUN.global" files (%RUN = run number) with the following type of content:

run_key = pp_run
run_unique_key = 4e94e771-add6-41be-8683-c5f6a7a9ed1f

This is what is going to be written with the updated Run Control and DQM Function Manager (are you using it for playback?).

With those 'global' files new version of hltd is needed to propagate the parameter to CMSSW:

/nfshome0/smorovic/RPM/test/hltd-python36-2.9.8-2.el7.cern.x86_64.rpm
#to be used along with (may be installed in your system):
hltd-libs-python36-2.9.2-0.el7.cern.x86_64.rpm

Note that this version of hltd requires CMSSW changes in this PR to work properly in DQM mode.

Let me know if this is feasible or we need to come up with something else.

@jfernan2
Copy link
Contributor

jfernan2 commented May 7, 2021

Thanks @smorovic
Yes, the idea is to test it in playback, unless you have any other solution, is any run number prefered?
I have passed this information to @ErnestaP as DQM developer at P5

@smorovic
Copy link
Contributor Author

smorovic commented May 7, 2021

any run number is fine for testing.

@jfernan2
Copy link
Contributor

@smorovic we are not sure about your statement "Note that this version of hltd requires CMSSW changes in this PR to work properly in DQM mode"

You mean this PR will not work with the new HLTD version you gave us? Or just production mode, not playback?

@smorovic
Copy link
Contributor Author

Hi @jfernan,
new hltd version passes an additional VarArgs parameter to the CMSSW Python. Commit in this PR that also registers it within DQM configurations. Without it, the parameter is unrecognized and running with it will result in exception instead of starting the cmsRun process [*], so the combination of new CMSSW without this PR is not compatible.
Old (current) hltd can run CMSSW both with and without this PR.

Therefore I suggested to install new hltd rpm for the test you want to do. However,for production hltd can be updated only once you migrate DQM to a new release including this PR.

[*] that behavior of VarArgs is not too convenient and a switch to disable strict parameter checking would be useful.

@pmandrik
Copy link
Contributor

Dear @smorovic ,
we installed the hltd [0] but using the .global file with the following content:

run_key = cosmic_run
run_unique_key = 4e94e771-add6-41be-8683-c5f6a7a9ed1f

the HLTD execute the cmsRun with the follow options:
"-- starting process: ['cmsRun', '/cmsnfsdqmdata/dqmdata/dqm_cmssw/playback_0511_CMSSW_11_3_0_31684_33644_33684/src/DQM/Integration/python/clients/beamfake_dqm_sourceclient-live_cfg.py', 'runInputDir=/fff/BU0/ramdisk', 'runNumber=501482', 'runkey=cosmic_run', 'runUniquekey=<_sre.SRE_Match', 'object;', 'span=(0,', '53),', "match='run_unique_key", '=', '4e94e771-add6-41be-8683-c5f6a7a9>'] --"

where "runUniquekey" is not set correctly. Probably, because in [1] you need to add line 'uniqueKey = uniqueKey.group(1)' somewhere in the following part of the code:

            with open(dqm_globalrun_file, 'r') as f:
                for line in f:                
                    lines.append(line)  
            for line in lines:    
                runkey = re.search(  
                    r'\s*run_key\s*=\s*([0-9A-Za-z_]*)', line, re.I)
                if runkey:
                    runkey = runkey.group(1).lower()
                    break
            for line in lines:
                uniqueKey = re.search(
                    r'\s*run_unique_key\s*=\s*([^\s]+)', line, re.I)
                if uniqueKey:
                    break

[0] /nfshome0/smorovic/RPM/test/hltd-python36-2.9.8-2.el7.cern.x86_64.rpm
[1] /opt/hltd/python/Resource.py

@smorovic
Copy link
Contributor Author

Dear @pmandrik,
thanks for testing and identifying the problem. Indeed there is a bug as you pointed out.
I also spotted another problem, 'runUniquekey' should be 'runUniqueKey' in the same python script.

I build new version of hltd, so please try with:

/nfshome0/smorovic/RPM/test/hltd-python36-2.9.8-3.el7.cern.x86_64.rpm

@pmandrik
Copy link
Contributor

Dear @smorovic ,

we have a following error with latest version of HLTD, because of runUniqueKey that also is expected to be registered in [0] :

log file spoiler

-- starting process: ['cmsRun', '/cmsnfsdqmdata/dqmdata/dqm_cmssw/playback_0511_CMSSW_11_3_0_31684_33644_33684/src/DQM/Integration/python/clients/hlt_dqm_clientPB-live_cfg.py', 'runInputDir=/fff/BU0/ramdisk', 'runNumber=501490', 'runkey=cosmic_run', 'runUniqueKey=4e94e771-add6-41be-8683-c5f6a7a9ed1f'] --
Error: 'runUniqueKey' not registered.
----- Begin Fatal Exception 13-May-2021 12:44:40 CEST-----------------------
An exception of category 'ConfigFileReadError' occurred while
[0] Processing the python configuration file named /cmsnfsdqmdata/dqmdata/dqm_cmssw/playback_0511_CMSSW_11_3_0_31684_33644_33684/src/DQM/Integration/python/clients/hlt_dqm_clientPB-live_cfg.py
Exception Message:
unknown python problem occurred.
RuntimeError: Unknown variable

At:
/opt/offline/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_0/python/FWCore/ParameterSet/VarParsing.py(205): parseArguments
/cmsnfsdqmdata/dqmdata/dqm_cmssw/playback_0511_CMSSW_11_3_0_31684_33644_33684/python/DQM/Integration/config/pbsource_cfi.py(43):
/opt/offline/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_0/python/FWCore/ParameterSet/Config.py(684): load
/cmsnfsdqmdata/dqmdata/dqm_cmssw/playback_0511_CMSSW_11_3_0_31684_33644_33684/src/DQM/Integration/python/clients/hlt_dqm_clientPB-live_cfg.py(19):

----- End Fatal Exception -------------------------------------------------

-- process exit: 90 --

Also, in the playback system we use the fff_simulator tool to emulate incoming data as well as to create .global files [1].
The question is how to create relevant run_unique_key number for such simulation? Could you clarify this?

[0] https://github.com/smorovic/cmssw/blob/11_3_X_backport-frontierKey/DQM/Integration/python/config/pbsource_cfi.py
[1] https://github.com/cms-DQM/fff_dqmtools/blob/fc20c54aa06f431abea8556244dbf2f55e0ccca2/applets/fff_simulator.py#L311-L322

@smorovic
Copy link
Contributor Author

Hi @pmandrik,

Error: 'runUniqueKey' not registered.

You get the error because parameter is not registered in the CMSSW configuration. This pull request takes care of it by adding to DQM/Integration/python/config/inputsource_cfi.py this code:

# Parameter for frontierKey

options.register ('runUniqueKey',
          'InValid',
          VarParsing.VarParsing.multiplicity.singleton,
          VarParsing.VarParsing.varType.string,
          "Unique run key from RCMS for Frontier")

Did you fetch and apply commits from this PR for the test? If you didn't, this is the cause of the problem.
As I said earlier, I'm not aware of how to disable that unknown parameter error in CMSSW. So, in DQM-mode hltd is not compatible with any releases which don't integrate these python changes (so it can be deployed in production only when switching to a new release).

If you did apply the PR and it still didn't work, then there is some problem with the PR.
In that case let me know where you are running the test and which configuration do you use so that I can have a look.

Note also that I didn't add that parameter to:

DQM/Integration/python/config/fileinputsource_cfi.py
DQM/Integration/python/config/pbsource_cfi.py
DQM/Integration/python/config/unittestinputsource_cfi.py

which define a different input source module (PoolSource and DQMProtobufReader) and didn't seem as something that would be started from the hltd.
Let me know if I;m wrong about it, then I'll add the same parameter.

Also, in the playback system we use the fff_simulator tool to emulate incoming data as well as to create .global files [1].
The question is how to create relevant run_unique_key number for such simulation? Could you clarify this?

Value of run_unique_key is arbitrary for the test, but there are some constaints for the format. It should be limited to having alphanumeric characters and dashes ( - ). I posted as an example:

run_unique_key = 4e94e771-add6-41be-8683-c5f6a7a9ed1f

@pmandrik
Copy link
Contributor

Thank you for the clarification @smorovic ,
indeed, as you can see in the log file from my previous comment, in the playback system we are running
DQM/Integration/python/clients/hlt_dqm_clientPB-live_cfg.py
where
DQM/Integration/python/config/pbsource_cfi.py
is used and Fatal Exception came from this file.
Could you please fix this at first and then in case of additional problem I will send you requested information?

@cmsbuild
Copy link
Contributor

Pull request #33644 was updated. @malbouis, @andrius-k, @yuanchao, @kmaeshima, @ErnestaP, @ahmad3213, @cmsbuild, @jfernan2, @tlampen, @ggovi, @pohsun, @rvenditti, @francescobrivio can you please check and sign again.

@smorovic
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-48146c/15139/summary.html
COMMIT: 55dc9e7
CMSSW: CMSSW_11_3_X_2021-05-17-1100/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/33644/15139/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 3035 differences found in the comparisons
  • DQMHistoTests: Total files compared: 38
  • DQMHistoTests: Total histograms compared: 2877046
  • DQMHistoTests: Total failures: 7777
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 2869246
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.004 KiB( 37 files compared)
  • DQMHistoSizes: changed ( 312.0 ): 0.004 KiB MessageLogger/Warnings
  • Checked 160 log files, 37 edm output root files, 38 DQM output files
  • TriggerResults: found differences in 1 / 37 workflows

@ggovi
Copy link
Contributor

ggovi commented May 18, 2021

+1

@yuanchao
Copy link
Contributor

+1

@jfernan2
Copy link
Contributor

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_11_3_X IBs (tests are also fine) and once validation in the development release cycle CMSSW_12_0_X is complete. This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@qliphy
Copy link
Contributor

qliphy commented May 20, 2021

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants