Beamspot DIP server (normal DQM client) for 12_0_X #35193

sikler · 2021-09-08T10:57:58Z

PR description:

the aim was to get rid of the java-based DIP server (DQM/BeamMonitor/test/dip/BeamSpotDipServer.java) used in the past years to transmit beamspot information to LHC
detailed information on the Data Interchange Protocol is at https://readthedocs.web.cern.ch/display/ICKB/DIP+and+DIM
our beamspot server is now rewritten in C++, in the form of a normal DQM client
the central piece of code is the plugin DQM/BeamMonitor/plugins/BeamSpotDipServer.{cc,h}, the config at DQM/BeamMonitor/python/BeamSpotDipServer_cff.py, and the live client at DQM/Integration/python/clients/beamspotdip_dqm_sourceclient-live_cfg.py
it uses two external tools: dip (version 5.7.0) and log4cplus [in fact log4cplus is containted in the dip distribution], available at https://readthedocs.web.cern.ch/display/ICKB/DIP+API; now provided as external tools
mode of operation: the client is alive from begin to end of run; it checks the result of the previous beamspot fit at the end of each lumisection (in dqmEndLuminosityBlock); for now it looks at files containing the last beamspot fit and the actual status of the tracker under /nfshome0/dqmpro/BeamMonitorDQM/
configurable through readFromNFS: work either on files /nfshome0 or using beamspot from database and Dcs information on tracker status (now the reading of the legacy beamspot is implemented)

PR validation:

compiles runs on lxplus without warning or error message
test using a working DIP browser or java application is not yet possible (javaws is not supported anymore, jws is not available neither on the technical network, nor on lxplus; the page https://dash.web.cern.ch/DIP/ does not work)
live version to be tested

PR issues, questions:

issue: when reading from database, some fields cannot be filled (BeamSpotOnlineObject does not have the fields startTime, lumisection range, number of events used, meanPV, err_meanPV, rmsPV, err_rmsPV, maxPV); are those fields necessary for LHC?

cmsbuild · 2021-09-08T10:58:16Z

A new Pull Request was created by @sikler (Ferenc Siklér) for CMSSW_12_0_X.

It involves the following packages:

DQM/BeamMonitor (dqm)
DQM/Integration (dqm)

@andrius-k, @kmaeshima, @ErnestaP, @ahmad3213, @cmsbuild, @jfernan2, @rvenditti can you please review it and eventually sign? Thanks.
@threus, @batinkov, @battibass this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

jfernan2 · 2021-09-08T13:07:38Z

Thanks @sikler
The main question for this PR is, how to test it in Online DQM at P5?
Do you really need LHC beam/collisions? Can we do some kind of check using FakeBeam like FakeBeamMonitor does?

sikler · 2021-09-08T13:27:07Z

@jfernan2 Sounds interesting. Is there some documentation on FakeBeam and related (twiki, presentation, similar)?

jfernan2 · 2021-09-08T13:50:59Z

For FakeBeamMonitor, in this same package,
https://github.com/cms-sw/cmssw/blob/master/DQM/BeamMonitor/plugins/FakeBeamMonitor.h
I believe the two experts may shred us some light, @francescobrivio @gennai
Thanks

jfernan2 · 2021-09-08T13:52:27Z

BTW: @sikler
Since this PR superseeds them, it would be if you could:

close Beamspot DIP server (normal DQM client) for 11_2_X #32866 and Beamspot DIP server (normal DQM client) for 11_1_X #32845
Once we set how to properly test this PR, forwardport it to master version (currently 12_1_X)
Thank you

DQM/BeamMonitor/test/log4cplus.properties

DQM/BeamMonitor/plugins/BeamSpotDipServer.cc

DQM/BeamMonitor/test/beamspotdip_dqm_sourceclient-file_cfg.py

DQM/Integration/python/clients/beamspotdip_dqm_sourceclient-live_cfg.py

DQM/BeamMonitor/test/beamspotdip_dqm_sourceclient-file_cfg.py

DQM/BeamMonitor/plugins/BeamSpotDipServer.cc

francescobrivio · 2021-09-08T14:47:03Z

@jfernan2 Sounds interesting. Is there some documentation on FakeBeam and related (twiki, presentation, similar)?

Hi @sikler ! About the FakeBeamMonitor there is not documentation at the moment, it's a simple copy of the BeamMonitor except the BeamSpot values don't come from the fit to the events but are random numbers generated by TRandom. It was used to commission the per-lumi updates of the BeamSpot during LS2 when collisions were not available.

I will have a look at this PR asap (next few days) and see if there is a way to use it for testing DIP.

cmsbuild · 2021-09-09T15:36:01Z

Pull request #35193 was updated. @andrius-k, @kmaeshima, @ErnestaP, @ahmad3213, @cmsbuild, @jfernan2, @rvenditti can you please check and sign again.

DQM/BeamMonitor/plugins/BeamSpotDipServer.cc

francescobrivio · 2021-09-15T21:07:52Z

DQM/BeamMonitor/plugins/BeamSpotDipServer.cc

+  startTime      = bs.GetCreationTime();
+  startTimeStamp = bs.GetCreationTime(); // not provided by BeamSpotOnlineObject
+    endTime      = bs.GetCreationTime();
+    endTimeStamp = bs.GetCreationTime(); // not provided by BeamSpotOnlineObject


What is the difference between starTime and startTimeStamp? (and same for endTime).
Maybe they can be added to the BeamSpotOnlineObject.

startTime is an std::string, while startTimeStamp is epoch (number of seconds since 1970).
In principle startTime should contain the beginning of the processed lumisection range, and endTime should mark the end of the processed lumisection range.
Can you add those? Actually epoch could be calculated from CreationTime through mktime.

The function GetCreationTime() for the BeamSpotOnlineObject returns the moment in which the payload was actually created (see these lines), it's not related to the LS analyzed.

I'll look into adding these parameters to the online beamspot.

DQM/BeamMonitor/plugins/BeamSpotDipServer.cc

DQM/BeamMonitor/plugins/BeamSpotDipServer.h

francescobrivio · 2021-09-15T21:23:45Z

DQM/BeamMonitor/python/BeamSpotDipServer_cff.py

+  DCSStatus   = cms.untracked.string("scalersRawToDigi"),
+  #
+  sourceFile  = cms.untracked.string(
+    "/nfshome0/dqmpro/BeamMonitorDQM/BeamFitResults.txt"),


Ok as I mentioned above you are reading the txt file that is filled by the "HLT workflow" (beamhlt_dqm_sourceclient-live_cfg.py#L106), while the legacy workflow (beam_dqm_sourceclient-live_cfg.py#L96) is writing to
/nfshome0/dqmpro/BeamMonitorDQM/BeamFitResultsOld.txt

We should decide which one we want to publish, or at least add an option to make this configurable.
For example for the Pilot Beam Test in October I think only the legacy workflow (which is fed by ZeroBias) will produce actual BeamSpot values.

@gennai @mmusich any thoughts?

Hi @francescobrivio this comes a bit out of context for me. What did we use to publish before?

Unfortunately I don't know what was published in the past, @sikler do you know?
I will try to dig up the old DIP results.

I have implemented two possible workflows for DIP publishing:

read the BeamSpotOnlineObject and DCSStatus, and transmit their data (this is the default)

and the old way of reading the /nfs files (BeamFitResults.txt and BeamFitResults_TkStatus.txt), can be selected by ('readFromNFS = True')

So, the default now is BeamSpotOnlineObject and DCSStatus ('readFromNFS = False')

Hello @sikler
What is the reason or advantage of the operation mode reading the data from the condition DB (the BeamSpotOnlineObject)? Why it has been chosen as the default choice?
Also, taking into account that we have two workflows ("Legacy" and "HLT") populating two different tags, which one are you going to select?

Hi @ggovi

What is the reason or advantage of the operation mode reading the data from the condition DB (the BeamSpotOnlineObject)? Why it has been chosen as the default choice?

It was felt that reading /nfs files was not a good practice, so a full online solution with BeamSpotOnlineObject was the way to go.

Also, taking into account that we have two workflows ("Legacy" and "HLT") populating two different tags, which one are you going to select?

My understanding was that there will be some switch at the creation of BeamSpotOnlineObject, and that will decide with which beamspot would be used for online and also to be transmitted to LHC's DIP. So the beamspot used by online should match the DIP version.

@sikler thanks for the answer. The fragility of nfs filesystem sounds indeed like a good reason for preferring the DB as input. I'm actually wondering if this problem holds as well also for the BeamSpot online workflows... Because in this case we would be affected anyhow. @francescobrivio may clarify.
In any case, this choice has a cost - see the variables added in the payload, which will be presumably not consumed by HLT, but only "bridged out" for the DIP storage.
Concerning the choice of the two tags, I think the actual implementation has been evolved with respect to the earlier discussion, and now we have actually the two payloads (in the two different tags), with no "DB-stored" information about which one has been selected/consumed by hlt. @francescobrivio may comment also about this...

@ggovi the BeamSpot online workflows do not rely on nfs, they write directly the payload and upload it to CondDB. nfs is only used to store the txt files so I don't think there is a problem there.

@sikler regarding the choice of the two workflow it's like Giacomo wrote: both payloads are produced and uploaded to the DB in different tags. Then it's up to the consumer code to choose the preferred one. At the moment there are 2 consumers: the HLT and the DQM monitoring.
Both rely on an ESProducer (OnlineBeamSpotESProducer.cc) that reads both tags and takes the final decision on what BeamSpot to use, since they are based on the same producer the decision taken is the same for both (this has been commissioned and verified during MWGR and CRUZET).
You can see an example of the implementation in the DQM monitoring client onlinebeammonitor_dqm_sourceclient-live_cfg.py

So in the end we have to decide if what we publish in DIP is what we actually use in HLT (and in Express), or if we want to publish always the same result in any case (e.g. always the legacy BeamSpot).

@francescobrivio well if the workflows write files outside the local node disk, they could be still affected by nfs issues. In this case, it would be unreasonable to address the issue only on the DIP workflow side. Maybe we should discuss the overall picture. I'll try to setup a discussion.

DQM/Integration/python/clients/beamspotdip_dqm_sourceclient-live_cfg.py

francescobrivio · 2021-09-15T21:42:11Z

@jfernan2 @sikler about testing this PR in P5:
If I understood correctly the DIP server does not need live data, i.e. reads all the information from txt files which are stored in /nfshome0/dqmpro/BeamMonitorDQM/ or from the DB.
So, once the needed changes are done, one could probably test this by running in the DQM playback system using one run from CRUZET2021. Since we have been running the FakeBeamMonitor clients so far both the txt files and the DB values should be already there (with totally fake random values, but who cares at this point).

@sikler The only part I'm not sure about is if we can set the output of the DIP pubblication? i.e. we don't want to publish this fake values anywhere public, we just want to test that the pubblication (and the full chain) actually works.

DQM/BeamMonitor/plugins/BeamSpotDipServer.cc

sikler · 2021-09-16T12:23:56Z

@jfernan2 @sikler about testing this PR in P5:
If I understood correctly the DIP server does not need live data, i.e. reads all the information from txt files which are stored in /nfshome0/dqmpro/BeamMonitorDQM/ or from the DB.

As I wrote above, there are two options:

read the BeamSpotOnlineObject and DCSStatus, and transmit their data (this is the default)
and the old way of reading the /nfs files (BeamFitResults.txt and BeamFitResults_TkStatus.txt), can be selected by ('readFromNFS = True')
So, the default now is BeamSpotOnlineObject and and DCSStatus ('readFromNFS = False')

So, once the needed changes are done, one could probably test this by running in the DQM playback system using one run from CRUZET2021. Since we have been running the FakeBeamMonitor clients so far both the txt files and the DB values should be already there (with totally fake random values, but who cares at this point).

Sounds viable.

@sikler The only part I'm not sure about is if we can set the output of the DIP pubblication? i.e. we don't want to publish this fake values anywhere public, we just want to test that the pubblication (and the full chain) actually works.

Yes, we can set to anything by marking the publication as 'test', so people won't take the numbers seriously.

cmsbuild · 2021-09-16T12:32:06Z

Pull request #35193 was updated. @emanueleusai, @ahmad3213, @cmsbuild, @jfernan2, @pmandrik, @pbo0, @rvenditti can you please check and sign again.

cmsbuild · 2021-09-16T12:40:50Z

Pull request #35193 was updated. @emanueleusai, @ahmad3213, @cmsbuild, @jfernan2, @pmandrik, @pbo0, @rvenditti can you please check and sign again.

cmsbuild · 2021-09-16T13:21:11Z

Pull request #35193 was updated. @emanueleusai, @ahmad3213, @cmsbuild, @jfernan2, @pmandrik, @pbo0, @rvenditti can you please check and sign again.

[DD4hep] Increase precision of rotation matching in making DD4hep big XML file for DB

EDProducer/EDAnalyzer and consumes migration for some DT classes

…mFix LowPtElectrons: fix for UL FastSim mini v2 and nano v9 workflows

cmsbuild · 2021-09-27T13:24:36Z

Pull request #35193 was updated. @SiewYan, @gouskos, @lveldere, @bbilin, @wajidalikhan, @slava77, @ianna, @Saptaparna, @Martin-Grunewald, @rekovic, @alberto-sanchez, @santocch, @cecilecaillol, @perrotta, @civanch, @yuanchao, @makortel, @ahmad3213, @cmsbuild, @missirol, @agrohsje, @fwyzard, @GurpreetSinghChahal, @smorovic, @davidlange6, @smuzaffar, @Dr15Jones, @jpata, @cvuosalo, @emanueleusai, @mdhildreth, @AdrianoDee, @jfernan2, @kskovpen, @sbein, @ggovi, @qliphy, @pmandrik, @fabiocos, @pbo0, @francescobrivio, @malbouis, @ssekmen, @mkirsano, @jordan-martins, @emeschi, @alja, @srimanob, @fgolf, @mariadalfonso, @rvenditti, @tvami can you please check and sign again.

Martin-Grunewald · 2021-09-27T13:27:03Z

-1
wrong branch or rebase needed

slava77 · 2021-09-27T13:42:27Z

-1

cmsbuild · 2021-09-27T15:36:50Z

Pull request #35193 was updated. @SiewYan, @gouskos, @lveldere, @bbilin, @wajidalikhan, @slava77, @ianna, @Saptaparna, @Martin-Grunewald, @rekovic, @alberto-sanchez, @santocch, @cecilecaillol, @perrotta, @civanch, @yuanchao, @makortel, @ahmad3213, @cmsbuild, @missirol, @agrohsje, @fwyzard, @GurpreetSinghChahal, @smorovic, @davidlange6, @smuzaffar, @Dr15Jones, @jpata, @cvuosalo, @emanueleusai, @mdhildreth, @AdrianoDee, @jfernan2, @kskovpen, @sbein, @ggovi, @qliphy, @pmandrik, @fabiocos, @pbo0, @francescobrivio, @malbouis, @ssekmen, @mkirsano, @jordan-martins, @emeschi, @alja, @srimanob, @fgolf, @mariadalfonso, @rvenditti, @tvami can you please check and sign again.

tvami · 2021-09-27T18:00:49Z

-alca

PR was closed

tvami · 2021-09-27T18:02:03Z

-db

cmsbuild added this to the CMSSW_12_0_X milestone Sep 8, 2021

cmsbuild added dqm-pending orp-pending pending-signatures tests-pending labels Sep 8, 2021