Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beamspot DIP server (normal DQM client) for 12_0_X #35193

Closed
wants to merge 1,823 commits into from

Conversation

sikler
Copy link
Contributor

@sikler sikler commented Sep 8, 2021

PR description:

  • the aim was to get rid of the java-based DIP server (DQM/BeamMonitor/test/dip/BeamSpotDipServer.java) used in the past years to transmit beamspot information to LHC
  • detailed information on the Data Interchange Protocol is at https://readthedocs.web.cern.ch/display/ICKB/DIP+and+DIM
  • our beamspot server is now rewritten in C++, in the form of a normal DQM client
  • the central piece of code is the plugin DQM/BeamMonitor/plugins/BeamSpotDipServer.{cc,h}, the config at DQM/BeamMonitor/python/BeamSpotDipServer_cff.py, and the live client at DQM/Integration/python/clients/beamspotdip_dqm_sourceclient-live_cfg.py
  • it uses two external tools: dip (version 5.7.0) and log4cplus [in fact log4cplus is containted in the dip distribution], available at https://readthedocs.web.cern.ch/display/ICKB/DIP+API; now provided as external tools
  • mode of operation: the client is alive from begin to end of run; it checks the result of the previous beamspot fit at the end of each lumisection (in dqmEndLuminosityBlock); for now it looks at files containing the last beamspot fit and the actual status of the tracker under /nfshome0/dqmpro/BeamMonitorDQM/
  • configurable through readFromNFS: work either on files /nfshome0 or using beamspot from database and Dcs information on tracker status (now the reading of the legacy beamspot is implemented)

PR validation:

  • compiles runs on lxplus without warning or error message
  • test using a working DIP browser or java application is not yet possible (javaws is not supported anymore, jws is not available neither on the technical network, nor on lxplus; the page https://dash.web.cern.ch/DIP/ does not work)
  • live version to be tested

PR issues, questions:

  • issue: when reading from database, some fields cannot be filled (BeamSpotOnlineObject does not have the fields startTime, lumisection range, number of events used, meanPV, err_meanPV, rmsPV, err_rmsPV, maxPV); are those fields necessary for LHC?

@cmsbuild cmsbuild added this to the CMSSW_12_0_X milestone Sep 8, 2021
@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 8, 2021

A new Pull Request was created by @sikler (Ferenc Siklér) for CMSSW_12_0_X.

It involves the following packages:

  • DQM/BeamMonitor (dqm)
  • DQM/Integration (dqm)

@andrius-k, @kmaeshima, @ErnestaP, @ahmad3213, @cmsbuild, @jfernan2, @rvenditti can you please review it and eventually sign? Thanks.
@threus, @batinkov, @battibass this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@jfernan2
Copy link
Contributor

jfernan2 commented Sep 8, 2021

Thanks @sikler
The main question for this PR is, how to test it in Online DQM at P5?
Do you really need LHC beam/collisions? Can we do some kind of check using FakeBeam like FakeBeamMonitor does?

@sikler
Copy link
Contributor Author

sikler commented Sep 8, 2021

@jfernan2 Sounds interesting. Is there some documentation on FakeBeam and related (twiki, presentation, similar)?

@jfernan2
Copy link
Contributor

jfernan2 commented Sep 8, 2021

For FakeBeamMonitor, in this same package,
https://github.com/cms-sw/cmssw/blob/master/DQM/BeamMonitor/plugins/FakeBeamMonitor.h
I believe the two experts may shred us some light, @francescobrivio @gennai
Thanks

@jfernan2
Copy link
Contributor

jfernan2 commented Sep 8, 2021

BTW: @sikler
Since this PR superseeds them, it would be if you could:

@francescobrivio
Copy link
Contributor

@jfernan2 Sounds interesting. Is there some documentation on FakeBeam and related (twiki, presentation, similar)?

Hi @sikler ! About the FakeBeamMonitor there is not documentation at the moment, it's a simple copy of the BeamMonitor except the BeamSpot values don't come from the fit to the events but are random numbers generated by TRandom. It was used to commission the per-lumi updates of the BeamSpot during LS2 when collisions were not available.

I will have a look at this PR asap (next few days) and see if there is a way to use it for testing DIP.

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 9, 2021

Pull request #35193 was updated. @andrius-k, @kmaeshima, @ErnestaP, @ahmad3213, @cmsbuild, @jfernan2, @rvenditti can you please check and sign again.

@perrotta perrotta mentioned this pull request Sep 13, 2021
DQM/BeamMonitor/plugins/BeamSpotDipServer.cc Outdated Show resolved Hide resolved
Comment on lines 397 to 400
startTime = bs.GetCreationTime();
startTimeStamp = bs.GetCreationTime(); // not provided by BeamSpotOnlineObject
endTime = bs.GetCreationTime();
endTimeStamp = bs.GetCreationTime(); // not provided by BeamSpotOnlineObject
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between starTime and startTimeStamp? (and same for endTime).
Maybe they can be added to the BeamSpotOnlineObject.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

startTime is an std::string, while startTimeStamp is epoch (number of seconds since 1970).
In principle startTime should contain the beginning of the processed lumisection range, and endTime should mark the end of the processed lumisection range.
Can you add those? Actually epoch could be calculated from CreationTime through mktime.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function GetCreationTime() for the BeamSpotOnlineObject returns the moment in which the payload was actually created (see these lines), it's not related to the LS analyzed.

I'll look into adding these parameters to the online beamspot.

DQM/BeamMonitor/plugins/BeamSpotDipServer.cc Show resolved Hide resolved
DQM/BeamMonitor/plugins/BeamSpotDipServer.h Show resolved Hide resolved
DCSStatus = cms.untracked.string("scalersRawToDigi"),
#
sourceFile = cms.untracked.string(
"/nfshome0/dqmpro/BeamMonitorDQM/BeamFitResults.txt"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok as I mentioned above you are reading the txt file that is filled by the "HLT workflow" (beamhlt_dqm_sourceclient-live_cfg.py#L106), while the legacy workflow (beam_dqm_sourceclient-live_cfg.py#L96) is writing to
/nfshome0/dqmpro/BeamMonitorDQM/BeamFitResultsOld.txt

We should decide which one we want to publish, or at least add an option to make this configurable.
For example for the Pilot Beam Test in October I think only the legacy workflow (which is fed by ZeroBias) will produce actual BeamSpot values.

@gennai @mmusich any thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @francescobrivio this comes a bit out of context for me. What did we use to publish before?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I don't know what was published in the past, @sikler do you know?
I will try to dig up the old DIP results.

Copy link
Contributor Author

@sikler sikler Sep 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have implemented two possible workflows for DIP publishing:

  • read the BeamSpotOnlineObject and DCSStatus, and transmit their data (this is the default)
  • and the old way of reading the /nfs files (BeamFitResults.txt and BeamFitResults_TkStatus.txt), can be selected by ('readFromNFS = True')

So, the default now is BeamSpotOnlineObject and DCSStatus ('readFromNFS = False')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @sikler
What is the reason or advantage of the operation mode reading the data from the condition DB (the BeamSpotOnlineObject)? Why it has been chosen as the default choice?
Also, taking into account that we have two workflows ("Legacy" and "HLT") populating two different tags, which one are you going to select?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ggovi

What is the reason or advantage of the operation mode reading the data from the condition DB (the BeamSpotOnlineObject)? Why it has been chosen as the default choice?

It was felt that reading /nfs files was not a good practice, so a full online solution with BeamSpotOnlineObject was the way to go.

Also, taking into account that we have two workflows ("Legacy" and "HLT") populating two different tags, which one are you going to select?

My understanding was that there will be some switch at the creation of BeamSpotOnlineObject, and that will decide with which beamspot would be used for online and also to be transmitted to LHC's DIP. So the beamspot used by online should match the DIP version.

Copy link
Contributor

@ggovi ggovi Sep 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sikler thanks for the answer. The fragility of nfs filesystem sounds indeed like a good reason for preferring the DB as input. I'm actually wondering if this problem holds as well also for the BeamSpot online workflows... Because in this case we would be affected anyhow. @francescobrivio may clarify.
In any case, this choice has a cost - see the variables added in the payload, which will be presumably not consumed by HLT, but only "bridged out" for the DIP storage.
Concerning the choice of the two tags, I think the actual implementation has been evolved with respect to the earlier discussion, and now we have actually the two payloads (in the two different tags), with no "DB-stored" information about which one has been selected/consumed by hlt. @francescobrivio may comment also about this...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • @ggovi the BeamSpot online workflows do not rely on nfs, they write directly the payload and upload it to CondDB. nfs is only used to store the txt files so I don't think there is a problem there.

  • @sikler regarding the choice of the two workflow it's like Giacomo wrote: both payloads are produced and uploaded to the DB in different tags. Then it's up to the consumer code to choose the preferred one. At the moment there are 2 consumers: the HLT and the DQM monitoring.
    Both rely on an ESProducer (OnlineBeamSpotESProducer.cc) that reads both tags and takes the final decision on what BeamSpot to use, since they are based on the same producer the decision taken is the same for both (this has been commissioned and verified during MWGR and CRUZET).
    You can see an example of the implementation in the DQM monitoring client onlinebeammonitor_dqm_sourceclient-live_cfg.py

    So in the end we have to decide if what we publish in DIP is what we actually use in HLT (and in Express), or if we want to publish always the same result in any case (e.g. always the legacy BeamSpot).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@francescobrivio well if the workflows write files outside the local node disk, they could be still affected by nfs issues. In this case, it would be unreasonable to address the issue only on the DIP workflow side. Maybe we should discuss the overall picture. I'll try to setup a discussion.

@francescobrivio
Copy link
Contributor

@jfernan2 @sikler about testing this PR in P5:
If I understood correctly the DIP server does not need live data, i.e. reads all the information from txt files which are stored in /nfshome0/dqmpro/BeamMonitorDQM/ or from the DB.
So, once the needed changes are done, one could probably test this by running in the DQM playback system using one run from CRUZET2021. Since we have been running the FakeBeamMonitor clients so far both the txt files and the DB values should be already there (with totally fake random values, but who cares at this point).

@sikler The only part I'm not sure about is if we can set the output of the DIP pubblication? i.e. we don't want to publish this fake values anywhere public, we just want to test that the pubblication (and the full chain) actually works.

@sikler
Copy link
Contributor Author

sikler commented Sep 16, 2021

@jfernan2 @sikler about testing this PR in P5:
If I understood correctly the DIP server does not need live data, i.e. reads all the information from txt files which are stored in /nfshome0/dqmpro/BeamMonitorDQM/ or from the DB.

As I wrote above, there are two options:

  • read the BeamSpotOnlineObject and DCSStatus, and transmit their data (this is the default)
  • and the old way of reading the /nfs files (BeamFitResults.txt and BeamFitResults_TkStatus.txt), can be selected by ('readFromNFS = True')
    So, the default now is BeamSpotOnlineObject and and DCSStatus ('readFromNFS = False')

So, once the needed changes are done, one could probably test this by running in the DQM playback system using one run from CRUZET2021. Since we have been running the FakeBeamMonitor clients so far both the txt files and the DB values should be already there (with totally fake random values, but who cares at this point).

Sounds viable.

@sikler The only part I'm not sure about is if we can set the output of the DIP pubblication? i.e. we don't want to publish this fake values anywhere public, we just want to test that the pubblication (and the full chain) actually works.

Yes, we can set to anything by marking the publication as 'test', so people won't take the numbers seriously.

@cmsbuild
Copy link
Contributor

Pull request #35193 was updated. @emanueleusai, @ahmad3213, @cmsbuild, @jfernan2, @pmandrik, @pbo0, @rvenditti can you please check and sign again.

2 similar comments
@cmsbuild
Copy link
Contributor

Pull request #35193 was updated. @emanueleusai, @ahmad3213, @cmsbuild, @jfernan2, @pmandrik, @pbo0, @rvenditti can you please check and sign again.

@cmsbuild
Copy link
Contributor

Pull request #35193 was updated. @emanueleusai, @ahmad3213, @cmsbuild, @jfernan2, @pmandrik, @pbo0, @rvenditti can you please check and sign again.

[DD4hep] Increase precision of rotation matching in making DD4hep big XML file for DB
EDProducer/EDAnalyzer and consumes migration for some DT classes
…mFix

LowPtElectrons: fix for UL FastSim mini v2 and nano v9 workflows
@Martin-Grunewald
Copy link
Contributor

-1
wrong branch or rebase needed

@slava77
Copy link
Contributor

slava77 commented Sep 27, 2021

-1

@tvami
Copy link
Contributor

tvami commented Sep 27, 2021

-alca

  • PR was closed

@tvami
Copy link
Contributor

tvami commented Sep 27, 2021

-db

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment