Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Providing CMSSW version info to ElasticSearch #7786

Closed
sciaba opened this issue Apr 11, 2017 · 14 comments
Closed

Providing CMSSW version info to ElasticSearch #7786

sciaba opened this issue Apr 11, 2017 · 14 comments
Assignees
Milestone

Comments

@sciaba
Copy link

sciaba commented Apr 11, 2017

Doing analysis in ElasticSearch it is very useful to have the CMSSW version that was used by the job. I'd like to ask that this information is made available for exporting to ElasticSearch.

@ticoann
Copy link
Contributor

ticoann commented Apr 25, 2017

@amaltaro, Alan which code is putting this information?

@amaltaro
Copy link
Contributor

amaltaro commented Apr 25, 2017

I think this is part of the htcondor monitoring. Is it enough to add a new job classad or we have to do something else @sciaba ? If so, then it has to be a comma separated string (or a list...) since there are jobs running multiple CMSSW releases.

@sciaba
Copy link
Author

sciaba commented Apr 25, 2017

Good question. A list of releases would be good but one would need to know something about what CMSSW did each time it ran. Let me discuss it with Brian at the earliest opportunity.

@hufnagel
Copy link
Member

Seems like an oversight that this is not already available from CMSSW via a ChirpCMSSW_cmsRunN_Param fields. I would open a CMSSW ticket for it.

Otherwise please don' add more classadds, just chirp it directly from the CMSSW runtime executor, that knows what CMSSW version it's executing.

That way you get per step information only, but it's easy enough to search for that as long as you know the naming conventions and per step CMSSW version will be needed anyways at some point.

@sciaba
Copy link
Author

sciaba commented Apr 25, 2017

That would be perfect.

@amaltaro
Copy link
Contributor

@bbockelm as we discussed yesterday. What WMAgent has to provide is a list of CMSSW releases in the job classad, correct?

@hufnagel
Copy link
Member

Isn't it more important that CMSSW provides this information so you can match per step parameters to the CMSSW version ?

@bbockelm
Copy link
Contributor

Hi Dirk,

To first order, it is a bit of a toss-up whether done in the agent or CMSSW. I think the agent is the right place though because:

  • If done in CMSSW, we'd only get the information for future releases. Implementing in agent would work today (already done in CRAB).
  • If done by the agent, we'd get this information prior to job starting up (someday could be used for matchmaking in condor) or for steps that didn't run (because of job failure in an early step).

I suppose there's a reasonably good argument that we should do it in both places (think: CMSSW runs in other places, not just WMAgent). But I think it really should be in the agent for Andrea's use case.

Brian

@amaltaro amaltaro modified the milestones: WMAgent1706, WMAgent1705 Jun 7, 2017
@ticoann ticoann modified the milestones: WMAgent1706, WMAgent1707 Jul 4, 2017
@amaltaro amaltaro modified the milestones: WMAgent1708, WMAgent1707 Aug 1, 2017
@amaltaro amaltaro modified the milestones: WMAgent1709, WMAgent1708 Sep 12, 2017
@ticoann ticoann modified the milestones: WMAgent1709, WMAgent1710 Oct 24, 2017
@ticoann ticoann modified the milestones: WMAgent1710, WMAgent1712 Nov 27, 2017
@ticoann ticoann modified the milestones: WMAgent1712, WMAgent1801 Jan 18, 2018
@ticoann ticoann modified the milestones: WMAgent1801, WMAgent1802 Feb 12, 2018
@bbockelm
Copy link
Contributor

@amaltaro - can we pull this one back up from the dead? Is it relatively simple to fix?

@amaltaro
Copy link
Contributor

It should be fairly simple to implement it. We just have to decide how to publish this information for multi-steps jobs (I assume a comma separated ordered string), e.g.:

  • step1 on 911, step2 on 911 and step3 on 900 --> "CMSSW_9_1_1,CMSSW_9_1_1,CMSSW_9_0_0"
  • step1 on 800 and step2 on 800 --> "CMSSW_8_0_0,CMSSW_8_0_0"

Does it sound good? How about the scramArchs? Do we actually care about them?

@sciaba
Copy link
Author

sciaba commented Feb 19, 2018

How can one know the nature of the steps? Is it always obvious what stepN is? If There is no way to know what CMSSW was doing on a particular step, the information is almost useless.

@amaltaro
Copy link
Contributor

HTCondor classads don't carry information step-wise, instead they contain information about a job/task as a whole. You'd probably need to correlate a job information with the ReqMgr request information, unless CMSSW provides some of that meta data already via condor_chirp...

@sciaba
Copy link
Author

sciaba commented Feb 19, 2018

I hope it does, and if not it should (and the same for the CMSSW version as well, as Dirk had suggested). Correlating information from different sources is often unwieldy.
In general we should find a sound solution for expressing the information about the steps, considering that multi-step jobs are more and more common. This goes beyond the scope of this ticket, though.

@bbockelm
Copy link
Contributor

@amaltaro - we don't really care about the SCRAM arch because we get the runtime OS information from other sources.

Nowhere in CMS do we define what a DIGI job is: that's been a long-standing problem in all our monitoring, but we can't really fix this in this ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants