Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start providing DESIRED_CMSDatatier based on DESIRED_CMSDataset in a form of a classad #11613

Closed
khurtado opened this issue Jun 16, 2023 · 8 comments

Comments

@khurtado
Copy link
Contributor

Impact of the new feature
Inform which systems would be affected by this new feature: WMAgent. This is related to the condor submit plugin.

Is your feature request related to a problem? Please describe.
While working on #11608, we realized the CMSPrimaryDataTier parameter, used in Grafana monitoring was Unknown for the most part.

Describe the solution you'd like
This CMSPrimaryDataTier parameter used in monit is populated based on a classad called DESIRED_CMSDataset in the following form:
https://github.com/dmwm/cms-htcondor-es/blob/master/src/htcondor_es/convert_to_json.py#L751-L757

        info = str(result["DESIRED_CMSDataset"]).split("/")
        if len(info) > 3:
            result["CMSPrimaryPrimaryDataset"] = info[1]
            result["CMSPrimaryProcessedDataset"] = info[2]
            result["CMSPrimaryDataTier"] = info[-1]

Hence, we should make sure we append the CMSPrimaryDatatier information as part of the DESIRED_CMSDataset classad, or to create a new DESIRED_CMSDatatier classad and have the monitoring spider use this parameter to populate CMSPrimaryDatatier instead.

Describe alternatives you've considered

Additional context
Related to #11608

@khurtado
Copy link
Contributor Author

khurtado commented Jul 7, 2023

@amaltaro So, after looking at the values of these classads, it seems CMSPrimaryDatatier does not exist only when DESIRED_CMSDataset does not exist either. This means we are always setting this value whenever there is aninputDataset defined in the job package. Hence nothing to do?

Do we want to move the logic from the spider and create:

    ["CMSPrimaryPrimaryDataset"]
    ["CMSPrimaryProcessedDataset"]
    ["CMSPrimaryDataTier"]

on our own, or should we just leave it there and close this ticket? Or am I missing something?

EDIT: I have changed the dashboard below to classify by taskType instead of CMSPrimaryDataTier
https://monit-grafana.cern.ch/d/ifXAfjLVk/production-jobs-exit-code-monitoring?orgId=11&from=1654625206007&to=1688753206007&viewPanel=92

@amaltaro
Copy link
Contributor

amaltaro commented Jul 7, 2023

@khurtado I fail to see these classads in our submitter plugin:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/BossAir/Plugins/SimpleCondorPlugin.py

so it must be set by the spider monitoring itself. I don't think we need to change it, on the other hand, we have seen a few issues with spider monitoring over the last months and maybe it's time for us to start providing these information from the source(?)

It would be worth it to check a handful of jobs that contain DESIRED_CMSDataset and make sure that the dataset is properly broken down into the 3 classads that you mentioned above.

@khurtado
Copy link
Contributor Author

khurtado commented Jul 7, 2023

@amaltaro Yes, that's correct. The spider creates them based on DESIRED_CMSDataset which we always pass correctly unless there is no inputDataset in the job definition.

I checked at the ES data and didn't find any instance where the len of DESIRED_CMSDataset.split("/") was less than 4 (being [0] just an empty string from the split). I could not find any instance in the last year when DESIRED_CMSDataset existed but not the 3 derived parameters originating from the spider either, so everything seems fine.

We could indeed move that logic for those 3 parameters here, but things seem properly working as of now. What are the issues over the last few months you are referring to? are there overall spider issues or are these issues specific to these classads/parameters in the GH issue?

@amaltaro
Copy link
Contributor

amaltaro commented Jul 7, 2023

@khurtado Kenyi, you are right! I overlooked the initial issue description and I see you properly pointed out the relevant code converting dataset to the other 3 attributes.
I also had a look at one ReReco workflow and I see the CMSPrimaryDataTier classad is correct:
https://monit-opensearch.cern.ch/dashboards/app/discover?security_tenant=global#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-15m,to:now))&_a=(columns:!(data.CMSPrimaryPrimaryDataset,data.CMSPrimaryDataTier,data.CMSPrimaryProcessedDataset,data.DESIRED_CMSDataset),filters:!(),index:dd1ba850-d169-11ea-966a-e1c0a7950cea,interval:auto,query:(language:kuery,query:'data.WMAgent_RequestName:pdmvserv_Run2022C_EGamma_27Jun2023_230627_115909_2933'),sort:!())

but CMSPrimaryProcessedDataset and CMSPrimaryPrimaryDataset are empty (maybe there is no index for these fields in ES?).

Having said that, I would say that there is nothing to be done here. On the other hand, I find those classads misleading (it is hard to say if it's related to the input or output). Suggestions that I would have - and these could be done on the spider script itself are:

  • CMSPrimaryPrimaryDataset why Primary twice in the name?
  • instead of CMSPrimaryDataTier, I would rename it to DESIRED_CMSDatatier. Similar change to the other two classads.

If you think it's worth discussing this with Federica and Nikodemas, perhaps add them to this issue and/or open a CMSMonitoring specific ticket for this.

@khurtado
Copy link
Contributor Author

khurtado commented Jul 7, 2023

@amaltaro Ahhh, I was looking at the cms-20* index, not the monit_prod_condor* one. Indeed, that's strange and it looks like if they should be part of the latter as well. I will create a ticket in monit.

EDIT: GH issue created:
dmwm/cms-htcondor-es#211

@khurtado
Copy link
Contributor Author

khurtado commented Jul 7, 2023

@amaltaro If there is nothing to be done here on WMCore, should I close the ticket and just follow up on the monit GH repo?

@amaltaro
Copy link
Contributor

amaltaro commented Jul 7, 2023

I guess we can close this issue out. If CMS Monit decides to push those classads back into the WM realm, then we can reopen this issue. Whatever you prefer Kenyi!

@khurtado
Copy link
Contributor Author

khurtado commented Jul 7, 2023

Sounds good! Closing this one then.

@khurtado khurtado closed this as completed Jul 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

2 participants