-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CMSSW performance metrics to WMArchive document #11696
Conversation
Jenkins results:
|
Jenkins results:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just saw there was a pending review request for this PR, even though it's labelled as "PR: Work in progress".
I know we have to resume working on this once the flow of these metrics are fully understood. I don't understand why this PR provides the CMSSWMetrics.py
module though. That is an example and it should be placed under the correct location (either test/* or src/python/WMQuality).
7542509
to
47e4de2
Compare
Jenkins results:
|
test this please |
Alan, the |
Jenkins results:
|
Here is a link to a document which now contains cmssw metrics in WMArchive |
Alan, I applied this PR along with #11663 to vocms0291 WMAgent and run one workflow
and, then I found that these metrics are propagated into WMArchive, see this document Therefore, I removed work in progress label and now this PR along with #11663 are ready for review. Please note that this PR address how to propagate metrics to WMArchvie, while #11663 provides CMSSW metrics in FJR JSON by parsing appropriate parts of CMSSW XML report. Both PRs will be required to complete #11538 Just in case here a direct link to FJR JSON in OpenSearch/MONIT infrastructure (previously I added link to a filtered page). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the PR @vkuznet
I do not see anything dramatic that could be a showstopper. Nevertheless I made 2 or 3 comments inline. You may want to take a quick look. (no need to change anything though if you think it is already good enough)
Jenkins results:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vkuznet Valentin, I left a few comments along the code.
In addition to those comments:
- please check the report from Jenkins and take the necessary actions. New modules are supposed to be clean of any suggestions, unless there is no simple way to do that.
- one thing that is not desirable IMO is that we will have many metrics being provided twice, under different categories. Perhaps we should make a commitment for 1 or 2 years ahead such that we can remove all the other metrics and provide only
cmssw
.
def CMSSWMetrics(): | ||
""" | ||
Helper function to convert CMSSW_METRICS dict values to their data-types | ||
:return: dictionary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can infer from the code, but it would be helpful if you provided 2 or 3 lines of an example dictionary that would be returned.
rdict[key] = ndict | ||
return rdict | ||
|
||
def XrdMetrics(mdict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a unit test for this function
@@ -52,7 +54,8 @@ | |||
'readTotalMB': float, | |||
'readTotalSecs': float, | |||
'writeTotalMB': float, | |||
'writeTotalSecs': float}} | |||
'writeTotalSecs': float}, | |||
'cmssw': CMSSWMetrics()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand it right that this function would be executed only once during the lifetime of the ArchiveDataReporter? Which is the component responsible for parsing the FJR and creating the WMArchive document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will be done once, but is hard to say based on my current understanding of the all involved components. The parsing will be done by WMCore/FwkJobReport/XMLParser.py
when it will parse provided CMSSW XML FJR file, i.e. it will be done in WMCore/WMSpec/Steps/Executors/CMSSW.py
module and whoever component will call it. But it is not happen in ArchiveDataReporter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was afraid it could happen for every single job, but grepping the code:
(py3.8) OR-FVFH37J7Q6LR:WMCore amaltar2$ grep -rI DataMap src/* | grep import
src/python/WMComponent/ArchiveDataReporter/ArchiveDataPoller.py:from WMCore.Services.WMArchive.DataMap import createArchiverDoc
my understand is that the map will be created once, the moment ArchiveDataReporter component gets started.
In that case, everything is fine.
@@ -207,6 +211,10 @@ def typeCastPerformance(performDict): | |||
for key in PERFORMANCE_TYPE: | |||
if key in performDict: | |||
for param in PERFORMANCE_TYPE[key]: | |||
if key == 'cmssw': | |||
# so far we skip validation of cmssw metrics values | |||
newPerfDict[key] = performDict[key] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these are new metrics to be provided to the monitoring system, why do we skip this validation?
One of my fears is that, once we start validating it, that we get different metric data types for different CMSSW releases. If that happens, this will crash until someone looks and resolve the issue. Perhaps the way to deal with this would be to catch an exception and simply provide the value as is, thus avoiding human action.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is why I skipped validation, and in fact I do not know how to properly do it.
Alan, regarding new module clean code. The pep8 errors comes from non complaint indentation of
Of course I can manually try to align all parts (460 lines in total) but I get it from JSON file and parse it with |
Jenkins results:
|
@vkuznet Valentin, my IDE is not completely in sync with the rules used by jenkins pylint. can you fetch it and make a new commit with all its changes to see how it goes? |
Jenkins results:
|
Alan, here is your IDE failures from pep8:
|
Haa, I found which option I was missing in the IDE. Valentin, if you don't mind, can you please fetch an updated file from |
Jenkins results:
|
e834435
to
1e9a3ed
Compare
Jenkins results:
|
1e9a3ed
to
ec85ae7
Compare
Jenkins results:
|
ec85ae7
to
08f293c
Compare
Jenkins results:
|
Alan, after applying your latest IDE and few more iterations I made most of pep8 errors gone (except previous code in DataMap). Said that, I think this PR is ready for final review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valentin, the code looks good in general. However, I fear that we will be providing inconsistent data type in the output document.
The method named typeCastPerformance
does data casting for everything but cmssw
metrics. In addition, AFAIR the FJR has a string type for boolean data, which means that anything like "false", "False" would be evaluated to True
.
I think it would be really helpful to get a FJR - that would be parsed by ArchiveDataReporter - and have it go through the DataMap logic to see what the outcome would be. Are you able to provide both income and outcome documents?
Alan, yes I can easily generate JSON report from XML using this piece of code:
and you can find the JSON report over here and as you can see it is using |
Given that you have all this code in your repository, can you please generate the output file for ALL of those performace sections? They are: |
Alan, I updated report and now it contains all performance metrics. |
Thanks Valentin. I was trying to figure out why the Here is one example metric
versus
Said that, the |
@vkuznet Valentin, in the coming days, can you please create a new GH issue to remove the now duplicate sections from the FJR and WMArchive documents. Please provide an attachment/link/reference to the input FJR: |
Fixes #11538
Status
ready
Description
Add CMSSW performance metrics to WMArchive document
Is it backward compatible (if not, which system it affects?)
YES
Related PRs
#11663
External dependencies / deployment changes