Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ResourceControlUpdater to continuously update PNNs in the database #11599

Merged
merged 2 commits into from
May 31, 2023

Conversation

amaltaro
Copy link
Contributor

@amaltaro amaltaro commented May 29, 2023

Fixes #11596

Status

ready

Description

This PR provides the following:

  • fix GetSiteInfo DAO which was broken when no site name was provided
    • also update its return data type to always be a list (list of dictionaries)
  • support in ResourceControl listSiteInfo to list all resources in a single command
  • also change its return data type to be more consistent
  • finally, implement in ResourceControlUpdater an extra method to continuously compare CRIC vs local database;
    • in case of differences, insert the new PNN(s) into the database and make the relevant PSN association

Is it backward compatible (if not, which system it affects?)

YES

Related PRs

None

External dependencies / deployment changes

None

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 20 new failures
    • 3 changes in unstable tests
  • Python3 Pylint check: failed
    • 16 warnings and errors that must be fixed
    • 4 warnings
    • 37 comments to review
  • Pylint py3k check: failed
    • 25 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 1 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14263/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 4 new failures
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 17 warnings and errors that must be fixed
    • 4 warnings
    • 41 comments to review
  • Pylint py3k check: failed
    • 25 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 1 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14264/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 10 new failures
    • 3 changes in unstable tests
  • Python3 Pylint check: failed
    • 18 warnings and errors that must be fixed
    • 4 warnings
    • 65 comments to review
  • Pylint py3k check: failed
    • 25 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 2 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14265/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 19 warnings and errors that must be fixed
    • 6 warnings
    • 115 comments to review
  • Pylint py3k check: failed
    • 25 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 4 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14266/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 16 warnings and errors that must be fixed
    • 6 warnings
    • 115 comments to review
  • Pylint py3k check: failed
    • 25 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 4 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14267/artifact/artifacts/PullRequestReport.html

@@ -70,6 +71,10 @@ def __init__(self, config):
# set resource control
self.resourceControl = ResourceControl(config=self.config)

# get CRIC wrapper instance
self.cricPattern = ".*" # all sites
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no need to create self data with static value since it never changed. Unless it shown otherwise I suggest to move value .* directly into updatePNNMap function and put it into function call where the comment will be more appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see pros and cons in each implementation. With this, we have a clear definition at the initialization of the object about which parameters are used within this module/thread. Creating a constant inside the function is fine as well. I will update this code soon.

"""
# first create a dictionary to make key look-up easier
resp = {}
for thisItem in DBFormatter.format(self, result):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since below you takes elements of thisItem it is better to check that its length has at least 7 elements (the largest index I see in code below). If not it is better to provide appropriate logging message and skip the rest of the code in a loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be done, but this is a SQL statement where we request 7 fields, hence it must have 7 elements in the data objects. I don't think we gain anything from adding this check here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alan, my point is that it was unclear reading the PR code why we need 7 elements until I opened full file and saw the SQL statement which you mentioned. If you do not want to perform the check which will not hurt the codebase, then at least put appropriate comment either in docstring or better directly in for loop codebase that severn elements are part of SQL statement.

siteName = siteInfo['site_name']
if siteName not in cricMap:
logging.warning("Site name: %s cannot be found in CRIC.", siteName)
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we should simply continue here. We'd rather do something. the fact that the site has disappeared from CRIC must mean something. So I'd rather try to update the Agent's resource control and remove it from there as well. Even though it may require some more complicated code changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CRIC call uses the data-processing API, which requires an association of PNN and PSN.
Many opportunistic resources do not have a PNN, and those would fall into this warning message. Maybe I could update the message to something like "cannot be found in CRIC data-processing API"?

Copy link
Contributor

@todor-ivanov todor-ivanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @amaltaro
Thanks for this PR, I've left few comments inline.


results[0]["pnn"] = pnns
return results[0]
return results
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the DAO did change it's return format... that's obvious, but is this the only place where it is used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API is only consumed by "bin/wmagent-resource-control", which has been updated accordingly.

@@ -364,8 +364,9 @@ def addMCFakeFile(self):
siteInfo = self.getLocationInfo.execute(conn=self.getDBConn(),
transaction=self.existingTransaction())
for site in siteInfo:
if site['pnn'] in self.commonLocation:
locations.add(site['pnn'])
for pnn in site['pnn']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't that be achieved with set operations, instead of additional for cycle?

@@ -91,6 +96,13 @@ def algorithm(self, parameters):
logging.info("This component is not enabled in the configuration. Doing nothing.")
return

logging.info("Checking and updating PSN x PNN map...")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this x here in this log message is confusing. Maybe we can use PSN to PNN map instead.

@amaltaro
Copy link
Contributor Author

@todor-ivanov @vkuznet I think I addressed all of your concerns in the last 2 commits (to be properly squashed before merging it). Please have another look.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 17 warnings and errors that must be fixed
    • 6 warnings
    • 115 comments to review
  • Pylint py3k check: failed
    • 25 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 4 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14268/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 2 new failures
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 17 warnings and errors that must be fixed
    • 6 warnings
    • 115 comments to review
  • Pylint py3k check: failed
    • 25 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 4 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14269/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor Author

And the last commit fixes an actual bug that I had in the code. It has now been tested in vocms0193 and it seems to be working fine.

Copy link
Contributor

@todor-ivanov todor-ivanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things look good to me.

@@ -364,8 +364,7 @@ def addMCFakeFile(self):
siteInfo = self.getLocationInfo.execute(conn=self.getDBConn(),
transaction=self.existingTransaction())
for site in siteInfo:
if site['pnn'] in self.commonLocation:
locations.add(site['pnn'])
locations.update(set(site['pnn']) & set(self.commonLocation))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Way better... Thanks @amaltaro

append new item

Correct list of pnns

ResourceControlUpdater can now update PNN/PSN association

Deal with list of PNNs in WMBSHelper

Mock CRIIC in ResourceControlUpdater

Apply Todor/Valentin suggestions on ResourceControlUpdater

change set operation in WMBSHelper

Apply Todor/Valentin suggestions on GetSiteInfo

Fix ResourceControl API to listSiteInfo
fix WMBS/Locations unit test

pylint fixes; warn replaced by warning
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 17 warnings and errors that must be fixed
    • 6 warnings
    • 115 comments to review
  • Pylint py3k check: failed
    • 25 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 4 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14270/artifact/artifacts/PullRequestReport.html

@amaltaro amaltaro merged commit 993e485 into dmwm:master May 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New PNNs for T2_FR_GRIF_* sites
4 participants