Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New function to get dataset locations from MSPileup (addressing #11620) #11879

Merged
merged 2 commits into from
Feb 9, 2024

Conversation

anpicci
Copy link
Contributor

@anpicci anpicci commented Jan 25, 2024

Fixes #11620

Status

In development

Description

A new method is introduced to get the pileup dataset location leveraging MSPileup

Is it backward compatible (if not, which system it affects?)

NO (It's a new feature for fetching information)

Related PRs

None

External dependencies / deployment changes

No

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 4 new failures
    • 1 tests no longer failing
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 9 warnings and errors that must be fixed
    • 1 warnings
    • 24 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 5 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14788/artifact/artifacts/PullRequestReport.html

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Andrea, this is the right location where we need to update pileup data location in (global) workqueue, however I made some comments that need to be followed up. Thanks

#datasets = [d for prec in pileupDatasets.values() for d in prec]
#self.pileupData = self.getDatasetLocations(datasets)
### flattening the pileupDatasets to have (dbsurl, dataset) pairs to loop over in the getDatasetLocationsFromMSPileup
datasetsWithDbsURL = [(dbsUrl, dataset) for dbsUrl, datasets in pileupDatasets.items() for dataset in datasets]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I look into this, I would even say that we don't have to reorganize the data that comes out of self.wmspec.listPileupDatasets(). That structure would be in a format like:

{"cmsweb production DBSReader": [pileupA, pileupB, pileupC],
  "cmsweb-testbed DBSReader": [pileupD]}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've thought about this for a while, maybe for now it's better to keep it as it is. The alternative would be to have a for loop over pileupDatasets, and send to getDatasetLocationsFromMSPileup the key as the dbsUrl and the list [pileupA, pileupB, pileupC]. This would imply updating self.pileupData at each iteration of the outer loop. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for missing your reply. Yes, I would be in favor of constructing a code like:

for dbsUrl, datasets in pileupDatasets.items():
    # note that each workflow will be either configured to production or to testbed url, hence single iteration
    self.pileupData = self.getDatasetLocationsFromMSPileup(dbsUrl, datasets)

this way we:
a) don't need to refactor the data structure above (named as pileupDatasets)
b) we don't need to map the mspileup url for each pileup dataset
c) we don't need to set the mspileup url for each pileup dataset, as in general the majority (or all) will share the same url

if isinstance(datasetsWithDbsURL, str):
datasetsWithDbsURL = [datasetsWithDbsURL]
result = {}
for dataset, dbsUrl in zip(datasetsWithDbsURL):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this line gives me a ValueError exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to commit a new version, were L292 and L293 are deleted, as well as the zip in L295

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 4 changes in unstable tests
  • Python3 Pylint check: failed
    • 9 warnings and errors that must be fixed
    • 1 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 6 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14797/artifact/artifacts/PullRequestReport.html

@anpicci
Copy link
Contributor Author

anpicci commented Jan 31, 2024

From the second test of Jenkins test, I get this traceback:

Traceback (most recent call last):
File "/build/cmsbld/jenkins/workspace/DMWM-WMCorePy3-PR-unittests/SLICE/3/label/cms-dmwm-cc7/code/src/python/WMCore/WorkQueue/Policy/Start/StartPolicyInterface.py", line 298, in getDatasetLocationsFromMSPileup
doc = getPileupDocs(msPileupUrl, queryDict)
File "/build/cmsbld/jenkins/workspace/DMWM-WMCorePy3-PR-unittests/SLICE/3/label/cms-dmwm-cc7/code/src/python/WMCore/MicroService/Tools/Common.py", line 119, in getPileupDocs
data = mgr.getdata(mspileupUrl, queryDict, headers, verb='POST',
File "/build/cmsbld/jenkins/workspace/DMWM-WMCorePy3-PR-unittests/SLICE/3/label/cms-dmwm-cc7/code/src/python/WMCore/Services/pycurl_manager.py", line 363, in getdata
_, data = self.request(url=url, params=params, headers=headers, verb=verb,
File "/build/cmsbld/jenkins/workspace/DMWM-WMCorePy3-PR-unittests/SLICE/3/label/cms-dmwm-cc7/code/src/python/Utils/PortForward.py", line 66, in portMangle
return callFunc(callObj, newUrl, *args, **kwargs)
File "/build/cmsbld/jenkins/workspace/DMWM-WMCorePy3-PR-unittests/SLICE/3/label/cms-dmwm-cc7/code/src/python/WMCore/Services/pycurl_manager.py", line 353, in request
raise exc
http.client.HTTPException: url=https://cmsweb-prod.cern.ch:8443/ms-pileup/data/pileup, code=403, reason=Forbidden, headers={'Date': 'Tue, 30 Jan 2024 16:57:17 GMT', 'Server': 'Apache', 'Set-Cookie': 'cms-auth=3c656d3e60fea6a6a2d9956121c9b268bc31ac1f5ac876473d7677c5a14fc9678fc84ab3e0fc11d4;path=/;secure;httponly;expires=Thu, 01-Jan-1970 00:00:01 GMT', 'Content-Type': 'text/html;charset=utf-8', 'Content-Length': '753', 'X-Rest-Status': '100', 'X-Rest-Time': '1001.596 us', 'CMS-Server-Time': 'D=63250 t=1706633837604723'}, result=b'\n\n\n \n <title>403 Forbidden</title>\n <style type="text/css">\n #powered_by {\n margin-top: 20px;\n border-top: 2px solid black;\n font-style: italic;\n }\n\n #traceback {\n color: red;\n }\n </style>\n\n \n

403 Forbidden

\n

You are not authorized to access this resource.

\n
\n    
\n \n Powered by CherryPy 18.8.0\n \n
\n \n\n'

for WMCore_t.WorkQueue_t.Policy_t.Start_t.Block_t.BlockTestCase.testPileupData. Is this relevant?
@amaltaro

@amaltaro
Copy link
Contributor

Dennis and I discussed the same issue yesterday. It turns out the query to MSPileup is incorrect. Before you make any further changes in this PR, I think it would be better to converge on Dennis' developments in #11870

@amaltaro
Copy link
Contributor

@anpicci Andrea, now that we have the changes Dennis provided merged in master:
#11870

could you please update this PR accordingly? Please use the getPileupDocs() provided in the new python package. Thanks

@anpicci
Copy link
Contributor Author

anpicci commented Feb 1, 2024

Thank you @d-ylee @amaltaro. Do I need to rebase my local repo?

@amaltaro
Copy link
Contributor

amaltaro commented Feb 1, 2024

Yes @anpicci . Please let me know if you need any advice on that. If you prefer, you can also create a new development branch/PR.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 tests no longer failing
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 9 warnings and errors that must be fixed
    • 1 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 7 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14804/artifact/artifacts/PullRequestReport.html

@anpicci
Copy link
Contributor Author

anpicci commented Feb 1, 2024

@amaltaro from Jenkins logs, I read:
W0201, line 160 in StartPolicyInterface.call: Attribute 'mask' defined outside init
I have checked that actually it is the only self attribute defined outside the __init__, and the others are defined as None in the __init__ and modified afterwards. Should we apply the same strategy to self.mask?

In addition, I still get this error from the second test:
Traceback (most recent call last):
File "/build/cmsbld/jenkins/workspace/DMWM-WMCorePy3-PR-unittests/SLICE/3/label/cms-dmwm-cc7/code/src/python/WMCore/WorkQueue/Policy/Start/StartPolicyInterface.py", line 299, in getDatasetLocationsFromMSPileup
doc = getPileupDocs(msPileupUrl, queryDict, method='POST')[0]
File "/build/cmsbld/jenkins/workspace/DMWM-WMCorePy3-PR-unittests/SLICE/3/label/cms-dmwm-cc7/code/src/python/WMCore/Services/MSPileup/MSPileupUtils.py", line 25, in getPileupDocs
data = mgr.getdata(mspileupUrl, queryDict, headers, verb=method,
File "/build/cmsbld/jenkins/workspace/DMWM-WMCorePy3-PR-unittests/SLICE/3/label/cms-dmwm-cc7/code/src/python/WMCore/Services/pycurl_manager.py", line 363, in getdata
_, data = self.request(url=url, params=params, headers=headers, verb=verb,
File "/build/cmsbld/jenkins/workspace/DMWM-WMCorePy3-PR-unittests/SLICE/3/label/cms-dmwm-cc7/code/src/python/Utils/PortForward.py", line 66, in portMangle
return callFunc(callObj, newUrl, *args, **kwargs)
File "/build/cmsbld/jenkins/workspace/DMWM-WMCorePy3-PR-unittests/SLICE/3/label/cms-dmwm-cc7/code/src/python/WMCore/Services/pycurl_manager.py", line 353, in request
raise exc
http.client.HTTPException: url=https://cmsweb-prod.cern.ch:8443/ms-pileup/data/pileup, code=403, reason=Forbidden, headers={'Date': 'Thu, 01 Feb 2024 14:58:59 GMT', 'Server': 'Apache', 'Set-Cookie': 'cms-auth=575598d84b057ed202789f0472a4904a64268cff4b6685a92f03efd815dc2348a41d90116e365080;path=/;secure;httponly;expires=Thu, 01-Jan-1970 00:00:01 GMT', 'Content-Type': 'text/html;charset=utf-8', 'Content-Length': '753', 'X-Rest-Status': '100', 'X-Rest-Time': '777.721 us', 'CMS-Server-Time': 'D=98148 t=1706799539815054'}, result=b'\n\n\n \n <title>403 Forbidden</title>\n <style type="text/css">\n #powered_by {\n margin-top: 20px;\n border-top: 2px solid black;\n font-style: italic;\n }\n\n #traceback {\n color: red;\n }\n </style>\n\n \n

403 Forbidden

\n

You are not authorized to access this resource.

\n
\n    
\n \n Powered by CherryPy 18.8.0\n \n
\n \n\n'

@amaltaro
Copy link
Contributor

amaltaro commented Feb 1, 2024

@anpicci I suspect you didn't rebase this PR properly. You can rebase your development branch with the same instructions provided in this link: https://steveklabnik.com/writing/how-to-squash-commits-in-a-github-pull-request
just remove the -i option (for interactive squashing).

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 tests no longer failing
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 9 warnings and errors that must be fixed
    • 1 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 7 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14808/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 2 new failures
    • 3 changes in unstable tests
  • Python3 Pylint check: failed
    • 10 warnings and errors that must be fixed
    • 1 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 7 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14809/artifact/artifacts/PullRequestReport.html

@anpicci
Copy link
Contributor Author

anpicci commented Feb 2, 2024

@amaltaro thanks to your correction, the error coming from MSPileup in the default test disappeared. Should I worry about the other errors in the default test?

@amaltaro
Copy link
Contributor

amaltaro commented Feb 2, 2024

I still see that this unit test is still failing:

    WMCore_t.WorkQueue_t.WorkQueue_t.WorkQueueTest:testPileupOnProduction changed from success to failure

While this one is likely unstable:

    WMCore_t.WorkQueue_t.WorkQueue_t.WorkQueueTest:testThrottling changed from success to failure

@amaltaro
Copy link
Contributor

amaltaro commented Feb 2, 2024

Giving it a second look, the first unit test is failing with:

root: ERROR: Error getting block location from MSPileup for /GammaGammaToEE_Elastic_Pt15_8TeV-lpair/Summer12-START53_V7C-v1/GEN-SIM: name 'dataItem' is not defined
Traceback (most recent call last):
  File "/build/cmsbld/jenkins/workspace/DMWM-WMCorePy3-PR-unittests/SLICE/8/label/cms-dmwm-cc7/code/src/python/WMCore/WorkQueue/Policy/Start/StartPolicyInterface.py", line 295, in getDatasetLocationsFromMSPileup
    queryDict = {'query': {'pileupName': dataItem},
NameError: name 'dataItem' is not defined

This error is reported in the middle of those 100s of log lines in this unit test: https://cmssdt.cern.ch/dmwm-jenkins/job/DMWM-WMCore-PR-test/14809/testReport/junit/WMCore_t.WorkQueue_t.WorkQueue_t/WorkQueueTest/testPileupOnProduction/

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 9 warnings and errors that must be fixed
    • 1 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 7 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14810/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 2 new failures
    • 3 changes in unstable tests
  • Python3 Pylint check: failed
    • 9 warnings and errors that must be fixed
    • 1 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 7 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14811/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 tests no longer failing
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 9 warnings and errors that must be fixed
    • 1 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 7 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14812/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 2 new failures
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 9 warnings and errors that must be fixed
    • 1 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 7 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14813/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 tests no longer failing
    • 3 changes in unstable tests
  • Python3 Pylint check: failed
    • 9 warnings and errors that must be fixed
    • 1 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 7 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14814/artifact/artifacts/PullRequestReport.html

@anpicci
Copy link
Contributor Author

anpicci commented Feb 3, 2024

With the last commit, the message in WMCore_t.WorkQueue_t.Policy_t.Start_t.Block_t.BlockTestCase:testPileupData
changed from:

root: INFO: pileupDatasets: {'https://cmsweb-prod.cern.ch/dbs/prod/global/DBSReader': {'/HighPileUp/Run2011A-v1/RAW'}}
root: INFO: datasetsWithDbsURL: [('https://cmsweb-prod.cern.ch/dbs/prod/global/DBSReader', '/HighPileUp/Run2011A-v1/RAW')]
root: INFO: dbsUrl: https://cmsweb-prod.cern.ch/dbs/prod/global/DBSReader
root: INFO: msPileupUrl: https://cmsweb-prod.cern.ch/ms-pileup/data/pileup
root: ERROR: Did not find any pileup document for query: {'pileupName': '/HighPileUp/Run2011A-v1/RAW'

to

root: INFO: pileupDatasets: {'https://cmsweb-prod.cern.ch/dbs/prod/global/DBSReader': {'/HighPileUp/Run2011A-v1/RAW'}}
root: INFO: datasetsWithDbsURL: [('https://cmsweb-prod.cern.ch/dbs/prod/global/DBSReader', '/HighPileUp/Run2011A-v1/RAW')]
root: INFO: dbsUrl: https://cmsweb-prod.cern.ch/dbs/prod/global/DBSReader
root: INFO: msPileupUrl: https://cmsweb-prod.cern.ch/ms-pileup/data/pileup
root: WARNING: Did not find any pileup document for query: {'pileupName': '/HighPileUp/Run2011A-v1/RAW'}

according to the modification I have made. This poses another question to me: Why do we get a 2011 file from here that is not present in the document returned by MSPileup, neither in prod nor in testbed? Is ignoring the pileup datasets from here that are not found in the MSPileup document correct?

In any case, the only error reported now as must be resolved is regarding WMCore_t.WorkQueue_t.WorkQueue_t/WorkQueueTest/testPileupOnProduction, but I am not sure how to fix it. I suspect it might relate to the missing pileup dataset.

@amaltaro

@amaltaro
Copy link
Contributor

amaltaro commented Feb 3, 2024

I see. In general, data location is resolved through Rucio, so far. And in many cases we are actually mocking calls to Rucio such that unit tests can reliable define the location for some of the DIDs that we use in the unit tests without actually making an HTTP request to a Rucio server.

I don't have an easy answer on how to mock that single MSPileup HTTP request. So we might have to change the pileup being used in that workqueue unit test to one of those that we actually have in MSPileup (we need to check if the workflow is using DBS - Rucio - testbed or production?). This might cause other issues with (un)mocked DBS data, but we will have to test that.

@anpicci
Copy link
Contributor Author

anpicci commented Feb 5, 2024

I see, is this a thing I can help with?

@amaltaro
Copy link
Contributor

amaltaro commented Feb 5, 2024

Yes, we have to adapt/fix that unit test according to the changes provided in here. If you are running unit tests over the docker image, you could adventure yourself on that code.

Otherwise I will try to pull your branch out and get that fixed, later in my day.

result[dataset] = doc['currentRSEs']
except IndexError:
self.logger.warning('Did not find any pileup document for query: %s', queryDict['query'])
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am really glad I had a careful look at these unit tests, otherwise a nasty bug would sneak through.
Please add

                result[dataset] = []  # equal to an empty currentRSEs

before the continue statement. Otherwise, workqueue elements would be completely missing the pileup information. This fixes one of the unit tests.

@amaltaro
Copy link
Contributor

amaltaro commented Feb 6, 2024

@anpicci Andrea, 2nd unit test can be fixed with the changes I provided in this file:
https://amaltaro.web.cern.ch/amaltaro/forAndreaP/Block_t.py

It is a sub-optimal fix though, given that it will only work inside this unit test module.

Ideally we should have a way to mock the module/function in https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/Services/MSPileup/MSPileupUtils.py, but AFAIU, we would have to convert it to a class to have it properly emulated under: https://github.com/dmwm/WMCore/tree/master/src/python/WMQuality/Emulators

Please request another review once you have applied these modifications (and try to keep src/* changes separated from test/*).

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 9 warnings and errors that must be fixed
    • 1 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 7 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14828/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor

amaltaro commented Feb 7, 2024

@anpicci it looks like we have a new unit test failing now. Let's try to resolve these as best as we can now. Then eventually we get back to this with this issue that I just created: #11891

PS.: I will have a look at the other failing unit test later and share any solutions that I might find.

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anpicci I left further comments for your consideration.

I am also in favor of moving forward with this PR, even if that means leaving a broken unit test behind:

WMCore_t.WorkQueue_t.WorkQueue_t.WorkQueueTest:testProcessingWithPileup changed from success to failure

For that, I think we could pull in @d-ylee to look at mocking MSPileup interaction.

@@ -12,6 +12,8 @@
from WMCore.WorkQueue.DataStructs.WorkQueueElement import WorkQueueElement
from WMCore.DataStructs.LumiList import LumiList
from WMCore.WorkQueue.WorkQueueExceptions import WorkQueueWMSpecError, WorkQueueNoWorkError
#from WMCore.MicroService.Tools.Common import getPileupDocs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to delete this line.

@@ -159,10 +161,15 @@ def __call__(self, wmspec, task, data=None, mask=None, team=None, continuous=Fal
self.validate()
try:
pileupDatasets = self.wmspec.listPileupDatasets()
self.logger.info(f'pileupDatasets: {pileupDatasets}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we already have some logging a few lines below, I would suggest to either remove or to set it as debug level.

#datasets = [d for prec in pileupDatasets.values() for d in prec]
#self.pileupData = self.getDatasetLocations(datasets)
### flattening the pileupDatasets to have (dbsurl, dataset) pairs to loop over in the getDatasetLocationsFromMSPileup
datasetsWithDbsURL = [(dbsUrl, dataset) for dbsUrl, datasets in pileupDatasets.items() for dataset in datasets]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for missing your reply. Yes, I would be in favor of constructing a code like:

for dbsUrl, datasets in pileupDatasets.items():
    # note that each workflow will be either configured to production or to testbed url, hence single iteration
    self.pileupData = self.getDatasetLocationsFromMSPileup(dbsUrl, datasets)

this way we:
a) don't need to refactor the data structure above (named as pileupDatasets)
b) we don't need to map the mspileup url for each pileup dataset
c) we don't need to set the mspileup url for each pileup dataset, as in general the majority (or all) will share the same url

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 9 warnings and errors that must be fixed
    • 1 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 7 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14834/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 2 new failures
    • 1 tests no longer failing
    • 3 changes in unstable tests
  • Python3 Pylint check: failed
    • 9 warnings and errors that must be fixed
    • 1 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 6 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14835/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 2 new failures
    • 1 tests no longer failing
    • 3 changes in unstable tests
  • Python3 Pylint check: failed
    • 12 warnings and errors that must be fixed
    • 1 warnings
    • 58 comments to review
  • Pylint py3k check: failed
    • 1 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 7 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14836/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 12 warnings and errors that must be fixed
    • 1 warnings
    • 58 comments to review
  • Pylint py3k check: failed
    • 1 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 7 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14837/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 2 new failures
    • 1 tests no longer failing
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 12 warnings and errors that must be fixed
    • 1 warnings
    • 58 comments to review
  • Pylint py3k check: failed
    • 1 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 7 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14838/artifact/artifacts/PullRequestReport.html

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anpicci Andrea, please see my comments along the code. Once you provide those changes, feel free to have them already squashed with the first commit. Thanks!

datasets = [d for prec in pileupDatasets.values() for d in prec]
self.pileupData = self.getDatasetLocations(datasets)
#datasets = [d for prec in pileupDatasets.values() for d in prec]
#self.pileupData = self.getDatasetLocations(datasets)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to completely delete these 2 lines.

Returns a dictionary with the location of the datasets according to MSPileup
The definition of "location" here is a union of all sites holding at least
part of the dataset (defined by the DATASET grouping).
:param datasets: list of datasets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the list of parameters in the docstring.
I think the definition of "location" is now outdated as well, so you might want to delete it and perhaps rephrase the first sentence to something like "Returns a dictionary with the current location of the datasets according to MSPileup".

self.logger.debug(f'doc: {doc}')
if len(currentRSEs) == 0:
self.logger.warning(f'No RSE has a copy of the desired pileup dataset. Expected RSEs: {doc["expectedRSEs"]}')
self.logger.info(f'locationsFromPileup - name: {dataset}, currentRSEs: {doc["currentRSEs"]}, containerFraction: {doc["containerFraction"]}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe delete this logging line in favor of the one above that I will leave a comment.

self.logger.debug(f'msPileupUrl: {msPileupUrl}')
doc = getPileupDocs(msPileupUrl, queryDict, method='POST')[0]
currentRSEs = doc['currentRSEs']
self.logger.debug(f'doc: {doc}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest setting this one to info level and print something like Retrieved MSPileup document: {doc}

except IndexError:
self.logger.warning('Did not find any pileup document for query: %s', queryDict['query'])
result[dataset] = []
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

continue statement can actually be removed here, given that it will anyhow move to the next iteration.

'filters': ['expectedRSEs', 'currentRSEs', 'pileupName', 'containerFraction', 'ruleIds']}
pileUpinstance = '-testbed' if 'cmsweb-testbed' in dbsUrl else '-prod'
msPileupUrl = f"https://cmsweb{pileUpinstance}.cern.ch/ms-pileup/data/pileup"
self.logger.debug(f'dbsUrl: {dbsUrl}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps merge these 2 debugging records as well in a single one, e.g.: "Using DbsUrl: {dbsUrl} and MSPileup URL: {msPileupUrl}".
Also move it under the first for loop, given that it will not change between datasets of the same url instance.

try:
queryDict = {'query': {'pileupName': dataset},
'filters': ['expectedRSEs', 'currentRSEs', 'pileupName', 'containerFraction', 'ruleIds']}
pileUpinstance = '-testbed' if 'cmsweb-testbed' in dbsUrl else '-prod'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 2 lines with the url logic don't need to be executed for each dataset. So they can move under the first for loop instead of the inner one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this would imply replacing dataset with datasets in these lines. I am wondering if this further modification would work or not. My original idea relates to the implementation made by @d-ylee here of the MSPileupUtils

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion is to implement something like:

for dbsUrl, datasets in datasetsWithDbsURL.items():
    pileUpinstance = '-testbed' if 'cmsweb-testbed' in dbsUrl else '-prod'
    msPileupUrl = f"https://cmsweb{pileUpinstance}.cern.ch/ms-pileup/data/pileup"
    self.logger.info(f'Will fetch {len(datasets)} from MSPileup url: {dbsUrl}')
    for dataset in datasets:
        queryDict = {'query': {'pileupName': dataset},
            'filters': ['expectedRSEs', 'currentRSEs', 'pileupName', 'containerFraction', 'ruleIds']}
        try:
            doc = getPileupDocs(msPileupUrl, queryDict, method='POST')[0]
 ....

does it make sense to you?

…ation in StartPolicyInterface dmwm#11620

Modifying the implementation of MSPileup in StartPolicyInterface dmwm#11620

Implementation of MSPileupUtils in StartPolicyInterface dmwm#11620

Fixing a bug in getDatasetLocationsFromMSPileup dmwm#11620

Fixing again bugs in getDatasetLocationsFromMSPileup dmwm#11620

Fixing again bugs in getDatasetLocationsFromMSPileup dmwm#11620

Trying to debug Jenkins for getDatasetLocationsFromMSPileup dmwm#11620

Trying to debug Jenkins for getDatasetLocationsFromMSPileup dmwm#11620

Trying to debug Jenkins for getDatasetLocationsFromMSPileup dmwm#11620

Fixing an important bug in StartPolicyInterface dmwm#11620

Addressing Alan's comments for StartPolicyInterface dmwm#11620

Optimization of the new function in StartPolicyInterface dmwm#11620
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 tests no longer failing
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 9 warnings and errors that must be fixed
    • 1 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 4 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14840/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 tests no longer failing
    • 3 changes in unstable tests
  • Python3 Pylint check: failed
    • 9 warnings and errors that must be fixed
    • 1 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 4 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14841/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 tests no longer failing
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 12 warnings and errors that must be fixed
    • 1 warnings
    • 58 comments to review
  • Pylint py3k check: failed
    • 1 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 5 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14842/artifact/artifacts/PullRequestReport.html

@anpicci anpicci requested a review from amaltaro February 9, 2024 13:51
Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Andrea, it looks good to me.

@amaltaro amaltaro merged commit 775be59 into dmwm:master Feb 9, 2024
1 of 4 checks passed
amaltaro added a commit to amaltaro/WMCore that referenced this pull request Feb 21, 2024
amaltaro added a commit that referenced this pull request Feb 21, 2024
Fix variable name in log record; complement to #11879
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor pileup data location in global workqueue to support partial availability
3 participants