Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuous update of the pileup availability #11884

Merged
merged 2 commits into from
Feb 26, 2024

Conversation

vkuznet
Copy link
Contributor

@vkuznet vkuznet commented Feb 1, 2024

Fixes #11619

Status

In development

Description

Implement logic for continious update of pileup availability based on #11619 (comment)

Main logic of the algorithm:

  • for every pileup record find out location of tarball
  • get configuration files within tarball
  • update pielup conf content with new MSPileup info using this algorithm:
    • compare json conf blocks with those found in msPUBlockLoc
      • if block is found in msPUBlockLog we keep it and update its rses from ones found in msPUBlockLoc entry
      • if block does not exists in msPUBLockLoc we discard
    • if block from msPUBLockLoc is not found in json conf block list
  • efficiently (i.e. only once) update pileupconf.json files within tarball

Is it backward compatible (if not, which system it affects?)

YES

Related PRs

External dependencies / deployment changes

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 tests no longer failing
    • 7 tests added
  • Python3 Pylint check: failed
    • 10 warnings and errors that must be fixed
    • 2 warnings
    • 34 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 12 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14802/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 tests no longer failing
    • 7 tests added
  • Python3 Pylint check: failed
    • 3 warnings and errors that must be fixed
    • 2 warnings
    • 34 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 12 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14803/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 2 new failures
    • 1 tests no longer failing
    • 7 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 2 warnings and errors that must be fixed
    • 2 warnings
    • 33 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 5 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14806/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 2 new failures
    • 1 tests no longer failing
    • 7 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 2 warnings and errors that must be fixed
    • 2 warnings
    • 33 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 3 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14807/artifact/artifacts/PullRequestReport.html

@vkuznet vkuznet requested a review from amaltaro February 1, 2024 19:11
@vkuznet vkuznet changed the title Fix issue 11619 Continuous update of the pileup availability Feb 1, 2024
Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valentin, I have not yet reviewed the actual logic of pileup json update. However, I left some reviews for your consideration.

Please also update the PR description with your own words of what is provided in this PR. There is no need for a very detailed description, but things like "Util method to find pileup json file in workflow sandbox; Fetch pileup data location from MSPileup; etc etc".

@@ -101,8 +289,49 @@ def algorithm(self, parameters=None):
with CodeTimer("Rucio block resolution", logger=logging):
self.findRucioBlocks(uniqueActivePU, msPileupList)

# TODO in future tickets: find the json files in the spec/sandbox area and tweak them
# define where to look for sandbox tar balls
idir = getattr(self.config.WorkflowUpdater, "componentDir")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this to the __init__ method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? The point is that I don't know what happen at run-time and when this area is created. In other words, does it part of WMA installation or is it part of run-time, i.e. when first job is processed. I think it belongs in a right place since I line below I check if it exists, and use '/data/srv/wmagent/current/install/wmagentpy3' as fall back area. And if neither exist the algorithm cycle fails with proper log error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this attribute does not exist, the component will actually fail to even start. This is not a runtime attribute, it is a deployment configuration/attribute. In other words, it is defined only once in an agent lifetime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alan, it is not about component but rather about directory. Does directory exist before run time?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it does. I am only saying that whenever you read the component configuration, the ideal place for that is in the init method (or setup, when it exists).

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 5 warnings and errors that must be fixed
    • 2 warnings
    • 34 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 4 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14817/artifact/artifacts/PullRequestReport.html

@vkuznet vkuznet requested a review from amaltaro February 5, 2024 18:26
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 7 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 5 warnings and errors that must be fixed
    • 2 warnings
    • 34 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 5 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14818/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor

amaltaro commented Feb 5, 2024

Update the PR description, please.

@vkuznet
Copy link
Contributor Author

vkuznet commented Feb 5, 2024

PR description is updated.

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valentin, in addition to the comments made along the code, please do use the report provided by Jenkins, as you will see that it points to errors in the current code.

I have to say that I am not very fond of using temporary directory for expanding the current sandbox, making the necessary updates, and compressing things again. I feel like it can one day give us a hard time. The only way that I can think of to replace that would be to completely refactor how pileup is transferred and accessed inside the job...

src/python/Utils/FileTools.py Outdated Show resolved Hide resolved
src/python/Utils/FileTools.py Outdated Show resolved Hide resolved
src/python/Utils/FileTools.py Outdated Show resolved Hide resolved
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 7 tests added
  • Python3 Pylint check: failed
    • 6 warnings and errors that must be fixed
    • 2 warnings
    • 34 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 5 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14823/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 2 warnings and errors that must be fixed
    • 2 warnings
    • 34 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 5 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14824/artifact/artifacts/PullRequestReport.html

@vkuznet
Copy link
Contributor Author

vkuznet commented Feb 6, 2024

Alan, original PR was checked against Jenkins, only when you asked to re-arrange stuff I didn't check them since you said it was not complete review. Said that, jenkins now are passed where I filled out missing imports due to code re-arrangements.

In your review you asked for few optimization which I find premature. I tried to address all your comments to state my view on this subject.

@vkuznet vkuznet requested a review from amaltaro February 6, 2024 14:17
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 2 tests no longer failing
    • 7 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 2 warnings and errors that must be fixed
    • 2 warnings
    • 34 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 5 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14827/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor

amaltaro commented Feb 7, 2024

Valentin, I think we covered these details last time we had a chat, but let me write a not complete and detailed logic here:

  1. check if the agent has active workflows with active pileup
  2. for each active pileup, resolve its location through MSPileup (considering customName whenever needed)
  3. for each pileup - or customName - find all the rucio datasets (cms blocks) attached to it
  4. now for each workflow spec path (where the workflow sandbox is) with active pileup(s), do the following:
  • first check if the pileupconf.json belongs to the same container (it is rare, but a workflow could have multiple pileups, hence multiple pileupconf.json for the different pileups)
  • second, check if the number of blocks in the existent pileupconf.json is equal to the number of blocks retrieved from Rucio (from step 3 above). If not, we must update pileupconf.json; otherwise, we might have to, check bullet below
  • third, check if the location of each block in pileupconf.json is equal to the location from MSPileup (from step 2 above). If any is different, we must update the pileupconf.json
  1. if any of the pileupconf.json of a given workflow was updated, we need to recreate the workflow tarball

Some of this is already implemented in master, others are already implemented in this PR. But I believe there are changes still to be done in here.

Regarding how exactly pileupconf.json needs to be updated, I think we discussed that and your PR seems to be correct so far.

@vkuznet
Copy link
Contributor Author

vkuznet commented Feb 7, 2024

Alan, I think I resolved all your concerns with latest commits. I also, squashed them. Meanwhile, thanks fore logic reminder in your #11884 (comment) but according to my understanding it is fully implemented. If something is missing please let me know explicitly. Please note, I'm still awaiting jenkins tests and if something will pop-up I'll resolve it.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 7 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 2 warnings and errors that must be fixed
    • 2 warnings
    • 34 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 5 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14829/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 7 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 4 warnings and errors that must be fixed
    • 2 warnings
    • 44 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 5 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14830/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 tests no longer failing
    • 8 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 10 warnings and errors that must be fixed
    • 2 warnings
    • 62 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14884/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor

Valentin, according to our meeting yesterday, I understand this is ready for another review. If so, then please request it through the GH interface.

Before that though, I would suggest you to review some of the comments I left in the code, as I see there are still changes to be applied to this code.

In addition, I see a trend of modifying the source code (src/python/*) to be able to test and run unit tests. To me, the code becomes unnecessary more confusing and I would suggest to actually adopt the emulators. If the emulator does not exist, we should create it. If it does not provide the data and/or method that we need, we need to extend it. If needed, that can also be tracked in a different issue.

For instance, the Rucio emulator can be found here: https://github.com/dmwm/WMCore/blob/master/src/python/WMQuality/Emulators/RucioClient/MockRucioApi.py

we can update the mocked data by updating and running this script: https://github.com/dmwm/WMCore/blob/master/src/python/WMQuality/Emulators/RucioClient/MakeRucioMockFile.py

and this is where we are currently mocking Rucio: https://github.com/dmwm/WMCore/blob/master/src/python/WMQuality/Emulators/EmulatedUnitTestCase.py#L58

The last bit of change that you need, is to update the unit test class to actually inherit from EmulatedUnitTestCase instead of unittest.TestCase.

I think @d-ylee is going to create a simple documentation for that, given that we discussed basically the same thing yesterday.

@vkuznet
Copy link
Contributor Author

vkuznet commented Feb 20, 2024

Alan, yes the PR is ready for review, and I did resolved to the best of my ability all requested issues. Said that, even though there are general guidelines I think not everything can be aligned with them. As I pointed out in this PR the test of overwriting the existing tarball requires to write it to another place, that's why additional parameter to source code was added. I did use Rucio Mock APIs but due to limitation of current configuration (it is impossible to pass to config an actual python object) I used similar pattern of adding optional parameter (by the way I've done it in a similar way as we do in MSPileup codebase). This certainly falls into category of adding addition to source code but I can't find any other way to keep it only in test codebase.

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valentin, if you use the mock.patch according to what I suggested in my previous reply, there should be no need to pass in a new rucio object instance.

Please review your code and patch objects accordingly, such that there is no need to source code changes to accommodate unit tests.

@vkuznet
Copy link
Contributor Author

vkuznet commented Feb 20, 2024

Alan, thanks for pointers. Honestly, I missed that due to annoying GitHub feature which hide (collapse) long discussions and what you pointed out was hidden when I was reviewed the other changes. I'll take take care of those. And, I'll try to use mock.patch (which is new to me) to resolve extra parameter.

Said that, before I was awaiting your reply, I was working on implementation of block info via concurrent DBS calls, if you can review this gist I can insert this code already in this PR, since I tested and benchmarked it in local environment. For 463 blocks it takes 9 seconds to get all files and number of events.

@vkuznet
Copy link
Contributor Author

vkuznet commented Feb 20, 2024

Alan, I unload collapsed GH threads and resolved all issues, only one required removal of obsolete function. Apart from that I addressed all issues already. I'll work on mocking interface now.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 tests no longer failing
    • 8 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 10 warnings and errors that must be fixed
    • 2 warnings
    • 62 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14893/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 tests no longer failing
    • 8 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 12 warnings and errors that must be fixed
    • 2 warnings
    • 66 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14894/artifact/artifacts/PullRequestReport.html

@vkuznet
Copy link
Contributor Author

vkuznet commented Feb 20, 2024

Alan, mock.patch is in place, but for the record I did used EmulatedUnitTestCase already in my unit test. The reason I used Rucio client and passed it around because I was unaware that mack.patch applies. But it only applies to listed classes in EmulatedUnitTestCase.py module. Because we added rucio to WorkflowUpdaterPoller I was need to add 'WMComponent.WorkflowUpdater.WorkflowUpdaterPoller.Rucio' to emulated class and now Rucio works withmock.patch`. I simply was unaware of that.

Said that:

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valentin, please find my review along the code.

self.rucio = None
# define where to look for sandbox tar balls
self.sandboxDir = getattr(self.config.WorkflowUpdater, "sandboxDir")
if not os.path.exists(self.sandboxDir):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that the workflow spec parameter is already giving us the location of the tarball (and WMSandbox). So there is no need for us to have this variable defined here.

@amaltaro
Copy link
Contributor

For your gist, I would suggest to have it in the separate PR - especially because now we also have a separated GH issue for that. However, I can already see that your code is inconsistent with the implementation in https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WMSpec/Steps/Fetchers/PileupFetcher.py, given that your gist does not check whether a file is valid or not. There could be more, but let's wait for the actual review.

Side note: for that implementation, I think we could - eventually - start porting other methods used by the microservices, available in this common module: https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/MicroService/Tools/Common.py. Said that, I think we could have a new module created under WMCore/Services/DBS for the pycurl based DBS queries.

@vkuznet
Copy link
Contributor Author

vkuznet commented Feb 22, 2024

For your gist, I would suggest to have it in the separate PR - especially because now we also have a separated GH issue for that. However, I can already see that your code is inconsistent with the implementation in https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WMSpec/Steps/Fetchers/PileupFetcher.py, given that your gist does not check whether a file is valid or not. There could be more, but let's wait for the actual review.

First look at Fetchers reveals its main issue, it is sequential codebase. Therefore, if you want to spent for DBS calls large number of time to get block info from large number of blocks we can use it. This implementation is linear scale, i.e. if we have 1K blocks I bet it will spend minutes to get info about them (may be close to an hour). I provided concurrent version which has look-up time close to single DBS query. Regarding valid files, it is correct observation and it can be done either via DBS query parameter (I need to check this) or by filtering output results to yield only valid files.

Side note: for that implementation, I think we could - eventually - start porting other methods used by the microservices, available in this common module: https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/MicroService/Tools/Common.py. Said that, I think we could have a new module created under WMCore/Services/DBS for the pycurl based DBS queries.

Yes, I think it would be good idea to put this into separate module. But I suggest to not use DBS Client (DBSReader) since it is sequential codebase, and rather create concurrent functions via getdata function of pyrcurl manager.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 8 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 10 warnings and errors that must be fixed
    • 2 warnings
    • 64 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 1 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14907/artifact/artifacts/PullRequestReport.html

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vkuznet please find further review along the code. Before pushing in your upcoming changes, please squash commits accordingly.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 8 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 10 warnings and errors that must be fixed
    • 2 warnings
    • 64 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 1 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14914/artifact/artifacts/PullRequestReport.html

@vkuznet
Copy link
Contributor Author

vkuznet commented Feb 26, 2024

Alan, I addressed your last review feedback and updated the code. The Jenkins are fine. I hope this is the last iteration. Feel free to look-up again.

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vkuznet Valentin, it's looking good to me. Please squash those commits accordingly and we can merge it.

@vkuznet
Copy link
Contributor Author

vkuznet commented Feb 26, 2024

Squashing is done.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 8 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 10 warnings and errors that must be fixed
    • 2 warnings
    • 64 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 1 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14918/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor

Thanks Valentin

@amaltaro amaltaro merged commit a4bc374 into dmwm:master Feb 26, 2024
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WMAgent: continuous update of the pileup availability
3 participants