Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation for WorkflowUpdater component #11795

Merged
merged 4 commits into from
Nov 22, 2023

Conversation

amaltaro
Copy link
Contributor

@amaltaro amaltaro commented Nov 14, 2023

Fixes #11733

Status

Ready

Description

This pull request provides the foundation of a new component called WorkflowUpdater. Further details are:

  • component connects to the same Rucio instance as the one used in WorkQueueManager, using the the usual WMAgent account (either wma_test or wma_prod) account
  • component talks to MSPileup to retrieve a snapshot of the pileups in the system
  • component talks to Rucio to retrieve a list of blocks for a given container name
  • it fetches a list of active workflows in the agent, based on wmbs_subscriptions and matching only Production/Processing jobs.
  • load the active workflows spec to see which one is requesting pileup or not
  • to be implemented: update the workflow sandbox with the relevant pileup information

In addition, there are deployment changes and it also requires changes to the secrets file with new urls.

Extra, tell git to ignore .vscode directory/files in the repository (for VSCode users).

NOTE that this component is expected to be functional, but still missing its real functionality as other tickets still need to be addressed.

Is it backward compatible (if not, which system it affects?)

NO, new component!

Related PRs

None

External dependencies / deployment changes

It has 3 dependencies so far:

  1. A new secrets parameter called MSPILEUP_URL
  2. Depends on this deployment update: Update WMAgent deployment to parse MSPILEUP_URL - take2 deployment#1293
  3. And the new pileup data structure using customName, provided in: Add support for customName (custom container DID) #11765

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 tests no longer failing
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 8 warnings and errors that must be fixed
    • 3 warnings
    • 42 comments to review
  • Pylint py3k check: failed
    • 1 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 7 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14625/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 9 warnings and errors that must be fixed
    • 3 warnings
    • 43 comments to review
  • Pylint py3k check: failed
    • 1 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 8 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14638/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 tests no longer failing
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 9 warnings and errors that must be fixed
    • 4 warnings
    • 44 comments to review
  • Pylint py3k check: failed
    • 1 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 8 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14641/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 2 new failures
    • 4 changes in unstable tests
  • Python3 Pylint check: failed
    • 9 warnings and errors that must be fixed
    • 4 warnings
    • 44 comments to review
  • Pylint py3k check: failed
    • 1 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 8 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14642/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 tests no longer failing
    • 3 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 11 warnings and errors that must be fixed
    • 4 warnings
    • 55 comments to review
  • Pylint py3k check: failed
    • 1 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 10 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14648/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 4 new failures
    • 13 tests deleted
    • 1 tests no longer failing
    • 2 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 78 warnings and errors that must be fixed
    • 10 warnings
    • 275 comments to review
  • Pylint py3k check: failed
    • 1 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 374 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14649/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 8 tests deleted
    • 1 tests no longer failing
    • 2 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 81 warnings and errors that must be fixed
    • 10 warnings
    • 297 comments to review
  • Pylint py3k check: failed
    • 1 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 377 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14650/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor Author

test this please

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 8 tests deleted
    • 1 tests no longer failing
    • 2 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 81 warnings and errors that must be fixed
    • 10 warnings
    • 297 comments to review
  • Pylint py3k check: failed
    • 1 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 377 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14651/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 8 tests deleted
    • 1 tests no longer failing
    • 2 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 81 warnings and errors that must be fixed
    • 10 warnings
    • 297 comments to review
  • Pylint py3k check: failed
    • 1 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 377 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14652/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 8 tests deleted
    • 1 tests no longer failing
    • 1 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 81 warnings and errors that must be fixed
    • 10 warnings
    • 297 comments to review
  • Pylint py3k check: failed
    • 1 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 377 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14653/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor Author

I think this PR is good for a review. I am still working on the unit test, but I don't plan to make any other major changes in this pull request. @todor-ivanov @vkuznet please have a look whenever you can.

"""
# call the base class
super(Harness, self).__init__()
print("WorkflowUpdater.__init__")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need print here and why not to use logger instead?

thisPU = {"pileupName": puItem['pileupName'],
"customName": puItem['customName'],
"rses": puItem['currentRSEs'],
"blocks": []}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that blocks all the time will be an empty list, why do you need to construct it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My goal is to populate this variable with a list of block names in a future pull request. Just so we have a data structure with all the relevant information for a given pileup dataset.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case please change docstring accordingly to avoid confusion. For instance, I suggest to make concrete estimates using approximate time spent in rucio, e.g. O(1-10sec), multiplied by total number of given workflows, i.e. how much time we'll spend for 100 workflows (which is what I expect as average).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the docstring. I can also add a comment of expected time to be spent on that, but as things change, I am sure it will simply become outdated and not meaningful.

still unfinished, then it is considered an unfinished/active workflow.
"""

sql = """SELECT wmbs_workflow.name, wmbs_workflow.spec
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that it may be late in a WMCore game, but all SQL statement should not be hard-coded in the code, it is much better to keep them in templates such that we may easily change them without touch the code (for the record this is done in dbs2go codebase). This also allows to profile SQL easily and benchmark it without having a code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have already seen a GH reporting it, if not, do you feel like opening a dedicated issue for tackling this? I'd rather have a dedicated issue such that, whenever we change how DAOs are defined, that we make it consistency across the code (and potentially involving T0 as well).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it can go to separate issue, and the way to make it backward compatible is the following. The DAO APIs should take optional file parameter and read from a file a template. Then, the refactoring can be done to move SQL statements into templates. I'm fine to have it as a separate issue though.

Oracle implementation of Workflow.GetUnfinishedWorkflows
"""

from WMCore.WMBS.MySQL.Workflow.GetUnfinishedWorkflows import GetUnfinishedWorkflows as MySQLGetUnfinishedWorkflows
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably you need to break line to make it more readable and pylint/pep complaint to 80 characters.

workflow=mergeWorkflow,
split_algo="ParentlessMergeBySize")

file1 = File(lfn="file1", size=1024, events=1024, first_event=0, locations={"T2_CH_CERN"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since locations dict does not change why not to define it as local variable and then use it everywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me change it real quick.

Copy link
Contributor

@todor-ivanov todor-ivanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 8 tests deleted
    • 1 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 81 warnings and errors that must be fixed
    • 10 warnings
    • 296 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 376 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14654/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 8 tests deleted
    • 1 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 81 warnings and errors that must be fixed
    • 10 warnings
    • 296 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 376 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14655/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor Author

@vkuznet Valentin, I have not yet squashed those commits. Please review the code again and then I can squash it. Or let me know if you want me to squash it before your review. Thanks

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 2 new failures
    • 8 tests deleted
    • 1 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 81 warnings and errors that must be fixed
    • 10 warnings
    • 296 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 376 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14656/artifact/artifacts/PullRequestReport.html

fix default config and polling cycle

Parse new MSPILEUP_URL component configuration
puWflows = self.findWflowsWithPileup(wflowSpecs)

# otherwise, move on retrieving pileups
pileupMap = self.getPileupDocs()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If pileupMap is not used in return there is no need to make this local variable, you should either change it to _ or not assign anything at all from right hand side of this expression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we work in the other tickets relevant to this component, it will be used.

thisPU = {"pileupName": puItem['pileupName'],
"customName": puItem['customName'],
"rses": puItem['currentRSEs'],
"blocks": []}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case please change docstring accordingly to avoid confusion. For instance, I suggest to make concrete estimates using approximate time spent in rucio, e.g. O(1-10sec), multiplied by total number of given workflows, i.e. how much time we'll spend for 100 workflows (which is what I expect as average).

:param listSpecs: a list of dictionary with workflow name and spec path
:return: a list with the workflow names that require pileup
"""
wflowsWithPU = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on your numbers the list will be 8.8KB and would be copied from place to place. And, you can achieve len(gen) simply adding another local counter variable. As I said I'm not against the list but I have no idea how it will be used afterwards and I'm trying to estimate the code of having large list go through multiple function calls.

def findRucioBlocks(self, uniquePUList, pileupMap):
"""
Given a list of unique pileup dataset names, list all of
their blocks in Rucio
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to add appropriate comment into doc string about potential slowness of this function especially with increasing number of processed workflows. Moreover, I suggest to add @timeFunction decorator to explicitly dump time spent in this function such that we we'll not be guessing but rather can look it up from the log after operations.

self.testInit.generateWorkDir(config)

# First the general stuff
config.section_("General")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to point out as a general comment here. Usage of custom WMCore configuration significantly impact code portability to other frameworks, like Flask, etc. There is no need to use it everywhere, and more standard format for configuration, like, .ini, .json, .yaml is a better choice. Having relying on custom format add extra layer of dependency without much of gain in code structure or functionality, and make it impossible to port (parts of) code to other languages which does not have WMCore python based configuration. Said that, there is nothing here to fix or address, and my comment is a general observation.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 8 tests deleted
    • 1 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 81 warnings and errors that must be fixed
    • 10 warnings
    • 296 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 376 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14657/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor Author

@vkuznet I addressed your relevant concerns in the last commit (to be eventually squashed).
I have also updated the initial description making a note that this implementation is incomplete. The component is supposed to be functional with these changes, but not performing all the tasks that it's planned to (as those require further developments tracked in different tickets).

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 8 tests deleted
    • 1 tests added
    • 3 changes in unstable tests
  • Python3 Pylint check: failed
    • 78 warnings and errors that must be fixed
    • 10 warnings
    • 296 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 376 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14658/artifact/artifacts/PullRequestReport.html

Copy link
Contributor

@vkuznet vkuznet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alan, thanks for addressing the issues I pointed out, the code looks fine on my side and you can proceed (I do not know if your intention to merge or work on it, that's why I only put a comment right now).

@amaltaro
Copy link
Contributor Author

Thank you for the prompt reviews, Valentin. Yes, the idea is to get it merged and resume activities on this component and other related tickets.

I am going to squash the commits accordingly; and also get some pylint fixed/refactored to make it look better.

New DAOs for finding active workflows

Return spec path from the DAO

Load spec file and find whether pileup is required or not

fix logger object when instantiating Rucio

Change Rucio account to wma_test

Valentins suggestions, part 1

use GET method instead of POST

Fix some data structures; time findRucioBlocks

pylint fixes in WorkflowUpdater
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 8 tests deleted
    • 1 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 72 warnings and errors that must be fixed
    • 10 warnings
    • 288 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 13 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14659/artifact/artifacts/PullRequestReport.html

unit tests - rename testWorkload function by createTestWorkload

fix unit tests calls and imports

use wma_test in unit tests

Valentins unit test suggestions

pylint fixes for test package
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 8 tests deleted
    • 1 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 72 warnings and errors that must be fixed
    • 10 warnings
    • 288 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 13 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14660/artifact/artifacts/PullRequestReport.html

@amaltaro amaltaro merged commit a3a2b0c into dmwm:master Nov 22, 2023
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create new WMAgent component to perform pileup and workflow sandbox updates: WorkflowUpdater
4 participants