Initial implementation for WorkflowUpdater component #11795

amaltaro · 2023-11-14T21:59:28Z

Fixes #11733

Status

Ready

Description

This pull request provides the foundation of a new component called WorkflowUpdater. Further details are:

component connects to the same Rucio instance as the one used in WorkQueueManager, using the the usual WMAgent account (either wma_test or wma_prod) account
component talks to MSPileup to retrieve a snapshot of the pileups in the system
component talks to Rucio to retrieve a list of blocks for a given container name
it fetches a list of active workflows in the agent, based on wmbs_subscriptions and matching only Production/Processing jobs.
load the active workflows spec to see which one is requesting pileup or not
to be implemented: update the workflow sandbox with the relevant pileup information

In addition, there are deployment changes and it also requires changes to the secrets file with new urls.

Extra, tell git to ignore .vscode directory/files in the repository (for VSCode users).

NOTE that this component is expected to be functional, but still missing its real functionality as other tickets still need to be addressed.

Is it backward compatible (if not, which system it affects?)

NO, new component!

Related PRs

None

External dependencies / deployment changes

It has 3 dependencies so far:

A new secrets parameter called MSPILEUP_URL
Depends on this deployment update: Update WMAgent deployment to parse MSPILEUP_URL - take2 deployment#1293
And the new pileup data structure using customName, provided in: Add support for customName (custom container DID) #11765

cmsdmwmbot · 2023-11-14T22:08:27Z

Jenkins results:

Python3 Unit tests: failed
- 1 new failures
- 1 tests no longer failing
- 1 changes in unstable tests
Python3 Pylint check: failed
- 8 warnings and errors that must be fixed
- 3 warnings
- 42 comments to review
Pylint py3k check: failed
- 1 errors and warnings that should be fixed
Pycodestyle check: succeeded
- 7 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14625/artifact/artifacts/PullRequestReport.html

cmsdmwmbot · 2023-11-17T11:39:31Z

Jenkins results:

Python3 Unit tests: failed
- 1 new failures
- 2 changes in unstable tests
Python3 Pylint check: failed
- 9 warnings and errors that must be fixed
- 3 warnings
- 43 comments to review
Pylint py3k check: failed
- 1 errors and warnings that should be fixed
Pycodestyle check: succeeded
- 8 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14638/artifact/artifacts/PullRequestReport.html

cmsdmwmbot · 2023-11-17T20:11:39Z

Jenkins results:

Python3 Unit tests: succeeded
- 1 tests no longer failing
- 2 changes in unstable tests
Python3 Pylint check: failed
- 9 warnings and errors that must be fixed
- 4 warnings
- 44 comments to review
Pylint py3k check: failed
- 1 errors and warnings that should be fixed
Pycodestyle check: succeeded
- 8 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14641/artifact/artifacts/PullRequestReport.html

cmsdmwmbot · 2023-11-18T13:32:02Z

Jenkins results:

Python3 Unit tests: failed
- 2 new failures
- 4 changes in unstable tests
Python3 Pylint check: failed
- 9 warnings and errors that must be fixed
- 4 warnings
- 44 comments to review
Pylint py3k check: failed
- 1 errors and warnings that should be fixed
Pycodestyle check: succeeded
- 8 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14642/artifact/artifacts/PullRequestReport.html

cmsdmwmbot · 2023-11-20T21:21:26Z

Jenkins results:

Python3 Unit tests: failed
- 1 tests no longer failing
- 3 tests added
- 1 changes in unstable tests
Python3 Pylint check: failed
- 11 warnings and errors that must be fixed
- 4 warnings
- 55 comments to review
Pylint py3k check: failed
- 1 errors and warnings that should be fixed
Pycodestyle check: succeeded
- 10 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14648/artifact/artifacts/PullRequestReport.html

cmsdmwmbot · 2023-11-21T02:34:39Z

Jenkins results:

Python3 Unit tests: failed
- 4 new failures
- 13 tests deleted
- 1 tests no longer failing
- 2 tests added
- 2 changes in unstable tests
Python3 Pylint check: failed
- 78 warnings and errors that must be fixed
- 10 warnings
- 275 comments to review
Pylint py3k check: failed
- 1 errors and warnings that should be fixed
Pycodestyle check: succeeded
- 374 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14649/artifact/artifacts/PullRequestReport.html

cmsdmwmbot · 2023-11-21T02:56:14Z

Jenkins results:

Python3 Unit tests: failed
- 1 new failures
- 8 tests deleted
- 1 tests no longer failing
- 2 tests added
- 2 changes in unstable tests
Python3 Pylint check: failed
- 81 warnings and errors that must be fixed
- 10 warnings
- 297 comments to review
Pylint py3k check: failed
- 1 errors and warnings that should be fixed
Pycodestyle check: succeeded
- 377 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14650/artifact/artifacts/PullRequestReport.html

amaltaro · 2023-11-21T02:57:12Z

test this please

cmsdmwmbot · 2023-11-21T03:05:14Z

Jenkins results:

Python3 Unit tests: failed
- 1 new failures
- 8 tests deleted
- 1 tests no longer failing
- 2 tests added
- 2 changes in unstable tests
Python3 Pylint check: failed
- 81 warnings and errors that must be fixed
- 10 warnings
- 297 comments to review
Pylint py3k check: failed
- 1 errors and warnings that should be fixed
Pycodestyle check: succeeded
- 377 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14651/artifact/artifacts/PullRequestReport.html

cmsdmwmbot · 2023-11-21T03:19:39Z

Jenkins results:

Python3 Unit tests: failed
- 8 tests deleted
- 1 tests no longer failing
- 2 tests added
- 2 changes in unstable tests
Python3 Pylint check: failed
- 81 warnings and errors that must be fixed
- 10 warnings
- 297 comments to review
Pylint py3k check: failed
- 1 errors and warnings that should be fixed
Pycodestyle check: succeeded
- 377 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14652/artifact/artifacts/PullRequestReport.html

cmsdmwmbot · 2023-11-21T03:32:35Z

Jenkins results:

Python3 Unit tests: failed
- 8 tests deleted
- 1 tests no longer failing
- 1 tests added
- 2 changes in unstable tests
Python3 Pylint check: failed
- 81 warnings and errors that must be fixed
- 10 warnings
- 297 comments to review
Pylint py3k check: failed
- 1 errors and warnings that should be fixed
Pycodestyle check: succeeded
- 377 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14653/artifact/artifacts/PullRequestReport.html

amaltaro · 2023-11-21T12:02:35Z

I think this PR is good for a review. I am still working on the unit test, but I don't plan to make any other major changes in this pull request. @todor-ivanov @vkuznet please have a look whenever you can.

vkuznet · 2023-11-21T12:50:27Z

src/python/WMComponent/WorkflowUpdater/WorkflowUpdater.py

+        """
+        # call the base class
+        super(Harness, self).__init__()
+        print("WorkflowUpdater.__init__")


why do you need print here and why not to use logger instead?

src/python/WMComponent/WorkflowUpdater/WorkflowUpdater.py

src/python/WMComponent/WorkflowUpdater/WorkflowUpdaterPoller.py

vkuznet · 2023-11-21T13:01:42Z

src/python/WMComponent/WorkflowUpdater/WorkflowUpdaterPoller.py

+            thisPU = {"pileupName": puItem['pileupName'],
+                      "customName": puItem['customName'],
+                      "rses": puItem['currentRSEs'],
+                      "blocks": []}


I see that blocks all the time will be an empty list, why do you need to construct it here?

My goal is to populate this variable with a list of block names in a future pull request. Just so we have a data structure with all the relevant information for a given pileup dataset.

In this case please change docstring accordingly to avoid confusion. For instance, I suggest to make concrete estimates using approximate time spent in rucio, e.g. O(1-10sec), multiplied by total number of given workflows, i.e. how much time we'll spend for 100 workflows (which is what I expect as average).

I updated the docstring. I can also add a comment of expected time to be spent on that, but as things change, I am sure it will simply become outdated and not meaningful.

src/python/WMComponent/WorkflowUpdater/WorkflowUpdaterPoller.py

vkuznet · 2023-11-21T13:10:20Z

src/python/WMCore/WMBS/MySQL/Workflow/GetUnfinishedWorkflows.py

+    still unfinished, then it is considered an unfinished/active workflow.
+    """
+
+    sql = """SELECT wmbs_workflow.name, wmbs_workflow.spec


I know that it may be late in a WMCore game, but all SQL statement should not be hard-coded in the code, it is much better to keep them in templates such that we may easily change them without touch the code (for the record this is done in dbs2go codebase). This also allows to profile SQL easily and benchmark it without having a code.

I might have already seen a GH reporting it, if not, do you feel like opening a dedicated issue for tackling this? I'd rather have a dedicated issue such that, whenever we change how DAOs are defined, that we make it consistency across the code (and potentially involving T0 as well).

Yes, it can go to separate issue, and the way to make it backward compatible is the following. The DAO APIs should take optional file parameter and read from a file a template. Then, the refactoring can be done to move SQL statements into templates. I'm fine to have it as a separate issue though.

vkuznet · 2023-11-21T13:11:07Z

src/python/WMCore/WMBS/Oracle/Workflow/GetUnfinishedWorkflows.py

+Oracle implementation of Workflow.GetUnfinishedWorkflows
+"""
+
+from WMCore.WMBS.MySQL.Workflow.GetUnfinishedWorkflows import GetUnfinishedWorkflows as MySQLGetUnfinishedWorkflows


probably you need to break line to make it more readable and pylint/pep complaint to 80 characters.

test/python/WMComponent_t/WorkflowUpdater_t/WorkflowUpdater_t.py

vkuznet · 2023-11-21T13:13:36Z

test/python/WMComponent_t/WorkflowUpdater_t/WorkflowUpdater_t.py

+                                         workflow=mergeWorkflow,
+                                         split_algo="ParentlessMergeBySize")
+
+        file1 = File(lfn="file1", size=1024, events=1024, first_event=0, locations={"T2_CH_CERN"})


since locations dict does not change why not to define it as local variable and then use it everywhere?

Let me change it real quick.

todor-ivanov

It looks good.

cmsdmwmbot · 2023-11-21T17:50:27Z

Jenkins results:

Python3 Unit tests: failed
- 1 new failures
- 8 tests deleted
- 1 tests added
- 1 changes in unstable tests
Python3 Pylint check: failed
- 81 warnings and errors that must be fixed
- 10 warnings
- 296 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded
- 376 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14654/artifact/artifacts/PullRequestReport.html

cmsdmwmbot · 2023-11-21T20:31:46Z

Jenkins results:

Python3 Unit tests: succeeded
- 8 tests deleted
- 1 tests added
- 1 changes in unstable tests
Python3 Pylint check: failed
- 81 warnings and errors that must be fixed
- 10 warnings
- 296 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded
- 376 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14655/artifact/artifacts/PullRequestReport.html

amaltaro · 2023-11-21T21:14:59Z

@vkuznet Valentin, I have not yet squashed those commits. Please review the code again and then I can squash it. Or let me know if you want me to squash it before your review. Thanks

cmsdmwmbot · 2023-11-21T21:24:53Z

Jenkins results:

Python3 Unit tests: failed
- 2 new failures
- 8 tests deleted
- 1 tests added
- 2 changes in unstable tests
Python3 Pylint check: failed
- 81 warnings and errors that must be fixed
- 10 warnings
- 296 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded
- 376 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14656/artifact/artifacts/PullRequestReport.html

fix default config and polling cycle Parse new MSPILEUP_URL component configuration

vkuznet · 2023-11-22T13:04:44Z

src/python/WMComponent/WorkflowUpdater/WorkflowUpdaterPoller.py

+            puWflows = self.findWflowsWithPileup(wflowSpecs)
+
+            # otherwise, move on retrieving pileups
+            pileupMap = self.getPileupDocs()


If pileupMap is not used in return there is no need to make this local variable, you should either change it to _ or not assign anything at all from right hand side of this expression.

As we work in the other tickets relevant to this component, it will be used.

vkuznet · 2023-11-22T13:05:27Z

src/python/WMComponent/WorkflowUpdater/WorkflowUpdaterPoller.py

+            thisPU = {"pileupName": puItem['pileupName'],
+                      "customName": puItem['customName'],
+                      "rses": puItem['currentRSEs'],
+                      "blocks": []}


In this case please change docstring accordingly to avoid confusion. For instance, I suggest to make concrete estimates using approximate time spent in rucio, e.g. O(1-10sec), multiplied by total number of given workflows, i.e. how much time we'll spend for 100 workflows (which is what I expect as average).

vkuznet · 2023-11-22T13:14:36Z

src/python/WMComponent/WorkflowUpdater/WorkflowUpdaterPoller.py

+        :param listSpecs: a list of dictionary with workflow name and spec path
+        :return: a list with the workflow names that require pileup
+        """
+        wflowsWithPU = []


Based on your numbers the list will be 8.8KB and would be copied from place to place. And, you can achieve len(gen) simply adding another local counter variable. As I said I'm not against the list but I have no idea how it will be used afterwards and I'm trying to estimate the code of having large list go through multiple function calls.

vkuznet · 2023-11-22T13:20:30Z

src/python/WMComponent/WorkflowUpdater/WorkflowUpdaterPoller.py

+    def findRucioBlocks(self, uniquePUList, pileupMap):
+        """
+        Given a list of unique pileup dataset names, list all of
+        their blocks in Rucio


I suggest to add appropriate comment into doc string about potential slowness of this function especially with increasing number of processed workflows. Moreover, I suggest to add @timeFunction decorator to explicitly dump time spent in this function such that we we'll not be guessing but rather can look it up from the log after operations.

vkuznet · 2023-11-22T13:29:25Z

test/python/WMComponent_t/WorkflowUpdater_t/WorkflowUpdater_t.py

+        self.testInit.generateWorkDir(config)
+
+        # First the general stuff
+        config.section_("General")


I want to point out as a general comment here. Usage of custom WMCore configuration significantly impact code portability to other frameworks, like Flask, etc. There is no need to use it everywhere, and more standard format for configuration, like, .ini, .json, .yaml is a better choice. Having relying on custom format add extra layer of dependency without much of gain in code structure or functionality, and make it impossible to port (parts of) code to other languages which does not have WMCore python based configuration. Said that, there is nothing here to fix or address, and my comment is a general observation.

cmsdmwmbot · 2023-11-22T13:37:26Z

Jenkins results:

Python3 Unit tests: succeeded
- 8 tests deleted
- 1 tests added
- 2 changes in unstable tests
Python3 Pylint check: failed
- 81 warnings and errors that must be fixed
- 10 warnings
- 296 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded
- 376 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14657/artifact/artifacts/PullRequestReport.html

amaltaro · 2023-11-22T16:08:38Z

@vkuznet I addressed your relevant concerns in the last commit (to be eventually squashed).
I have also updated the initial description making a note that this implementation is incomplete. The component is supposed to be functional with these changes, but not performing all the tasks that it's planned to (as those require further developments tracked in different tickets).

cmsdmwmbot · 2023-11-22T16:16:29Z

Jenkins results:

Python3 Unit tests: failed
- 1 new failures
- 8 tests deleted
- 1 tests added
- 3 changes in unstable tests
Python3 Pylint check: failed
- 78 warnings and errors that must be fixed
- 10 warnings
- 296 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded
- 376 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14658/artifact/artifacts/PullRequestReport.html

vkuznet

Alan, thanks for addressing the issues I pointed out, the code looks fine on my side and you can proceed (I do not know if your intention to merge or work on it, that's why I only put a comment right now).

amaltaro · 2023-11-22T20:27:31Z

Thank you for the prompt reviews, Valentin. Yes, the idea is to get it merged and resume activities on this component and other related tickets.

I am going to squash the commits accordingly; and also get some pylint fixed/refactored to make it look better.

New DAOs for finding active workflows Return spec path from the DAO Load spec file and find whether pileup is required or not fix logger object when instantiating Rucio Change Rucio account to wma_test Valentins suggestions, part 1 use GET method instead of POST Fix some data structures; time findRucioBlocks pylint fixes in WorkflowUpdater

cmsdmwmbot · 2023-11-22T20:59:14Z

Jenkins results:

Python3 Unit tests: succeeded
- 8 tests deleted
- 1 tests added
- 1 changes in unstable tests
Python3 Pylint check: failed
- 72 warnings and errors that must be fixed
- 10 warnings
- 288 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded
- 13 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14659/artifact/artifacts/PullRequestReport.html

unit tests - rename testWorkload function by createTestWorkload fix unit tests calls and imports use wma_test in unit tests Valentins unit test suggestions pylint fixes for test package

cmsdmwmbot · 2023-11-22T21:08:27Z

Jenkins results:

Python3 Unit tests: succeeded
- 8 tests deleted
- 1 tests added
- 1 changes in unstable tests
Python3 Pylint check: failed
- 72 warnings and errors that must be fixed
- 10 warnings
- 288 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded
- 13 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14660/artifact/artifacts/PullRequestReport.html

amaltaro added PR: Do not merge yet PR: Work in progress labels Nov 14, 2023

amaltaro mentioned this pull request Nov 18, 2023

Update WMAgent deployment to parse MSPILEUP_URL dmwm/deployment#1290

Merged

amaltaro force-pushed the fix-11733 branch from c779c77 to 664aa0d Compare November 21, 2023 02:20

amaltaro force-pushed the fix-11733 branch from 664aa0d to 2b4673c Compare November 21, 2023 02:46

amaltaro force-pushed the fix-11733 branch from 2b4673c to 0df90d1 Compare November 21, 2023 03:11

amaltaro force-pushed the fix-11733 branch from 0df90d1 to dfd28ce Compare November 21, 2023 03:22

amaltaro added the PR: squashing needed label Nov 21, 2023

amaltaro requested review from vkuznet and todor-ivanov November 21, 2023 11:58

vkuznet requested changes Nov 21, 2023

View reviewed changes

todor-ivanov approved these changes Nov 21, 2023

View reviewed changes

amaltaro force-pushed the fix-11733 branch from bc78b0a to ec7ea39 Compare November 21, 2023 21:13

amaltaro requested a review from vkuznet November 21, 2023 21:14

Deployment related changes to support component WorkflowUpdater

c7d4d59

fix default config and polling cycle Parse new MSPILEUP_URL component configuration

amaltaro force-pushed the fix-11733 branch from ec7ea39 to 26f801e Compare November 22, 2023 13:26

amaltaro removed PR: Do not merge yet PR: Work in progress PR: squashing needed labels Nov 22, 2023

vkuznet requested changes Nov 22, 2023

View reviewed changes

amaltaro mentioned this pull request Nov 22, 2023

Update WMAgent deployment to parse MSPILEUP_URL - take2 dmwm/deployment#1293

Merged

vkuznet reviewed Nov 22, 2023

View reviewed changes

amaltaro added 2 commits November 22, 2023 15:34

Ignore hidden vscode directory - for VSCode IDE

f98b6c6

amaltaro force-pushed the fix-11733 branch from 3bc09d9 to f7742e1 Compare November 22, 2023 20:43

unit tests for WorkflowUpdater_t

fcccf57

unit tests - rename testWorkload function by createTestWorkload fix unit tests calls and imports use wma_test in unit tests Valentins unit test suggestions pylint fixes for test package

amaltaro force-pushed the fix-11733 branch from f7742e1 to fcccf57 Compare November 22, 2023 20:59

amaltaro merged commit a3a2b0c into dmwm:master Nov 22, 2023
3 of 4 checks passed

amaltaro mentioned this pull request Jan 10, 2024

Fix typo in WorkflowUpdater module #11859

Merged

Initial implementation for WorkflowUpdater component #11795

Initial implementation for WorkflowUpdater component #11795

Conversation

amaltaro commented Nov 14, 2023 • edited Loading

Status

Description

Is it backward compatible (if not, which system it affects?)

Related PRs

External dependencies / deployment changes

cmsdmwmbot commented Nov 14, 2023

cmsdmwmbot commented Nov 17, 2023

cmsdmwmbot commented Nov 17, 2023

cmsdmwmbot commented Nov 18, 2023

cmsdmwmbot commented Nov 20, 2023

cmsdmwmbot commented Nov 21, 2023

cmsdmwmbot commented Nov 21, 2023

amaltaro commented Nov 21, 2023

cmsdmwmbot commented Nov 21, 2023

cmsdmwmbot commented Nov 21, 2023

cmsdmwmbot commented Nov 21, 2023

amaltaro commented Nov 21, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

todor-ivanov left a comment

Choose a reason for hiding this comment

cmsdmwmbot commented Nov 21, 2023

cmsdmwmbot commented Nov 21, 2023

amaltaro commented Nov 21, 2023

cmsdmwmbot commented Nov 21, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cmsdmwmbot commented Nov 22, 2023

amaltaro commented Nov 22, 2023

cmsdmwmbot commented Nov 22, 2023

vkuznet left a comment

Choose a reason for hiding this comment

amaltaro commented Nov 22, 2023

cmsdmwmbot commented Nov 22, 2023

cmsdmwmbot commented Nov 22, 2023

amaltaro commented Nov 14, 2023 •

edited

Loading