Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix part pu block level information missing fix 11736 #11975

Conversation

todor-ivanov
Copy link
Contributor

Fixes #11736

Status

ready

Description

Previously we were having issues obtaining block level information from DBS, which was fixed by correctly configuring the dbsUrl inside the WorflowUpdater component. This relieved two more issues :

  • We were wrongly checking for empty records returned from updateBlockInfo we were getting an object like this {'mc':{}}, which is not an empty dictionary and was misleading us.
  • We were always getting the above returned by updateBlockInfo, because we were always sending an empty list of blocs msPUBlockLoc:{} to be searched for. And we were always sending this list empty, because we were failing to resolve block level information from Rucio here: , due to a bad data structure flattening in the main algorithm here:
    uniqueActivePU = flattenList([item['pileup'] for item in puWflows])
    and the main reason for that faulty flattening of the data structure was this exact line here:
    wflowSpec['pileup'] = pileupSpecs.values()
    where we populate the block level information as dictionary views, instead of direct sets. The resultant data structure (upon flattening) was not a pure list of strings, but rather a list of sets:
  • What were getting:
In [111]: uniqueActivePU
Out[111]: 
[{'/Neutrino_E-10_gun/RunIISummer20ULPrePremix-UL16_106X_mcRun2_asymptotic_v13-v1/PREMIX',
  '/RelValMinBias_14TeV/CMSSW_11_2_0_pre8-112X_mcRun3_2024_realistic_v10_forTrk-v1/GEN-SIM'},
 {'/Neutrino_E-10_gun/RunIISummer20ULPrePremix-UL16_106X_mcRun2_asymptotic_v13-v1/PREMIX',
  '/RelValMinBias_14TeV/CMSSW_11_2_0_pre8-112X_mcRun3_2024_realistic_v10_forTrk-v1/GEN-SIM'},
 {'/Neutrino_E-10_gun/RunIISummer20ULPrePremix-UL16_106X_mcRun2_asymptotic_v13-v1/PREMIX',
  '/RelValMinBias_14TeV/CMSSW_11_2_0_pre8-112X_mcRun3_2024_realistic_v10_forTrk-v1/GEN-SIM'},
 {'/Neutrino_E-10_gun/RunIISummer20ULPrePremix-UL16_106X_mcRun2_asymptotic_v13-v1/PREMIX',
  '/RelValMinBias_14TeV/CMSSW_11_2_0_pre8-112X_mcRun3_2024_realistic_v10_forTrk-v1/GEN-SIM'},
 {'/Neutrino_E-10_gun/RunIISummer20ULPrePremix-UL16_106X_mcRun2_asymptotic_v13-v1/PREMIX',
  '/RelValMinBias_14TeV/CMSSW_11_2_0_pre8-112X_mcRun3_2024_realistic_v10_forTrk-v1/GEN-SIM'}]
  • What we were expecting:
In [112]: uniqueActivePU2
Out[112]: 
['/Neutrino_E-10_gun/RunIISummer20ULPrePremix-UL16_106X_mcRun2_asymptotic_v13-v1/PREMIX',
 '/RelValMinBias_14TeV/CMSSW_11_2_0_pre8-112X_mcRun3_2024_realistic_v10_forTrk-v1/GEN-SIM',
 '/Neutrino_E-10_gun/RunIISummer20ULPrePremix-UL16_106X_mcRun2_asymptotic_v13-v1/PREMIX',
 '/RelValMinBias_14TeV/CMSSW_11_2_0_pre8-112X_mcRun3_2024_realistic_v10_forTrk-v1/GEN-SIM',
 '/Neutrino_E-10_gun/RunIISummer20ULPrePremix-UL16_106X_mcRun2_asymptotic_v13-v1/PREMIX',
 '/RelValMinBias_14TeV/CMSSW_11_2_0_pre8-112X_mcRun3_2024_realistic_v10_forTrk-v1/GEN-SIM',
 '/Neutrino_E-10_gun/RunIISummer20ULPrePremix-UL16_106X_mcRun2_asymptotic_v13-v1/PREMIX',
 '/RelValMinBias_14TeV/CMSSW_11_2_0_pre8-112X_mcRun3_2024_realistic_v10_forTrk-v1/GEN-SIM',
 '/Neutrino_E-10_gun/RunIISummer20ULPrePremix-UL16_106X_mcRun2_asymptotic_v13-v1/PREMIX',
 '/RelValMinBias_14TeV/CMSSW_11_2_0_pre8-112X_mcRun3_2024_realistic_v10_forTrk-v1/GEN-SIM']

With the current Fix:

I've fetched the changes from @vkuznet who has provided the first part of the fix

  • We force any faulty call to updateBlockInfo to return an empty {}
  • We iterate through the sets of PU datasets returned from WMBSHelper.listPileupDatasets for every workflow and add them not as a view in the returned workflowSpec but rather as a pure set. This way later the structure is flattened properly
  • We convert the already flattened data structure uniqueActivePU to a set, in order to avoid null or repeated cycles later in the code.

Is it backward compatible (if not, which system it affects?)

NO NEED BE

Related PRs

None

External dependencies / deployment changes

None

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 tests no longer failing
  • Python3 Pylint check: failed
    • 2 warnings and errors that must be fixed
    • 1 warnings
    • 6 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15029/artifact/artifacts/PullRequestReport.html

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@todor-ivanov thank you for fixing this. Patch looks good in general, but I have a minor request that I would prefer to have in before merging this.

@@ -293,7 +294,7 @@ def algorithm(self, parameters=None):
logging.info("Agent has no active workflows with pileup at the moment")
return
# resolve unique active pileup dataset names
uniqueActivePU = flattenList([item['pileup'] for item in puWflows])
uniqueActivePU = set(flattenList([item['pileup'] for item in puWflows]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on the set!

@@ -463,7 +464,10 @@ def findWflowsWithPileup(listSpecs):
workloadHelper.load(wflowSpec['spec'])
pileupSpecs = workloadHelper.listPileupDatasets()
if pileupSpecs:
wflowSpec['pileup'] = pileupSpecs.values()
puSet = set()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todor, let me suggest a more readable variation to this solution:

wflowSpec['pileup'] = set()
for pileupN in pileupSpecs.values():
    wflowSpec['pileup'].add(pileupN)

Variables with less than 3 chars are usually unwanted as well.

Copy link
Contributor Author

@todor-ivanov todor-ivanov Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amaltaro This suggestion is not going to work. Here is the error it fails with:

2024-04-26 20:01:28,199:139921118983936:ERROR:WorkflowUpdaterPoller:Failed to load spec file for: /data/srv/wmagent/v2.3.2rc8/install/wmagentpy3/WorkQueueManager/cache/tivanov_SC_MultiPU_Feb2024_Val_PartPU_v7_240419_220203_5072/WMSandbox/WMWorkload.pkl. Details: unhashable type: 'set'

And the explanation is simple: you cannot add a mutable object (a view member - a set in our case) as a member of a set. All members of a set needs to be hashable, while the members of the current view are not - they are sets by their own. I've been facing this very same error in few variants of this patch, until I realized our only way out is to iterate through the already present sets inside the view and make a union with the zero set.

checking it manually gives the same error:

In [138]: print(pileupSpecs)
{'https://cmsweb-testbed.cern.ch/dbs/int/global/DBSReader': {'/Neutrino_E-10_gun/RunIISummer20ULPrePremix-UL16_106X_mcRun2_asymptotic_v13-v1/PREMIX', '/RelValMinBias_14TeV/CMSSW_11_2_0_pre8-112X_mcRun3_2024_realistic_v10_forTrk-v1/GEN-SIM'}}

In [139]: wflowSpec = {}

In [140]: wflowSpec['pileup'] = set()

In [141]: for pileupN in pileupSpecs.values():
     ...:     wflowSpec['pileup'].add(pileupN)
     ...: 
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[141], line 2
      1 for pileupN in pileupSpecs.values():
----> 2     wflowSpec['pileup'].add(pileupN)

TypeError: unhashable type: 'set'

I can change the value names, though, such that they are not two chars long.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this on the other hand ... works:

In [143]: for pileupN in pileupSpecs.values():
     ...:     wflowSpec['pileup'] =  wflowSpec['pileup'] | pileupN
     ...: 

In [144]: wflowSpec['pileup']
Out[144]: 
{'/Neutrino_E-10_gun/RunIISummer20ULPrePremix-UL16_106X_mcRun2_asymptotic_v13-v1/PREMIX',
 '/RelValMinBias_14TeV/CMSSW_11_2_0_pre8-112X_mcRun3_2024_realistic_v10_forTrk-v1/GEN-SIM'}

I am going to update the patch. I like it better, indeed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @amaltaro - Done. And tested. Please take another look.

@todor-ivanov todor-ivanov force-pushed the bugfix_PartPU_BlockLevelInformationMissing_fix-11736 branch from 47d7f6f to b99acf3 Compare April 26, 2024 18:31
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 tests no longer failing
  • Python3 Pylint check: failed
    • 2 warnings and errors that must be fixed
    • 1 warnings
    • 6 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 1 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15030/artifact/artifacts/PullRequestReport.html

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't see it was actually returning a set instead of a plain pileup string.
Changes look good to me. Please squash the commits and we are good to go.

Optain sets from workloadHelper.listPileupDatasets() instead of views && make uniqueActivePU a set

Do the set union directly in wflowSpec['pilup']
@todor-ivanov todor-ivanov force-pushed the bugfix_PartPU_BlockLevelInformationMissing_fix-11736 branch from b99acf3 to dde9386 Compare April 27, 2024 02:51
@todor-ivanov
Copy link
Contributor Author

thanks @amaltaro

Please squash the commits and we are good to go

Done. Feel free to merge and cut a new tag at your convenience.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 tests no longer failing
  • Python3 Pylint check: failed
    • 2 warnings and errors that must be fixed
    • 1 warnings
    • 6 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 1 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15031/artifact/artifacts/PullRequestReport.html

@amaltaro amaltaro merged commit e21a6e2 into dmwm:master Apr 27, 2024
2 of 4 checks passed
@amaltaro
Copy link
Contributor

Thanks @todor-ivanov . If you haven't done yet, please apply this patch to the agent you have been using for such validations. Given that it's a simple patch on the agent side only, I don't think we have to upgrade everything to start from scratch.

In any case, this fix is now in tag 2.3.2rc9.

@todor-ivanov
Copy link
Contributor Author

The agent was patched and a new workflow is already in the queue:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integrate all the partial pileup features together and run final pre-production tests
3 participants