Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dont cross check location when data is reported without location #11878

Merged
merged 2 commits into from
Jan 29, 2024

Conversation

amaltaro
Copy link
Contributor

Fixes #11877

Status

ready

Description

Fixes an issue inserted with: #11810
where input for ACDC workflows might be checked against Rucio, which is wrong, given that ACDC workflows don't have real data as input, but they actually use a collection of data from the ACDC server.

I am just ditching that extra check against Rucio and leaving this update to the DataLocationUpdater WorkQueueManager thread, which is supposed to continuously update input/pileup data location.

Is it backward compatible (if not, which system it affects?)

YES

Related PRs

Complement to #11810

External dependencies / deployment changes

None

@amaltaro
Copy link
Contributor Author

Given that the Rucio call has been previously added only for further debugging, I went ahead and applied this patch to vocms0282. Now WorkQueueManager didn't crash and it actually produced these log lines for the same problematic ACDC workflow (see issue for further details):

2024-01-25 04:31:53,228:140045050111744:INFO:WorkQueue:Splitting /cmsunified_ACDC0_r-0-Run2017C_Charmonium_UL2017_MiniAODv2_BParking_240123_014605_3460/DataProcessing with policy name ResubmitBlock and policy params {'name': 'ResubmitBlock', 'args': {}}
2024-01-25 04:31:53,629:140045050111744:WARNING:StartPolicyInterface:Input data has no location, spec=<WMCore.WMSpec.WMWorkload.WMWorkloadHelper object at 0x7f5ec4ec91f0>, data=/acdc/cmsunified_ACDC0_r-0-Run2017C_Charmonium_UL2017_MiniAODv2_BParking_240123_014605_3460/:wangz_r-0-Run2017C_Charmonium_UL2017_MiniAODv2_BParking_240111_162540_2272:DataProcessing/0/373
2024-01-25 04:31:53,629:140045050111744:INFO:WorkQueue:Work splitting completed with 1 units, 0 rejectedWork and 0 badWork
2024-01-25 04:31:53,629:140045050111744:INFO:WorkQueue:Queuing element a0caf38833a7073868fd4c24fa349e32 for /cmsunified_ACDC0_r-0-Run2017C_Charmonium_UL2017_MiniAODv2_BParking_240123_014605_3460/DataProcessing with policy ResubmitBlock, with 373 job(s) and 373 lumis on /acdc/cmsunified_ACDC0_r-0-Run2017C_Charmonium_UL2017_MiniAODv2_BParking_240123_014605_3460/:wangz_r-0-Run2017C_Charmonium_UL2017_MiniAODv2_BParking_240111_162540_2272:DataProcessing/0/373

I still don't understand why it says not to have any location, as it was supposed to be "T2_FI_HIP" instead. If this is a bug, then it's probably unrelated to these changes and is there for a long time.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 tests no longer failing
    • 3 changes in unstable tests
  • Python3 Pylint check: failed
    • 9 warnings and errors that must be fixed
    • 22 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14787/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 3 new failures
    • 1 tests no longer failing
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 12 warnings and errors that must be fixed
    • 41 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 2 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14789/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 tests no longer failing
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 74 warnings and errors that must be fixed
    • 57 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 2 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14790/artifact/artifacts/PullRequestReport.html

Make location of ACDC documents as union set; make ResubmitBlock policy more verbose
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 74 warnings and errors that must be fixed
    • 57 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 2 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14793/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor Author

Commits have been squashed. Merging it and will soon backport it to an agent branch.

@amaltaro amaltaro merged commit 4ca5c77 into dmwm:master Jan 29, 2024
3 of 4 checks passed
@amaltaro
Copy link
Contributor Author

This PR has been backported to the 2.3.0_wmagent branch, all the 5 agents patched (including the RelVal agent) and a new wmagent tag 2.3.0.1 was released in PyPi.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Agent v2.3.0 trying to resolve input data for ACDC workflows
3 participants