Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor pileup data location in global workqueue to support partial availability #11620

Closed
todor-ivanov opened this issue Jun 27, 2023 · 1 comment · Fixed by #11879 or #11908
Closed

Comments

@todor-ivanov
Copy link
Contributor

todor-ivanov commented Jun 27, 2023

Impact of the new feature
Global WorkQueue

Is your feature request related to a problem? Please describe.
This feature needs to be deployed with: #11732

Complimentary to allowing partial data placement for Premixed pileup datasets, we need to allow also partial data location in Global WorkQueue. Meaning that we no longer have a hard requirement stating that the pileup container needs to be fully locked and available by wmcore_pileup account, hence data lookup granularity has to change from container to Rucio dataset level.

Describe the solution you'd like
The pileup data location algorithm needs to be refactored to resolve data location based on Rucio datasets instead of the container. Given that some pileup containers have 10s of thousands of Rucio datasets, this algorithm needs to be refactored to allow high concurrency, either relying on the existing pycurl_manager module; or by adopting python asyncio library.

We can likely have a relaxed error handling logic, such as:

  • if 100% of the data look-up requests fail, we raise a hard exception and fail the workflow acquisition (to be retried in the next cycle)
  • if only a fraction of the data look-up requests fail, we assume those rucio datasets as having an empty location

For Global WorkQueue purposes, if there is at least 1 rucio dataset available in a given RSE (under wmcore_pileup), then we should consider that RSE as a valid pileup location. I do not think it matters for the service to find 100k rucio datasets or only 1 available under a given RSE, as we will simply trust that pileup availability is meeting CompOps criteria.

Describe alternatives you've considered
Note the comment about concurrent library to be used.

Additional context
This is part of the meta issue: #11537

@amaltaro amaltaro changed the title Support partial pileup data location in Global/Local Workqueue Refactor pileup data location in global workqueue to support fraction availability Sep 21, 2023
@amaltaro amaltaro changed the title Refactor pileup data location in global workqueue to support fraction availability Refactor pileup data location in global workqueue to support partial availability Sep 21, 2023
@amaltaro
Copy link
Contributor

Please refer to this comment #11732 (comment), which suggests to couple pileup data location to the MSPileup information, across WM.

@anpicci anpicci self-assigned this Jan 18, 2024
anpicci added a commit to anpicci/WMCore that referenced this issue Jan 25, 2024
anpicci added a commit to anpicci/WMCore that referenced this issue Feb 1, 2024
anpicci added a commit to anpicci/WMCore that referenced this issue Feb 2, 2024
…ation in StartPolicyInterface dmwm#11620

Modifying the implementation of MSPileup in StartPolicyInterface dmwm#11620

Implementation of MSPileupUtils in StartPolicyInterface dmwm#11620
anpicci added a commit to anpicci/WMCore that referenced this issue Feb 2, 2024
anpicci added a commit to anpicci/WMCore that referenced this issue Feb 2, 2024
anpicci added a commit to anpicci/WMCore that referenced this issue Feb 2, 2024
anpicci added a commit to anpicci/WMCore that referenced this issue Feb 7, 2024
anpicci added a commit to anpicci/WMCore that referenced this issue Feb 8, 2024
…ation in StartPolicyInterface dmwm#11620

Modifying the implementation of MSPileup in StartPolicyInterface dmwm#11620

Implementation of MSPileupUtils in StartPolicyInterface dmwm#11620

Fixing a bug in getDatasetLocationsFromMSPileup dmwm#11620

Fixing again bugs in getDatasetLocationsFromMSPileup dmwm#11620

Fixing again bugs in getDatasetLocationsFromMSPileup dmwm#11620

Trying to debug Jenkins for getDatasetLocationsFromMSPileup dmwm#11620

Trying to debug Jenkins for getDatasetLocationsFromMSPileup dmwm#11620

Trying to debug Jenkins for getDatasetLocationsFromMSPileup dmwm#11620

Fixing an important bug in StartPolicyInterface dmwm#11620
anpicci added a commit to anpicci/WMCore that referenced this issue Feb 8, 2024
anpicci added a commit to anpicci/WMCore that referenced this issue Feb 8, 2024
anpicci added a commit to anpicci/WMCore that referenced this issue Feb 8, 2024
anpicci added a commit to anpicci/WMCore that referenced this issue Feb 8, 2024
…ation in StartPolicyInterface dmwm#11620

Modifying the implementation of MSPileup in StartPolicyInterface dmwm#11620

Implementation of MSPileupUtils in StartPolicyInterface dmwm#11620

Fixing a bug in getDatasetLocationsFromMSPileup dmwm#11620

Fixing again bugs in getDatasetLocationsFromMSPileup dmwm#11620

Fixing again bugs in getDatasetLocationsFromMSPileup dmwm#11620

Trying to debug Jenkins for getDatasetLocationsFromMSPileup dmwm#11620

Trying to debug Jenkins for getDatasetLocationsFromMSPileup dmwm#11620

Trying to debug Jenkins for getDatasetLocationsFromMSPileup dmwm#11620

Fixing an important bug in StartPolicyInterface dmwm#11620

Addressing Alan's comments for StartPolicyInterface dmwm#11620
anpicci added a commit to anpicci/WMCore that referenced this issue Feb 8, 2024
Aligning unit test Block_t to the latest src updates dmwm#11620
anpicci added a commit to anpicci/WMCore that referenced this issue Feb 9, 2024
anpicci added a commit to anpicci/WMCore that referenced this issue Feb 9, 2024
…ation in StartPolicyInterface dmwm#11620

Modifying the implementation of MSPileup in StartPolicyInterface dmwm#11620

Implementation of MSPileupUtils in StartPolicyInterface dmwm#11620

Fixing a bug in getDatasetLocationsFromMSPileup dmwm#11620

Fixing again bugs in getDatasetLocationsFromMSPileup dmwm#11620

Fixing again bugs in getDatasetLocationsFromMSPileup dmwm#11620

Trying to debug Jenkins for getDatasetLocationsFromMSPileup dmwm#11620

Trying to debug Jenkins for getDatasetLocationsFromMSPileup dmwm#11620

Trying to debug Jenkins for getDatasetLocationsFromMSPileup dmwm#11620

Fixing an important bug in StartPolicyInterface dmwm#11620

Addressing Alan's comments for StartPolicyInterface dmwm#11620

Optimization of the new function in StartPolicyInterface dmwm#11620
anpicci added a commit to anpicci/WMCore that referenced this issue Feb 9, 2024
amaltaro added a commit that referenced this issue Feb 9, 2024
New function to get dataset locations from MSPileup (addressing #11620)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment