Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSPileup: complete support to fraction of pileup container #11621

Closed
todor-ivanov opened this issue Jun 27, 2023 · 0 comments · Fixed by #11807
Closed

MSPileup: complete support to fraction of pileup container #11621

todor-ivanov opened this issue Jun 27, 2023 · 0 comments · Fixed by #11807

Comments

@todor-ivanov
Copy link
Contributor

todor-ivanov commented Jun 27, 2023

Impact of the new feature
MSPileup

Is your feature request related to a problem? Please describe.
This issue potentially depends on: #11734

For implementing partial data placement for Premixed pileups we need to allow the fraction number (the desired size from the pilup on disk) to be manageable and to fulfill the following requirements/desired features:

  • When this number is reduced, the workflow management system should free some Rucio rules and ensure the new fraction on disk.
  • When this number is increased, the workflow management system should create new rules to make tape recall and ensure the new fraction on disk
  • Fraction increase and decrease should be done evenly at every defined RSE, e.g. if we reduce the fraction from 100% to 50% for a pileup whose locations are defined as CERN and FNAL, we expect a fraction drop of 50% at both sites.

In order to allow manageable fraction of a Premixed pileup in the partial data placement mechanism, we need to assure not only the MSPilup interface allows such a change, but also this change is dynamically propagated to all the workflows currently running and using the pilup in question. This means we should support live update to the pileup JSON map that is shipped with every single job. In other words, we cannot create a pileup location JSON file at the beginning of the workflow and keep using it - unchanged - during the lifetime of a workflow.

Describe the solution you'd like
The current pileup configuration in MSPileup already takes into consideration that a fraction of a container could eventually be requested, but so far it has always been set to 1, see:

  "containerFraction": 1.0,

With this ticket, we should allow this value to be greater than 0.0 and less than or equal than 1.0. The container fraction definition here refers to pileup blocks, regardless of how many files and/or events are in each block.

The expected behavior is, whenever the containerFraction field changes for a given pileup dataset, MSPileup is supposed to act accordingly, such as:

  • if it decreases, MSPileup needs to find out how many Rucio dataset rules need to be removed. The service should not delete any rules, but instead set their expiration to 24h (or whatever we decide to have)
  • if it increases, MSPileup needs to find out how many Rucio dataset rules need to be added. New Rucio dataset rules need to be created for datasets with NO wmcore_pileup rules.
  • in addition, this rule removal/addition needs to be performed against each RSE where the pileup is expected to be (defined by expectedRSEs)

Formulae for the number of Rucio datasets is (note the round up):

ceil(containerFraction * num_rucio_datasets)

NOTE that the moment we start creating Rucio dataset based rules, we inflate MSPileup logic with thousands of extra rules, where we probably have to refactor how completion and rules are tracked in the service.

Describe alternatives you've considered
An alternative, potentially the best alternative, would be to have MSPileup creating new Rucio containers with the outcome number of Rucio datasets. For that, I believe it would require:

  1. create a new container DID in Rucio
  2. attach every single Rucio dataset into the freshly created container DID
  3. create one rule for this new container
  4. finally, set an expiration time (24h) for the previous rule that is either shrinking or expanding.

Additional context
This is part of the meta issue: #11537 , and it depends on completion of #11801

@todor-ivanov todor-ivanov changed the title Support dynamic pilup fraction diring workflow lifetime Support dynamic pileup fraction during workflow lifetime Jun 29, 2023
@amaltaro amaltaro changed the title Support dynamic pileup fraction during workflow lifetime MSPileup: complete support to fraction of pileup container Sep 21, 2023
@vkuznet vkuznet self-assigned this Nov 28, 2023
@vkuznet vkuznet moved this from Todo to In Progress in WMCore quarterly developments Dec 11, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in WMCore quarterly developments Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants