You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
This issue potentially depends on: #11734
For implementing partial data placement for Premixed pileups we need to allow the fraction number (the desired size from the pilup on disk) to be manageable and to fulfill the following requirements/desired features:
When this number is reduced, the workflow management system should free some Rucio rules and ensure the new fraction on disk.
When this number is increased, the workflow management system should create new rules to make tape recall and ensure the new fraction on disk
Fraction increase and decrease should be done evenly at every defined RSE, e.g. if we reduce the fraction from 100% to 50% for a pileup whose locations are defined as CERN and FNAL, we expect a fraction drop of 50% at both sites.
In order to allow manageable fraction of a Premixed pileup in the partial data placement mechanism, we need to assure not only the MSPilup interface allows such a change, but also this change is dynamically propagated to all the workflows currently running and using the pilup in question. This means we should support live update to the pileup JSON map that is shipped with every single job. In other words, we cannot create a pileup location JSON file at the beginning of the workflow and keep using it - unchanged - during the lifetime of a workflow.
Describe the solution you'd like
The current pileup configuration in MSPileup already takes into consideration that a fraction of a container could eventually be requested, but so far it has always been set to 1, see:
"containerFraction": 1.0,
With this ticket, we should allow this value to be greater than 0.0 and less than or equal than 1.0. The container fraction definition here refers to pileup blocks, regardless of how many files and/or events are in each block.
The expected behavior is, whenever the containerFraction field changes for a given pileup dataset, MSPileup is supposed to act accordingly, such as:
if it decreases, MSPileup needs to find out how many Rucio dataset rules need to be removed. The service should not delete any rules, but instead set their expiration to 24h (or whatever we decide to have)
if it increases, MSPileup needs to find out how many Rucio dataset rules need to be added. New Rucio dataset rules need to be created for datasets with NO wmcore_pileup rules.
in addition, this rule removal/addition needs to be performed against each RSE where the pileup is expected to be (defined by expectedRSEs)
Formulae for the number of Rucio datasets is (note the round up):
ceil(containerFraction * num_rucio_datasets)
NOTE that the moment we start creating Rucio dataset based rules, we inflate MSPileup logic with thousands of extra rules, where we probably have to refactor how completion and rules are tracked in the service.
Describe alternatives you've considered
An alternative, potentially the best alternative, would be to have MSPileup creating new Rucio containers with the outcome number of Rucio datasets. For that, I believe it would require:
create a new container DID in Rucio
attach every single Rucio dataset into the freshly created container DID
create one rule for this new container
finally, set an expiration time (24h) for the previous rule that is either shrinking or expanding.
Additional context
This is part of the meta issue: #11537 , and it depends on completion of #11801
The text was updated successfully, but these errors were encountered:
todor-ivanov
changed the title
Support dynamic pilup fraction diring workflow lifetime
Support dynamic pileup fraction during workflow lifetime
Jun 29, 2023
amaltaro
changed the title
Support dynamic pileup fraction during workflow lifetime
MSPileup: complete support to fraction of pileup container
Sep 21, 2023
Impact of the new feature
MSPileup
Is your feature request related to a problem? Please describe.
This issue potentially depends on: #11734
For implementing partial data placement for Premixed pileups we need to allow the fraction number (the desired size from the pilup on disk) to be manageable and to fulfill the following requirements/desired features:
In order to allow manageable fraction of a Premixed pileup in the partial data placement mechanism, we need to assure not only the MSPilup interface allows such a change, but also this change is dynamically propagated to all the workflows currently running and using the pilup in question. This means we should support live update to the pileup JSON map that is shipped with every single job. In other words, we cannot create a pileup location JSON file at the beginning of the workflow and keep using it - unchanged - during the lifetime of a workflow.
Describe the solution you'd like
The current pileup configuration in MSPileup already takes into consideration that a fraction of a container could eventually be requested, but so far it has always been set to 1, see:
With this ticket, we should allow this value to be greater than 0.0 and less than or equal than 1.0. The container fraction definition here refers to pileup blocks, regardless of how many files and/or events are in each block.
The expected behavior is, whenever the
containerFraction
field changes for a given pileup dataset, MSPileup is supposed to act accordingly, such as:wmcore_pileup
rules.expectedRSEs
)Formulae for the number of Rucio datasets is (note the round up):
NOTE that the moment we start creating Rucio dataset based rules, we inflate MSPileup logic with thousands of extra rules, where we probably have to refactor how completion and rules are tracked in the service.
Describe alternatives you've considered
An alternative, potentially the best alternative, would be to have MSPileup creating new Rucio containers with the outcome number of Rucio datasets. For that, I believe it would require:
Additional context
This is part of the meta issue: #11537 , and it depends on completion of #11801
The text was updated successfully, but these errors were encountered: