-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tier0 and Rucio Subscriptions #4489
Comments
Two options here. We could leave things as they are (with the custodial site setting in the Tier0 config and placement rules for each output type hard-coded in Tier0 code and then just replace the current per-dataset PhEDEx subscriptions with per-block Rucio rules. This has the advantage that setting the rules, tracking workflow completion and removing the rules (i.e. the completeTier0 data lifetime management) is handled in one place. Second option is that the custodial assignment of primary datasets to T1 and the placement rules for different Tier0 outputs are kept in Rucio itself. That's cleaner in terms of managing these rules, but it raises some technical concerns. We would need to have Rucio automatically and reliably be able to detect what type of data it deals with. While that's easy for primary datasets, what about different PromptReco output (RECO, AOD, AlcaReco Skims, Physics Skims) ? And what about keeping PromptReco and ReReco seperate? Are parsing rules for the /A/B/C CMS dataset name enough or would the Tier0 injection need to 'tag' the data with a type maybe? |
Take a look at the metadata listed here: https://github.com/dmwm/CMSRucio/wiki/Draft-Rucio-data-management-plan#other-metadata To keep RECO and PromptRECO separate I think we could use prod_step or phys_group. There may be other metadata fields one could use too. So the Tier0 could set these fields which could be used to distinguish which Rucio subscriptions apply |
After consideration, it was decided not to use Rucio Subscriptions, due to the fact that Tier0 needs to be able to easily modify destination sites for every dataset. Instead, the PhEDEx procedure was adapted to Rucio (WMCore PR10006). The procedure have been tested and successfully used in production. I will close this issue. |
So we don't forget (and this may be more operational). We need to explore converting the Tier0 "rules" about where datasets are subscribed to in PhEDEx into Rucio Subscriptions, which are best thought of as "rule generators": "Place datasets matching this metadata at this group of sites"
This is mentioned here: https://docs.google.com/document/d/1Ih8GAQRnYb9umtHwc4ofV8yisNkIZecmYgk5vEms19w/edit#heading=h.pxep6ubef9bw and in a milestone for Rucio
The text was updated successfully, but these errors were encountered: