Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tier0 and Rucio Subscriptions #4489

Closed
ericvaandering opened this issue Mar 26, 2019 · 3 comments
Closed

Tier0 and Rucio Subscriptions #4489

ericvaandering opened this issue Mar 26, 2019 · 3 comments

Comments

@ericvaandering
Copy link
Member

So we don't forget (and this may be more operational). We need to explore converting the Tier0 "rules" about where datasets are subscribed to in PhEDEx into Rucio Subscriptions, which are best thought of as "rule generators": "Place datasets matching this metadata at this group of sites"

This is mentioned here: https://docs.google.com/document/d/1Ih8GAQRnYb9umtHwc4ofV8yisNkIZecmYgk5vEms19w/edit#heading=h.pxep6ubef9bw and in a milestone for Rucio

@hufnagel
Copy link
Member

hufnagel commented Mar 26, 2019

Two options here.

We could leave things as they are (with the custodial site setting in the Tier0 config and placement rules for each output type hard-coded in Tier0 code and then just replace the current per-dataset PhEDEx subscriptions with per-block Rucio rules. This has the advantage that setting the rules, tracking workflow completion and removing the rules (i.e. the completeTier0 data lifetime management) is handled in one place.

Second option is that the custodial assignment of primary datasets to T1 and the placement rules for different Tier0 outputs are kept in Rucio itself. That's cleaner in terms of managing these rules, but it raises some technical concerns. We would need to have Rucio automatically and reliably be able to detect what type of data it deals with. While that's easy for primary datasets, what about different PromptReco output (RECO, AOD, AlcaReco Skims, Physics Skims) ? And what about keeping PromptReco and ReReco seperate? Are parsing rules for the /A/B/C CMS dataset name enough or would the Tier0 injection need to 'tag' the data with a type maybe?

@ericvaandering
Copy link
Member Author

Take a look at the metadata listed here: https://github.com/dmwm/CMSRucio/wiki/Draft-Rucio-data-management-plan#other-metadata

To keep RECO and PromptRECO separate I think we could use prod_step or phys_group. There may be other metadata fields one could use too.

So the Tier0 could set these fields which could be used to distinguish which Rucio subscriptions apply

@germanfgv
Copy link
Contributor

After consideration, it was decided not to use Rucio Subscriptions, due to the fact that Tier0 needs to be able to easily modify destination sites for every dataset. Instead, the PhEDEx procedure was adapted to Rucio (WMCore PR10006). The procedure have been tested and successfully used in production. I will close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants