Exclude NANOAOD from Unified output dataplacement for standard workfl…#532
Conversation
|
the relval nanoaod is not the same as central production aod, they have to be available where people expect to run validation ; CERN for example |
|
@nsmith- Nick, give that NANO data will be soon managed by Rucio/DM team, you might want to have a special rule/subscription for RelVal NANO. |
|
Will WMAgent be able to set some DID metadata that replicates Line 6506 in d11ab40 |
d11ab40 to
dec68ab
Compare
|
@nsmith- your question is quite relevant. I do not think we discussed how to fetch this exact piece of information when we were discussing the output data placement in Micro Services. I was kind of always thinking of this info like a given and being easily accessible through ReqMgr. |
|
@sharad1126 ,What is the status of this PR? Did you manage to test it somehow? And are you planning to merge it some time this week? I believe we will need it soon. Thanks in advance. |
|
@todor-ivanov For the marking of RelVal datasets, I believe after discussing with @amaltaro the plan here is to set the PhEDEx group name as metadata on the container DID, and all RelVal have that group. |
|
@todor-ivanov I am not sure how to test this. you need to provide me some specific workflows for this case where we can try running the script on. |
|
Thanks Nick @nsmith-, This was yet another aspect of the point you were making and I did not fully get it before. Yes, marking the the dataset itself as a Relval makes quite a sense, and IIUC as it cannot be done through the dataset metadata itself, you suggest that we do it with including the needed markers in the transfer submission metadata, right? I think in the current implementation, what unified is doing is more or less similar - it marks those datasets with setting the PhEDEx group to And just to clarify, when you are mentioning container DID, you refer to a Rucio Container, right? |
Actually I think it needs to be in the DID creation metadata (on injection by WMAgent), as opposed to the rule metadata (what I take to be "transfer submission"--really there is no transfer submission concept in rucio, just rule creation, which may or may not require an actual transfer) |
|
Thanks @nsmith- I think I got the picture now. Most of this discussion provides useful information for the Output Data Placement in Microservices issue though. But it was really good that you brought that up here. Thanks again. |
|
@sharad1126 I do not think we can test it explicitly with some dedicated workflows. At least not until the testing environment for Unified is completely setup so one can do such a test with sending such workflows to testbed. The best we can do here is just you do the normal test in the so far semi testing machine (vocms0** - I do not remember which one was that) as you normally do with any other change in the code, and just check if there are no any typos or something that totally breaks the 'closor' module. What I remember you telling me about that proccess is, that you were cherrypicking commits in that machine order to do the test. Or did I understand you wrong? |
@nsmith- Nick, I did try it - I guess for block creation - and the did creation failed. Can you please check whether it's supported or not at block/container creation and let us know? |
|
@amaltaro it seems, unlike for rule metadata, that for DID metadata the keys are enforced to be from a predefined set. One has to first add the metadata key via commands like https://rucio.readthedocs.io/en/latest/api/meta-data.html |
|
Yes, now I recall you had already mentioned it in the past week. |
|
We'll carry on this metadata discussion at dmwm/WMCore#9655 |
|
as it looks like from the discussion above, this is still an on going PR. @amaltaro @todor-ivanov should I convert this to draft? |
|
@sharad1126 As Far as it concerns the Unified code, It is the whole thing. I do not think you should turn it into draft. Again, the data tiers to fall under the change here (and to be excluded from data placement) are configurable through Unified Config. You can always change them. And the question is more like - should we switch off data placement for About the discussion triggered above - it is already moved in Wmcore, as It concerns how we will distinguish the data in the Future implementation. |
|
@todor-ivanov does that mean we can test and then just merge this PR? |
|
@sharad1126 Yes. Even if we decide to wait with disabling the |
|
@todor-ivanov from my side it looks like it is ok to merge. I don't have any specific things to test it on but for the python related checks and all, it is all ok to merge. |
amaltaro
left a comment
There was a problem hiding this comment.
@todor-ivanov I have made a couple of comments in the code that will make this change cleaner.
@sharad1126 no, it cannot be merged and put in production because the default configuration is already skipping NANO data.
I still have a question though. If we do not place those NANOAD datasets here, won't it be a blocker to get workflows announced?
|
@amaltaro thanks for confirming. I think that sending the data to ddm and announcing workflows are still two different things and if needed, workflows can be announced without sending data to ddm. the whole point being, one needs to take care that the workflow status is changed accordingly. |
That's exactly my point. I think unified checks on the output data placement status, and I'm afraid that not placing NANOAOD might cause workflows to get stuck in |
|
actual location of the output is only checked for DQMIO location and possibility of running harvesting. |
de5ef60 to
1573cb1
Compare
|
@amaltaro @sharad1126 @vlimant Here is the code with the early interruption of the cycle. Which I agree, looks much prettier .... but in this case I think the beauty will just introduce some issues. And the way I did it the previous time was exactly to avoid the behavior Alan and Sharad were discussing above - to avoid Having 'NANOAOD' workflows that due to this check and interruption of the logic start accumulating in |
|
Upon a double check I noticed which was the line I was overlooking and now I convinced myself it will work both ways. So the current version of the PR is definitely achieving the same goal as the previous one. @sharad1126 could you please test and merge it? |
|
@todor-ivanov @amaltaro I see no issues in the code. There is nothing I can test this code on and won't affect anything in unified's side. It won't stop doing anything which it was doing previously as you just introduced 2 new lists which checks and sends logs only. The other lists which has NANOAOD will still work as it is as the NANOAOD in those lists still exist. |
|
Thanks @todor-ivanov , it looks good to me. Todor, can you please update the description of this PR with what exactly is being proposed here? That will clarify things to Sharad as well. @nsmith- Nick, do we need any handshake before it can get merged? |
|
@amaltaro that would be great indeed. thanks. |
|
Done |
|
As I understand it, merging this changes production behavior. I think then we need to wait for all WMAgents to be running in Rucio-Nano mode before removing this? |
…ows. Exclude NANOAOD from Unified output dataplacement for RelVal workflows. Exclude NANOAOD from Unified - add tier_to_skip in UnifiedConfig. Early DataTier check and cycle interruption && put back NANOOD in the default list tiers_to_DDM Revert privious bad membership check Renaming tier configuration lists Switching off Rucio for nonrelval tiers
1573cb1 to
ba6634c
Compare
|
@sharad1126 this one must be merged before [1]. Thanks! [1] |
|
@nsmith- Nick, did you manage to create the NANO rucio subscriptions? I guess that's the only thing that is holding us on merging this PR, isn't it? |
|
Ok so its not clear to me yet: if this is merged then unified will stop sending NANO replication requests to Dynamo? |
|
@sharad1126 let me try to clarify this PR. It provides to configuration parameters:
In case they are set to an empty list, we deal with data subscription as it's done nowadays. Otherwise, datatiers listed won't get a DDM/PhEDEx subscription. |
|
@amaltaro I understand what you mean. My point was to introduce the second one later but ok, I don't mind this as well. From my side, I see no issues in this PR and we are ready to merge this once you are ready to handle the relval NANOAOD output. |
Fixes #530
Status
Description
As part of the Rucio migration there have been a plan put into action to initially adapt the NANOAOD data tier. Which means that Unified should not subscribe data transfers when it comes to those datasets.
After our previous investigation of the code we concluded that Unified is doing output data placement through DDM for standard Workflows and through Phedex for RelVal. It seems we can benefit from the already existing parameter 'tier_to_DDM' here. And we can hard code the exclusion of NANOD for Relval in a similar way as it is done for the rest of the data tiers here: [1]
[1] https://github.com/CMSCompOps/WmAgentScripts/blob/master/Unified/closor.py#L502-L518
Is it backward compatible (if not, which system it affects?)
Related PRs
NO
External dependencies / deployment changes
We may want to make it configurable for RelVal workflows too.
Mention people to look at PRs
@sharad1126 @amaltaro @vlimant