Skip to content
This repository was archived by the owner on Sep 12, 2024. It is now read-only.

Remove all input data placement; force transferor to set wflows to staged#519

Merged
sharad1126 merged 2 commits intoCMSCompOps:masterfrom
amaltaro:fix-wmcore-9527
Apr 8, 2020
Merged

Remove all input data placement; force transferor to set wflows to staged#519
sharad1126 merged 2 commits intoCMSCompOps:masterfrom
amaltaro:fix-wmcore-9527

Conversation

@amaltaro
Copy link
Copy Markdown
Contributor

@amaltaro amaltaro commented Apr 1, 2020

Fixes dmwm/WMCore#9527

Status

tested

Description

This is the first step towards migrating the data management logic out of Unified umbrella (and instead rely on the WMCore MicroServices for that).

This patch was supposed to simply disable the input data placement made in the transferor module, but nothing is just... and I noticed that the assignor module has A LOT of data location/presence as well.

A summary for the transferor changes is:

  • forced the transferor module to set workflows to "staged" (in the Unified database/oracle)
  • force the transferor module to execute=False, such that no phedex subscription requests will be made
  • still let the module run in dry-run mode, such that we can see what would be the phedex requests

A summary of the assignor changes is:

  • removed all the data management logic, like:
    • no longer check anything about secondary dataset (besides campaign related info)
    • no longer check anything related to the primary dataset/blocks
    • amount of copies of data
  • removed partial and good_enough options
  • primary_AAA option relies only on the campaign (or module command line option) configuration
  • secondary_AAA option relies only on the campaign (or module command line option) configuration
  • removed the opportunistic sites logic (which depends on primary/secondary locations)
  • finally, the siteWhitelist depends on the utils function getSiteWhiteList(), which takes into consideration a bunch of things, like: I/O, goodIO sites, goodAAA sites, blow_up limits, campaign SiteWhitelist/SiteBlacklist info, memory/multicore sites, architectures and it should be it!

Is it backward compatible (if not, which system it affects?)

no

Related PRs

none

External dependencies / deployment changes

Yes, we need to wait for the CMSWEB production upgrade. Meant to happen on April 7.

Mention people to look at PRs

@sharad1126 and others

@sharad1126
Copy link
Copy Markdown
Contributor

@amaltaro can you please make a PR to the testing branch as well?

Comment thread Unified/assignor.py
parameters['EventsPerJob'] = eventsPerJob
else:
spl = wfh.getSplittings()[0]
eventsPerJobEstimated = spl['events_per_job'] if 'events_per_job' in spl else None
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert

@sharad1126
Copy link
Copy Markdown
Contributor

we might need to check the batchor and checkor too.
the state transitions in unified are generally here - https://github.com/CMSCompOps/WmAgentScripts/wiki/Modules-need-to-update-reqmgr-status

@vlimant
Copy link
Copy Markdown
Contributor

vlimant commented Apr 1, 2020

my 2 cents @amaltaro , better not touch transferor at all, in case of "disaster" with moving in the MS, this way ops will still have a handle at initiating what needs to happen to have shit done.

same for assignor (hard to review how you handle the massaging of the site-whitelist based).

You could have a specific option in injector to set all workflows directly to "staged" (unified status)

@vlimant
Copy link
Copy Markdown
Contributor

vlimant commented Apr 1, 2020

batchor : no need, it only detects relvals and set the campaigns up
checkor : it's all post-production, so no need

@amaltaro
Copy link
Copy Markdown
Contributor Author

amaltaro commented Apr 1, 2020

@vlimant Jean-Roch,

You could have a specific option in injector to set all workflows directly to "staged" (unified status)

did you mean the transferor? If so, yes I have also considered that, but then we would miss all the transferor logic in place (we might need some of those dry-run logs) in case we need to compare things.

For the assignor, it's hard to simply bypass all the data management logic in there. With this patch, the juggling with the site whitelist comes mostly from the utils/getSiteWhiteList function. So there is no massaging coming from either data location or data presence.

If we manage to keep all the required changes in a single PR, we could also revert the changes and update the Unified production node (if we are in a deep trouble and can't quickly fix MS).

Thanks for spending your time looking into it though!

Sharad, from Jean-Roch's comment, it looks like these 2 modules is all that we need to worry for input data placement.

@vlimant
Copy link
Copy Markdown
Contributor

vlimant commented Apr 2, 2020

did you mean the transferor? If so, yes I have also considered that, but then we would miss all the transferor logic in place (we might need some of those dry-run logs) in case we need to compare things.

I meant injector that essentially see the wf in assignment-approved and set them considered for unified ; setting directly into "staged" for unified will directly pass it to assignor. Keep transferor as is, to view the logic when there is an issue, and produce the necessary transfer requests.

For assignor, the "partial" option has been rather essential for data reprocessing, and better keep the logic in place.
The logic for tayloring the site-white list is also quite essential (classical PU, etc), I don't actually understand the reasoning for removing it, since it is not creating transfer request but only checking that the workflow is assignable.

@amaltaro
Copy link
Copy Markdown
Contributor Author

amaltaro commented Apr 2, 2020

I meant injector that essentially see the wf in assignment-approved and set them considered for unified ; setting directly into "staged" for unified will directly pass it to assignor. Keep transferor as is, to view the logic when there is an issue, and produce the necessary transfer requests.

I see. transferor works on requests in assignment-approved in ReqMgr2 and considered in Oracle(MongoDB?) If injector is the first door the workflows have to go through, then setting workflows to the staged unified status there, will do the trick.
It's still safe to set transferor to execute=False though, in case something else puts workflows back in considered, at least we make sure transfers are not going to be made.

In summary, anything interacting with the DM system has to be stopped. Not necessarily now, but in a month or two from now. So the PU, the partial, the good_enough bits, are all going away.

And given that Unified is flexible, if we face big problems, we can always revert patches and update Unified, AFAIU. Different than CMSWEB services that would take, at the very least, a day to get updates in production.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Disable input data placement in Unified

3 participants