Versioning T0 replay runs when the same run is used more than once #4475

vytjan · 2019-01-30T15:01:11Z

Due to WMStats functionality, when we inject run 123456 in one replay and all workflows get to "normal-archived" state, then, when using the same run in any future replay, no workflows of run 123456 are displayed on testbed WMStats.
Since this year we will run multiple replays injecting the same run(s), it would be useful to have information about runs/workflows of every replay on WMStats.
@hufnagel how do you think, would the following be reasonable - to use some similar versioning for replayed runs or workflows as we currently use in the processing version for datasets?
It could be added as some additional parameter in the replay configuration I guess. Or could there be some different approach to this issue?
As I talked to Alan, we are not really willing to delete any couchdb documents containing the workflows of run 123456 from older replays in such case.

hufnagel · 2019-01-30T19:07:10Z

WMStats can't distinguish between different versions of the same workflow as long as the workflow is named exactly the same. Correct @amaltaro ?

But we have freedom to name workflows whatever we want in the Tier0. You could include a version number of string of sorts in the workflow name. Agent vobox hostname doesn't really work by itself, but how about vobox hostname plus processing version for the replay? We increment the latter anyways when we do new replays on the same agent/vobox. Should be easy to change the python in the config to force this to be consistent.

If you plan on running the same replay with the same processing version multiple times, you would have to invent some unique identifier/counter. But that still doesn't help you to keep track of things since you still need to be able which replay used which identifier. So might as well use the processing version, which has the useful side effect of also separating the output data of the replay.

One thing though, you likely want this to be optional since I don't think we want this for the production Tier0, correct?

amaltaro · 2019-01-30T20:11:02Z

I initially thought about a random number, just to make it a unique workflow and able to properly use wmstats functionality.
However, as Dirk points out, we might want to have a logic where it's easy to identify the workflow names created. How about we make it:
normal workflow name + short hostname (not fqdn) + timestamp with date and time (same as we have for production)
? This should be enough.

Then as we discussed as well, it should be a configurable parameter for the Tier0Feeder which you only enable when you're running a replay or so.

vytjan · 2019-01-31T12:30:49Z

Alan, I would go with Dirk's suggestion to add the agent vobox hostname (or even the processing era, which is e.g. Tier0_REPLAY_vocms015) + processing version of the replay to the workflow names.
As I checked the Tier0Config.py and RunConfigAPI.py, an easy way could be adding an optional 'isReplay' parameter to ReplayOfflineConfiguration.py.
If it is defined, then the workflow name would be changed from
Express_Run326607_StreamHIExpressAlignment
to
Express_Run326607_StreamHIExpressAlignment_Tier0_REPLAY_vocms015_v212.

Dirk, Alan, would such naming make sense? We are identifying every replay with the processing version anyway. Or we could add just the processing version.

hufnagel · 2019-01-31T14:46:00Z

If we create an isReplay parameter, would we want to use it to configure other settings too? There are a few parameters we just toggle between two settings, one for production and one for replays...

Otherwise I would just leave this at useUniqueWorkflowName or something like that, which tells you exactly what the parameter actually does.

vytjan · 2019-01-31T16:31:46Z

Thanks for your feedback Dirk. Indeed, since this parameter will only be used for naming workflows, then we would like to have it as specific as possible. Then I agree something like useUniqueWorkflowName will work.
I am going to create a patch for this and test it.

vytjan · 2019-02-07T19:16:02Z

So I added a new Global T0 configuration parameter UseUniqueWorkflowName.
The workflow name is formed in such manner:
Express_Run322057_StreamExpressAlignment_vocms047_v270

I implemented it following other global parameters setup and already ran a simple test replay on vocms047:
vytjan/T0@master...vytjan:versioning-workflows
Although, not sure whether the UseUniqueWorkflowName should be used to set the hostname of the VM or it should simply contain a boolean value in the replay configuration.

amaltaro · 2019-02-07T21:37:23Z

Just an idea, how about setting that config somewhere here:
https://github.com/vytjan/T0/blob/ecf29162c776a4eecb5e87c94a0e2dcea70455ee/src/python/T0/RunConfig/Tier0Config.py#L247

and then evaluate it at Tier0Feeder before building the workflow name?
The hostname is defined in the WMAgent configuration file (or socket.gethostname() for instance).

In addition to that, it might be worth it adding the date, just in case someone run the two replays with the same settings, something like
currentTime = time.strftime('%y%m%d_%H%M%S', time.localtime(time.time()))

vytjan · 2019-02-11T17:32:07Z

I changed UseUniqueWorkflowName to EnableUniqueWorkflowName boolean parameter.
Also, added the acquisition era string to the workflow name as well as the timestamp. Thus, now the workflow name in replays is formed (excluding seconds in the timestamp on purpose):
PromptReco_Run322057_JetHT_Tier0_REPLAY_vocms047_v271_190211_1814.

vytjan/T0@master...vytjan:versioning-workflows#diff-798a0fd2fcb838e49ef7b0b833411e1aR558

Now it makes more sense and the workflow names are "enough" unique/separable. At the same time, the longest WF name is ~85 characters, so there is still enough until we reach 100:
Express_Run322057_StreamALCALUMIPIXELSEXPRESS_Tier0_REPLAY_vocms0313_v271_190211_1802

vytjan · 2019-02-22T12:26:36Z

Merged #4476
#4476

vytjan closed this as completed Feb 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Versioning T0 replay runs when the same run is used more than once #4475

Versioning T0 replay runs when the same run is used more than once #4475

vytjan commented Jan 30, 2019

hufnagel commented Jan 30, 2019

amaltaro commented Jan 30, 2019

vytjan commented Jan 31, 2019

hufnagel commented Jan 31, 2019 •

edited

Loading

vytjan commented Jan 31, 2019

vytjan commented Feb 7, 2019

amaltaro commented Feb 7, 2019

vytjan commented Feb 11, 2019

vytjan commented Feb 22, 2019

Versioning T0 replay runs when the same run is used more than once #4475

Versioning T0 replay runs when the same run is used more than once #4475

Comments

vytjan commented Jan 30, 2019

hufnagel commented Jan 30, 2019

amaltaro commented Jan 30, 2019

vytjan commented Jan 31, 2019

hufnagel commented Jan 31, 2019 • edited Loading

vytjan commented Jan 31, 2019

vytjan commented Feb 7, 2019

amaltaro commented Feb 7, 2019

vytjan commented Feb 11, 2019

vytjan commented Feb 22, 2019

hufnagel commented Jan 31, 2019 •

edited

Loading