Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Versioning T0 replay runs when the same run is used more than once #4475

Closed
vytjan opened this issue Jan 30, 2019 · 9 comments
Closed

Versioning T0 replay runs when the same run is used more than once #4475

vytjan opened this issue Jan 30, 2019 · 9 comments

Comments

@vytjan
Copy link
Contributor

vytjan commented Jan 30, 2019

Due to WMStats functionality, when we inject run 123456 in one replay and all workflows get to "normal-archived" state, then, when using the same run in any future replay, no workflows of run 123456 are displayed on testbed WMStats.
Since this year we will run multiple replays injecting the same run(s), it would be useful to have information about runs/workflows of every replay on WMStats.
@hufnagel how do you think, would the following be reasonable - to use some similar versioning for replayed runs or workflows as we currently use in the processing version for datasets?
It could be added as some additional parameter in the replay configuration I guess. Or could there be some different approach to this issue?
As I talked to Alan, we are not really willing to delete any couchdb documents containing the workflows of run 123456 from older replays in such case.

@hufnagel
Copy link
Member

WMStats can't distinguish between different versions of the same workflow as long as the workflow is named exactly the same. Correct @amaltaro ?

But we have freedom to name workflows whatever we want in the Tier0. You could include a version number of string of sorts in the workflow name. Agent vobox hostname doesn't really work by itself, but how about vobox hostname plus processing version for the replay? We increment the latter anyways when we do new replays on the same agent/vobox. Should be easy to change the python in the config to force this to be consistent.

If you plan on running the same replay with the same processing version multiple times, you would have to invent some unique identifier/counter. But that still doesn't help you to keep track of things since you still need to be able which replay used which identifier. So might as well use the processing version, which has the useful side effect of also separating the output data of the replay.

One thing though, you likely want this to be optional since I don't think we want this for the production Tier0, correct?

@amaltaro
Copy link
Contributor

I initially thought about a random number, just to make it a unique workflow and able to properly use wmstats functionality.
However, as Dirk points out, we might want to have a logic where it's easy to identify the workflow names created. How about we make it:
normal workflow name + short hostname (not fqdn) + timestamp with date and time (same as we have for production)
? This should be enough.

Then as we discussed as well, it should be a configurable parameter for the Tier0Feeder which you only enable when you're running a replay or so.

@vytjan
Copy link
Contributor Author

vytjan commented Jan 31, 2019

Alan, I would go with Dirk's suggestion to add the agent vobox hostname (or even the processing era, which is e.g. Tier0_REPLAY_vocms015) + processing version of the replay to the workflow names.
As I checked the Tier0Config.py and RunConfigAPI.py, an easy way could be adding an optional 'isReplay' parameter to ReplayOfflineConfiguration.py.
If it is defined, then the workflow name would be changed from
Express_Run326607_StreamHIExpressAlignment
to
Express_Run326607_StreamHIExpressAlignment_Tier0_REPLAY_vocms015_v212.

Dirk, Alan, would such naming make sense? We are identifying every replay with the processing version anyway. Or we could add just the processing version.

@hufnagel
Copy link
Member

hufnagel commented Jan 31, 2019

If we create an isReplay parameter, would we want to use it to configure other settings too? There are a few parameters we just toggle between two settings, one for production and one for replays...

Otherwise I would just leave this at useUniqueWorkflowName or something like that, which tells you exactly what the parameter actually does.

@vytjan
Copy link
Contributor Author

vytjan commented Jan 31, 2019

Thanks for your feedback Dirk. Indeed, since this parameter will only be used for naming workflows, then we would like to have it as specific as possible. Then I agree something like useUniqueWorkflowName will work.
I am going to create a patch for this and test it.

@vytjan
Copy link
Contributor Author

vytjan commented Feb 7, 2019

So I added a new Global T0 configuration parameter UseUniqueWorkflowName.
The workflow name is formed in such manner:
Express_Run322057_StreamExpressAlignment_vocms047_v270

I implemented it following other global parameters setup and already ran a simple test replay on vocms047:
vytjan/T0@master...vytjan:versioning-workflows
Although, not sure whether the UseUniqueWorkflowName should be used to set the hostname of the VM or it should simply contain a boolean value in the replay configuration.

@amaltaro
Copy link
Contributor

amaltaro commented Feb 7, 2019

Just an idea, how about setting that config somewhere here:
https://github.com/vytjan/T0/blob/ecf29162c776a4eecb5e87c94a0e2dcea70455ee/src/python/T0/RunConfig/Tier0Config.py#L247

and then evaluate it at Tier0Feeder before building the workflow name?
The hostname is defined in the WMAgent configuration file (or socket.gethostname() for instance).

In addition to that, it might be worth it adding the date, just in case someone run the two replays with the same settings, something like
currentTime = time.strftime('%y%m%d_%H%M%S', time.localtime(time.time()))

@vytjan
Copy link
Contributor Author

vytjan commented Feb 11, 2019

I changed UseUniqueWorkflowName to EnableUniqueWorkflowName boolean parameter.
Also, added the acquisition era string to the workflow name as well as the timestamp. Thus, now the workflow name in replays is formed (excluding seconds in the timestamp on purpose):
PromptReco_Run322057_JetHT_Tier0_REPLAY_vocms047_v271_190211_1814.

vytjan/T0@master...vytjan:versioning-workflows#diff-798a0fd2fcb838e49ef7b0b833411e1aR558

Now it makes more sense and the workflow names are "enough" unique/separable. At the same time, the longest WF name is ~85 characters, so there is still enough until we reach 100:
Express_Run322057_StreamALCALUMIPIXELSEXPRESS_Tier0_REPLAY_vocms0313_v271_190211_1802

@vytjan
Copy link
Contributor Author

vytjan commented Feb 22, 2019

Merged #4476
#4476

@vytjan vytjan closed this as completed Feb 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants