New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[override] DF/data4es: (possible) production step names for task. #371
Conversation
a10cc71
to
a16ae65
Compare
We have `ctag` field with the current (last) AMI tag; but fully understand the process, to which the given task belongs, we can only by the whole chain of AMI tags (as they basically define the steps of data processing). This field is not required now for any use-case, but will allow more accurate classification of tasks by steps and, consequently, let us provide more adequate results in some use-cases.
New fields are added on stage 017 (adjustMetadata): * ctag_format_step; * ami_tags_format_step. There are different ways to say one step from another: * MC production step name (already exists as 'step_name' field); * current AMI tag + output data format; * chain of AMI tags + output data format. The latter is supposed to be the most universal, but initially was used only the first one, then for some cases the second was invened, and... ...and there's no good way to make things as they are supposed to be all at once. So we need all the possible namings. --- NOTE 1. Step names are based on the output data formats. They can be taken from task's metadata `output_formats` field, but it wasn't used before (when we needed tasks classification by format-based-steps), so I am not sure can or cannot it be used here. NOTE 2. Since data format is taken not from `output_formats`, but from output datasets' names, the functionality from Stage 093 (datasetFormats) is reused by moving it into `pyDKB` library (module `atlas.misc` is created for this kind of more-than-once used functions).
No one will be interested in statictics related to these formats. In theory.
According to M.Borodin.
According to M.Borodin, field 'output_formats' and list of data formats, derived from the list of dataset names, must be exactly the same (if format is specified in 'output_formats', the task will produce a dataset in the given format).
e92199c
to
9c06106
Compare
Rebase on the new version of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to my vacation and ambiguous state of this PR, this is merely a group of suggestions on things that caught my eye instead of complete review followed by approval/call for changes.
Co-authored-by: Evildoor <evildoor256@gmail.com>
Co-authored-by: Evildoor <evildoor256@gmail.com>
Will close and reopen the PR in attemt to restart the Travis PR build... |
f23ad15
to
e163b42
Compare
The merge is supposed to make the PR "mergeable" and allow Travis CI PR build.
Well, it looks like the Travis PR check wouldn't start because this PR was "outdated"; at least on the requests page I saw this error:
;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh, most of the PR looks fine, but the last non-style change (9c06106) makes things look strange. We have a PR which adds something to stage 017 and adjusts following data samples. And also moves some functionality from stage 093 into a new library intended for ATLAS-related stuff.
Sure, one can understand why this happened by studying commits in this PR and both stages are fulfilling the similar role in different branches of the dataflow, but still - do the library-related things really belong to this PR? Furthermore, is there any need to do these changes now, when the function is still only used in 093?
P.S. Even if the answer is "yes" and everything is left as-is, there is no need to import atlas
in 017.
Historicaly -- yes:
However, since it is a single PR, not a series of PRs (which it would be, if the first version was reviewed and merged before the second one was implemented) -- there is a possibility (and formal reasons) to make the commit history more "refined". "Refined" means "better", so the only reason for "not" is the price of this refinement (in terms of time). I will do the refinement; this, however, means that this PR will be closed and replaced with another one (or two, actually) -- and then the next related PR (#374) will be rebased on the new version. Stay tuned... |
Overridden with
#379(merged),#380(merged)(To be closed after those above are merged)
New fields are added on stage 017 (adjustMetadata):
ami_tags
;ctag_format_step
;ami_tags_format_step
.To be discussed:
output_formats
?Yes.
TXT
,LOG
, ...?) be filtered out?Yes, but only
LOG
.ToDo: