Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[override] DF/data4es: (possible) production step names for task. #371

Closed
wants to merge 10 commits into from

Conversation

mgolosova
Copy link
Collaborator

@mgolosova mgolosova commented Jun 17, 2020

Overridden with #379 (merged), #380 (merged)

(To be closed after those above are merged)

New fields are added on stage 017 (adjustMetadata):

  • ami_tags;
  • ctag_format_step;
  • ami_tags_format_step.

To be discussed:

  • would it be correct to take data formats from output_formats?
    Yes.
  • should some formats (TXT, LOG, ...?) be filtered out?
    Yes, but only LOG.

ToDo:

@mgolosova mgolosova self-assigned this Jun 17, 2020
@mgolosova mgolosova marked this pull request as draft June 17, 2020 14:51
@mgolosova
Copy link
Collaborator Author

@mgolosova mgolosova changed the title [pending] DF/data4es: (possible) prodcution step names for task. [pending] DF/data4es: (possible) production step names for task. Jun 23, 2020
@mgolosova mgolosova force-pushed the data4es-017-events-calculation-v2 branch from a10cc71 to a16ae65 Compare June 25, 2020 15:47
We have `ctag` field with the current (last) AMI tag; but fully
understand the process, to which the given task belongs, we can only by
the whole chain of AMI tags (as they basically define the steps of data
processing).

This field is not required now for any use-case, but will allow more
accurate classification of tasks by steps and, consequently, let us
provide more adequate results in some use-cases.
New fields are added on stage 017 (adjustMetadata):
* ctag_format_step;
* ami_tags_format_step.

There are different ways to say one step from another:
* MC production step name (already exists as 'step_name' field);
* current AMI tag + output data format;
* chain of AMI tags + output data format.

The latter is supposed to be the most universal, but initially was used
only the first one, then for some cases the second was invened, and...
...and there's no good way to make things as they are supposed to be all
at once. So we need all the possible namings.

---

NOTE 1.
Step names are based on the output data formats. They can be taken from
task's metadata `output_formats` field, but it wasn't used before (when
we needed tasks classification by format-based-steps), so I am not sure
can or cannot it be used here.

NOTE 2.
Since data format is taken not from `output_formats`, but from output
datasets' names, the functionality from Stage 093 (datasetFormats) is
reused by moving it into `pyDKB` library (module `atlas.misc` is
created for this kind of more-than-once used functions).
No one will be interested in statictics related to these formats. In
theory.
According to M.Borodin, field 'output_formats' and list of data formats,
derived from the list of dataset names, must be exactly the same (if
format is specified in 'output_formats', the task will produce a dataset
in the given format).
@mgolosova
Copy link
Collaborator Author

mgolosova force-pushed the data4es-017-steps branch from e92199c to 9c06106

Rebase on the new version of data4es-017-events-calculation-v2 + join of a couple of related commits (e.g. syntax fixes).

Copy link
Contributor

@Evildoor Evildoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to my vacation and ambiguous state of this PR, this is merely a group of suggestions on things that caught my eye instead of complete review followed by approval/call for changes.

Utils/Dataflow/017_adjustMetadata/adjustMetadata.py Outdated Show resolved Hide resolved
Utils/Dataflow/017_adjustMetadata/adjustMetadata.py Outdated Show resolved Hide resolved
Utils/Dataflow/pyDKB/atlas/misc.py Outdated Show resolved Hide resolved
Base automatically changed from data4es-017-events-calculation-v2 to master June 29, 2020 11:06
@mgolosova mgolosova changed the title [pending] DF/data4es: (possible) production step names for task. DF/data4es: (possible) production step names for task. Jun 29, 2020
@mgolosova mgolosova mentioned this pull request Jul 22, 2020
@mgolosova mgolosova marked this pull request as ready for review July 22, 2020 12:14
@mgolosova
Copy link
Collaborator Author

Will close and reopen the PR in attemt to restart the Travis PR build...

@mgolosova mgolosova closed this Jul 22, 2020
@mgolosova mgolosova reopened this Jul 22, 2020
@mgolosova mgolosova changed the base branch from master to 009-typo-fix July 23, 2020 07:55
@mgolosova mgolosova changed the base branch from 009-typo-fix to master July 23, 2020 07:55
@mgolosova mgolosova force-pushed the data4es-017-steps branch 2 times, most recently from f23ad15 to e163b42 Compare July 23, 2020 12:40
The merge is supposed to make the PR "mergeable" and allow Travis CI PR
build.
@mgolosova
Copy link
Collaborator Author

Well, it looks like the Travis PR check wouldn't start because this PR was "outdated"; at least on the requests page I saw this error: “GitHub payload is missing a merge commit”, a recommendation for which sounds like: "please confirm your pull request is open and mergeable". Strange, however, that there was no problem in running the checks when I added a new one or removed the last commit... but anyway, now

  • all the checks have passed.

;)

Copy link
Contributor

@Evildoor Evildoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, most of the PR looks fine, but the last non-style change (9c06106) makes things look strange. We have a PR which adds something to stage 017 and adjusts following data samples. And also moves some functionality from stage 093 into a new library intended for ATLAS-related stuff.

Sure, one can understand why this happened by studying commits in this PR and both stages are fulfilling the similar role in different branches of the dataflow, but still - do the library-related things really belong to this PR? Furthermore, is there any need to do these changes now, when the function is still only used in 093?

P.S. Even if the answer is "yes" and everything is left as-is, there is no need to import atlas in 017.

@mgolosova
Copy link
Collaborator Author

@Evildoor,

is there any need to do these changes now, when the function is still only used in 093?

Historicaly -- yes:

  • 1f9cc50 and 9c06106 have a several days gap between them;
  • 9c06106 is not a "fix" of 1f9cc50, it is a different implementation of the same functionality -- which could not be used previously for the author (me) didn't have enough information;
  • rewriting a history in this case does not save someone who reads the commit history from a bunch of mistakes/typos and theire fixes, it masks the real history of development.

However, since it is a single PR, not a series of PRs (which it would be, if the first version was reviewed and merged before the second one was implemented) -- there is a possibility (and formal reasons) to make the commit history more "refined". "Refined" means "better", so the only reason for "not" is the price of this refinement (in terms of time).

I will do the refinement; this, however, means that this PR will be closed and replaced with another one (or two, actually) -- and then the next related PR (#374) will be rebased on the new version. Stay tuned...

@mgolosova mgolosova changed the title DF/data4es: (possible) production step names for task. [override] DF/data4es: (possible) production step names for task. Jul 29, 2020
@mgolosova
Copy link
Collaborator Author

@Evildoor,
PRs #379, #380 are ready for review.

@mgolosova mgolosova marked this pull request as draft July 29, 2020 11:01
@mgolosova
Copy link
Collaborator Author

#379 and #380 are merged, so this PR's functionality is fully presented in master.
Closing.

@mgolosova mgolosova closed this Jul 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants