[override] DF/data4es: (possible) production step names for task. #371

mgolosova · 2020-06-17T14:51:47Z

Overridden with #379 (merged), #380 (merged)

(To be closed after those above are merged)

New fields are added on stage 017 (adjustMetadata):

ami_tags;
ctag_format_step;
ami_tags_format_step.

To be discussed:

would it be correct to take data formats from output_formats?
Yes.
should some formats (TXT, LOG, ...?) be filtered out?
Yes, but only LOG.

ToDo:

rebase after Merge REST API server branch to master #176 and remove cherry-picked commit (bf88a2f);

mgolosova · 2020-06-19T11:02:54Z

Расширение data4es для данных по прогрессу обработки событий

We have `ctag` field with the current (last) AMI tag; but fully understand the process, to which the given task belongs, we can only by the whole chain of AMI tags (as they basically define the steps of data processing). This field is not required now for any use-case, but will allow more accurate classification of tasks by steps and, consequently, let us provide more adequate results in some use-cases.

New fields are added on stage 017 (adjustMetadata): * ctag_format_step; * ami_tags_format_step. There are different ways to say one step from another: * MC production step name (already exists as 'step_name' field); * current AMI tag + output data format; * chain of AMI tags + output data format. The latter is supposed to be the most universal, but initially was used only the first one, then for some cases the second was invened, and... ...and there's no good way to make things as they are supposed to be all at once. So we need all the possible namings. --- NOTE 1. Step names are based on the output data formats. They can be taken from task's metadata `output_formats` field, but it wasn't used before (when we needed tasks classification by format-based-steps), so I am not sure can or cannot it be used here. NOTE 2. Since data format is taken not from `output_formats`, but from output datasets' names, the functionality from Stage 093 (datasetFormats) is reused by moving it into `pyDKB` library (module `atlas.misc` is created for this kind of more-than-once used functions).

No one will be interested in statictics related to these formats. In theory.

According to M.Borodin.

According to M.Borodin, field 'output_formats' and list of data formats, derived from the list of dataset names, must be exactly the same (if format is specified in 'output_formats', the task will produce a dataset in the given format).

mgolosova · 2020-06-25T16:39:14Z

mgolosova force-pushed the data4es-017-steps branch from e92199c to 9c06106

Rebase on the new version of data4es-017-events-calculation-v2 + join of a couple of related commits (e.g. syntax fixes).

Evildoor

Due to my vacation and ambiguous state of this PR, this is merely a group of suggestions on things that caught my eye instead of complete review followed by approval/call for changes.

Utils/Dataflow/017_adjustMetadata/adjustMetadata.py

Utils/Dataflow/pyDKB/atlas/misc.py

Co-authored-by: Evildoor <evildoor256@gmail.com>

mgolosova · 2020-07-22T12:33:40Z

Will close and reopen the PR in attemt to restart the Travis PR build...

The merge is supposed to make the PR "mergeable" and allow Travis CI PR build.

mgolosova · 2020-07-23T13:02:35Z

Well, it looks like the Travis PR check wouldn't start because this PR was "outdated"; at least on the requests page I saw this error: “GitHub payload is missing a merge commit”, a recommendation for which sounds like: "please confirm your pull request is open and mergeable". Strange, however, that there was no problem in running the checks when I added a new one or removed the last commit... but anyway, now

all the checks have passed.

;)

Evildoor

Uh, most of the PR looks fine, but the last non-style change (9c06106) makes things look strange. We have a PR which adds something to stage 017 and adjusts following data samples. And also moves some functionality from stage 093 into a new library intended for ATLAS-related stuff.

Sure, one can understand why this happened by studying commits in this PR and both stages are fulfilling the similar role in different branches of the dataflow, but still - do the library-related things really belong to this PR? Furthermore, is there any need to do these changes now, when the function is still only used in 093?

P.S. Even if the answer is "yes" and everything is left as-is, there is no need to import atlas in 017.

mgolosova · 2020-07-29T10:24:23Z

@Evildoor,

is there any need to do these changes now, when the function is still only used in 093?

Historicaly -- yes:

1f9cc50 and 9c06106 have a several days gap between them;
9c06106 is not a "fix" of 1f9cc50, it is a different implementation of the same functionality -- which could not be used previously for the author (me) didn't have enough information;
rewriting a history in this case does not save someone who reads the commit history from a bunch of mistakes/typos and theire fixes, it masks the real history of development.

However, since it is a single PR, not a series of PRs (which it would be, if the first version was reviewed and merged before the second one was implemented) -- there is a possibility (and formal reasons) to make the commit history more "refined". "Refined" means "better", so the only reason for "not" is the price of this refinement (in terms of time).

I will do the refinement; this, however, means that this PR will be closed and replaced with another one (or two, actually) -- and then the next related PR (#374) will be rebased on the new version. Stay tuned...

mgolosova · 2020-07-29T11:00:25Z

@Evildoor,
PRs #379, #380 are ready for review.

mgolosova · 2020-07-29T11:14:37Z

#379 and #380 are merged, so this PR's functionality is fully presented in master.
Closing.

mgolosova self-assigned this Jun 17, 2020

mgolosova marked this pull request as draft June 17, 2020 14:51

mgolosova mentioned this pull request Jun 18, 2020

[override] DF: events processing progress data. #359

Closed

4 tasks

mgolosova mentioned this pull request Jun 22, 2020

[override] DF/data4es: Stage 040 (progress data) #374

Closed

2 tasks

mgolosova changed the title ~~[pending] DF/data4es: (possible) prodcution step names for task.~~ [pending] DF/data4es: (possible) production step names for task. Jun 23, 2020

mgolosova force-pushed the data4es-017-events-calculation-v2 branch from a10cc71 to a16ae65 Compare June 25, 2020 15:47

mgolosova added 5 commits June 25, 2020 18:12

DF/data4es/017: ignore technical data formats when generate step names.

a0adae1

No one will be interested in statictics related to these formats. In theory.

DF/data4es/017: TXT is not a 'technical' format.

16b9542

According to M.Borodin.

mgolosova force-pushed the data4es-017-steps branch from e92199c to 9c06106 Compare June 25, 2020 16:15

Evildoor reviewed Jun 27, 2020

View reviewed changes

Base automatically changed from data4es-017-events-calculation-v2 to master June 29, 2020 11:06

mgolosova changed the title ~~[pending] DF/data4es: (possible) production step names for task.~~ DF/data4es: (possible) production step names for task. Jun 29, 2020

mgolosova and others added 4 commits June 29, 2020 14:10

pyDKB: typo fix (in docstring).

d806359

Co-authored-by: Evildoor <evildoor256@gmail.com>

DF/data4es/017: fix wording (in docstring).

5c25843

Co-authored-by: Evildoor <evildoor256@gmail.com>

DF/data4es/017: typo fix (docstring).

e163b42

DF/data4es/017: fix word order (docstring).

f23ad15

mgolosova mentioned this pull request Jul 22, 2020

095 update: part 2 #284

Merged

mgolosova marked this pull request as ready for review July 22, 2020 12:14

mgolosova closed this Jul 22, 2020

mgolosova reopened this Jul 22, 2020

mgolosova changed the base branch from master to 009-typo-fix July 23, 2020 07:55

mgolosova changed the base branch from 009-typo-fix to master July 23, 2020 07:55

mgolosova force-pushed the data4es-017-steps branch 2 times, most recently from f23ad15 to e163b42 Compare July 23, 2020 12:40

Merge remote-tracking branch 'origin/master' into data4es-017-steps

cad6a01

The merge is supposed to make the PR "mergeable" and allow Travis CI PR build.

Evildoor requested changes Jul 24, 2020

View reviewed changes

mgolosova changed the title ~~DF/data4es: (possible) production step names for task.~~ [override] DF/data4es: (possible) production step names for task. Jul 29, 2020

This was referenced Jul 29, 2020

DF/data4es/093: refactoring. #379

Merged

DF/data4es/017: steps (refined history). #380

Merged

mgolosova marked this pull request as draft July 29, 2020 11:01

mgolosova closed this Jul 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[override] DF/data4es: (possible) production step names for task. #371

[override] DF/data4es: (possible) production step names for task. #371

mgolosova commented Jun 17, 2020 •

edited

mgolosova commented Jun 19, 2020

mgolosova commented Jun 25, 2020

Evildoor left a comment

mgolosova commented Jul 22, 2020

mgolosova commented Jul 23, 2020

Evildoor left a comment

mgolosova commented Jul 29, 2020

mgolosova commented Jul 29, 2020

mgolosova commented Jul 29, 2020

[override] DF/data4es: (possible) production step names for task. #371

[override] DF/data4es: (possible) production step names for task. #371

Conversation

mgolosova commented Jun 17, 2020 • edited

Overridden with #379 (merged), #380 (merged)

mgolosova commented Jun 19, 2020

mgolosova commented Jun 25, 2020

Evildoor left a comment

Choose a reason for hiding this comment

mgolosova commented Jul 22, 2020

mgolosova commented Jul 23, 2020

Evildoor left a comment

Choose a reason for hiding this comment

mgolosova commented Jul 29, 2020

mgolosova commented Jul 29, 2020

mgolosova commented Jul 29, 2020

mgolosova commented Jun 17, 2020 •

edited