Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DF/data4es: events calculation v2. #369

Merged
merged 2 commits into from Jun 29, 2020

Conversation

mgolosova
Copy link
Collaborator

@mgolosova mgolosova commented Jun 17, 2020

New algorithm of calculation for these values was suggested by M.Borodin Mikhail.Borodin@cern.ch during a discussion on "how to get current progress info for a task from DEFT" as the "most general" one.

There can possibly be some cases not addressed in the current version, but they are to be detected when we start collecting and using the data.


Waiting for: #368 (parent task metadata).

ToDo:

@mgolosova mgolosova self-assigned this Jun 17, 2020
@mgolosova
Copy link
Collaborator Author

Base automatically changed from data4es-task-parent-info to master June 25, 2020 12:47
@mgolosova mgolosova force-pushed the data4es-017-events-calculation-v2 branch from a078a6e to a10cc71 Compare June 25, 2020 15:32
New algorithm of calculation for these values was suggested by
M.Borodin <Mikhail.Borodin@cern.ch> during a discussion on "how to get
current progress info for a task from DEFT" as the "most general" one.

There can possibly be some cases not addressed in the current version,
but they are to be detected when we start collecting and using the data.
@mgolosova mgolosova force-pushed the data4es-017-events-calculation-v2 branch from a10cc71 to a16ae65 Compare June 25, 2020 15:47
@mgolosova
Copy link
Collaborator Author

First I tried to simply merge master here, but the branch this one was derived from was rewritten (rebased on a more actual master), so there was an extra commit duplicating the one that was already in master with different id.

mgolosova force-pushed the data4es-017-events-calculation-v2 branch from a078a6e to a10cc71 16 minutes ago

Here I fixed the merge commit (messed a bit with the mapping conflict resolving and in addition to the conflict resolution fixed codestyle).

mgolosova force-pushed the data4es-017-events-calculation-v2 branch from a10cc71 to a16ae65 17 seconds ago

Here I rebased current branch to the master and thus get rid of multiple identic commits in the history.

It was a good idea to stay away from rebases and push-forces, but, unfortunately, one rebase has already happened -- and now we have to follow this paradigm to avoid extra mess in the resulting commits history in master.

@mgolosova mgolosova changed the title [pending] DF/data4es: events calculation v2. DF/data4es: events calculation v2. Jun 25, 2020
@mgolosova mgolosova requested a review from Evildoor June 25, 2020 15:56
Copy link
Contributor

@Evildoor Evildoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took some time to verify the logic, but it seems to be correct according to trello, emails, and "Mikhail said so". Hence, only two text-related comments remain.

Utils/Dataflow/017_adjustMetadata/adjustMetadata.py Outdated Show resolved Hide resolved
result = None

# First task in chain
if data['taskid'] == data['chain_id']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Foreword: this comment is just a bunch of thoughts about the situation that was already discussed in #368, so it does not call for changes in this PR, and may be left without answer.


This means that this function MUST be executed after chain data handling in transform_chain_data(). It's indeed coded this way, so if the said function correctly processes the data then there should be no problems. However...

  1. What if something actually goes wrong?
  2. If the chain roots are now receiving this special treatment, then it's another reason to think about "Is it a good idea to designate a task as a root of its own chain if something goes wrong?".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. If something goes wrong (as in “there’s no taskid or chain_id), then we can not calculate values we need properly, and, in fact, there’s something completely wring about given message. If it’s a one-time accident, we fail now and try again later when the process restarted. If we keep on failing at this record than it’s not an accident and we need to take a closer look at the data and decide how to handle it. When it’s no longer a hypothetical “what if”.

  2. This”special treatment” means “we need to know if this task is a first in chain or not. If we don’t know for sure, but set chain_id to taskid, here we’ll think that it is first, although it might as well be the opposite. I believe it would be incorrect.

@mgolosova mgolosova merged commit 239cc26 into master Jun 29, 2020
@mgolosova mgolosova deleted the data4es-017-events-calculation-v2 branch June 29, 2020 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants