-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Samples update 2018 #199
Samples update 2018 #199
Conversation
3bac0bd
to
6ca050c
Compare
e3a7e27
to
f32a8e4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few comments before I checked the samples themselves, please take a look.
Can you also try and resolve the conflicts with master
? There are two ways to do:
git fetch; git rebase origin/master
<...solve conflicts here...>
git push --force
or
git fetch; git merge origin/master
<...solve conflicts here...>
git push
The first will "hide" conflicts, so that no one will be able to see and check how they were solved (well, except that recently GitHub started to keep the story of push-forces, allowing to see the difference between two versions); the second will save this information within the merge commit.
Both looks fine for me; merge
is definitely more "right" way, yet rebase
is "less noisy". So I don`t know which way would I prefer in same situation )
Resolved the conflicts. |
e9424f3This commit is a bit confusing, as it mixes up "what was" (that requires one more fix) and "what became".
Actualy, the good rule is: when you want to write in commit log something like "do(ne) this and do(ne) that" (here: "update" and "add"), stop for a moment and think. If you can`t clearly explain, how this two changes are related to each other, it is better to separate them and make two individual commits. An example of "related" changes is: "Data processing stage changed: now it produces records with additional field 'X' (change 'A'). So 'X' is added to the mapping (change 'B') and output samples are updated (change 'C')." If one does not perform 'B' or 'C' at all, things become inconsistent: previous version of the output data example simply becomes incorrect, and missed field in the mapping makes general behaviour unpredictable and may lead to conflicts in the futire (one who wants to know what data are available in the final storage, will check the mapping and won`t be aware of the field 'X'; one who wants to make changes to another stage, may try to add his own version of 'X' field, and it will lead to a conflict). This three chages may go separately, as three commits (within a single PR), if the author wishes, yet it is fine to commit three-in-one, as they are clearly related to each other. |
You are absolutely right about e9424f3. I tried to properly update the existing commits with symbolic links' changes, but something got in the way, so I put them into a separate commit to deal with later, if necessary. Well, that "later" is now then. The commit was scrapped and its changes were incorporated into earlier commits:
This way, the symbolic links should properly represent the situation with the samples at all points of time. While doing so I had to rewrite history, and now numerous commits merged from master are displayed separately. Is this ok, or should I redo this PR in a separate branch? |
Doesn`t look very good, you`re right. Here`s what happened.
After merge:
After rebase:
In other words: to What you can do:
New branch is created just to have an easy way to step back and start it over, same can be done right in the Functionally there is no difference if you close this PR and open another one from the new branch: the only difference is that you can keep the name of this PR`s source branch, replacing its content. |
Aha, I see. Thank you very much for the detailed explanation. |
As ATLAS' development progresses, the data can change. Some types of new information, such as `hs06sec` for Oracle / Chicago ES, can be absent in 2016 sample. Therefore, new 2018 sample are added to properly illustrate the state of the data, but the 2016 one are preserved because the system must work correctly regardless of the data gathering time. Update the stage's README to describe the samples. Conflicts: Utils/Dataflow/009_oracleConnector/README
Update the stage's README to describe the samples.
Update the stage's README to describe the samples.
Update the stage's README to describe the samples. Add a symbolic link to the input used for creating the sample.
Update the stage's README to describe the samples.
Update the stage's README to describe the samples.
Update the stage's README to describe the samples. Add symbolic links to the input files used to produce the sample.
Update all READMEs to reflect the change. Update symbolic links broken by the change.
Current sample was obtained by processing 009's output, while the stage should be working with 091's output. Replace the sample with correct one.
Current sample was obtained by processing 091's output, while the stage should be working with 025's output. Replace the sample with correct one.
Current sample was obtained with stage 025 skipped. Update it properly.
@mgolosova
was not an option due to the fact that at least one commit before the merge had to be changed. I decided that I got tired of hauling this merge in this PR, so I started new branch from current master, cherry-picked commits into it and reset this branch to a new one (diff said that they were identical). Please, take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Evildoor, still can not approve the PR.
Main reason is that the 2018 output sample of the 025 does not contain toths06*
metadata.
Maybe there were no information in Chicago ES when you created the PR, but now it is definitely there. So I think I should ask you to update the sample; without these fields it simply does not serve its purpose...
Also, just noticed that Stage 025 input
directory still linked to the 009's output
(091 output/input_*
files are expected). There are two options:
- simply link
025_*/input
to091_*/output
, as in 093.
Then in theREADME
(of both 025 and 093 stages) should be specified which input files were used; - create a directory with two links to
091_*/output/input_*
files.
Then the 093'sinput
should be updated accordingly (to contain only091_*/output/output_*
files).
And, talking about links to the output
directory and individual files: 016's input
is a directory with two links, while it can be a single link to the 025's output
. I see it is due to the fact that originally there was a link to the 091's specific output file, but now it can be changed. It`s just because links to directories simplify further samples management: if one needs to rename samples, he/she will rename only output samples, not input and output together. Won`t work for all stages, of course, but still...
There`s an artificial line in 009's 2016 output sample:
{"taskid": 13319225, "start_time" : "27-02-2018 16:56:38", "end_time" : "07-03-2018 14:24:46", "status": "done"}
It was added in #191, to have at least one record with new metadata fields. Now it can (and should) be removed, as exactly for this purpose 2018 samples were added.
Seems like this record was to be propagated from there only to the 025's output, which was updated in this PR and does not contain this line any longer, so it is the only sample to fix.
Everything else seems to be fine, thank you!
The line was added by hand for sake of having at least one record with new metadata fields in the sample. Now such records can be obtained from 2018 samples, so the line is no longer needed.
- Redirect stage 025's input to stage 091's output as it should be. - Specify which of the 091's samples were used in stage 025's README. This is important since stage 091 produces two types of files. - Do the same in stage 093's README for consistency.
Done.
Missed that, thank you. Done, first option, because...
... I agree with you - links to directories are easier to work with.
Removed it. |
- Re-produce the sample to add missing toths06* fields. - Update the corresponding samples of stages 016 and 019 that follow the 025.
Directory links are easier to maintain.
My bad, fixed. |
@Evildoor, thank you! Approved and merged now. |
The data changes overtime, which means that something can be missing in the current samples that were taken from the 2016's data. Samples from the 2018's data should be added to properly represent the data, but the system must work on both sets.
Note - data gathering intervals were saved in
Utils/Dataflow/009_oracleConnector/README
.