Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AnnexJsonProtocol progress bar fixes #5421

Merged
merged 6 commits into from Feb 23, 2021

Conversation

kyleam
Copy link
Contributor

@kyleam kyleam commented Feb 17, 2021

This series makes two changes to AnnexJsonProtocol:

  • corrects a bug in the progress bar ID calculation that resulted in using the same ID across concurrent operations on different files

  • adds the file name to the progress bar

Together these fix gh-5419.

@kyleam kyleam changed the base branch from master to maint February 17, 2021 17:33
@codecov
Copy link

codecov bot commented Feb 17, 2021

Codecov Report

Merging #5421 (455047d) into maint (528e199) will increase coverage by 0.02%.
The diff coverage is 96.55%.

Impacted file tree graph

@@            Coverage Diff             @@
##            maint    #5421      +/-   ##
==========================================
+ Coverage   89.52%   89.55%   +0.02%     
==========================================
  Files         296      296              
  Lines       41734    41756      +22     
==========================================
+ Hits        37362    37394      +32     
+ Misses       4372     4362      -10     
Impacted Files Coverage Δ
datalad/support/annexrepo.py 90.69% <96.55%> (+0.09%) ⬆️
datalad/downloaders/tests/test_http.py 90.49% <0.00%> (+1.90%) ⬆️
datalad/ui/utils.py 56.25% <0.00%> (+9.37%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 528e199...455047d. Read the comment docs.

@kyleam
Copy link
Contributor Author

kyleam commented Feb 18, 2021

Taking another look at the example from gh-5419, the progress bar handling still looks off, even with the main issue resolved. I'm not quite sure in what way, but I think it has to do with clearing finished bars (*). I went back to before get was switched to AnnexJsonProtocol (46d78a8353^), and I don't think the same issue is present (things still look off, but in a different way). Hmm..

(*) I see the same issue when I use just "command", "key", and "file" fields to generate the hash, so it doesn't appear to be related to the hash discussion above.

@yarikoptic
Copy link
Member

the progress bar handling still looks off,

FWIW:

off-1: indeed -- they seems to be not "finalized"/closed...
While trying with 0.13.7 I have mentioned somewhat similar (but not as drastic) behavior:

$> datalad install -g -J5 ///dbic/QA
Total (9 ok out of 2255):   1%|▎                                                           | 403M/77.8G [01:16<6:27:48, 3.32MB/s]
sourcedata/sub-emmet/ses-20180521/f .. 21_task-rest_acq-3mm_bold.dicom.tgz:  96%|██████████▌| 45.6M/47.6M [00:29<00:00, 4.13MB/s]
sourcedata/sub-emmet/ses-20180508/f .. 508_task-rest_acq-p2_bold.dicom.tgz: 100%|██████████▉| 48.3M/48.4M [00:29<00:00, 2.41MB/s]
sourcedata/sub-amit/ses-20180508/dw .. 0508_acq-DTIX30Xp2Xs4_dwi.dicom.tgz:  35%|███▉       | 31.9M/90.0M [01:16<43:42, 22.2kB/s]
sourcedata/sub-emmet/ses-20180508/d .. 0508_acq-DTIX30Xp2Xs4_dwi.dicom.tgz:  61%|██████▋    | 55.4M/90.3M [01:16<00:11, 3.15MB/s]
sourcedata/sub-emmet/ses-20180521/d .. _acq-DTIX30Xp2Xs4X2mm_dwi.dicom.tgz:   8%|▋        | 6.97M/83.6M [01:15<3:35:48, 5.92kB/s]
sourcedata/sub-emmet/ses-20180521/f .. _acq-3mm_dir-AP_magnitude.dicom.tgz:  99%|██████████▉| 2.65M/2.68M [00:13<00:00, 5.31MB/s]
sourcedata/sub-emmet/ses-20180521/f .. -rest_acq-p2Xs4X35mm_bold.dicom.tgz:  99%|██████████▉| 49.9M/50.3M [00:14<00:00, 1.72MB/s]
sourcedata/sub-emmet/ses-20180521/f .. _acq-3mm_dir-LR_magnitude.dicom.tgz:   0%|                    | 0.00/2.68M [00:00<?, ?B/s]
sourcedata/sub-emmet/ses-20180521/a .. s-20180521_acq-MPRAGE_T1w.dicom.tgz:  58%|██████▎    | 9.46M/16.4M [00:55<04:12, 27.5kB/s]
sourcedata/sub-emmet/ses-20180521/f .. 521_task-rest_acq-p2_bold.dicom.tgz:   2%|▏           | 974k/48.1M [00:07<13:03, 60.2kB/s]

you can see that some are at 99% (and in fact probably 100%) or even 100% but they linger around. Some seems to be eventually reused for another progress bar but in total we have more than 5 requested. The same with 0.12.7 -- so I suspect that this particular aspect is probably due to tqdm itself (not rushing to close/wipe out complete progress bars?) or how we use tqdm. (the same behavior after I upgraded to tqdm 4.57.0)

off-2: Another "off" for me is that paths are not "compressed" whenever too long. Compare current to above:

$> datalad install -g -J5 ///dbic/QA
Total:   0%|                                                             | 0.00/76.9G [00:00<?, ? Bytes/s]
Get sourcedata/sub-emmet/ses-20180531/dwi/sub-emmet_ses-20180531_acq-DTIX30Xp2X2mm_dwi.dicom.tgz:  56%|▌| 
Get sourcedata/sub-qa/ses-20161205/func/sub-qa_ses-20161205_task-rest_acq-p2Xs4X35mm_bold.dicom.tgz:  38%|
Get sourcedata/sub-qa/ses-20161219/dwi/sub-qa_ses-20161219_acq-DTIX30Xp2_dwi.dicom.tgz:   1%| | 1.13M/85.9
Get sourcedata/sub-qa/ses-20161212/dwi/sub-qa_ses-20161212_acq-DTIX30Xp2_dwi.dicom.tgz:   4%| | 3.34M/82.5
Get sourcedata/sub-qa/ses-20161219/dwi/sub-qa_ses-20161219_acq-DTIX30Xp2Xs4_dwi.dicom.tgz:  43%|▍| 37.8M/

@kyleam
Copy link
Contributor Author

kyleam commented Feb 18, 2021

Thanks for confirming on your end.

off-2: Another "off" for me is that paths are not "compressed" whenever too long

I noticed that too when looking at the output on 46d78a8353^. That issue is already mentioned in the second commit, but when discussing it, I hadn't realized that the old behavior was to elide. So I agree that to restore the pre-0.14 behavior it needs to be dealt with rather than punted on.

@kyleam kyleam marked this pull request as draft February 18, 2021 15:51
This is used a good amount and will be used more in upcoming commits.
_proc_json_record() logs error messages from the _action_ value [*].
However, it only does so when "byte-progress" is in the top-level
record.  I'm not sure if there's ever a case where the "action" and
"byte-progress" keys aren't both present, but, since the current code
bothers to check both conditions, move the logging of the errors
upstream so that it's triggered even if byte-progress isn't present.

[*] It's not clear to me that git-annex ever adds error messages to
    this value instead of the top-level record.
The code for generating the progress bar ID and label are going to get
extended in upcoming commits.  Extract it to separate methods to avoid
blowing up _proc_json_record().
_proc_json_record() considers only byte-progress records when updating
individual progress bars.  However, --json-progress output from
git-annex commands may not include a progress record where the
byte-progress value equals the total size.  The only indication that
the record is complete, then, is the final non-progress record for the
operation:

  {"byte-progress":14173395,
    "action":{"command":"get",
              "key":"SHA256E-s25194376--...",
              "file":"foo",
              ...},
    "total-size":25194376,
    "percent-progress":"56.26%"}
  {"command":"get","note":"from origin...\nchecksum...","success":true,
   "key":"SHA256E-s25194376--...",
   "file":"foo",
    ...}

Rework _proc_json_record() so that the final record triggers the
progress bar tear down.  In order to process a bar's completion, the
final non-progress record needs to be linked to the byte-progress bar.
While the action record seems to be a reliable link across progress
records, it isn't present in the final non-progress record, so hard
code a set of fields---command, key, and file---that should be
sufficient for linking the records for a single operations.

This new approach also fixes an error in the progress ID logic.  The
old approach was to generate a progress bar ID for incoming data by
hashing id(self) and the action of the json record.  However, the
calculation only considered the action's _keys_.  That means that it
would, for example, incorrectly collapse get() results for different
files into the same progress bar because the action results share the
same set of keys.

Fixes datalad#5419.
AnnexJsonProtocol uses the command from the action result as the
progress bar label.  When operations act on a particular file and work
in parallel, there's no indication about which file a progress bar is
for:

  Get:  15%...
  Get:  25%...

Prior to 0.14.0 (in particular 46d78a8), when AnnexRepo.get()
didn't use AnnexJsonProtocol, the above would be displayed as

  foo:  15%...
  foobar:  25%...

Update AnnexJsonProtocol to include the file name, if present,
effectively merging the above two displays into

  Get foo:  15%...
  Get foobar:  25%...

The display is a bit uglier because the progress bars aren't aligned
unless the files happen to be the same length and because long file
names can take up a lot of space, but both of those seem better than
not having the information at all.  The latter will be mitigated by
the next commit.
In 0.14 commands using ProcessAnnexProgressIndicators were switched to
AnnexJsonProtocol, and ProcessAnnexProgressIndicators was removed
(948ccf3).  This switch dropped the file names from the progress
bars, and the last commit added them back.  Long file names, though,
can interfere with the display of the progress bar.

Deal with this by porting similar truncation logic to what was in the
now-removed ProcessAnnexProgressIndicators.
@kyleam kyleam marked this pull request as ready for review February 19, 2021 20:51
@kyleam
Copy link
Contributor Author

kyleam commented Feb 19, 2021

All right, I think the latest push should resolve all of the 0.14 regressions. The issue with bars lingering was because bars weren't torn down unless a progress record for 100% came in, which doesn't always happen. The fix pushed is to use a progress bar ID that can link the progress records (which contain the action fields) to the final record for the operation (no action field). I don't see a way to do that without hard coding a subset of keys, so, @yarikoptic, you'll be happy to see that your suggested "command", "key", "file" set is used.

@yarikoptic
Copy link
Member

I thought to report a new issue since thought that it is unrelated but it seems it might actually be triggered by changes here.

we do get back individual progress bars now (look great, thanks @kyleam), but Total seems became completely inadequate. Even though it seems to get Total updated when any progress bar finishes (not "on the fly" like with 0.13.7; so ideally total should also get updated with any new progress bar updated), its ETA and speed seems to remain incorrect. When I tried maint version, it seems that speed does get updated (thus decided that it might be due to changes in this PR).

$> datalad get -J4 .
Total:   2%|█▏                                               | 58.9M/2.41G [01:00<25:57:08, 25.2k Bytes/s]
Get sub-03/func/sub-03_ta .. sktask_run-02_bold.nii.gz:  68%|██▋ | 31.5M/46.2M [00:57<01:02, 236k Bytes/s]
Get sub-03/func/sub-03_ta .. sktask_run-03_bold.nii.gz:  79%|██▎| 36.3M/46.2M [00:17<00:05, 1.67M Bytes/s]
Get sub-02/func/sub-02_ta .. sktask_run-02_bold.nii.gz:  63%|██▌ | 31.8M/50.7M [00:30<01:03, 299k Bytes/s]
Get sub-02/func/sub-02_ta .. sktask_run-01_bold.nii.gz:  77%|██▎| 39.3M/50.8M [00:15<00:03, 3.45M Bytes/s]

and some times it seems to not get updated at all (I had 3 or 4 progress bar disappear but Total remains 0) (that is on ///openneuro/ds000001 where I dropped everything before running get)

$> datalad get -J4 .
Total:   0%|                                                             | 0.00/2.42G [00:00<?, ? Bytes/s]
Get sub-02/func/sub-02_ta .. sktask_run-03_bold.nii.gz:  78%|██▎| 39.5M/50.7M [00:32<00:08, 1.32M Bytes/s]
Get sub-01/func/sub-01_ta .. sktask_run-03_bold.nii.gz:  26%|| 12.5M/47.3M [00:27<00:53, 650k Bytes/s]
Get sub-03/anat/sub-03_T1w.nii.gz:  41%|██████████▏              | 2.16M/5.28M [00:01<00:03, 907k Bytes/s]
Get sub-02/func/sub-02_ta .. sktask_run-01_bold.nii.gz:   7%|| 3.67M/50.8M [00:08<00:39, 1.20M Bytes/s]

so it might be just that Total is missing updates on some completed progress bars but not the other.

@kyleam
Copy link
Contributor Author

kyleam commented Feb 23, 2021

Thanks for testing.

thus decided that it might be due to changes in this PR

Quite possible. I'll take another look.

@kyleam
Copy link
Contributor Author

kyleam commented Feb 23, 2021

Even though it seems to get Total updated when any progress bar finishes [...], its ETA and speed seems to remain incorrect
[...]
and some times it seems to not get updated at all

I'm not seeing how any of the changes here would trigger that. Assuming the new conditions map onto the old ones as intended, there shouldn't be any change to the handling of the global progress bar (which is untouched by the diff).

Trying with maint on ds000001, it's of course hard to tell because all the bars are incorrectly collapsed into one, but I see what looks like both those behaviors for some runs. And for this PR I also only see it sometimes. For the cases where the update doesn't happen, it seems like the bytes might eventually comes around, though I haven't completely convinced myself that's true.

I'm having a hard time debugging this in any controlled fashion. With smaller set of files, I haven't had any luck triggering these issues.

not "on the fly" like with 0.13.7; so ideally total should also get updated with any new progress bar updated

Yes, I believe that's a deliberate design decision of AnnexJsonProtocol. I'd prefer not to bundle that change in this PR.

@yarikoptic
Copy link
Member

Ok, let's proceed with this PR as is and I will file a separate issue for the total

@yarikoptic yarikoptic merged commit 31fbf69 into datalad:maint Feb 23, 2021
@kyleam kyleam deleted the progress-bar-interference branch February 24, 2021 14:25
yarikoptic added a commit to yarikoptic/datalad that referenced this pull request Feb 25, 2021
…abel

Original odd experience was reported in
datalad#5421 (comment) and it was
due to interplay of two factors:

1 - Total is updated only after git-annex verifies the checksum.

Total progress bar is updated only upon the final "ok" from git-annex
that we got the file.  That also involves validation of the checksum by
git-annex. So, even though file is finished downloading, and its progress bar
disappeared, Total would not be updated until that file is "ok"ed by annex.
That takes time, so perceptually to the user  (me) it was odd that Total shows
only some little total although a good number of files already "downloaded"

Good news: since Joey implemented "on the fly" checksumming (at least for
built in "special remotes") -- with e.g. `8.20210223+git1-g33af7d193-1~ndall+1`
experience is much better since there is no delay from progressbar disappearing
and the final message arriving and thus Total gets updated.

2 - Default tqdm setting is not good for "rare" updates

See `smoothing` at https://github.com/tqdm/tqdm . Default `0.3` is a
compromise between providing user with smth between "momentary" and
"mean" speed.  When Total is updated only once a minute, that leads to
"current" speed being indeed quite low if most of the time there were no
progress at all ;)

Since "smoothing" is ATM quit specific to tqdm and we have quite a number of
subclasses from ProgressBarBase  I decided to just go for an ad-hoc solution
here and make any progress bar which has "total" in its label to use
smoothing=0, i.e. to report the mean.

Altogether with a new git-annex and this total smoothing of 0, Total looks
quite decent now ;)

Closes datalad#5434
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Absent filename in makes multiple progressbars to "compete" for the display
3 participants