Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

progress bars: use github.com/vbauerster/mpb #581

Merged
merged 1 commit into from
Feb 15, 2019

Conversation

vrothberg
Copy link
Member

Use the github.com/vbauerster/mpb library for displaying the progress
bars. It is actively maintained, is designed for multi-bar usage and
just gets the job done.

This will fix issues with single bars being displayed multiple times.
Note that updating the bar's prefix does not work anymore. We're now
logging those entries, for instance, when we skip a blob as it's
already present.

The output now looks as follows (e.g., when pulling jboss/wildfly):

Getting image source signatures
Copying blob aeb7866da422 [======================================] 71.2 MiB / 71.2 MiB
Copying blob 157601a0b538 [======================================] 10.3 MiB / 10.3 MiB
Copying blob 642f4164f381 [======================================] 1.9 KiB / 1.9 KiB
Copying blob bda512e97517 [======================================] 74.6 MiB / 74.6 MiB
Copying blob 4cccaafdae21 [======================================] 170.9 MiB / 170.9 MiB
Copying config 2602b48525 [======================================] 6.5KiB / 6.5KiB
Writing manifest to image destination
Storing signatures

Signed-off-by: Valentin Rothberg rothberg@redhat.com

@vrothberg
Copy link
Member Author

@rhatdan @mtrmac @baude @mheon This will fix the messed up progress bars in Podman master. The new library makes a very good impression on me and most importantly, it works :^)

@vrothberg
Copy link
Member Author

I'll spin up a Skopeo CI PR, where you can also test.

@rhatdan
Copy link
Member

rhatdan commented Feb 9, 2019

Works for me, but need the skopeo change.

@vrothberg vrothberg force-pushed the mpb-progress-bars branch 2 times, most recently from b7e4147 to 67f0a25 Compare February 11, 2019 13:08
@vrothberg
Copy link
Member Author

Works for me, but need the skopeo change.

Had two deadlocks in the code which are now fixed and Skopeo is green (see containers/skopeo#602).
@rhatdan @mtrmac @runcom PTAL

@rhatdan
Copy link
Member

rhatdan commented Feb 11, 2019

LGTM

@vrothberg
Copy link
Member Author

(updated to latest master)

copy/copy.go Outdated Show resolved Hide resolved
@giuseppe
Copy link
Member

I've left a comment in the Skopeo's PR, I am seeing a problem when I try to pull jboss/wildfly

@vrothberg vrothberg force-pushed the mpb-progress-bars branch 2 times, most recently from 5b9542e to e8eedb8 Compare February 14, 2019 16:33
Use the github.com/vbauerster/mpb library for displaying the progress
bars.  It is actively maintained, is designed for multi-bar usage and
just gets the job done.

This will fix issues with single bars being displayed multiple times.
Note that updating the bar's prefix does not work anymore.  We're now
logging those entries, for instance, when we skip a blob as it's
already present.

The output now looks as follows (e.g., when pulling jboss/wildfly):

```
Getting image source signatures
Copying blob aeb7866da422 [======================================] 71.2 MiB / 71.2 MiB
Copying blob 157601a0b538 [======================================] 10.3 MiB / 10.3 MiB
Copying blob 642f4164f381 [======================================] 1.9 KiB / 1.9 KiB
Copying blob bda512e97517 [======================================] 74.6 MiB / 74.6 MiB
Copying blob 4cccaafdae21 [======================================] 170.9 MiB / 170.9 MiB
Copying config 2602b48525 [======================================] 6.5KiB / 6.5KiB
Writing manifest to image destination
Storing signatures
```

Note that this change requires the following dependencies:
 * github.com/vbauerster/mpb v3.3.4
 * github.com/mattn/go-isatty v0.0.4
 * github.com/VividCortex/ewma v1.1.1

Once we migrate to using Go modules, we can start using mpb v4.x.

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
@giuseppe
Copy link
Member

LGTM

@rhatdan rhatdan merged commit 22beff9 into containers:master Feb 15, 2019
Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m sorry about getting to this so late. I can do the smaller cleanups myself.

@vrothberg PTAL at least at the Progress.Wait comment — am I missing anything?

destInfo, err := c.copyBlobFromStream(ctx, bytes.NewReader(configBlob), srcInfo, nil, false, true, bar)
if err != nil {
return err
}
if bar != nil {
bar.SetTotal(0, true)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be 0 or len(configBlob)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 is intended. It has no impact on the displayed total size but it will trigger the bar to be completed.

canModifyBlob bool, isConfig bool, bar *pb.ProgressBar) (types.BlobInfo, error) {
canModifyBlob bool, isConfig bool, bar *mpb.Bar) (types.BlobInfo, error) {
c.progressWG.Add(1)
defer c.progressWG.Done()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does c.progressWG do anything we need? The synchronization of parallel layer copy AFAICS happens independently via copyGroup, and the only effect of mpb.WithWaitGroups seems to be that ic.c.progressPool.Wait() blocks on c.progressWG in addition to doing everything else it does.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's intended as well. While testing this PR, we've experienced situations where the progress was still rendering although the copy operations have completed (including the one of the config). It's a background process with a specific refresh rate and the copier's WaitGroup makes sure that we progress only when it's done.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the copier's WaitGroup makes sure that we progress only when it's done.

I can’t see how that works; the only use of progressWG inside the code is p.uwg.Wait() immediately at the start of Progress.Wait, which AFAICS is immediately satisfied because copyGroup.Wait ensured that all layer copies are done, and copyConfig has returned.

Progress.Wait does wait for the separate render process to finish, via pState.zeroWait (“please exit”) and Progress.done (“exited”); I can’t see that this depends on progressWG in any way.

@@ -600,13 +580,31 @@ func (ic *imageCopier) copyUpdatedConfigAndManifest(ctx context.Context) ([]byte
return nil, err
}

ic.c.progressPool.Wait()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation of Progress.Wait says

After this method has been called, there is no way to reuse *Progress instance.

The middle of copyUpdatedConfigAndManifest is not a good place for such a call, because copyUpdatedConfigAndManifest can be called multiple times per image (as well, as, strictly speaking, zero times if an error happens before). Either the pool should live across all of those calls (and in that case the “Writing manifest to image destination” message just below needs to happen in some way that does not interfere with the progressPool), or it should be closed before even starting to deal with configs/manifests (using just a message, not progress bar, for the config, or maybe creating an one-use single-bar progress pool just for each attempt to copy a manifest).

(Similar in nature is a concern about attaching progressPool to copier vs. imageCopier; right now that makes ~zero difference; longer-term after #400 is resurrected, copier might live across copies of multiple single-arch images, whereas imageCopier would carry state related to one of them.)

(Purely aesthethically it would be nice for the “closing” of the progress pool to happen (or, to be guaranteed to happen as a last-resort backstop) at the same level of the call stack that creates the progress pool.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, that's intended. We need to make sure the progress bar has finished rendering before printing, see the next line ic.c.Printf("Writing manifest to image destination\n"). I understand it's not ideal when looking at the assumptions on the code, but I could not find any other way and am very hesitant to rewrite copy.go.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Similar in nature is a concern about attaching progressPool to copier vs. imageCopier; right now that makes ~zero difference; longer-term after #400 is resurrected, copier might live across copies of multiple single-arch images, whereas imageCopier would carry state related to one of them.)

It's possible to add new bars dynamically and it will render correctly as long as nothing prints in between.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Purely aesthethically it would be nice for the “closing” of the progress pool to happen (or, to be guaranteed to happen as a last-resort backstop) at the same level of the call stack that creates the progress pool.)

I share that desire but mpb doesn't expose a Nuke() API. It is possible though to nuke each bar indivually which would allow to work around that.

}
} else {
cld.destInfo, cld.diffID, cld.err = ic.copyLayer(ctx, srcLayer, bar)
}
data[index] = cld
if bar != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bar can never be nil AFAICS, so drop this check. (The code would crash in bar.ProxyReader anyway.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was one test where exactly this case happened while copying a config. It's really just a defense for corner cases.

destInfo, err := c.copyBlobFromStream(ctx, bytes.NewReader(configBlob), srcInfo, nil, false, true, bar)
if err != nil {
return err
}
if bar != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bar can never be nil AFAICS, so drop this check. (The code would crash in bar.ProxyReader anyway.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was one test case where this happened. I could not figure out why and also had a look at the mpb code but figured to just add the check to defend from the corner case. I think it's related to non-TTY as the test case (i.e., TestCopyOCIRoundTrip) worked when doing it manually.

Copy link
Collaborator

@mtrmac mtrmac Feb 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second, working around crashes without understanding how they happen is not great for reliability. AFAICS the only way bar can be nil is if AddBar returns nil, which in turn can happen only if Progress.Wait has been called.

… which probably points to #581 (comment) , calling Wait and looping to copyUpdatedConfigAndManifest as the underlying cause: the code tried schema2, failed, tried schema1, and copyConfig did not use the bar (and did not trigger the bar.ProgressReader crash) because schema1 does not have a separate config object.

I’ll take a stab at fixing this; if you can remember what the failing situation was (or is it just straightforward make test-skopeo failing?), that would be useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right, and I was/am not happy with it either. The failure is straightforward to reproduce with integration tests (i.e., TestCopyOCIRoundTrip).

I think a more or less easy solution is to break out the parts printing (including related code) and move the Wait() call to the end of copyOneImage.

Copy link
Collaborator

@mtrmac mtrmac Feb 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, reproduced. I was not quite right about this happening with schema1 fallback: it is copying an OCI image and falling back to schema2. For the record, the reason it does not crash in bar.ProgressReader is that creating the ProgressReader does not access bar, only non-empty reads do, and PutBlob detects that the config is already present at the destination, so no reads happen.

I think I know how to deal with this…

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I know how to deal with this…

Thank you so much for looking into this 🙏

@vrothberg vrothberg deleted the mpb-progress-bars branch February 17, 2019 09:38
@vrothberg
Copy link
Member Author

I’m sorry about getting to this so late. I can do the smaller cleanups myself.

All your comments are valid concerns, but I couldn't find any other way without refactoring parts of copy.go.

@vrothberg PTAL at least at the Progress.Wait comment — am I missing anything?

You missed a lot of pain :^) The progress bars are always "rendered" in the background with a specific refresh rate, which makes them subject to race conditions with the code. This is why the "old" library had issues and mpb avoids such situations by integration a sync.WaitGroup. The place where we do this is constrained by the calls where we're printing to the reportWriter. Any print will cause issues for the progress bars.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants