Skip to content
This repository has been archived by the owner on Aug 16, 2023. It is now read-only.

Calculate ETA based on current average rate instead of global average #28

Closed
a-j-wood opened this issue May 29, 2022 · 3 comments
Closed
Labels
enhancement New feature or request

Comments

@a-j-wood
Copy link
Owner

lemonsqueeze-average-rate-interval-updated-v2.patch.txt

Date: Sat, 26 May 2018 08:30:06 +0200
From: lemonsqueeze
Subject: ETA based on current average rate in pv ?

Hi Andrew,

Thanks for pv, i love this little tool. Great way to keep track of
progress for anything file related.

Yesterday i ran into this scenario:
I was running a really long task that generates a 800Mb file at about
50k/s. Something like:

while true ; do ls -l /* ; sleep 1 ; done > /tmp/file

Sometime after it started i thought of using pv to keep track of
progress with:

tail -n +1 -f /tmp/file | pv -s 800m >/dev/null

I was interested in ETA mostly, unfortunately if you do this after a
few mb have been generated the ETA is wildly wrong for a very long
time: transfer rate is really fast initially and fools pv into
thinking we're going to get something like that over time, and it
takes a really long time for the average rate to drop back to 50k/s
which is all we're getting actually.

It'd be nice to have an ETA based on current rate average here, say
over last 30s, instead of global rate average.

I'm experimenting with this patch on top of pv 1.6.6:

https://github.com/lemonsqueeze/pv

keeps track of history periodically over a given time window and uses
that to compute current average rate for ETA. Looks like it works
pretty well. Also added a --average-rate-interval option to change
time window (name isn't great, looking for a better one)

Let me know what you think. If there's a git repository somewhere i
could create a pull request.

Cheers,

Matt


Date: Wed, 8 Aug 2018 10:48:09 +0100
From: Andrew Wood
To: lemonsqueeze
Subject: Re: ETA based on current average rate in pv ?

Thanks for your email, and the for the patch. Sorry for the delayed
response, I am quite behind with my mailbox.

I'll have a good look at the code changes in the near future - they look
fine at first glance but I'll need to sit down and get my head around it,
and maybe add a few more comments for my own benefit.

There have been a couple of other suggestions and patches regarding the
rate display and ETA calculation, so what I'll probably try to do is bring
them all in such that the user has a choice of what algorithm to use.

I am hoping to have a new release ready in the next few months.

Do you have a web site I can mention in the contributors section of the
documentation?

Thanks again.


Date: Thu, 6 Sep 2018 13:50:15 +0200
From: lemonsqueeze
To: Andrew Wood
Subject: Re: ETA based on current average rate in pv ?

Hi Andrew,

Sorry it's my turn to answer lately.
I revisited my patch recently, some fixes/cleanup but nothing drastic:

  • fixed --average-rate so it displays current average rate instead of
    global average rate (to be consistent with ETA)
  • default time window for average rate is 10s now, i think most people
    would be confortable with that.
  • better name for --avg-rate-interval: how about --rate-window ?

the changes are in my git repo (https://github.com/lemonsqueeze/pv)
otherwise full patch from 1.6.6 is attached.

I don't have a dedicated website atm, but you could use my github page
for the website.

Cheers,
Matt

@a-j-wood a-j-wood added the enhancement New feature or request label May 29, 2022
@a-j-wood
Copy link
Owner Author

Correspondence about a similar request:

Christoph-Biedl-exponential-smoothing-eta.patch.txt

Date: Sat, 1 Jul 2017 14:50:47 +0200
From: Christoph Biedl
To: Debian Bug Tracking System
Subject: Bug#866747: pv: ETA prediction suffers badly from bandwidth bursts, use exponential smoothing

Package: pv
Version: 1.6.0-1+b1
Severity: wishlist
Tags: upstream patch

Hello,

if the bandwidth of the data pv processes contains bursts - not just
small fluctuations - the ETA (actually: time left) prediction suffers
pretty badly since AFAICT the simple but fragile linear computation is
used, based on total data transferred so far, and total time spent for
this.

This hits me in two scenarios:

  • A tar pipe "tar -cf - . | pv --size=... | ..." where tar reads a
    few huge files at raw speed (say: 100 Mbyte/sec) while many spread
    small files slow down the process significantly (say: to 100 kbyte/sec).

  • The receiving side right of pv does buffering, so the first bunch of
    data is processed pretty fast, followed by the much slower rate for
    sustained operation.

In both cases, pv's ETA prediction will happily show a short time
that's left, but then start counting upwards, reach a peak, and
eventually count downwards as expected, although not quite at one
second per second. As a result, the predicted total time is way
below the time this will actually take.

This is a generic problem with predicting the future, however the
linear approach is the problematic part here because burst have a huge
impact on the computation.

How to repeat:

(
dd if=/dev/zero bs=1M count=1 status=none | pv --quiet --rate-limit=1M
dd if=/dev/zero bs=1M count=9 status=none | pv --quiet --rate-limit=10k
) | pv --size=10M >/dev/null

The commands in the subshell produce 10 Mibyte data, a burst for the
first one, then pretty slow for the rest.

Expected:

After jumping to the "10%" instantly and declaring a remaining time of
somewhat nine seconds, pv should pretty soon (within say 30 seconds)
adopt the prediction to the actual input rate. As a result, the time
spent (second column) plus the time left (last column) should soon sum
up to 922 seconds, or "15:22".

Actually seen:

As described above. A few examples, progress bar stripped, note the
total time estimation keeps growing until the end:

| 1,09MiB 0:00:10 [10,1KiB/s] 10% ETA 0:01:21
(total time estimation: 91sec = 01:31)

| 1,28MiB 0:00:30 [10,1KiB/s] 12% ETA 0:03:23
(total time estimation: 233sec = 03:53)

| 1,58MiB 0:01:00 [10,1KiB/s] 15% ETA 0:05:20
(total time estimation: 380sec = 06:20)

| 2,16MiB 0:02:00 [10,1KiB/s] 21% ETA 0:07:14
(total time estimation: 554sec = 09:14)

| 2,75MiB 0:03:00 [10,1KiB/s] 27% ETA 0:07:54
(total time estimation: 654sec = 10:54)

| 3,33MiB 0:04:00 [10,1KiB/s] 33% ETA 0:07:59
(total time estimation: 719sec = 11:59)

| 3,92MiB 0:05:00 [10,1KiB/s] 39% ETA 0:07:45
(total time estimation: 765sec = 12:45)

| 6,85MiB 0:10:00 [10,1KiB/s] 68% ETA 0:04:35
(total time estimation: 875sec = 14:35)

| 8,02MiB 0:12:00 [10,1KiB/s] 80% ETA 0:02:57
(total time estimation: 897sec = 14:57)

| 8,61MiB 0:13:00 [10,1KiB/s] 86% ETA 0:02:06
(total time estimation: 906sec = 15:06)

| 9,19MiB 0:14:00 [10,1KiB/s] 91% ETA 0:01:13
(total time estimation: 913sec = 15:13)

| 9,78MiB 0:15:00 [10,1KiB/s] 97% ETA 0:00:20
(total time estimation: 920sec = 15:20)

Suggestion:

Use exponential smoothing, either as an option (preferably the
default), or even as a replacement. Some tests here suggest a
smoothing factor of 0.5 would serve the purpose quite well. Making
this another option was nice to have, though.

( some time passes )

Okay, I was in the right mood to demonstrate this. First was a proof of
concept in Perl, followed by a quick hack for pv. Ended up with a patch
that includes parameter parsing and documentation. Attached.

Cheers,

Christoph

@a-j-wood
Copy link
Owner Author

Closing as per pull request #65 - the new "--eta-window" option will be in the next release after 1.6.20.

@a-j-wood
Copy link
Owner Author

Correction, it's "--average-rate-window".

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant