use ctrl-c to trigger checkpoint&abort #4606

fortran77 · 2019-06-05T21:42:18Z

Have you checked borgbackup docs, FAQ, and open Github issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

Issue, really a feature request.

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

1.1.9

Operating system (distribution) and version.

Ubuntu 16.04, Ubuntu 18.04, CentOS 7, OS X 10.13.6.

Hardware / network configuration, and filesystems used.

Various.

How much data is handled by borg?

Varies, kBytes to GBytes.

Full borg commandline that lead to the problem (leave away excludes and passwords)

borg create --progress ...

Describe the problem you're observing.

Borg makes checkpoints at intervals. Suppose we want to interrupt a backup (maybe it's taking too long and we'll try again later). If waiting 5 minutes longer would allow a checkpoint to be created, it's advisable to wait. But Borg doesn't tell us when a checkpoint is coming or when one has been done. So we have no idea when is the right time to hit ^C.

A good solution would be for Borg to print when a checkpoint has been done if the --progress option has been given; and, if possible, also print how much longer to the next checkpoint.

In a more fanciful scenario, I can imagine a progress bar with marks showing where checkpoints will occur or have occurred. Like how some video services show you where the ads are. (May be hard to implement as Borg probably does not know how much data will be transmitted in the future.)

Currently, --progress causes a line of output to be shown at the bottom of the screen. The documentation doesn't mention that any part of this output tells us when a checkpoint occurs.

Admittedly, I have not downloaded and tested version 1.1.10 to see if it prints anything different.

A search for the word "checkpoint" in issues, docs, and faq didn't turn up anything relevant.

Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

Not applicable.

Include any warning/errors/backtraces from the system logs

Not applicable.

The text was updated successfully, but these errors were encountered:

ThomasWaldmann · 2019-06-06T14:02:00Z

Is it really that important?

The default is 1800s (30mins) IIRC, so in the medium case you'ld lose about 15mins, in the worst case almost 30mins.

The default used to be less (5mins IIRC) in earlier borg/attic versions, but that had a big impact due to the overhead of a commit, so it was re-adjusted to default to 30mins to have less overhead / better overall speed.

If you expect to interrupt often or to have a rather unstable repo connection, you can reduce the checkpoint interval to less than 30mins by using the commandline option.

About the progress display:

No, it doesn't show when a checkpoint happened / will happen.
I'm a bit reluctant about adding / changing the progress display. it is rather crowded already due to amount of information it shows. it needs to fit into 80 char width and adding anything reduces the space available for the path/filename.
Users often wonder what the info in the progress display means and adding even more infos wouldn't make it easier.

fortran77 · 2019-06-06T20:23:40Z

I think most people would find it useful. I know I would have when I was debugging backup scripts sending data on a limited-speed upllink and using ^C frequently.

It might be only 15 min of data on the average, but human psychology makes it seem a lot more, when you have an annoying suspicion that you just lost 29 minutes of backup effort. Not knowing makes it seem a lot worse.

The real question perhaps is how much we are willing to give up to get checkpoint status. The current --progress display stays on one line, so it doesn't pollute the screen by filling it with lines of output. On the down side, we only get to see current information, and it's hard to tell how backup speed varies with time.

If there were to be a more verbose display that actually filled the screen with lines of output, then including checkpoint status in that would be a good idea. The possibility of producing such additional verbose output (perhaps with a --verbose option) should be considered. Maybe for the long run.

fortran77 · 2019-06-06T20:33:21Z

The other possibility would be a two-line display instead of one line. I truly do not know what it takes in coding effort to do this. Can one trivially generalize to n lines of status, or does increasing the lines beyond 1 take entirely different code?

ThomasWaldmann · 2019-06-06T20:36:33Z

2 line display is not possible using the super-simplistic way of cursor control we use now (just using CR).

fortran77 · 2019-06-06T23:32:56Z

Here are a couple of ideas.

Have a certain signal, e.g., SIGQUIT, set a flag.

Elsewhere, in a suitable loop, when that flag is seen, a checkpoint is done and then the program exits.

Alternatively:

Most of the time when Borg is manually interrupted, the user will resume eventually. So maybe instead of SIGQUIT we just use SIGINT. Then a single ^C should always cause a checkpoint-and-exit as above. A second ^C during the checkpoint can then tell Borg to abort the checkpoint and do an immediate exit.

ThomasWaldmann · 2019-06-07T09:32:17Z

@fortran77 sounds neat, but guess this could produce quite complicated tracebacks and make analysis more complicated when Ctrl-C is hit in a error/hanging state (because trying to commit then would just error/hang again).

fortran77 · 2019-06-07T19:23:28Z

Direct response is at the end below. My thought process comes first:

My first inclination was to respond by recommending my first idea. I.e., SIGINT causes Borg to exit, while SIGQUIT asks for checkpoint-then-exit. This way nothing changes for the average user who hits ^C when trying to abort a hung backup.

But then I thought: Why make the user have to think about whether or not a checkpoint should be done? We should just try to do the checkpoint on ^C. That way, if a checkpoint is at all possible, one will be done.

The average user will notice if ^C isn't interrupting the backup, and will surely hit ^C a few more times We are all used to misbehaving processes not exiting, and it's normal for us to try again a few times.

So my second idea seems the better one. If Borg is behaving properly but the network is hung, it will try a checkpoint and remain hung, and the user will hit ^C again and Borg will exit.

If Borg is behaving properly but the backup is just going slowly, the user will hit ^C, a checkpoint will occur, and everything will be fine. If a message could be printed "Saving checkpoint ...", the user will know to wait a little for this to complete before hitting ^C again.

Now to respond more directly to your comment: The first ^C sets a flag and disables the ^C signal handler. The flag is detected in a loop elsewhere, causing a checkpoint-then-exit.

So the second ^C acts normally and interrupts whatever is going on. None of the code, except setting the flag, will occur inside a signal handler. So the only effect of the first ^C is to set a flag. All other code executes normally.

If the code is misbehaving, it might not detect the flag at all. The second ^C will cause it to exit. If it does detect the flag and begins a commit, then the second ^C will abort the commit.

So from the user's point of view, the second ^C acts just like the first ^C used to act in the past.

ThomasWaldmann · 2019-06-08T12:49:00Z

OK, sounds good, guess we'll have to try that.

fortran77 · 2019-06-08T16:13:47Z

Great! I am glad my somewhat long explanation made sense.

I'll start learning Python, just in case.

like: - try saving a checkpoint if borg create is ctrl-c-ed

ThomasWaldmann · 2019-06-22T21:26:22Z

@fortran77 have a look at #4635.

fortran77 · 2019-06-22T21:59:29Z

Looks really good!

Question: Does it do any harm to make the checkpoint logging ('... starting checkpoint creation...' , '... finished checkpoint creation') unconditional, so all checkpoints are logged, not just the ones caused by ^C? This would be useful information.

Also, the average user might otherwise misinterpret the selective logging and think that no checkpoints are done unless ^C is hit.

ThomasWaldmann · 2019-06-22T22:09:44Z

The reason why I added this output is to keep users away from hitting ctrl-c again (at least not for the well-working case), so they don't interrupt the checkpoint creation.

Not sure if i want to output this for the normal checkpoints (on INFO level, it might disturb the file list output (and scroll away quickly), on DEBUG level one usually would not see it anyway).

fortran77 · 2019-06-22T22:11:32Z

Makes sense, thanks.

like: - try saving a checkpoint if borg create is ctrl-c-ed

ThomasWaldmann · 2019-08-02T18:21:53Z

Had the impression that it did not work correctly yet in all cases and added some improve state tracking now.

@fortran77 can you test the code from the PR while doing some longer borg operations (create and also some others) and hitting ctrl-c at misc. times?

create has support for "first ctrl-c checkpoint", other operations shouldn't behave different than before this PR (except that they need 2 ctrl-c to abort now).

fortran77 · 2019-08-08T08:00:47Z

I'm a bit slow right now, but will check.

ThomasWaldmann · 2019-08-30T19:12:01Z

@fortran77 did you check already?

fortran77 · 2019-08-31T01:28:41Z

My apologies, I've been more-or-less away from the keyboard recently, sorry about that. Will see if I can do the checks in the next 2–3 days.

first ctrl-c: checkpoint and abort, fixes #4606

like: - try saving a checkpoint if borg create is ctrl-c-ed

ThomasWaldmann added this to the hydrogen-alpha7 milestone Jun 8, 2019

ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Jun 22, 2019

special behaviour on first ctrl-c, fixes borgbackup#4606

38b9697

like: - try saving a checkpoint if borg create is ctrl-c-ed

ThomasWaldmann self-assigned this Jun 22, 2019

ThomasWaldmann changed the title ~~Feature request: show when checkpoint is done or better still when it is about to be done~~ use ctrl-c to trigger checkpoint&abort Jun 22, 2019

ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Aug 2, 2019

special behaviour on first ctrl-c, fixes borgbackup#4606

8d5661b

like: - try saving a checkpoint if borg create is ctrl-c-ed

ThomasWaldmann closed this as completed in 9732fe4 Sep 6, 2019

ThomasWaldmann added a commit that referenced this issue Sep 6, 2019

Merge pull request #4635 from ThomasWaldmann/ctrlc-checkpoint

aa7df50

first ctrl-c: checkpoint and abort, fixes #4606

elho pushed a commit to elho/borg that referenced this issue Mar 17, 2020

special behaviour on first ctrl-c, fixes borgbackup#4606

481bd0b

like: - try saving a checkpoint if borg create is ctrl-c-ed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use ctrl-c to trigger checkpoint&abort #4606

use ctrl-c to trigger checkpoint&abort #4606

fortran77 commented Jun 5, 2019

ThomasWaldmann commented Jun 6, 2019

fortran77 commented Jun 6, 2019

fortran77 commented Jun 6, 2019

ThomasWaldmann commented Jun 6, 2019

fortran77 commented Jun 6, 2019

ThomasWaldmann commented Jun 7, 2019

fortran77 commented Jun 7, 2019 •

edited

ThomasWaldmann commented Jun 8, 2019

fortran77 commented Jun 8, 2019

ThomasWaldmann commented Jun 22, 2019

fortran77 commented Jun 22, 2019

ThomasWaldmann commented Jun 22, 2019

fortran77 commented Jun 22, 2019

ThomasWaldmann commented Aug 2, 2019

fortran77 commented Aug 8, 2019

ThomasWaldmann commented Aug 30, 2019

fortran77 commented Aug 31, 2019

use ctrl-c to trigger checkpoint&abort #4606

use ctrl-c to trigger checkpoint&abort #4606

Comments

fortran77 commented Jun 5, 2019

Have you checked borgbackup docs, FAQ, and open Github issues?

Is this a BUG / ISSUE report or a QUESTION?

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

Operating system (distribution) and version.

Hardware / network configuration, and filesystems used.

How much data is handled by borg?

Full borg commandline that lead to the problem (leave away excludes and passwords)

Describe the problem you're observing.

Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

Include any warning/errors/backtraces from the system logs

ThomasWaldmann commented Jun 6, 2019

fortran77 commented Jun 6, 2019

fortran77 commented Jun 6, 2019

ThomasWaldmann commented Jun 6, 2019

fortran77 commented Jun 6, 2019

ThomasWaldmann commented Jun 7, 2019

fortran77 commented Jun 7, 2019 • edited

ThomasWaldmann commented Jun 8, 2019

fortran77 commented Jun 8, 2019

ThomasWaldmann commented Jun 22, 2019

fortran77 commented Jun 22, 2019

ThomasWaldmann commented Jun 22, 2019

fortran77 commented Jun 22, 2019

ThomasWaldmann commented Aug 2, 2019

fortran77 commented Aug 8, 2019

ThomasWaldmann commented Aug 30, 2019

fortran77 commented Aug 31, 2019

fortran77 commented Jun 7, 2019 •

edited