Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use ctrl-c to trigger checkpoint&abort #4606

Closed
fortran77 opened this issue Jun 5, 2019 · 17 comments
Closed

use ctrl-c to trigger checkpoint&abort #4606

fortran77 opened this issue Jun 5, 2019 · 17 comments
Assignees

Comments

@fortran77
Copy link

Have you checked borgbackup docs, FAQ, and open Github issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

Issue, really a feature request.

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

1.1.9

Operating system (distribution) and version.

Ubuntu 16.04, Ubuntu 18.04, CentOS 7, OS X 10.13.6.

Hardware / network configuration, and filesystems used.

Various.

How much data is handled by borg?

Varies, kBytes to GBytes.

Full borg commandline that lead to the problem (leave away excludes and passwords)

borg create --progress ...

Describe the problem you're observing.

Borg makes checkpoints at intervals. Suppose we want to interrupt a backup (maybe it's taking too long and we'll try again later). If waiting 5 minutes longer would allow a checkpoint to be created, it's advisable to wait. But Borg doesn't tell us when a checkpoint is coming or when one has been done. So we have no idea when is the right time to hit ^C.

A good solution would be for Borg to print when a checkpoint has been done if the --progress option has been given; and, if possible, also print how much longer to the next checkpoint.

In a more fanciful scenario, I can imagine a progress bar with marks showing where checkpoints will occur or have occurred. Like how some video services show you where the ads are. (May be hard to implement as Borg probably does not know how much data will be transmitted in the future.)

Currently, --progress causes a line of output to be shown at the bottom of the screen. The documentation doesn't mention that any part of this output tells us when a checkpoint occurs.

Admittedly, I have not downloaded and tested version 1.1.10 to see if it prints anything different.

A search for the word "checkpoint" in issues, docs, and faq didn't turn up anything relevant.

Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

Not applicable.

Include any warning/errors/backtraces from the system logs

Not applicable.

@ThomasWaldmann
Copy link
Member

Is it really that important?

The default is 1800s (30mins) IIRC, so in the medium case you'ld lose about 15mins, in the worst case almost 30mins.

The default used to be less (5mins IIRC) in earlier borg/attic versions, but that had a big impact due to the overhead of a commit, so it was re-adjusted to default to 30mins to have less overhead / better overall speed.

If you expect to interrupt often or to have a rather unstable repo connection, you can reduce the checkpoint interval to less than 30mins by using the commandline option.

About the progress display:

  • No, it doesn't show when a checkpoint happened / will happen.
  • I'm a bit reluctant about adding / changing the progress display. it is rather crowded already due to amount of information it shows. it needs to fit into 80 char width and adding anything reduces the space available for the path/filename.
  • Users often wonder what the info in the progress display means and adding even more infos wouldn't make it easier.

@fortran77
Copy link
Author

I think most people would find it useful. I know I would have when I was debugging backup scripts sending data on a limited-speed upllink and using ^C frequently.

It might be only 15 min of data on the average, but human psychology makes it seem a lot more, when you have an annoying suspicion that you just lost 29 minutes of backup effort. Not knowing makes it seem a lot worse.

The real question perhaps is how much we are willing to give up to get checkpoint status. The current --progress display stays on one line, so it doesn't pollute the screen by filling it with lines of output. On the down side, we only get to see current information, and it's hard to tell how backup speed varies with time.

If there were to be a more verbose display that actually filled the screen with lines of output, then including checkpoint status in that would be a good idea. The possibility of producing such additional verbose output (perhaps with a --verbose option) should be considered. Maybe for the long run.

@fortran77
Copy link
Author

The other possibility would be a two-line display instead of one line. I truly do not know what it takes in coding effort to do this. Can one trivially generalize to n lines of status, or does increasing the lines beyond 1 take entirely different code?

@ThomasWaldmann
Copy link
Member

2 line display is not possible using the super-simplistic way of cursor control we use now (just using CR).

@fortran77
Copy link
Author

Here are a couple of ideas.

Have a certain signal, e.g., SIGQUIT, set a flag.

Elsewhere, in a suitable loop, when that flag is seen, a checkpoint is done and then the program exits.

Alternatively:

Most of the time when Borg is manually interrupted, the user will resume eventually. So maybe instead of SIGQUIT we just use SIGINT. Then a single ^C should always cause a checkpoint-and-exit as above. A second ^C during the checkpoint can then tell Borg to abort the checkpoint and do an immediate exit.

@ThomasWaldmann
Copy link
Member

@fortran77 sounds neat, but guess this could produce quite complicated tracebacks and make analysis more complicated when Ctrl-C is hit in a error/hanging state (because trying to commit then would just error/hang again).

@fortran77
Copy link
Author

fortran77 commented Jun 7, 2019

Direct response is at the end below. My thought process comes first:

My first inclination was to respond by recommending my first idea. I.e., SIGINT causes Borg to exit, while SIGQUIT asks for checkpoint-then-exit. This way nothing changes for the average user who hits ^C when trying to abort a hung backup.

But then I thought: Why make the user have to think about whether or not a checkpoint should be done? We should just try to do the checkpoint on ^C. That way, if a checkpoint is at all possible, one will be done.

The average user will notice if ^C isn't interrupting the backup, and will surely hit ^C a few more times We are all used to misbehaving processes not exiting, and it's normal for us to try again a few times.

So my second idea seems the better one. If Borg is behaving properly but the network is hung, it will try a checkpoint and remain hung, and the user will hit ^C again and Borg will exit.

If Borg is behaving properly but the backup is just going slowly, the user will hit ^C, a checkpoint will occur, and everything will be fine. If a message could be printed "Saving checkpoint ...", the user will know to wait a little for this to complete before hitting ^C again.

Now to respond more directly to your comment: The first ^C sets a flag and disables the ^C signal handler. The flag is detected in a loop elsewhere, causing a checkpoint-then-exit.

So the second ^C acts normally and interrupts whatever is going on. None of the code, except setting the flag, will occur inside a signal handler. So the only effect of the first ^C is to set a flag. All other code executes normally.

If the code is misbehaving, it might not detect the flag at all. The second ^C will cause it to exit. If it does detect the flag and begins a commit, then the second ^C will abort the commit.

So from the user's point of view, the second ^C acts just like the first ^C used to act in the past.

@ThomasWaldmann
Copy link
Member

OK, sounds good, guess we'll have to try that.

@ThomasWaldmann ThomasWaldmann added this to the hydrogen-alpha7 milestone Jun 8, 2019
@fortran77
Copy link
Author

Great! I am glad my somewhat long explanation made sense.

I'll start learning Python, just in case.

ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Jun 22, 2019
like:
 - try saving a checkpoint if borg create is ctrl-c-ed
@ThomasWaldmann ThomasWaldmann self-assigned this Jun 22, 2019
@ThomasWaldmann ThomasWaldmann changed the title Feature request: show when checkpoint is done or better still when it is about to be done use ctrl-c to trigger checkpoint&abort Jun 22, 2019
@ThomasWaldmann
Copy link
Member

@fortran77 have a look at #4635.

@fortran77
Copy link
Author

Looks really good!

Question: Does it do any harm to make the checkpoint logging ('... starting checkpoint creation...' , '... finished checkpoint creation') unconditional, so all checkpoints are logged, not just the ones caused by ^C? This would be useful information.

Also, the average user might otherwise misinterpret the selective logging and think that no checkpoints are done unless ^C is hit.

@ThomasWaldmann
Copy link
Member

The reason why I added this output is to keep users away from hitting ctrl-c again (at least not for the well-working case), so they don't interrupt the checkpoint creation.

Not sure if i want to output this for the normal checkpoints (on INFO level, it might disturb the file list output (and scroll away quickly), on DEBUG level one usually would not see it anyway).

@fortran77
Copy link
Author

Makes sense, thanks.

ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Aug 2, 2019
like:
 - try saving a checkpoint if borg create is ctrl-c-ed
@ThomasWaldmann
Copy link
Member

Had the impression that it did not work correctly yet in all cases and added some improve state tracking now.

@fortran77 can you test the code from the PR while doing some longer borg operations (create and also some others) and hitting ctrl-c at misc. times?

create has support for "first ctrl-c checkpoint", other operations shouldn't behave different than before this PR (except that they need 2 ctrl-c to abort now).

@fortran77
Copy link
Author

I'm a bit slow right now, but will check.

@ThomasWaldmann
Copy link
Member

@fortran77 did you check already?

@fortran77
Copy link
Author

My apologies, I've been more-or-less away from the keyboard recently, sorry about that. Will see if I can do the checks in the next 2–3 days.

ThomasWaldmann added a commit that referenced this issue Sep 6, 2019
first ctrl-c: checkpoint and abort, fixes #4606
elho pushed a commit to elho/borg that referenced this issue Mar 17, 2020
like:
 - try saving a checkpoint if borg create is ctrl-c-ed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants