New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use ctrl-c to trigger checkpoint&abort #4606
Comments
Is it really that important? The default is 1800s (30mins) IIRC, so in the medium case you'ld lose about 15mins, in the worst case almost 30mins. The default used to be less (5mins IIRC) in earlier borg/attic versions, but that had a big impact due to the overhead of a commit, so it was re-adjusted to default to 30mins to have less overhead / better overall speed. If you expect to interrupt often or to have a rather unstable repo connection, you can reduce the checkpoint interval to less than 30mins by using the commandline option. About the progress display:
|
I think most people would find it useful. I know I would have when I was debugging backup scripts sending data on a limited-speed upllink and using ^C frequently. It might be only 15 min of data on the average, but human psychology makes it seem a lot more, when you have an annoying suspicion that you just lost 29 minutes of backup effort. Not knowing makes it seem a lot worse. The real question perhaps is how much we are willing to give up to get checkpoint status. The current --progress display stays on one line, so it doesn't pollute the screen by filling it with lines of output. On the down side, we only get to see current information, and it's hard to tell how backup speed varies with time. If there were to be a more verbose display that actually filled the screen with lines of output, then including checkpoint status in that would be a good idea. The possibility of producing such additional verbose output (perhaps with a --verbose option) should be considered. Maybe for the long run. |
The other possibility would be a two-line display instead of one line. I truly do not know what it takes in coding effort to do this. Can one trivially generalize to n lines of status, or does increasing the lines beyond 1 take entirely different code? |
2 line display is not possible using the super-simplistic way of cursor control we use now (just using CR). |
Here are a couple of ideas. Have a certain signal, e.g., SIGQUIT, set a flag. Elsewhere, in a suitable loop, when that flag is seen, a checkpoint is done and then the program exits. Alternatively: Most of the time when Borg is manually interrupted, the user will resume eventually. So maybe instead of SIGQUIT we just use SIGINT. Then a single ^C should always cause a checkpoint-and-exit as above. A second ^C during the checkpoint can then tell Borg to abort the checkpoint and do an immediate exit. |
@fortran77 sounds neat, but guess this could produce quite complicated tracebacks and make analysis more complicated when Ctrl-C is hit in a error/hanging state (because trying to commit then would just error/hang again). |
Direct response is at the end below. My thought process comes first: My first inclination was to respond by recommending my first idea. I.e., SIGINT causes Borg to exit, while SIGQUIT asks for checkpoint-then-exit. This way nothing changes for the average user who hits ^C when trying to abort a hung backup. But then I thought: Why make the user have to think about whether or not a checkpoint should be done? We should just try to do the checkpoint on ^C. That way, if a checkpoint is at all possible, one will be done. The average user will notice if ^C isn't interrupting the backup, and will surely hit ^C a few more times We are all used to misbehaving processes not exiting, and it's normal for us to try again a few times. So my second idea seems the better one. If Borg is behaving properly but the network is hung, it will try a checkpoint and remain hung, and the user will hit ^C again and Borg will exit. If Borg is behaving properly but the backup is just going slowly, the user will hit ^C, a checkpoint will occur, and everything will be fine. If a message could be printed "Saving checkpoint ...", the user will know to wait a little for this to complete before hitting ^C again. Now to respond more directly to your comment: The first ^C sets a flag and disables the ^C signal handler. The flag is detected in a loop elsewhere, causing a checkpoint-then-exit. So the second ^C acts normally and interrupts whatever is going on. None of the code, except setting the flag, will occur inside a signal handler. So the only effect of the first ^C is to set a flag. All other code executes normally. If the code is misbehaving, it might not detect the flag at all. The second ^C will cause it to exit. If it does detect the flag and begins a commit, then the second ^C will abort the commit. So from the user's point of view, the second ^C acts just like the first ^C used to act in the past. |
OK, sounds good, guess we'll have to try that. |
Great! I am glad my somewhat long explanation made sense. I'll start learning Python, just in case. |
like: - try saving a checkpoint if borg create is ctrl-c-ed
@fortran77 have a look at #4635. |
Looks really good! Question: Does it do any harm to make the checkpoint logging ('... starting checkpoint creation...' , '... finished checkpoint creation') unconditional, so all checkpoints are logged, not just the ones caused by ^C? This would be useful information. Also, the average user might otherwise misinterpret the selective logging and think that no checkpoints are done unless ^C is hit. |
The reason why I added this output is to keep users away from hitting ctrl-c again (at least not for the well-working case), so they don't interrupt the checkpoint creation. Not sure if i want to output this for the normal checkpoints (on INFO level, it might disturb the file list output (and scroll away quickly), on DEBUG level one usually would not see it anyway). |
Makes sense, thanks. |
like: - try saving a checkpoint if borg create is ctrl-c-ed
Had the impression that it did not work correctly yet in all cases and added some improve state tracking now. @fortran77 can you test the code from the PR while doing some longer borg operations (create and also some others) and hitting ctrl-c at misc. times?
|
I'm a bit slow right now, but will check. |
@fortran77 did you check already? |
My apologies, I've been more-or-less away from the keyboard recently, sorry about that. Will see if I can do the checks in the next 2–3 days. |
first ctrl-c: checkpoint and abort, fixes #4606
like: - try saving a checkpoint if borg create is ctrl-c-ed
Have you checked borgbackup docs, FAQ, and open Github issues?
Yes
Is this a BUG / ISSUE report or a QUESTION?
Issue, really a feature request.
System information. For client/server mode post info for both machines.
Your borg version (borg -V).
1.1.9
Operating system (distribution) and version.
Ubuntu 16.04, Ubuntu 18.04, CentOS 7, OS X 10.13.6.
Hardware / network configuration, and filesystems used.
Various.
How much data is handled by borg?
Varies, kBytes to GBytes.
Full borg commandline that lead to the problem (leave away excludes and passwords)
Describe the problem you're observing.
Borg makes checkpoints at intervals. Suppose we want to interrupt a backup (maybe it's taking too long and we'll try again later). If waiting 5 minutes longer would allow a checkpoint to be created, it's advisable to wait. But Borg doesn't tell us when a checkpoint is coming or when one has been done. So we have no idea when is the right time to hit ^C.
A good solution would be for Borg to print when a checkpoint has been done if the --progress option has been given; and, if possible, also print how much longer to the next checkpoint.
In a more fanciful scenario, I can imagine a progress bar with marks showing where checkpoints will occur or have occurred. Like how some video services show you where the ads are. (May be hard to implement as Borg probably does not know how much data will be transmitted in the future.)
Currently, --progress causes a line of output to be shown at the bottom of the screen. The documentation doesn't mention that any part of this output tells us when a checkpoint occurs.
Admittedly, I have not downloaded and tested version 1.1.10 to see if it prints anything different.
A search for the word "checkpoint" in issues, docs, and faq didn't turn up anything relevant.
Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.
Not applicable.
Include any warning/errors/backtraces from the system logs
Not applicable.
The text was updated successfully, but these errors were encountered: