Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added resticprofile flags --no-lock & --lock-wait #33

Merged
merged 9 commits into from
Apr 23, 2021

Conversation

jkellerer
Copy link
Collaborator

@jkellerer jkellerer commented Feb 14, 2021

This draft PR adds 2 resticprofile flags (and schedule & global configs) to control locking behaviour (profile locks & restic remote locks)

@creativeprojects, let me know whether this is acceptable. If so I'll test and document it.
Currently I'm not using the built-in locking as I'm missing these 2 features.

Added flags:

  • --no-lock: Ignores any defined locks.
    Motivation: Not all tasks really need a lock, e.g. I usually run resiticprofile --name $PROFILE_NAME unlock as pre-run to ensure stale locks are removed. With profile locks this only works when locks can be ignored.

  • --lock-wait: Wait up to the specified duration to acquire a lock before failing.
    Motivation: When triggering operations that should use a lock, one can't always avoid concurrency (e.g. in schedules that may take longer than expected). However it should be configurable whether concurrency will immediately cause a failure or only when some time passed and it didn't resolve.

Added schedule config:

  • schedule-lock-mode (default | fail | ignore): Toggles wether to set --lock-wait (default) or --no-lock (ignore) flags.
  • schedule-lock-wait: Toggles whether default sets --lock-wait to a duration.

Added global config:

  • restic-lock-retry-after: Sets a delay for retrying restic commands that fail to acquire a repository lock. 0 disables handling remote lock failures.
  • restic-stale-lock-age: Sets a lock age for detecting stale locks. Stale locks are not waited upon but restic unlock is tried once (when force-inactive-lock is also set in the profile). 0 disables stale lock handling.

Status

  • Implemented drafts
  • Test config changes
  • Test profile lock handling
  • Test restic lock handling

@sonarcloud
Copy link

sonarcloud bot commented Feb 14, 2021

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

No Coverage information No Coverage information
0.0% 0.0% Duplication

@codecov
Copy link

codecov bot commented Feb 14, 2021

Codecov Report

Merging #33 (35e9a1f) into master (d64dc4f) will increase coverage by 1.34%.
The diff coverage is 75.26%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #33      +/-   ##
==========================================
+ Coverage   53.48%   54.83%   +1.34%     
==========================================
  Files          49       50       +1     
  Lines        3255     3478     +223     
==========================================
+ Hits         1741     1907     +166     
- Misses       1359     1394      +35     
- Partials      155      177      +22     
Impacted Files Coverage Δ
main.go 0.44% <0.00%> (-0.02%) ⬇️
schedule_jobs.go 3.52% <0.00%> (-0.23%) ⬇️
config/flag.go 78.57% <54.54%> (-9.90%) ⬇️
shell/command.go 66.66% <70.00%> (ø)
wrapper.go 73.93% <78.57%> (+2.45%) ⬆️
shell/analyser.go 80.00% <80.00%> (ø)
config/config.go 63.63% <100.00%> (ø)
config/global.go 100.00% <100.00%> (ø)
config/profile.go 90.32% <100.00%> (+0.15%) ⬆️
config/schedule.go 86.84% <100.00%> (+4.69%) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d64dc4f...35e9a1f. Read the comment docs.

@creativeprojects
Copy link
Owner

Yeah I don't particularly need it the way I use restic. But if you need it on your workflow, please go ahead 😃

@creativeprojects
Copy link
Owner

Actually I had this issue recently, although this PR wouldn't have solved it:

I have 2 central backup machines where all my servers push new file every night. One of the server had a big upgrade and pushed a lot more than usual, and then all the other night backup failed.

I think the "wait for lock" option could also be checking the restic lock regularly and wait until it's free to go. If possible. Maybe.

@jkellerer
Copy link
Collaborator Author

Yes I have this as a separate pre-run script to check the restic lock (and wait for it and/or remove stale locks).
That's also on my list for an additional PR.

Sorry for the delay, try to finish this PR next week and will then also check how to integrate restic lock handling.

@creativeprojects
Copy link
Owner

Yes I have this as a separate pre-run script to check the restic lock (and wait for it and/or remove stale locks).
That's also on my list for an additional PR.

Excellent 👍🏻

Sorry for the delay, try to finish this PR next week and will then also check how to integrate restic lock handling.

Don't worry, I've also been busy on other things lately 😄

@jkellerer
Copy link
Collaborator Author

jkellerer commented Apr 11, 2021

This PR is updated to contain missing ideas like schedule config and restic lock handling (stale and lock wait). I still need to test and review the changes myself and rewrite output capturing since I missed the latest changes in master :).

"--lock-wait" also waits on non-stale restic remote locks
Stale locks are unlocked when "force-inactive-locks" is true
Uses global section to configure the behaviour (can also be disabled)
@jkellerer jkellerer force-pushed the ft-lock-wait branch 3 times, most recently from 3f90b19 to fb2d0c6 Compare April 18, 2021 13:44
@jkellerer
Copy link
Collaborator Author

Unit tests have been written and code was cleaned-up.
Testing it now on 2 similar servers with overlapping schedules where my previous shell pre-script for lock-wait was removed.

@jkellerer
Copy link
Collaborator Author

Had several profiles scheduled hourly on 2 servers pointing to the same repo, resulting in several 100 backups over the past days. There was lots of log output like this:

2021/04/20 00:23:12 lock wait (remaining 22m2s / elapsed 22m58s): rest:https://**:**@backup-server:8080/shared-repo/ locked by PID 8475 on app-serv-02 by root (UID 0, GID 0)
2021/04/20 00:15:17 lock wait (remaining 29m57s / elapsed 15m3s): /tmp/resticprofile.shared-repo.lock locked by root on Tuesday, 20-Apr-21 00:00:03 CEST from app-serv-01

... but not a single failure, all scheduled jobs succeeded.
Profile configuration was adjusted to contain schedule-lock-wait: 45m & lock: /tmp/file.

@creativeprojects : I'm removing the WIP flag as I think it is ready for review now.

@jkellerer jkellerer marked this pull request as ready for review April 20, 2021 19:58
@jkellerer jkellerer changed the title WIP: Added resticprofile flags --no-lock & --lock-wait Added resticprofile flags --no-lock & --lock-wait Apr 20, 2021
@creativeprojects
Copy link
Owner

Excellent!

I'll study it in the next few days 👍🏻

Thanks

wrapper.go Outdated Show resolved Hide resolved
@sonarcloud
Copy link

sonarcloud bot commented Apr 23, 2021

SonarCloud Quality Gate failed.

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 3 Code Smells

No Coverage information No Coverage information
3.3% 3.3% Duplication

@creativeprojects
Copy link
Owner

2021/04/20 00:23:12 lock wait (remaining 22m2s / elapsed 22m58s): rest:https://**:**@backup-server:8080/shared-repo/ locked by PID 8475 on app-serv-02 by root (UID 0, GID 0)
2021/04/20 00:15:17 lock wait (remaining 29m57s / elapsed 15m3s): /tmp/resticprofile.shared-repo.lock locked by root on Tuesday, 20-Apr-21 00:00:03 CEST from app-serv-01

The message I'm using for the local lock is completely different than the restic one. I never noticed that before and it does look stupid 😆 (I might want to change that at some point.)

Anyway, I tried a few very close backup overnight and it worked perfectly fine, so I'm merging the PR.
I'll keep it running for a few days and release a new version later.

Thank you very much, it works nicely and it's going to be very useful 👍🏻

@creativeprojects creativeprojects merged commit 331b710 into creativeprojects:master Apr 23, 2021
@jkellerer jkellerer deleted the ft-lock-wait branch April 24, 2021 06:16
@renard
Copy link

renard commented Oct 25, 2022

Coming late after the party.

Wouldn't it be useful to have options such as --no-lock available in the configuration file?
For example I have a ro-repository which is on a read-only remote ZFS volume. Locks could not be created.

I prefer having global lock-mode similar to schedule-lock-mode instead of having to remember to add the --no-lock flag (which is still useful, don't get me wrong). Same with --lock-wait.

In the same way wouldn't it be nice to have an extra-flags key for commands to allow users to pass extra arguments to restic?

@jkellerer
Copy link
Collaborator Author

Hi @renard, it looks like you are referring to restic no-lock flag. You can simply add no-lock: true to your profile, any flag that has no special meaning for resticprofile is passed to restic.

@renard
Copy link

renard commented Oct 27, 2022

Ah yes that is it. We should be careful about that: the no-lock key MUST be true.

Looks like other variants are not taken in account ie. you cannot use yes instead of true.

Some implementations allow yes/no, on/off, 0/1 but they are not part of yaml standard:

This is still unclear to me. However specifying true instead is fine. Thanks again for the clarification.

EDIT: looks like the syntax changed between yaml 1.1 and 1.2: https://yaml.org/spec/1.1/#id864510

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants