Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Borg Deadlocked #813

Closed
rumpelsepp opened this issue Mar 30, 2016 · 26 comments
Closed

Borg Deadlocked #813

rumpelsepp opened this issue Mar 30, 2016 · 26 comments

Comments

@rumpelsepp
Copy link
Contributor

This night all my backups failed, due to some kind of deadlock. I have three servers, one contains the borg repository, the other two servers push their backups into that (=the same) repository. I do hourly backups, the diff between the hourly backups is usally just about 9 MB or something and normally they are really fast. The setup worked well for about 1 week.

To avoid races, I schedule my hourly cron with sleep $(jot -r 1 1 600) && backup.sh, which generates a random delay from 0 to 10 minutes. Additionally, I have set the --lock-wait flag to 1800. This night a got the following emails:

Server:

Remote: Borg 1.0.0: exception in RPC call:
Remote: Traceback (most recent call last):
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/remote.py", line 94, in serve
Remote:     res = f(*args)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 429, in put
Remote:     self.prepare_txn(self.get_transaction_id())
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 178, in prepare_txn
Remote:     self.lock.upgrade()
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 298, in upgrade
Remote:     self.acquire(exclusive=True, remove=SHARED)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 257, in acquire
Remote:     self._wait_for_readers_finishing(remove, sleep)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 286, in _wait_for_readers_finishing
Remote:     raise LockTimeout(self.path)
Remote: borg.locking.LockTimeout: /var/borg/backup/lock
Remote: Platform: FreeBSD korriban 10.2-RELEASE-p9 FreeBSD 10.2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 amd64
Remote: Borg: 1.0.0  Python: CPython 3.4.4
Remote: PID: 11415  CWD: /var/borg
Remote: sys.argv: ['/usr/local/bin/borg', 'serve', '--umask=077']
Remote: SSH_ORIGINAL_COMMAND: None
Remote: 
Remote: Borg 1.0.0: exception in RPC call:
Remote: Traceback (most recent call last):
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/remote.py", line 94, in serve
Remote:     res = f(*args)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 429, in put
Remote:     self.prepare_txn(self.get_transaction_id())
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 178, in prepare_txn
Remote:     self.lock.upgrade()
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 298, in upgrade
Remote:     self.acquire(exclusive=True, remove=SHARED)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 257, in acquire
Remote:     self._wait_for_readers_finishing(remove, sleep)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 286, in _wait_for_readers_finishing
Remote:     raise LockTimeout(self.path)
Remote: borg.locking.LockTimeout: /var/borg/backup/lock
Remote: Platform: FreeBSD korriban 10.2-RELEASE-p9 FreeBSD 10.2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 amd64
Remote: Borg: 1.0.0  Python: CPython 3.4.4
Remote: PID: 11415  CWD: /var/borg
Remote: sys.argv: ['/usr/local/bin/borg', 'serve', '--umask=077']
Remote: SSH_ORIGINAL_COMMAND: None
Remote: 
Remote: Borg 1.0.0: exception in RPC call:
Remote: Traceback (most recent call last):
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/remote.py", line 94, in serve
Remote:     res = f(*args)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 429, in put
Remote:     self.prepare_txn(self.get_transaction_id())
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 178, in prepare_txn
Remote:     self.lock.upgrade()
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 298, in upgrade
Remote:     self.acquire(exclusive=True, remove=SHARED)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 257, in acquire
Remote:     self._wait_for_readers_finishing(remove, sleep)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 286, in _wait_for_readers_finishing
Remote:     raise LockTimeout(self.path)
Remote: borg.locking.LockTimeout: /var/borg/backup/lock
Remote: Platform: FreeBSD korriban 10.2-RELEASE-p9 FreeBSD 10.2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 amd64
Remote: Borg: 1.0.0  Python: CPython 3.4.4
Remote: PID: 11415  CWD: /var/borg
Remote: sys.argv: ['/usr/local/bin/borg', 'serve', '--umask=077']
Remote: SSH_ORIGINAL_COMMAND: None
Remote: 
Remote: Borg 1.0.0: exception in RPC call:
Remote: Traceback (most recent call last):
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/remote.py", line 94, in serve
Remote:     res = f(*args)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 429, in put
Remote:     self.prepare_txn(self.get_transaction_id())
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 178, in prepare_txn
Remote:     self.lock.upgrade()
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 298, in upgrade
Remote:     self.acquire(exclusive=True, remove=SHARED)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 257, in acquire
Remote:     self._wait_for_readers_finishing(remove, sleep)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 286, in _wait_for_readers_finishing
Remote:     raise LockTimeout(self.path)
Remote: borg.locking.LockTimeout: /var/borg/backup/lock
Remote: Platform: FreeBSD korriban 10.2-RELEASE-p9 FreeBSD 10.2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 amd64
Remote: Borg: 1.0.0  Python: CPython 3.4.4
Remote: PID: 11415  CWD: /var/borg
Remote: sys.argv: ['/usr/local/bin/borg', 'serve', '--umask=077']
Remote: SSH_ORIGINAL_COMMAND: None
Remote: 
Remote: Borg 1.0.0: exception in RPC call:
Remote: Traceback (most recent call last):
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/remote.py", line 94, in serve
Remote:     res = f(*args)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 429, in put
Remote:     self.prepare_txn(self.get_transaction_id())
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 178, in prepare_txn
Remote:     self.lock.upgrade()
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 298, in upgrade
Remote:     self.acquire(exclusive=True, remove=SHARED)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 257, in acquire
Remote:     self._wait_for_readers_finishing(remove, sleep)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 286, in _wait_for_readers_finishing
Remote:     raise LockTimeout(self.path)
Remote: borg.locking.LockTimeout: /var/borg/backup/lock
Remote: Platform: FreeBSD korriban 10.2-RELEASE-p9 FreeBSD 10.2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 amd64
Remote: Borg: 1.0.0  Python: CPython 3.4.4
Remote: PID: 11415  CWD: /var/borg
Remote: sys.argv: ['/usr/local/bin/borg', 'serve', '--umask=077']
Remote: SSH_ORIGINAL_COMMAND: None
Remote: 
Remote: Borg 1.0.0: exception in RPC call:
Remote: Traceback (most recent call last):
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/remote.py", line 94, in serve
Remote:     res = f(*args)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 429, in put
Remote:     self.prepare_txn(self.get_transaction_id())
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 178, in prepare_txn
Remote:     self.lock.upgrade()
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 298, in upgrade
Remote:     self.acquire(exclusive=True, remove=SHARED)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 257, in acquire
Remote:     self._wait_for_readers_finishing(remove, sleep)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 286, in _wait_for_readers_finishing
Remote:     raise LockTimeout(self.path)
Remote: borg.locking.LockTimeout: /var/borg/backup/lock
Remote: Platform: FreeBSD korriban 10.2-RELEASE-p9 FreeBSD 10.2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 amd64
Remote: Borg: 1.0.0  Python: CPython 3.4.4
Remote: PID: 11415  CWD: /var/borg
Remote: sys.argv: ['/usr/local/bin/borg', 'serve', '--umask=077']
Remote: SSH_ORIGINAL_COMMAND: None
Remote: 
Remote: Borg 1.0.0: exception in RPC call:
Remote: Traceback (most recent call last):
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/remote.py", line 94, in serve
Remote:     res = f(*args)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 429, in put
Remote:     self.prepare_txn(self.get_transaction_id())
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 178, in prepare_txn
Remote:     self.lock.upgrade()
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 298, in upgrade
Remote:     self.acquire(exclusive=True, remove=SHARED)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 257, in acquire
Remote:     self._wait_for_readers_finishing(remove, sleep)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 286, in _wait_for_readers_finishing
Remote:     raise LockTimeout(self.path)
Remote: borg.locking.LockTimeout: /var/borg/backup/lock
Remote: Platform: FreeBSD korriban 10.2-RELEASE-p9 FreeBSD 10.2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 amd64
Remote: Borg: 1.0.0  Python: CPython 3.4.4
Remote: PID: 11415  CWD: /var/borg
Remote: sys.argv: ['/usr/local/bin/borg', 'serve', '--umask=077']
Remote: SSH_ORIGINAL_COMMAND: None
Remote: 
Remote: Borg 1.0.0: exception in RPC call:
Remote: Traceback (most recent call last):
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/remote.py", line 94, in serve
Remote:     res = f(*args)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 429, in put
Remote:     self.prepare_txn(self.get_transaction_id())
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 178, in prepare_txn
Remote:     self.lock.upgrade()
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 298, in upgrade
Remote:     self.acquire(exclusive=True, remove=SHARED)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 257, in acquire
Remote:     self._wait_for_readers_finishing(remove, sleep)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 286, in _wait_for_readers_finishing
Remote:     raise LockTimeout(self.path)
Remote: borg.locking.LockTimeout: /var/borg/backup/lock
Remote: Platform: FreeBSD korriban 10.2-RELEASE-p9 FreeBSD 10.2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 amd64
Remote: Borg: 1.0.0  Python: CPython 3.4.4
Remote: PID: 11415  CWD: /var/borg
Remote: sys.argv: ['/usr/local/bin/borg', 'serve', '--umask=077']
Remote: SSH_ORIGINAL_COMMAND: None
Remote: 
Remote Exception (see remote log for the traceback)
Platform: FreeBSD coruscant 10.2-RELEASE-p14 FreeBSD 10.2-RELEASE-p14 #0: Wed Mar 16 20:46:12 UTC 2016     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 amd64
Borg: 1.0.0  Python: CPython 3.4.4
PID: 59110  CWD: /root
sys.argv: ['/usr/local/bin/borg', 'create', '-C', 'lz4', '--lock-wait', '1800', 'borg@bla.tld:/var/borg/backup::coruscant-2016-03-30-010210-CEST', '/usr/home', '/etc', '/usr/local/etc']
SSH_ORIGINAL_COMMAND: None

Client 1

Failed to create/acquire the lock /root/.cache/borg/018468490212573b7a255fdbeadc052964eb2fee8e0446b79abc7fead592bfb2/lock.exclusive (timeout).

Client 2

Failed to create/acquire the lock /root/.cache/borg/018468490212573b7a255fdbeadc052964eb2fee8e0446b79abc7fead592bfb2/lock.exclusive (timeout).

The server has too much borg processes left:

$ ps aux | grep borg
borg      11410   1,5  0,9 153684  38712  -  Ss    1:02am      5:18,73 /usr/local/bin/python3.4 /usr/local/bin/borg serve --umask=077
root      11407   0,0  0,2  86496   7468  -  Is    1:02am      0:00,05 sshd: borg [priv] (sshd)
borg      11409   0,0  0,2  86496  10304  -  I     1:02am      0:00,24 sshd: borg@notty (sshd)
root      25779   0,0  0,2  86496   7460  -  Is    5:08am      0:00,05 sshd: borg [priv] (sshd)
borg      25781   0,0  0,2  86496   7596  -  I     5:08am      0:00,01 sshd: borg@notty (sshd)
borg      25782   0,0  0,9 149588  38024  -  Is    5:08am      2:22,57 /usr/local/bin/python3.4 /usr/local/bin/borg serve --umask=077
root      35819   0,0  0,2  86496   7468  -  Is    8:00am      0:00,05 sshd: borg [priv] (sshd)
borg      35821   0,0  0,2  86496   7480  -  I     8:00am      0:00,01 sshd: borg@notty (sshd)
borg      35822   0,0  0,9 149332  37648  -  Ss    8:00am      0:01,22 /usr/local/bin/python3.4 /usr/local/bin/borg serve --umask=077
root      36103   0,0  0,2  86496   7460  -  Is    8:08am      0:00,05 sshd: borg [priv] (sshd)
borg      36105   0,0  0,2  86496   7472  -  I     8:08am      0:00,01 sshd: borg@notty (sshd)
borg      36106   0,0  0,9 149332  37632  -  Ss    8:08am      0:01,22 /usr/local/bin/python3.4 /usr/local/bin/borg serve --umask=077
stefan    37298   0,0  0,1  18824   2564  1  S+    8:28am      0:00,01 grep borg

It seems that it deadlocked very badly; any ideas what went wrong? Is this a bug or misconfiguration?

@enkore
Copy link
Contributor

enkore commented Mar 30, 2016

Can you determine which process/host hogged the lock first and what happened there?

In 1.0.0 there are some circumstances (- mainly abnormal process termination) that usually result in stale locks. Some changes where made in master since then to avoid these; unless a Borg is sigkill'ed there shouldn't be any stale locks anymore.

@rumpelsepp
Copy link
Contributor Author

It seems like the remote host has crashed and the lock was not cleaned up properly.

@ThomasWaldmann
Copy link
Member

OK, in that case, I'ld say this is a feature, not a bug. If the backup host crashed while borg was active, you want to know that and further backups being stopped until you manually remove the lock AND run a borg check on the repo.

@enkore
Copy link
Contributor

enkore commented Mar 30, 2016

The borg serve-side lock should have been cleaned up when the client goes way. At least that's the intention I read into the code:

        while True:
            r, w, es = select.select([stdin_fd], [], [], 10)
            ...
            if es:
                self.repository.close()
                return

( https://github.com/borgbackup/borg/blob/master/borg/remote.py#L75 )

Btw. maybe a repository.rollback() is more appropriate here? Or is this the regular clean-up path? Not sure...

@ThomasWaldmann
Copy link
Member

@enkore @rumpelsepp I understood "remote host has crashed" as "the machine running 'borg serve' has crashed". If I understood correctly, there is nothing to do here.

@enkore
Copy link
Contributor

enkore commented Mar 30, 2016

Misunderstanding on my part then :)

@rumpelsepp
Copy link
Contributor Author

No. The only thing that I can find out is the error message with Traceback in the first cron mail. The machine did not crash, the machine was fine and refused to aquire locks, thus the other machines spammed me with Borg "Could not aquire lock..." mails. :)

First email:

Remote: Borg 1.0.0: exception in RPC call:
Remote: Traceback (most recent call last):
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/remote.py", line 94, in serve
Remote:     res = f(*args)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 429, in put
Remote:     self.prepare_txn(self.get_transaction_id())
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 178, in prepare_txn
Remote:     self.lock.upgrade()
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 298, in upgrade
Remote:     self.acquire(exclusive=True, remove=SHARED)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 257, in acquire
Remote:     self._wait_for_readers_finishing(remove, sleep)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 286, in _wait_for_readers_finishing
Remote:     raise LockTimeout(self.path)
Remote: borg.locking.LockTimeout: /var/borg/backup/lock
Remote: Platform: FreeBSD korriban 10.2-RELEASE-p9 FreeBSD 10.2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 amd64
Remote: Borg: 1.0.0  Python: CPython 3.4.4
Remote: PID: 11410  CWD: /var/borg
Remote: sys.argv: ['/usr/local/bin/borg', 'serve', '--umask=077']
Remote: SSH_ORIGINAL_COMMAND: None
Remote: 
Remote: Borg 1.0.0: exception in RPC call:
Remote: Traceback (most recent call last):
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/remote.py", line 94, in serve
Remote:     res = f(*args)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 429, in put
Remote:     self.prepare_txn(self.get_transaction_id())
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 178, in prepare_txn
Remote:     self.lock.upgrade()
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 298, in upgrade
Remote:     self.acquire(exclusive=True, remove=SHARED)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 257, in acquire
Remote:     self._wait_for_readers_finishing(remove, sleep)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 286, in _wait_for_readers_finishing
Remote:     raise LockTimeout(self.path)
Remote: borg.locking.LockTimeout: /var/borg/backup/lock
Remote: Platform: FreeBSD korriban 10.2-RELEASE-p9 FreeBSD 10.2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 amd64
Remote: Borg: 1.0.0  Python: CPython 3.4.4
Remote: PID: 11410  CWD: /var/borg
Remote: sys.argv: ['/usr/local/bin/borg', 'serve', '--umask=077']
Remote: SSH_ORIGINAL_COMMAND: None
Remote: 
Remote: Borg 1.0.0: exception in RPC call:
Remote: Traceback (most recent call last):
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/remote.py", line 94, in serve
Remote:     res = f(*args)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 429, in put
Remote:     self.prepare_txn(self.get_transaction_id())
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 178, in prepare_txn
Remote:     self.lock.upgrade()
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 298, in upgrade
Remote:     self.acquire(exclusive=True, remove=SHARED)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 257, in acquire
Remote:     self._wait_for_readers_finishing(remove, sleep)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 286, in _wait_for_readers_finishing
Remote:     raise LockTimeout(self.path)
Remote: borg.locking.LockTimeout: /var/borg/backup/lock
Remote: Platform: FreeBSD korriban 10.2-RELEASE-p9 FreeBSD 10.2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 amd64
Remote: Borg: 1.0.0  Python: CPython 3.4.4
Remote: PID: 11410  CWD: /var/borg
Remote: sys.argv: ['/usr/local/bin/borg', 'serve', '--umask=077']
Remote: SSH_ORIGINAL_COMMAND: None
Remote: 
Remote: Borg 1.0.0: exception in RPC call:
Remote: Traceback (most recent call last):
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/remote.py", line 94, in serve
Remote:     res = f(*args)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 429, in put
Remote:     self.prepare_txn(self.get_transaction_id())
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 178, in prepare_txn
Remote:     self.lock.upgrade()
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 298, in upgrade
Remote:     self.acquire(exclusive=True, remove=SHARED)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 257, in acquire
Remote:     self._wait_for_readers_finishing(remove, sleep)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 286, in _wait_for_readers_finishing
Remote:     raise LockTimeout(self.path)
Remote: borg.locking.LockTimeout: /var/borg/backup/lock
Remote: Platform: FreeBSD korriban 10.2-RELEASE-p9 FreeBSD 10.2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 amd64
Remote: Borg: 1.0.0  Python: CPython 3.4.4
Remote: PID: 11410  CWD: /var/borg
Remote: sys.argv: ['/usr/local/bin/borg', 'serve', '--umask=077']
Remote: SSH_ORIGINAL_COMMAND: None
Remote: 
Remote: Borg 1.0.0: exception in RPC call:
Remote: Traceback (most recent call last):
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/remote.py", line 94, in serve
Remote:     res = f(*args)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 429, in put
Remote:     self.prepare_txn(self.get_transaction_id())
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 178, in prepare_txn
Remote:     self.lock.upgrade()
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 298, in upgrade
Remote:     self.acquire(exclusive=True, remove=SHARED)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 257, in acquire
Remote:     self._wait_for_readers_finishing(remove, sleep)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 286, in _wait_for_readers_finishing
Remote:     raise LockTimeout(self.path)
Remote: borg.locking.LockTimeout: /var/borg/backup/lock
Remote: Platform: FreeBSD korriban 10.2-RELEASE-p9 FreeBSD 10.2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 amd64
Remote: Borg: 1.0.0  Python: CPython 3.4.4
Remote: PID: 11410  CWD: /var/borg
Remote: sys.argv: ['/usr/local/bin/borg', 'serve', '--umask=077']
Remote: SSH_ORIGINAL_COMMAND: None
Remote: 
Remote: Borg 1.0.0: exception in RPC call:
Remote: Traceback (most recent call last):
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/remote.py", line 94, in serve
Remote:     res = f(*args)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 429, in put
Remote:     self.prepare_txn(self.get_transaction_id())
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 178, in prepare_txn
Remote:     self.lock.upgrade()
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 298, in upgrade
Remote:     self.acquire(exclusive=True, remove=SHARED)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 257, in acquire
Remote:     self._wait_for_readers_finishing(remove, sleep)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 286, in _wait_for_readers_finishing
Remote:     raise LockTimeout(self.path)
Remote: borg.locking.LockTimeout: /var/borg/backup/lock
Remote: Platform: FreeBSD korriban 10.2-RELEASE-p9 FreeBSD 10.2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 amd64
Remote: Borg: 1.0.0  Python: CPython 3.4.4
Remote: PID: 11410  CWD: /var/borg
Remote: sys.argv: ['/usr/local/bin/borg', 'serve', '--umask=077']
Remote: SSH_ORIGINAL_COMMAND: None
Remote: 
Remote: Borg 1.0.0: exception in RPC call:
Remote: Traceback (most recent call last):
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/remote.py", line 94, in serve
Remote:     res = f(*args)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 429, in put
Remote:     self.prepare_txn(self.get_transaction_id())
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 178, in prepare_txn
Remote:     self.lock.upgrade()
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 298, in upgrade
Remote:     self.acquire(exclusive=True, remove=SHARED)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 257, in acquire
Remote:     self._wait_for_readers_finishing(remove, sleep)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 286, in _wait_for_readers_finishing
Remote:     raise LockTimeout(self.path)
Remote: borg.locking.LockTimeout: /var/borg/backup/lock
Remote: Platform: FreeBSD korriban 10.2-RELEASE-p9 FreeBSD 10.2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 amd64
Remote: Borg: 1.0.0  Python: CPython 3.4.4
Remote: PID: 11410  CWD: /var/borg
Remote: sys.argv: ['/usr/local/bin/borg', 'serve', '--umask=077']
Remote: SSH_ORIGINAL_COMMAND: None
Remote: 
Remote: Borg 1.0.0: exception in RPC call:
Remote: Traceback (most recent call last):
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/remote.py", line 94, in serve
Remote:     res = f(*args)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 429, in put
Remote:     self.prepare_txn(self.get_transaction_id())
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 178, in prepare_txn
Remote:     self.lock.upgrade()
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 298, in upgrade
Remote:     self.acquire(exclusive=True, remove=SHARED)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 257, in acquire
Remote:     self._wait_for_readers_finishing(remove, sleep)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 286, in _wait_for_readers_finishing
Remote:     raise LockTimeout(self.path)
Remote: borg.locking.LockTimeout: /var/borg/backup/lock
Remote: Platform: FreeBSD korriban 10.2-RELEASE-p9 FreeBSD 10.2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 amd64
Remote: Borg: 1.0.0  Python: CPython 3.4.4
Remote: PID: 11410  CWD: /var/borg
Remote: sys.argv: ['/usr/local/bin/borg', 'serve', '--umask=077']
Remote: SSH_ORIGINAL_COMMAND: None
Remote: 
Remote: Borg 1.0.0: exception in RPC call:
Remote: Traceback (most recent call last):
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/remote.py", line 94, in serve
Remote:     res = f(*args)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 429, in put
Remote:     self.prepare_txn(self.get_transaction_id())
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 178, in prepare_txn
Remote:     self.lock.upgrade()
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 298, in upgrade
Remote:     self.acquire(exclusive=True, remove=SHARED)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 257, in acquire
Remote:     self._wait_for_readers_finishing(remove, sleep)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 286, in _wait_for_readers_finishing
Remote:     raise LockTimeout(self.path)
Remote: borg.locking.LockTimeout: /var/borg/backup/lock
Remote: Platform: FreeBSD korriban 10.2-RELEASE-p9 FreeBSD 10.2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 amd64
Remote: Borg: 1.0.0  Python: CPython 3.4.4
Remote: PID: 11410  CWD: /var/borg
Remote: sys.argv: ['/usr/local/bin/borg', 'serve', '--umask=077']
Remote: SSH_ORIGINAL_COMMAND: None
Remote: 
Remote: Borg 1.0.0: exception in RPC call:
Remote: Traceback (most recent call last):
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/remote.py", line 94, in serve
Remote:     res = f(*args)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 429, in put
Remote:     self.prepare_txn(self.get_transaction_id())
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 178, in prepare_txn
Remote:     self.lock.upgrade()
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 298, in upgrade
Remote:     self.acquire(exclusive=True, remove=SHARED)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 257, in acquire
Remote:     self._wait_for_readers_finishing(remove, sleep)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 286, in _wait_for_readers_finishing
Remote:     raise LockTimeout(self.path)
Remote: borg.locking.LockTimeout: /var/borg/backup/lock
Remote: Platform: FreeBSD korriban 10.2-RELEASE-p9 FreeBSD 10.2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 amd64
Remote: Borg: 1.0.0  Python: CPython 3.4.4
Remote: PID: 11410  CWD: /var/borg
Remote: sys.argv: ['/usr/local/bin/borg', 'serve', '--umask=077']
Remote: SSH_ORIGINAL_COMMAND: None
Remote: 
Remote: Borg 1.0.0: exception in RPC call:
Remote: Traceback (most recent call last):
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/remote.py", line 94, in serve
Remote:     res = f(*args)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 429, in put
Remote:     self.prepare_txn(self.get_transaction_id())
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 178, in prepare_txn
Remote:     self.lock.upgrade()
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 298, in upgrade
Remote:     self.acquire(exclusive=True, remove=SHARED)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 257, in acquire
Remote:     self._wait_for_readers_finishing(remove, sleep)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 286, in _wait_for_readers_finishing
Remote:     raise LockTimeout(self.path)
Remote: borg.locking.LockTimeout: /var/borg/backup/lock
Remote: Platform: FreeBSD korriban 10.2-RELEASE-p9 FreeBSD 10.2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 amd64
Remote: Borg: 1.0.0  Python: CPython 3.4.4
Remote: PID: 11410  CWD: /var/borg
Remote: sys.argv: ['/usr/local/bin/borg', 'serve', '--umask=077']
Remote: SSH_ORIGINAL_COMMAND: None
Remote: 
Remote: Borg 1.0.0: exception in RPC call:
Remote: Traceback (most recent call last):
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/remote.py", line 94, in serve
Remote:     res = f(*args)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 429, in put
Remote:     self.prepare_txn(self.get_transaction_id())
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 178, in prepare_txn
Remote:     self.lock.upgrade()
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 298, in upgrade
Remote:     self.acquire(exclusive=True, remove=SHARED)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 257, in acquire
Remote:     self._wait_for_readers_finishing(remove, sleep)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 286, in _wait_for_readers_finishing
Remote:     raise LockTimeout(self.path)
Remote: borg.locking.LockTimeout: /var/borg/backup/lock
Remote: Platform: FreeBSD korriban 10.2-RELEASE-p9 FreeBSD 10.2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 amd64
Remote: Borg: 1.0.0  Python: CPython 3.4.4
Remote: PID: 11410  CWD: /var/borg
Remote: sys.argv: ['/usr/local/bin/borg', 'serve', '--umask=077']
Remote: SSH_ORIGINAL_COMMAND: None
Remote: 
Remote: Borg 1.0.0: exception in RPC call:
Remote: Traceback (most recent call last):
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/remote.py", line 94, in serve
Remote:     res = f(*args)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 429, in put
Remote:     self.prepare_txn(self.get_transaction_id())
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 178, in prepare_txn
Remote:     self.lock.upgrade()
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 298, in upgrade
Remote:     self.acquire(exclusive=True, remove=SHARED)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 257, in acquire
Remote:     self._wait_for_readers_finishing(remove, sleep)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 286, in _wait_for_readers_finishing
Remote:     raise LockTimeout(self.path)
Remote: borg.locking.LockTimeout: /var/borg/backup/lock
Remote: Platform: FreeBSD korriban 10.2-RELEASE-p9 FreeBSD 10.2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 amd64
Remote: Borg: 1.0.0  Python: CPython 3.4.4
Remote: PID: 11410  CWD: /var/borg
Remote: sys.argv: ['/usr/local/bin/borg', 'serve', '--umask=077']
Remote: SSH_ORIGINAL_COMMAND: None
Remote: 
Remote: Borg 1.0.0: exception in RPC call:
Remote: Traceback (most recent call last):
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/remote.py", line 94, in serve
Remote:     res = f(*args)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 429, in put
Remote:     self.prepare_txn(self.get_transaction_id())
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 178, in prepare_txn
Remote:     self.lock.upgrade()
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 298, in upgrade
Remote:     self.acquire(exclusive=True, remove=SHARED)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 257, in acquire
Remote:     self._wait_for_readers_finishing(remove, sleep)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 286, in _wait_for_readers_finishing
Remote:     raise LockTimeout(self.path)
Remote: borg.locking.LockTimeout: /var/borg/backup/lock
Remote: Platform: FreeBSD korriban 10.2-RELEASE-p9 FreeBSD 10.2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 amd64
Remote: Borg: 1.0.0  Python: CPython 3.4.4
Remote: PID: 11410  CWD: /var/borg
Remote: sys.argv: ['/usr/local/bin/borg', 'serve', '--umask=077']
Remote: SSH_ORIGINAL_COMMAND: None
Remote: 
Remote: Borg 1.0.0: exception in RPC call:
Remote: Traceback (most recent call last):
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/remote.py", line 94, in serve
Remote:     res = f(*args)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 429, in put
Remote:     self.prepare_txn(self.get_transaction_id())
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 178, in prepare_txn
Remote:     self.lock.upgrade()
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 298, in upgrade
Remote:     self.acquire(exclusive=True, remove=SHARED)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 257, in acquire
Remote:     self._wait_for_readers_finishing(remove, sleep)
Remote:   File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 286, in _wait_for_readers_finishing
Remote:     raise LockTimeout(self.path)
Remote: borg.locking.LockTimeout: /var/borg/backup/lock
Remote: Platform: FreeBSD korriban 10.2-RELEASE-p9 FreeBSD 10.2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 amd64
Remote: Borg: 1.0.0  Python: CPython 3.4.4
Remote: PID: 11410  CWD: /var/borg
Remote: sys.argv: ['/usr/local/bin/borg', 'serve', '--umask=077']
Remote: SSH_ORIGINAL_COMMAND: None
Remote: 
Connection closed by remote host

@ThomasWaldmann
Copy link
Member

OK, so the borg serve machine did not crash, but there was a lock in the repo and we do not know why.

I guess we can't do anything here if we do not find out why the lock was there, so I think I'll close this unless we get more information.

@rumpelsepp
Copy link
Contributor Author

Yeah. When this issue comes again I will dig deeper into it and maybe enable debug logs. Currently everything was as silent as possible because of cron. ATM everything runs fine again.

On March 30, 2016 7:20:37 PM GMT+02:00, TW notifications@github.com wrote:

OK, so the borg serve machine did not crash, but there was a lock in
the repo and we do not know why.

I guess we can't do anything here if we do not find out why the lock
was there, so I think I'll close this unless we get more information.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#813 (comment)

Sent from my Android device with K-9 Mail. Please excuse my brevity.

@jperville
Copy link

Having this issue very often here, backing up 10 servers or so onto a shared backup repository.

As the original poster said, it is possible to mitigate a bit using random sleep and longer lock-wait but every morning I have several instances of the backup which fail after (a very long number of seconds), usually because too long waiting for the lock. I am logging failed backups into my monitoring system so I can tell exactly how many backups were successful and how many returned non zero (and after how many seconds).

@ThomasWaldmann
Copy link
Member

If the lock wait times out (after waiting for whatever time you specified), there are 2 possibilities:

  • the lock is legitimately held by a running borg process (in that case, there is no problem, it is working as expected)
  • a borg process died somehow and left an orphan lock that won't go away until you remove it manually using borg break-lock (in that case, one needs to find out why the borg process has died)

Maybe upgrade to borg 1.0.1, which has some fixes, including one related to lock cleanup.
Tell us if it helped.

@jperville
Copy link

@ThomasWaldmann I have backported the fix for #773 but other than that I'm still using 1.0.0-4 (because that's the latest package available from the ubuntu ppa as of today)

@enkore
Copy link
Contributor

enkore commented Apr 9, 2016

#830 (1.0.1+) / #777 (master) should clean up any locks if Borg crashes inside Python (-- it can't handle a SIGSEGV or something like that), so these are of interested if stuff crashes / "terminates unexpectedly" regularly on your servers.

@jperville
Copy link

Thanks, just asked in the "distro packages needed" issue for a borg 1.0.1 package, will try it as soon as it becomes available.

@lhupfeldt
Copy link

I just got a stale lock and I'm not running any simultaneous backups.
If a lock is found when a new backup starts, then it should be determined if the process is still running, and if not, then any necessary checks to be run after a crash should automatically be run and the lock deleted if checks are successful. A lock file is not needed to determine if a process is running, but it can be used to determine if it was running and crashed.
I was on 1.0.0 when the issue occurred, I have upgraded to 1.0.3 just now.

@ThomasWaldmann
Copy link
Member

@lhupfeldt you can't determine from machine A whether a borg process on machine B is still running (which both are clients to a repo on machine C). So this is not a general solution, it only works for the easy case when there is only 1 client always on same machine.

@enkore
Copy link
Contributor

enkore commented Jun 16, 2016

I was on 1.0.0 when the issue occurred, I have upgraded to 1.0.3 just now.

Since 1.0.0 there where some advancements how process termination and locking is handled; in 1.0.0 most premature exits would have caused a stale lock (e.g. computer shutdown, ^C, connection loss, …), while this should be fixed for almost everything except SIGKILL / hard crashes by 1.0.3. Note that it's merely an inconvenience, not something causing corruption...

@lhupfeldt
Copy link

lhupfeldt commented Jun 16, 2016

@ThomasWaldmann` I disagree. It is definitely more complex than determining if a process is running locally, but basically, if a human can determine it, then so can a program. The client borg will ask the server borg if it is running, and distinguishing should not be hard if each server process is started with a unique cmdline, so that it can recognize other instances of itself. I guess the check needs to ensure that only one process is accessing a specific repository and that can be seen from the cmdline.

#!/bin/python3

# Copyright (c) 2012 Lars Hupfeldt Nielsen, Hupfeldt IT
# All rights reserved. This work is under a BSD license, see LICENSE.TXT.

import sys, os
import psutil

def singleton_script():
    proc_name = os.path.basename(__file__)
    my_proc = None

    for proc in psutil.process_iter():
        try:
            try:
                # Handle script called as 'python <script>'
                arg_name = os.path.basename(proc.cmdline()[1]) if len(proc.cmdline()) > 1 else None
            except (psutil.ZombieProcess):
                continue
            except (PermissionError, psutil.AccessDenied, IndexError) as ex:
                arg_name = None

            if proc_name in (os.path.basename(proc.name()), arg_name):
                if my_proc:
                    print("Already running")
                    sys.exit(1)
                my_proc = proc

        except UnicodeDecodeError:
            # Workaround for broken psutils on non english installation
            # Singleton is still guaranteed if script is installed in a full path with an 'ascii' name
            pass


if __name__ == "__main__":
    import time
    singleton_script()
    print("Going to sleep for 10 seconds. Run one more of me to test!")
    time.sleep(10)

Modifying this to check on the cmdline arguments would do the trick

@ThomasWaldmann
Copy link
Member

"asking the server" and "cmdline arguments" require a interface change - we can only do that on bigger release steps and we want to do that rarely as it usually breaks (or just doesn't work) with older clients or servers. I suggest you just try latest 1.0.x and see if that solves most such problems.

@lhupfeldt
Copy link

lhupfeldt commented Jun 16, 2016

I have already upgrade to the latest version, so hopefully that will reduce the problem.

I'm not talking about necessarily adding any new cmdline arguments, couldn't the restrict-to-path be used for this? Even if a new parameter was added it could probably be backwards compatible.
I don't think adding feature this would need to break older clients - they would just fail with same error they are failing with now. And newer clients talking to an older server could fail gracefully in the same manner, unless you think this feature requires changing existing APIs in an incompatible manner.

I am able to resolve lock file problems, but I think all the family members I'm providing server space for, would call me without a clue as to why their backup failed if they got this error.

@level323
Copy link

Another approach that may be worth considering is using an separate lock management tool. I particularly like FLOM for this purpose:

https://github.com/tiian/flom

@lhupfeldt
Copy link

@level323 Does this require an external daemon? If it does I would consider it an unnecessary complexity and it may be overkill for this purpose.

@level323
Copy link

@lhupfeldt No, an external daemon is not required. But it does have that capability. FLOM supports many different use cases and locking mechanisms. FLOM compiles fairly easily with few dependencies.

I'm not saying it's "the" solution to the needs of this particular use case. My suggestion was offered more out of a sense of pragmatism - that is, borg's lock handling may still have some rough edges (or may not be designed to suit certain use cases) in which case FLOM can likely resolve your immediate issue and may provide a useful solution for use cases that borg's locking system was not designed for.

In the particular case mentioned by @jperville I believe FLOM can be used to completely eliminate the locking problems that were encountered. In this case FLOM can be used to ensure that only one instance of borg is ever running at any one time on the machine housing the central backup repo. Furthermore (if desired) FLOM can also be used to queue each remote source machine to backup to the central repo in turn (strictly one at a time).

@jperville
Copy link

This lock issue is still current for me, to the point that I gave up on the idea of using a shared borg repository for multiple clients.

@enkore
Copy link
Contributor

enkore commented Jun 17, 2016

What I don't quite understand (yet), and you didn't really say anything about it... do you have these issues due to crashes, or is it just timing? In the latter case a different locking system just wouldn't make a difference...

@jperville
Copy link

For me is timing mostly, running several clients (on different hosts) at the same time that push to different archives in the same repository, the clients can make really a long time to acquire the lock (if they do at all). I can reproduce quite easily by running my backup command explicitly on each host using a terminal close to the same time. Each backup should complete in around 5 minutes, however when I start 3 at the same time, borg keeps fighting for the lock and no backup completes (or if it does, it takes way too long).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants