Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically remove stale locks #1674

Merged
merged 7 commits into from Nov 7, 2016

Conversation

enkore
Copy link
Contributor

@enkore enkore commented Oct 2, 2016

Supersedes #1246

Fixes #562

Edit travis #3000 :)

@enkore enkore mentioned this pull request Oct 2, 2016
@codecov-io
Copy link

codecov-io commented Oct 2, 2016

Current coverage is 84.37% (diff: 76.00%)

Merging #1674 into master will decrease coverage by 0.08%

@@             master      #1674   diff @@
==========================================
  Files            20         20          
  Lines          6609       6664    +55   
  Methods           0          0          
  Messages          0          0          
  Branches       1125       1136    +11   
==========================================
+ Hits           5582       5623    +41   
- Misses          754        767    +13   
- Partials        273        274     +1   

Powered by Codecov. Last update b1ac207...6b88da2

@enkore enkore changed the title (wip) Automatically remove stale locks Automatically remove stale locks Oct 7, 2016
@@ -323,12 +323,15 @@ def borg_cmd(self, args, testing):
opts.append('--critical')
else:
raise ValueError('log level missing, fix this code')
env_vars = []
if bool(os.environ.get('BORG_UNIQUE_HOSTNAME')):
env_vars.append('BORG_UNIQUE_HOSTNAME=yes')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a while ago I changed semantics of env vars. in early borg and attic, non-present env vars meant False and any present ones meant True (even if e.g. FOO=False or FOO=0 was used - they would be evaluated as truish).

after my change, env vars are handled like yes() input. So, it is FOO=0 and FOO=1 or FOO=yes and FOO=no now.

So, I think it should be:

unique_hostname = os.environ.get('BORG_UNIQUE_HOSTNAME')
if unique_hostname is not None:
    env_vars.append('BORG_UNIQUE_HOSTNAME=%s' % unique_hostname)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, there are some other places that do bool(os.environ.get('B_U_H')), I'll change those as well for consistent behaviour.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I added prompt=True|False to yes() to facilitate this. Output now looks like this:

Remote: Enabled removal of stale repository locks
Remote: Killed stale lock tiger.3300-0.
Remote: Removed stale exclusive roster lock for pid 3300.
Remote: Removed stale exclusive roster lock for pid 3300.
Enabled removal of stale cache locks
Killed stale lock tiger.3292-0.
Removed stale exclusive roster lock for pid 3292.
Removed stale exclusive roster lock for pid 3292.

The doubled roster locks are because this is done on loading, so if it's not immediately saved that gets doubled. I think we could just move those down to INFO log level, or remove them.

else: # pragma: no cover
remote_path = args.remote_path or os.environ.get('BORG_REMOTE_PATH', 'borg')
remote_path = replace_placeholders(remote_path)
return [remote_path, 'serve'] + opts
return env_vars + [remote_path, 'serve'] + opts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as this is no-cover: did you test this manually, using a ssh: repo location?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it works.

dead_lock = ExclusiveLock(lockpath, id=dead_id).acquire()
with ExclusiveLock(lockpath, id=our_id, kill_stale_locks=True):
with pytest.raises(NotMyLock):
dead_lock.release()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

69 can only acquire the lock, if there is no other lock for lockpath.
but there is dead_lock. as it is dead, and kill_stale_locks, it gets killed.
but why does 71 raise NotMyLock then and not NotLocked?
and why is it allowed to enter the with-block if the lock is still considered present?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NotMyLock because there is an active lock and it belongs to 69. NotLocked would be the result if we de-indentend 70/71 one level, in which case 69 would be unlocked, and since 68 is dead, there is no lock = NotLocked.

@enkore enkore self-assigned this Oct 10, 2016
@enkore enkore removed their assignment Nov 3, 2016
# the lock made by the parent, so it needs to use the same PID for that.
_pid = os.getpid()
# XXX this sometimes requires live internet access for issuing a DNS query in the background.
_hostname = socket.gethostname()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a way around that API-wise, if the machine is configured that way.

@enkore
Copy link
Contributor Author

enkore commented Nov 3, 2016

Feature-wise this is done. Output could use some polishing perhaps:

Done. I added prompt=True|False to yes() to facilitate this. Output now looks like this:

Remote: Enabled removal of stale repository locks
Remote: Killed stale lock tiger.3300-0.
Remote: Removed stale exclusive roster lock for pid 3300.
Remote: Removed stale exclusive roster lock for pid 3300.
Enabled removal of stale cache locks
Killed stale lock tiger.3292-0.
Removed stale exclusive roster lock for pid 3292.
Removed stale exclusive roster lock for pid 3292.

The doubled roster locks are because this is done on loading, so if it's not immediately saved that gets doubled. I think we could just move those down to INFO log level, or remove them.

#1674 (comment)

Copy link
Member

@ThomasWaldmann ThomasWaldmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some small nitpicks

@@ -71,6 +71,8 @@ def __init__(self, repository, key, manifest, path=None, sync=True, do_files=Fal
self.key = key
self.manifest = manifest
self.path = path or os.path.join(get_cache_dir(), repository.id_str)
self.unique_hostname = yes(true_msg='Enabled removal of stale cache locks',
env_var_override='BORG_UNIQUE_HOSTNAME', prompt=False, env_msg=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BORG_UNIQUE_HOSTNAME is a bit ambiguous / unintuitive maybe.
It is meant as boolean, but from the name, it could also be meant as a var holding a hostname.

BORG_HOSTNAME_IS_UNIQUE maybe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also affects the python variable names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok


return local_pid_alive(pid)

def local_pid_alive(pid):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pep8: add one more blank line above this one.

try:
# This doesn't work on Windows.
# This does not kill anything, 0 means "see if we can send a signal to this process or not".
# Possible errors: No such process (== stale lock) or permission denied (not a stale lock)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a dot at end of sentence.

if err.errno == errno.ESRCH:
# ESRCH = no such process
return False
# Any other error (eg. permissions) mean that the process ID refers to a live process
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... means ... (and add a dot at the end).

@ThomasWaldmann
Copy link
Member

Note:

I reviewed the log levels and they seem ok as they are. Users should get notified if there are stale locks because this points to a problem in their setup.

Not sure though if we always want to read the "Enabled removal ..." msg from yes() all the time.

@enkore
Copy link
Contributor Author

enkore commented Nov 5, 2016

They're quick and distinct feedback that it worked / is set up correctly (on both ends). I guess we could move it to INFO level, moving the emission of the message from yes() to the call site.

I'll apply that and the other suggestions you made.

@enkore
Copy link
Contributor Author

enkore commented Nov 7, 2016

squash, merge?

Copy link
Member

@ThomasWaldmann ThomasWaldmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, squash some, merge.

verygreen and others added 7 commits November 7, 2016 21:54
If BORG_UNIQUE_HOSTNAME shell variable is set, stale locks
in both cache and repository are deleted.

Stale lock is defined as a lock that's originating from the same
hostname as us, and correspond to a pid that no longer exists.

This fixes borgbackup#562
@enkore enkore merged commit cf44954 into borgbackup:master Nov 7, 2016
@enkore enkore deleted the f/stale-lock-murderer branch November 7, 2016 20:58
@enkore
Copy link
Contributor Author

enkore commented Nov 7, 2016

@verygreen :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants