Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Deleting stale locks #1246

Closed
wants to merge 1 commit into from

Conversation

verygreen
Copy link
Contributor

If BORG_UNIQ_HOSTNAME shell variable is set, stale locks
in both cache and repository are deleted.
I wanted to add a parameter to the repo config, but alas, we lock
it before the read (makes sense, really), so the only two options
are the shell variable and a command line switch.
Do we want the switch? What would the name be? Any good ways
to enable this permanently for those wishing so if the config is
off-limits?

Currently if the option is set - we would detect if a lock is of
known format and would delete it.
We only update roster if we are to do an exclusive lock. Should we care
to prune stale locks from roster if we are only getting a shared lock,
or can we leave it be until such a time exclusive locker comes around?

Also I guess I need feedback on other named things and also my
weak python-fu ;)

This fixes #562
eventually ;)

@@ -70,6 +70,10 @@ def __init__(self, repository, key, manifest, path=None, sync=True, do_files=Fal
self.key = key
self.manifest = manifest
self.path = path or os.path.join(get_cache_dir(), repository.id_str)
if os.environ.get('BORG_UNIQ_HOSTNAME'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uniq is the unix command, but i guess we rather spell it correctly as unique.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another thing that came to my mind:
if we create and store a uuid clientside, we could use that and not need the user to assert uniqueness of hostname.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever uuid you store, it becomes not uniq once you clone a vm or something, right? I thought that was the main reason for not making this the default, otherwise we can just switch to such uuids (generatedand stored in .cache/borg on startup if not there already) and be done with it .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the hostname is also the same after cloning, right?
but maybe it is remembered more easily to be changed than some borg uuid. hmm...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frequently enough hostname is NOT something that is cloned with a VM, because it's obtaned via a reverse DNS lookup, for example.

e.g. I have a single subdir that holds an OS instance that I export with NFS, I then run tens of virtual machines out of it (nfsroot), every one of those virtual machines has a different hostname because of different DNS record.

@PlasmaPower
Copy link
Contributor

Can we try to read /etc/machine-id first, and if that exists use it instead of the hostname and automatically enable this feature?

We could also try to read the mac of the default network interface, which would be cross-platform.

@ThomasWaldmann
Copy link
Member

Not sure if the MAC addr is a good idea. It is not always unique even for real hardware (although it should be) and also finding out and accessing the default interface might be rather platform specific.

Same goes for machine-id as it seems. It is there on systemd machines. But BSD, OS X, Windows?
So we would rather generate an own unique ID and store it on the client in the config, that works everywhere.

@verygreen
Copy link
Contributor Author

The real trick is how to determine if our local unique ID has become stale and is no longer unique (because a vm was cloned or whatever).

@verygreen
Copy link
Contributor Author

Also another vector is - if we store our own ID - where do we store it? in home dir? but that could be a network share and therefore visible from multiple nodes making it non-unique as well.

@PlasmaPower
Copy link
Contributor

PlasmaPower commented Jul 3, 2016

Not sure if the MAC addr is a good idea. It is not always unique even for real hardware (although it should be)

In which cases? I'm talking about the hardware MAC here, not the MAC actually used for networking (can be changed software level).

finding out and accessing the default interface might be rather platform specific

from uuid import getnode

https://docs.python.org/3.1/library/uuid.html

@verygreen
Copy link
Contributor Author

I guess mac (no matter if it was software overridden even) is a good proxy for a unique id, otherwise things would just not work on the network and we only care about all of this for networked access anyway.

@enkore
Copy link
Contributor

enkore commented Jul 3, 2016

Obtaining some unique or maybe generated ID depending on various intransparent factors (=uuid.getnode) is IMHO a bad match for this.

Sticking with the hostname is clear and easy to understand + verify.

@PlasmaPower
Copy link
Contributor

@enkore If two MACs on the same network are the same, then they've got bigger problems. Also, they'll be much more independent then hostnames. We can specify an override in the config if you'd like.

@verygreen
Copy link
Contributor Author

We cannot specify an override in the config because we lock before we have a chance of reading the repo config.
Unless there's some node-global config somewhere that I missed that could be read before opening a repository.

@enkore
Copy link
Contributor

enkore commented Jul 3, 2016

@enkore If two MACs on the same network are the same, then they've got bigger problems. Also, they'll be much more independent then hostnames. We can specify an override in the config if you'd like.

getnode's documentation is rather unclear on this. I read it as

  • may or may not return a hardware-address derived number of a random NIC in the system
  • may or may not generate a random number and return that instead
  • may or may not store that number somewhere in the FS ("The first time this runs, it may launch a separate program")

note added by @ThomasWaldmann: it stores the ID into a module global var after determining it. It is not persistent.

@PlasmaPower
Copy link
Contributor

PlasmaPower commented Jul 3, 2016

@enkore I'd assume the cache is in memory. Also, this is Python's recommended way of doing this specific task, do you have a better way? Hostname conflicts aren't that rare, especially when given the default Linux hostname is localhost, and that most cloud hosting services probably share hostnames across VMs.


# Just nuke the stale locks early on load
if self.ok_to_kill_zombie_locks:
for key in (SHARED, EXCLUSIVE):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is ... um, unelegant.
I wonder if there's an easier way to do these checks.

@ThomasWaldmann
Copy link
Member

The env var could be BORG_UUID=... so users can put there anything they like and borg would just use that string, if given, as unique id.

@verygreen
Copy link
Contributor Author

verygreen commented Jul 3, 2016

I feel like BORG_UUID could be a separate thing altogether with its own patch because the changes are going to be mostly orthogonal to this patch other than one extra check for the unique_hostname var.

return False

try:
# This may not work in Windows.
Copy link
Contributor

@enkore enkore Jul 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should work in principle; on Windows you don't really send signals, but open a process and then call various functions on the process handle. Python shouldn't be different; opening the process for a dead PID fails and should raise OSError. The error number will be different from ESRCH ...

Doesn't matter too much; this branch doesn't support Windows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have read that at first kill(pid, 0) used to kill a process in windows for good. And then that was changed and it started to return error at all times.
So this nees to be verified.

Copy link
Contributor

@enkore enkore Jul 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have to look at the implementation to see if/whether signal is checked for it's value. If it's not any signal number may terminate a process, yes...

static PyObject *
os_kill_impl(PyModuleDef *module, pid_t pid, Py_ssize_t signal)
#ifndef MS_WINDOWS
#else /* !MS_WINDOWS */
{
    PyObject *result;
    DWORD sig = (DWORD)signal;
    DWORD err;
    HANDLE handle;

    /* Console processes which share a common console can be sent CTRL+C or
       CTRL+BREAK events, provided they handle said events. */
    if (sig == CTRL_C_EVENT || sig == CTRL_BREAK_EVENT) {
        if (GenerateConsoleCtrlEvent(sig, (DWORD)pid) == 0) {
    }

    /* If the signal is outside of what GenerateConsoleCtrlEvent can use,
       attempt to open and terminate the process. */
    handle = OpenProcess(PROCESS_ALL_ACCESS, FALSE, (DWORD)pid);
    if (handle == NULL) {
        err = GetLastError();
        return PyErr_SetFromWindowsErr(err);
    }

    if (TerminateProcess(handle, sig) == 0) {
        err = GetLastError();
        result = PyErr_SetFromWindowsErr(err);
    } else {
        Py_INCREF(Py_None);
        result = Py_None;
    }

    CloseHandle(handle);
    return result;
}
#endif /* !MS_WINDOWS */
#endif /* HAVE_KILL */

So this would terminate a live process with exit code = 0 for signal = 0.

Anyway, that's something for the windows branch to handle...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since windows port is in the works, I feel like potentially questionable Windows-related bits should be at least highlighted to be checked as the port is progressing.

Copy link
Contributor

@enkore enkore Jul 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, leave the comments in :)

@enkore
Copy link
Contributor

enkore commented Jul 3, 2016

Logic-wise this looks sound to me. A bit of "nitpicking" was already mentioned; I would add that using both "zombie lock" and "stale lock" to refer to the same thing is confusing, use only "stale lock".

The unique-host-thingy is as @verygreen says orthogonal to this patch and doesn't seem to be end-of-discussion'd yet. So I'd say we move that to another ticket.

@verygreen
Copy link
Contributor Author

It's stale lock on the outside, zombie lock on the inside, if you noticed.
But I can change that - no problems.


def load(self):
try:
with open(self.path) as f:
data = json.load(f)

# Just nuke the stale locks early on load
if self.ok_to_kill_zombie_locks:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, this here means that we won't do any stale lock detection if BORG_UNIQUE_HOSTNAME is not set, so no message would be printed hinting about this possibility, unlike the other check which is a bit inconsistent.
Do we want this message here or is it ok to omit it and let people learn through the documentation, I wonder?

@enkore
Copy link
Contributor

enkore commented Jul 3, 2016

"what to do with the hostname field?"-discussion -> #1253

If BORG_UNIQUE_HOSTNAME shell variable is set, stale locks
in both cache and repository are deleted.

Stale lock is defined as a lock that's originating from the same
hostname as us, and correspond to a pid that no longer exists.

This fixes borgbackup#562
thread_id = 0
return _hostname, _pid, thread_id


def check_lock_stale(host, pid, thread):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_lock_stale

@ThomasWaldmann
Copy link
Member

BTW, just FYI: i did another round of researching 3rd party lockfile libs. There is still nothing great out there, everything I found has obvious issues.

@ThomasWaldmann
Copy link
Member

BTW, master or 1.0-maint?

@verygreen
Copy link
Contributor Author

we need both, I think.
But I want a master version first and then we can backport.

@enkore
Copy link
Contributor

enkore commented Jul 21, 2016

I'm thinking master first, maybe see it at least in a RC in action before backporting?

@enkore
Copy link
Contributor

enkore commented Oct 2, 2016

Superseded by #1674

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

better locking
4 participants