Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Global and local pinning #31

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Global and local pinning #31

wants to merge 2 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Apr 7, 2018

Introduce the concept of global pinning, which allows one to pin the current thread without going through a thread-local handle.

In addition to that, we'll slightly tweak the general interface of crossbeam-epoch in order to fix several reported problems with it.

Rendered

I'm feeling torn between all the possible alternatives and don't expect the RFC to land in the original form. It's hard to choose the right solution here, so help me out. :)

Related issues:

@Amanieu This RFC is the answer (a belated one, sorry!) to your questions about anonymous pinning.

@Amanieu
Copy link

Amanieu commented Apr 7, 2018

I'm pretty happy with this RFC. My use case is a bit complicated because it involves threads that can potentially be killed at any point. I've included a rough code example which shows how my code works:

// There is one of these for each thread. They are tracked in a global list of
// all threads. ThreadData must be Send since it may be dropped in a different
// thread when a thread is killed.
struct ThreadData {
    handle: Handle,
    guard: Option<Guard>,
}

// This what the main loop of a thread looks like. The key fact to remember here
// is that threads can be asynchronously killed at any point, except when the
// ThreadData is locked.
fn run_thread(thread_data: &Mutex<ThreadData>) {
    let guard;

    {
        // Lock the thread. This prevents our thread from getting killed.
        let lock = thread_data.lock();

        // Pin the thread by setting thread_data.guard to Some(Guard).
        let guard = lock.guard.get_or_insert_with(|| lock.handle.pin());

        // We cheat a bit here by re-binding the lifetime of the guard so that
        // it outlives the lock. This is safe because we know that one of the
        // following occurs:
        // - Our thread is not killed, and we don't touch thread_data.guard
        //   anywhere except at the end of this function.
        // - If our thread is killed then our killer will drop our ThreadData
        //   (which it must lock before killing us). This will unpin the thread
        //   and avoid any memory leaks.
        guard = unsafe { &*(guard as *mut Guard) };

        // The thread stays pinned even after we release the lock.
        drop(lock);
    }

    // The main body of the thread goes here. We can safely access lock-free
    // data structures using our Guard, while still allowing our thread to be
    // killed at any point.

    // Unpin the guard after we are done using it.
    thread_data.lock().guard = None;
}

@Amanieu
Copy link

Amanieu commented Apr 7, 2018

I don't care much for global pinning for my use case. There is one piece of code where I do need it but I've just used a Mutex<Handle> as a workaround (the mutex is needed for other reasons anyways, so performance is not an issue).

@Amanieu
Copy link

Amanieu commented Apr 8, 2018

I would add another alternative: only allow a single Guard to exist per Handle at any point. This allows Handle to be safely sent to another thread since you can't get another Guard out of it while the previous Guard is still alive.

This would be ideal for my use case since I only have a single Guard in use at any one point, however I understand that it may be too limiting as a general-purpose solution.

}
```

Now, the `Global::pin()` method might be implemented like this:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean Collector::pin()?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, Collector::pin() would simply call Global::pin(), just like Handle::pin() now calls Local::pin(). It's the same thing :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Oh, fsr I hadn't realized Global is also a type and not only a variant.)


# Alternatives

### Add safe `Guard::pin_with()` and remove `Guard::clone()`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean Handle::pin_with()?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thank you! Fixed.

@ghost
Copy link
Author

ghost commented Apr 17, 2018

I would add another alternative: only allow a single Guard to exist per Handle at any point. This allows Handle to be safely sent to another thread since you can't get another Guard out of it while the previous Guard is still alive.

If we add a lifetime to Guard, then we could do something like this:

struct Handle { ... }

struct Guard<'a> { ...  }

impl Handle {
    fn pin(&self) -> Guard<'_> { ... }

    fn pin_mut(&mut self) -> Guard<'_> { ... }
}

This way you can obtain a unique guard if you want (pin_mut), but don't have to (pin). Is this what you had in mind?

@Amanieu
Copy link

Amanieu commented Apr 18, 2018

@stjepang That's doesn't work for me since I need to keep both the Guard and Handle in thread-local storage, which means that both must be 'static.

@ghost
Copy link
Author

ghost commented Apr 18, 2018

@Amanieu I see.

So in that case would you prefer something similar to raw_lock/raw_unlock from parking_lot, except in our case the methods would be named Handle::raw_pin and Handle::raw_unpin (the latter one is unsafe).

What kind of interface would suit you best?

@Amanieu
Copy link

Amanieu commented Apr 18, 2018

Here is the interface that I have in mind:

impl Handle {
    // This fails if a Guard already exists for this Handle
    fn pin(&self) -> Option<Guard>;
}

Guard will no longer be Clone, which together with the above change, ensures that there is at most one Guard per Handle at any time. This will allow us to make both Handle: Send and Guard: Send.

@Amanieu
Copy link

Amanieu commented Apr 18, 2018

Only allowing one guard at a time might be too restrictive for the global collector since it doesn't allow nesting, so we can also provide this API only for the global collector (using the hidden per-thread handle):

thread_local! {
    static HANDLE: (Handle, Option<Guard>) = (COLLECTOR.register(), None);
}

pub fn epoch::pin_with<F: FnOnce(&Guard)>(f: F);

This will return a reference to the currently active thread-local Guard if there is one, or pin the epoch if there is no currently active guard.

@ghost
Copy link
Author

ghost commented Apr 18, 2018

@Amanieu That seems a bit too restrictive - in particular, it would make implementing iterators that own guards impossible, wouldn't it?

However, going the other way around and implementing the one-guard-per-handle interface by wrapping the current one would be straightforward (except for implementing MyGuard: Send):

struct MyHandle {
    handle: Handle,
    pinned: Cell<bool>,
}

impl MyHandle {
    fn pin(&self) -> Option<MyGuard> {
        if self.pinned.get() {
            None
        } else {
            Some(self.handle.pin())
        }
    }
}

struct MyGuard {
    guard: Guard,
}

unsafe impl Send for MyHandle {}

Getting MyGuard: Send right is a bit tricky, though...

Maybe we should expose a few low-level methods for pinning without creating guards (like raw_lock and raw_unlock):

impl Handle {
    unsafe fn pin_unchecked();
    unsafe fn unpin_unchecked();
    unsafe fn defer_unchecked<F: FnOnce()>(f: F);
}

Then we can implement your interface like this:

struct MyHandle {
    handle: Handle,
    pinned: AtomicBool,
}

impl MyHandle {
    fn pin(&self) -> Option<MyGuard> {
        if self.pinned.swap(true, Relaxed) {
            None
        } else {
            unsafe { self.pin_unchecked(); }
            Some(MyGuard { handle: self.clone() }) // atomically increments the refcount
        }
    }
}

struct MyGuard {
    handle: Handle,
}

impl Drop for MyGuard {
    fn drop(&mut self) {
        unsafe { self.handle.pin_unchecked(); }
    }
}

unsafe impl Send for MyHandle {}
unsafe impl Send for MyGuard {}

@Amanieu
Copy link

Amanieu commented Apr 18, 2018

The problem with MyHandle is that it needs to give a &Guard because that is what SkipList expects. However this is unsound because the Guard can be cloned. This breaks the invariant of only having one guard per handle, which MyHandle requires to be able to be safely sent to another thread (the Guard must always be sent along with the Handle).

After consideration, I am happy with your proposed API except that I would like Guard::clone to be removed.

@Amanieu
Copy link

Amanieu commented Apr 19, 2018

For reference, here's my implementation of SendHandle:

struct SendHandle {
    handle: Handle,
    guard: Option<Guard>,
}
unsafe impl Send for SendHandle {}
impl SendHandle {
    pub fn new(collector: &Collector) -> SendHandle {
        SendHandle {
            handle: collector.register(),
            guard: None,
        }
    }
    pub fn pin(&mut self) -> GuardRef {
        let handle = &mut self.handle;
        self.guard.get_or_insert_with(|| handle.pin());
        GuardRef(&mut self.guard)
    }
    pub fn get(&mut self) -> Option<&Guard> {
        self.guard.as_ref()
    }
    pub fn unpin(&mut self) {
        self.guard = None;
    }
}
struct GuardRef<'a>(&'a mut Option<Guard>);
impl<'a> GuardRef<'a> {
    pub fn forget(self) {
        mem::forget(self);
    }
}
impl<'a> core::ops::Deref for GuardRef<'a> {
    type Target = Guard;
    fn deref(&self) -> &Guard {
        self.0.as_ref().unwrap()
    }
}
impl<'a> Drop for GuardRef<'a> {
    fn drop(&mut self) {
        *self.0 = None;
    }
}

Two points of note:

  • It only exposes a &Guard, not a &mut Guard. The latter is unsound because you could "steal" the guard with mem::replace.
  • This implementation is unsound if Guard is clonable because you could send the SendHandle to thread B while still having a guard in thread A, which can lead to data races.

Copy link
Member

@Vtec234 Vtec234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some questions, but in general looks very promising!


Consider this. What happens if we read a stale value of `self.epoch` and increment a counter just after the epoch advanced one step forward, and execute the fence just before the epoch is advanced one step forward again? That means we're effectively two epochs behind, while other threads might think we're zero epochs behind.

I think the problem is easily fixable by changing when a bag becomes expired. Rather than defining a bag is expired if it's at least 2 epochs old, we should say it's expired if it's at least 3 epochs old. That's it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please forgive my ignorance, but as I understand, the benefit of using parity over a single counter is that with parity the epoch can still be advanced by 1 when there is a global pin present, right? If a larger counter is used, say mod 4, and then the epoch age required for destruction extended to 5, can the epoch advance by up to 3 with a global pin present? I'm not really proposing anything like that, but I'm curious about the relationship between the number of counters and the minimal free-able epoch age.


Another alternative is to add a lifetime to `Guard` so that it becomes `Guard<'a>`. The lifetime `'a` borrows the `Handle` and therefore prevents it from being moved at all, which means it cannot be sent to another thread while such a guard exists.

This way we don't have to remove `Handle::clone()`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this alternative, but if we do go with it, I suggest to remove Handle::clone() anyway, since as you mentioned it is not clear what the method should do.


The main drawback of this solution is that the lifetime might be at times annoying, but note that the lifetime can be simply elided in most real-world situations.

### Two kinds of handles: `SharedHandle` and `LocalHandle`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any drawbacks to this alternative besides it being somewhat more complex to implement and use?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, that's the only drawback.

@ghost
Copy link
Author

ghost commented Apr 26, 2018

@Amanieu

After consideration, I am happy with your proposed API except that I would like Guard::clone to be removed.

That seems reasonable. One thing to keep in mind is that this precludes us from ever adding method Guard::handle() returning an Option<&Handle> (because one could just do guard.handle().pin() to clone it). But I suppose it's worth making that tradeoff.

@Vtec234

Please forgive my ignorance, but as I understand, the benefit of using parity over a single counter is that with parity the epoch can still be advanced by 1 when there is a global pin present, right?

Yes, exactly.

If a larger counter is used, say mod 4, and then the epoch age required for destruction extended to 5, can the epoch advance by up to 3 with a global pin present?

Actually, it doesn't matter whether we have 2, or 4, or even 100 counters. Here goes a lengthy explanation... Sorry in advance - it'll require some mental gymnastics. :)

Consider what happens in our current epoch tracking mechanism. In order to pin the current thread, we (1) load the global epoch, (2) store it into our thread-local storage, and (3) execute a fence. Since 1 << 63 is so large, we can pretty safely assume that that the global epoch won't overflow between steps (2) and (3). That is just extremely unlikely to happen. :) One more thing: recall that our basic invariant is that the global epoch can advance at most once while the thread is pinned (i.e. between step (3) and the final call to unpin).

But what if it was likely for the global epoch to advance by 1 << 63 (on 64-bit platforms) between steps (2) and (3)? Let's say that in step (1) we load global_epoch=7. Now assume that after we store it into thread-local storage in step (2) and execute the fence in step (3), the global epoch almost wraps around and becomes global_epoch=6. Concurrently, another thread is executing function try_advance - it managed to iterate the whole list between our steps (2) and (3), and is now almost finished - the only thing left to do is to increment the global epoch. After our step (3) has finished, that thread sets global_epoch=8. This means that the global epoch can in fact advance twice while we're pinned (first from 6 to 7, and then from 7 to 8).

If we assume it's possible for the global epoch to advance by 1 << 63 between steps (2) and (3), then we have to change the expiration threshold from 2 to 3. The reason why we need to do that if we introduce global pinning with two counters is simply because it's very possible for global epoch to wrap around modulo 2 (in contrast to wrapping modulo 1 << 63).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants