Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC isn't closure friendly. #50

Open
kevincox opened this issue Oct 31, 2016 · 17 comments
Open

GC isn't closure friendly. #50

kevincox opened this issue Oct 31, 2016 · 17 comments
Labels

Comments

@kevincox
Copy link
Contributor

This might be very difficult but I think it is worth considering how the GC can work with closures.

Currently moving a Gc into a closure causes problems because it can't be found in the trace. Also it will probably cause a panic on cleanup if the closure is stored in a GCed object because it will be rooted which will cause assert!(finalizer_safe()) to trigger.

I don't know if it would be possible to add support for tracing closures, I'm pretty sure this isn't possible in current Rust but maybe a compiler plugin could accomplish it. That being said some "tracing" traits aren't implemented for closure (Debug) but that might be for another reason.

There are other problems as well. Due to moving being so transparent there is no way to mark a "field" as #[unsafe_ignore_trace] if you want to capture a non-Gc containing object as well. This seems like it would make it impossible to choose the right "default".

That being said it would be very nice to be able to close over values and have them automatically traced if something could be worked out.

@apasel422
Copy link
Contributor

Could you post a code example that causes problems this way?

@Manishearth
Copy link
Owner

GCd objects don't get unrooted when put inside closures. They also won't get unrooted if the closure is put inside another GCd object, and you can cause a leak if you make a cycle here. The worse case here is a leak. You don't need to trace down closures for safety because all their contents have already been rooted.

A Gc<T> only unroots when it is placed transitively inside another Gc<T>. The operation of moving a Gc<T> into a closure doesn't do this, so it doesn't unroot.

@mystor
Copy link
Collaborator

mystor commented Oct 31, 2016

It would be awesome if we could traverse closure implementations, but I don't think that that will be possible until the rust language itself adds tracing infrastructure, as there is no place in the pipeline where we can hook in right now which gives us the information we would need (what variables are captured by the closure), but also allows us to implement traits.

@kevincox
Copy link
Contributor Author

Exactly what @Manishearth said. You stay rooted inside closures.

However this can do more then just cause cycles. If the closure is inside another Gc object and get's Dropped it will cause a panic because the captured reference will be Dropped as well, causing the assertion I mentioned to fire.

@mystor Is there any work in the language for tracing? I know that Servo did something similar for their GC so maybe I'll try to look into that to see if they are using any tricks. But yes, I think you're right in that we will need to wait for better tracing support.

@Manishearth
Copy link
Owner

http://manishearth.github.io/blog/2016/08/18/gc-support-in-rust-api-design/

@izgzhen
Copy link
Contributor

izgzhen commented Nov 1, 2016

Do we have a RFC in rust-lang side related to this now?

SkylerLipthay added a commit to SkylerLipthay/bean that referenced this issue Jul 14, 2019
Unfortunately, it seems like I am walled.

Manishearth/rust-gc#50

I thought the garbage collection story in Rust was more mature than it
actually is, and that I could ride the coattails of those who know how
the hell garbage collection actually works. Either I switch gears and
learn how to situate myself in garbage, or I call this project a dead
end because I'm running out of steam. :)
@andersk
Copy link
Collaborator

andersk commented May 4, 2021

Currently moving a Gc into a closure causes problems because it can't be found in the trace. Also it will probably cause a panic on cleanup if the closure is stored in a GCed object because it will be rooted which will cause assert!(finalizer_safe()) to trigger.

These issues are just a consequence of the incorrect #[unsafe_ignore_trace] that must have been used in order to store the closure in a Gc, right? An incorrect #[unsafe_ignore_trace] can cause these issues without any closures involved, too.

It’s certainly a limitation that there seems to be no way to correctly implement Trace for a struct with a closure containing a Gc. But at least one should not accidentally get an incorrect implementation without a visible “unsafe”.

@Manishearth
Copy link
Owner

Correct, this is still safe.

My recommendation woudl be having a second trait that is : Trace that you use trait objects of

@kevincox
Copy link
Contributor Author

kevincox commented May 4, 2021

These issues are just a consequence of the incorrect #[unsafe_ignore_trace]

Yes. But there is currently no way to trace a closure. Maybe a better name for this issue is "Allow tracing through closures."

I agree that the user should be aware of this to some degree. However it is quite easy to start with a closure that doesn't capture any Gc references but then later you add one. When you start capturing this reference you don't even necessarily notice (maybe the type Gc isn't spelt out in the nearby code) and the unsafe_ignore_trace is in a different file. So while the user does have to opt into this is is very likely that the user makes a mistake later and ends up in this situation because of how implicit closure captures are.

@andersk
Copy link
Collaborator

andersk commented May 4, 2021

Since a closure type cannot be named, it can only be contained in a generic struct, which is unsafe if there exists any unsafe instantiation of it. This is unsafe because it’s possible to misuse unsafely:

#[derive(Finalize, Trace)]
struct Closure<F: Fn()>(#[unsafe_ignore_trace] F);

so regardless of whether it’s actually misused, the author is responsible for ensuring that it’s not exposed as part of a safe API. It shares all the same considerations with a function like this:

fn convert<T, U>(value: &T) -> &U {
    unsafe { std::mem::transmute(value) }
}

which could be used correctly, but even a correct use could become incorrect after some nonlocal code change. Obviously, the author of the convert function is responsible for marking it unsafe and documenting its safety invariant. Similarly, the author of struct Closure is responsible for making its constructor private, wrapping it in a factory function that’s marked unsafe, and documenting its safety invariant. The call site in both cases is then required to use an unsafe block, whose presence locally alerts programmers that there are safety invariants to be maintained when making changes.

Alternatively, depending on the use case, it might be possible to restrict the struct such that it can only be used safely:

#[derive(Finalize, Trace)]
struct Closure<F: Copy + Fn()>(#[unsafe_ignore_trace] F);

@gagbo
Copy link

gagbo commented Sep 15, 2022

Hello,

I'm trying to implement a scheme interpreter using this crate and the Copy restriction doesn't work for me. Currently I'm trying to bundle in the same structure the closure and all the captures that happened in the closure, hoping that the extra fields would act as proper roots for the closure:

type ManagedEnv = Gc<GcCell<Env>>;
type ManagedValue = Gc<GcCell<Value>>;
trait Analyze: Fn(ManagedEnv) -> ManagedValue {}

#[derive(Trace, Finalize)]
pub struct Analyzed {
    /// The manually tracked list of roots that are used/captured by the closure
    ///
    /// This list does not contain all the roots that are indirectly captured by
    /// the thunks included in the analysis, since they are captured in [thunks]
    /// already.
    roots: Vec<R7Result>,

    /// The manually tracked list of thunks that are used/captured by the closure
    thunks: Vec<Gc<Analyzed>>,

    /// It is safe to ignore tracing because of the roots and the thunks are
    /// included in the struct, so those other fields will be responsible of
    /// tracing and it is okay to lose tracing information from the closure
    /// capture that happens at compilation time.
    ///
    /// see https://github.com/Manishearth/rust-gc/issues/50 for reasons why
    /// closures cannot automagically root the Gc'd values that get moved into
    /// it.
    #[unsafe_ignore_trace]
    fun: Box<dyn Analyze>,
}

Of course, as it's my first implementation, I'm still encountering bugs, but I'd like to know if that seems obviously wrong of if that's something that could work ? I guess the worst case scenario is when the closure is creating some ManagedValues in the body, but I thought that those values would not matter as they either are temporary (local to the closure execution), or returned outside of the closure, where they could be properly tracked.

@Manishearth
Copy link
Owner

@gagobo Wich Copy restriction? I don't think we have one.

@Manishearth
Copy link
Owner

The following code compiles just fine for me:

use gc::*;

type ManagedEnv = Gc<GcCell<Env>>;
type ManagedValue = Gc<GcCell<Value>>;
trait Analyze: Fn(ManagedEnv) -> ManagedValue {}

#[derive(Trace, Finalize)]
struct Env;

#[derive(Trace, Finalize)]
struct Value;
#[derive(Trace, Finalize)]
struct R7Result;
#[derive(Trace, Finalize)]
pub struct Analyzed {
    /// The manually tracked list of roots that are used/captured by the closure
    ///
    /// This list does not contain all the roots that are indirectly captured by
    /// the thunks included in the analysis, since they are captured in [thunks]
    /// already.
    roots: Vec<R7Result>,

    /// The manually tracked list of thunks that are used/captured by the closure
    thunks: Vec<Gc<Analyzed>>,

    /// It is safe to ignore tracing because of the roots and the thunks are
    /// included in the struct, so those other fields will be responsible of
    /// tracing and it is okay to lose tracing information from the closure
    /// capture that happens at compilation time.
    ///
    /// see https://github.com/Manishearth/rust-gc/issues/50 for reasons why
    /// closures cannot automagically root the Gc'd values that get moved into
    /// it.
    #[unsafe_ignore_trace]
    fun: Box<dyn Analyze>,
}

@gagbo
Copy link

gagbo commented Sep 15, 2022

I meant the Copy restriction in the Closure example earlier in the conversation.

If it looks fine to you then all is good (and you can ignore the rest of the post, it's less "issue relevant" but more "newbie conversation")

I implemented the code (mostly) as you put just above, and it compiles fine but I started triggering finalizer_safe panics (when calling two functions in a row in my scheme repl, so very not minimal example) and I wasn't sure if it was because my design is unsound to begin with, or if it's "just me" mis-representing the traceable fields in my application code when building the Analyzed instances. Even after reading your blog post the exact semantics of Trace and (especially) Finalize weren't super clear to me, so adding the closure capture management on top of those put me in total darkness, but that's how we learn :)

If I ever manage to build a MWE that can trigger Finalize panics with this struct I will add it here to show what I mean.

And thanks for making this crate, it might not look like it right now, but it does save me a lot of headaches

@Manishearth
Copy link
Owner

Yeah so like I said the way to do closures soundly is to have trait Closure: Trace and use something like Gc<dyn Closure> or whatever (you may need some further plumbing to make that work). But you basically have to manually define closure capture types and implement a trait on them that has a .call() method.

@gagbo
Copy link

gagbo commented Sep 17, 2022

I see what you mean.

For the sake of completeness, and not being a denvercoder9, this is a stub of the design I'm going with, where indeed the builtin closure types just have to be banished to avoid issues.

use gc::*

#[derive(Trace, Finalize)]
struct Env{}
#[derive(Trace, Finalize)]
struct Val{}
type ManagedVal = Gc<GcCell<Val>>;
type ManagedEnv = Gc<GcCell<Env>>;

// I don't want to deal with generic traits around arguments and return types,
// I'm only interested in one type of closure, therefore the trait is very concrete

pub trait CloseOnEnv: Trace {
    fn call(&self, env: ManagedEnv) -> ManagedVal;
}

#[derive(Trace, Finalize)]
pub struct Closure {
    // Maybe the box and extra type is unnecessary,
    // I only keep them for the easy alias
    inner: Box<dyn CloseOnEnv>,
}
pub type ManagedClosure = Gc<Closure>;

impl CloseOnEnv for Closure {
    fn call(&self, env: ManagedEnv) -> ManagedVal {
        self.inner.call(env)
    }
}

impl Closure {
    pub fn managed<T: CloseOnEnv + 'static>(inner: T) -> ManagedClosure {
        Gc::new(Self {
            inner: Box::new(inner),
        })
    }
}

// Example of usage:

// No capture needed
impl CloseOnEnv for ManagedVal {
    fn call(&self, _env: ManagedEnv) -> ManagedVal {
        self.clone()
    }
}

// Capture needed, making a combinator that executes a sequence of closures passed as arguments
fn sequence<T>(analyses: T) -> ManagedClosure
where
    T: Iterator<Item = ManagedClosure>,
{
    // A local struct with capture semantics that is going to be returned, and
    // that has proper semantics
    #[derive(Trace, Finalize)]
    struct Analyzed {
        parts: Vec<ManagedClosure>,
    }

    impl CloseOnEnv for Analyzed {
        fn call(&self, env: ManagedEnv) -> ManagedVal {
            let mut folded: ManagedVal = ManagedVal::new_null();
            for analysed in self.parts.iter() {
                folded = analysed.call(env.clone())?;
            }
            Ok(folded)
        }
    }

    Closure::managed(Analyzed {
        parts: analyses.collect(),
    })
}

Thanks for your guidance and your time

@jedel1043
Copy link

(Could open a new issue for this, but I think it's too related to the same underlying problem of inspecting captured variables).

Unfortunately, the workaround proposed falls apart for async blocks and async functions, because every await point captures the variables used after the await.

On another note, is there any RFC related to inspecting the captured variables of closures/futures?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants