Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline module for zero-cost tail call among multiple functions (mutral recursion) #20

Open
SichangHe opened this issue May 25, 2024 · 5 comments

Comments

@SichangHe
Copy link

SichangHe commented May 25, 2024

It would be similar to PyO3/pyo3#3815 and look like this:

#[tailcall::tailcall_module]
pub mod oddity {
    #[tailcall::multi_tailcall]
    pub fn is_even(x: u64) -> bool {
        if x > 0 {
            is_odd(x as u32 - 1)
        } else {
            true
        }
    }

    #[tailcall::multi_tailcall]
    pub fn is_odd(y: u32) -> bool {
        if y > 0 {
            is_even(y as u64 - 1)
        } else {
            false
        }
    }
}

And expand to something like:

pub mod oddity {
    enum __TailCallState {
        is_even(u64),
        is_odd(u32),
    }
    #[inline(never)]
    fn __multi_tailcall(mut tailcall_state: __TailCallState) -> bool {
        'tailcall_loop: loop {
            return match tailcall_state {
                __TailCallState::is_even(x) => {
                    if x > 0 {
                        tailcall_state = __TailCallState::is_odd(x as u32 - 1);
                        continue 'tailcall_loop;
                    } else {
                        true
                    }
                }
                __TailCallState::is_odd(y) => {
                    if y > 0 {
                        tailcall_state = __TailCallState::is_even(y as u64 - 1);
                        continue 'tailcall_loop;
                    } else {
                        false
                    }
                }
            }
        }
    }

    #[inline(never)]
    pub fn is_even(x: u64) -> bool {
        let tailcall_state = __TailCallState::is_even(x);
        __multi_tailcall(tailcall_state)
    }

    #[inline(never)]
    pub fn is_odd(y: u32) -> bool {
        let tailcall_state = __TailCallState::is_odd(y);
        __multi_tailcall(tailcall_state)
    }
}
  1. The whole inline module's AST (in this case the oddity module) would be fed into the tailcall_module proc macro.
  2. The proc macro finds all the functions ${f_i}=$[is_even, is_odd] annotated with tailcall::multi_tailcall.
  3. Validate all the $f_i$ have the same return type. Users should make another module if they want functions with a different return type.
  4. Make an enum __TailCallState representing the arguments for each function.
  5. Create an internal __multi_tailcall function that takes a __TailCallState and runs the loop, inlining each tail-call function's content with modification just like how it is done currently for single recursion.
  6. Rewrite each $f_i$ to create a __TailCallState and simply call the internal function __multi_tailcall with that state.

What do you think about this idea?

@alecdotninja
Copy link
Owner

alecdotninja commented May 28, 2024

Yes, I do think that would work. I suggested something similar here: #3 (comment)

I need to push up my branch, but I am am privately working on a different solution based on stack allocated thunks which would not require a module-level annotation. The main problem that approach has is that since Rust does not yet support DSTs on the stack, a concervatively large size (and alignment) has to be chosen for the thunk data.

It would be interesting to benchmark both approaches. In the enum case, you are still going to allocate as much stack space as the largest variant, so I am not sure which will be better.

@SichangHe
Copy link
Author

Hey @alecdotninja, thanks for getting back.

I hope this idea above would be relatively easy to implement. Sorry for not seeing your prior similar suggestion.

Please don't worry about the stack space allocation. It is only done once for the whole recursion, and allocation/deallocation take only one instruction regardless of the size, if I understand correctly: see compiler output line 147. This allocation is not that big unless you are crazy about your function arguments, and probably lives in the L1 cache which is plenty fast.

@alecdotninja
Copy link
Owner

There is no need to be sorry, @SichangHe ! I appreciate your thought in this space.

I've created a new draft PR with my in progress work on the new Thunk-based runtime system.

If you have any ideas or thoughts on the current implementation, I would welcome any feedback there.

@SichangHe
Copy link
Author

Intuitively, #21 adds the overhead of copying the closure and it vtable pointer (16 bytes) doing vtable lookup (one memory load) every call.

@SichangHe
Copy link
Author

#21 (comment):

It's not easy to take a functions arguments and represent them as enum variants in Rust, especially when generics or a receiver are involved (and remember all references are generic in at least their lifetime). 😵

There is an easy way to work around this, but it duplicates code: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=c0f7b792bb666cff0e327fce10215d72
Just declare the enum as generic over all its variants, then inline all the function calls to create the enum using the argument tuples and let Rustc infer the types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants