Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use OnceCell instead of Lazy to enable inlining #10

Open
Veetaha opened this issue Jan 16, 2022 · 13 comments
Open

Use OnceCell instead of Lazy to enable inlining #10

Veetaha opened this issue Jan 16, 2022 · 13 comments

Comments

@Veetaha
Copy link

Veetaha commented Jan 16, 2022

Lazy and OnceCell types are basically the same in their semantics except for one thing.
OnceCell is constructed with an empty constructor (OnceCell::new()), and the closure that initializes the OnceCell is passed at the call site via cell.get_or_init(/* closure */). This allows for get_or_init() call to be generic over the accepted closure type, and thus rustc is able to better optimize it by inlining the closure.

However, Lazy type is designed to be more friendly at the definition site where we specify the closure that initializes the value right at the construction of the Lazy type via Lazy::new(/* closure */). The problem here is that the closure's type gets type erased here. The closure is coerced to a function pointer. If you take a look at the definition of the Lazy type, you will see that it is defined as

pub struct Lazy<T, F = fn() -> T> { /* */ } 

See that the F generic parameter defaults to a function pointer. That's why, due to this coercion to a function pointer, rustc has less type information and thus is more likely not to inline the closure invocation.

I suggest switching the type that regex!() returns from Lazy<T> to once_cell::sync::OnceCell<T>.
Unfortunately, it is a breaking change for the users that depend on the type of the value returned from regex!() macro to be &Lazy<Regex>, so it means a major update if this is implemented. I wish that the regex!() macro returned &'static Regex (via Lazy::deref impl) from the start, otherwise it would be a patch update.

@Canop
Copy link
Owner

Canop commented Jan 17, 2022

Do you have indication that inlining the initializer would notably improve performances ? Did you build a test and benchmark, maybe ?

@Veetaha
Copy link
Author

Veetaha commented Jan 17, 2022

I haven't. In fact, any performance penalty here is generally very neglectable due to the fact that both types don't invoke the closure when they are already initialized, so this is only the difference on the first invocation I think. In general, if there is the possibility to do so I always prefer OnceCell, and in the case of this crate there is indeed such a possibility, both render the same results:

fn once_cell_regex() -> &'static Regex {
    static ONCE_CELL: OnceCell<Regex> = OnceCell::new();
    ONCE_CELL.get_or_init(|| Regex::new(CODE).unwrap())
}

fn lazy_regex() -> &'static Regex {
    static LAZY: Lazy<Regex> = Lazy::new(|| Regex::new(CODE).unwrap());
    &LAZY
}

I would like @matklad to comment on this issue, interested to hear his opinion

@matklad
Copy link

matklad commented Jan 17, 2022

The proper way to do this I think is to return

pub struct LazyRegex<'a> {
  re: &'a str,
  cell: once_cell::Sync::OneceCell<regex::Regex>,
}

impl ops::Deref for LazyRegex { 
  type Target = regex::Regex 
  fn deref(&self) { /* calls get_or_inint */}
}

this:

  • indeed makes it "statically obvious" to compiler how the regex is going to be constructed
  • makes once_cell a private impl detail, so you'd be able to switch to std::lazy::OnceCell once that is stable (which ... is quite unclear when actually happens) without making a breaking change
  • have some interractions with size which I am not sure about:
    • we don't store a pointer to fn, which compiler can't optimize out
    • we store an &'a str which is twice as large

Ok, scratch all that, I think the last bullet point is the most important one, and it makes me think that the current design with returning Lazy<Regex> is actually the correct one.

We have two choices here:

  • either each lazy regex is a separate type, where we can avoid any indirect calls
  • or there's some common type for all lazy regexes, where we do have some dynamism/runtime indirection via storing either an fn pointer or a pointer to original string

I don't think the first one is what we want, and, for the second one, between storing &str and storing fn, the second one
seems preferable, as it wastes less space. And if you are going to store fn anyway, then just exposing Lazy seems preferable to a custom type.

@Veetaha
Copy link
Author

Veetaha commented Jan 17, 2022

To clarify, I am talking specifically about regex!() macro. It currently returns &'static Lazy<Regex>. I don't think it makes sense to add any wrappers here, it may just return &'static Regex

@matklad
Copy link

matklad commented Jan 17, 2022

Oh... That one certainly should work like this:

https://docs.rs/once_cell/latest/once_cell/#lazily-compiled-regex

@Veetaha
Copy link
Author

Veetaha commented Jan 17, 2022

Wow, didn't know it was directly in once_cell docs =)

@Canop
Copy link
Owner

Canop commented Jan 18, 2022

(me neither)

I don't think this breaking change would be a real problem for lazy_regex users but I'm not sure I see real benefits either.

@Veetaha
Copy link
Author

Veetaha commented Jan 18, 2022

Well, the benefit is at least in future-proofing the API. regex!() shouldn't expose its internal implementation detail of what type it uses under the hood for static storage. I think we would need to do this breaking change either way when the OnceCell/Lazy types are upstreamed to stdlib. But if we do it today, we first of all hide this impl. detail, and second of all make it a patch update for regex!() to transition to std::lazy types. Though on the other hand lazy_regex!() would still require this library to make a major release when changing its return type to std::lazy::Lazy, hm....

@Canop
Copy link
Owner

Canop commented Jan 18, 2022

Such breaking change with a major version is perfectly OK IMO. After all this library is targeted at people who have more than just a basic level in Rust.

But it's better to ship it with more than this, so I'll probably sit with this issue open until either I have other features to add or once_cell enters the std lib.

@Veetaha
Copy link
Author

Veetaha commented Jan 18, 2022

Seems reasonable

@Veetaha
Copy link
Author

Veetaha commented Jun 4, 2022

One limitation with this that I've suddenly found is that autoderef coercion is not supported in constant expressions used at statics initialization, so we can not make something like this:

static RETRIES: &[&Regex] = &[
    regex!("connection reset by peer"i),
    regex!("connection closed by remote host"i),
];

It will result in compile error:

cannot perform deref coercion on `lazy_regex::Lazy<lazy_regex::Regex>` in statics

I wonder if we could work around this by providing a const fn getter to Lazy/OnceCell in once_cell crate?

const fn const_get<T>(this: Lazy<T>/OnceCell<T>) -> &T

cc @matklad

@msrd0
Copy link
Contributor

msrd0 commented Sep 6, 2023

Just FYI, OnceCell has been included in the Rust std lib a few month ago: https://blog.rust-lang.org/2023/06/01/Rust-1.70.0.html#oncecell-and-oncelock

@Canop
Copy link
Owner

Canop commented Sep 7, 2023

@msrd0 I'm waiting for LazyCell to stabilize. At this point I'll cleanly move to std.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants