New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allocate interned strings in a contigious buffer #16
Comments
Thanks for the suggestion! I also already made some thoughts how we can reduce the number of allocations. I was also experimenting with integrating https://crates.io/crates/bucket_vec with which prevents re-allocations compared with a normal vector upon Have you looked into the |
Yup, the above version is basically an inline version of I haven't seen Quick and non-scientific benchmarking shows that the above code (with
|
Thanks for the benchmarks. So 20-30% faster is "slightly outperforms"? :P Is the I really like your proposal and if we can make sure that these changes actually improve performance for most use cases I am more than willing to either implement or merge a PR for it. You are right about the |
use std::collections::HashMap;
#[derive(Default)]
pub struct Interner {
map: HashMap<String, u32>,
vec: Vec<String>,
}
impl Interner {
pub fn intern(&mut self, name: &str) -> u32 {
if let Some(&idx) = self.map.get(name) {
return idx;
}
let idx = self.map.len() as u32;
self.map.insert(name.to_owned(), idx);
self.vec.push(name.to_owned());
idx
}
pub fn lookup(&self, id: u32) -> &str {
self.vec[id as usize].as_str()
}
} I haven't actually tried My interners are from a hobby project of mine, which I am too embarrassed to make public rn :) I won't be submitting any PRs (but will probably write a blog about the technique), so feel free to run with the ideas! |
Published the impl in a blog :) One improvement I've noticed is that |
Nice read! Some comments of mine:
Also thank you for trying out the trie approach. I wanted to experiment with that myself. :) |
@matklad Today I implemented a rough skeleton of your proposal into The results are inconclusive: The summary is that I have to dig deeper into this but it seems that there are some significant trade-offs.
|
This has been implemented in the new Thanks @matklad for proposing this implementation! |
This is just a crazy idea, which might not actually work out in practice, but...
At the moment, the interner does roughly as many allocations, as there are interned strings, as each string is a separate box. A more efficient approach would be to append strings to a growing stable storage (
Vec<String>
, where eachString
has capacity double of the previous one). That they, there would be roughlyO(log(n))
allocations, which helps if original strings are not themselves separately allocated (if, for example, a compiler interns all identifiers from a single source file).The minimal implementation doesn't look too bad:
The text was updated successfully, but these errors were encountered: