Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocate interned strings in a contigious buffer #16

Closed
matklad opened this issue Mar 22, 2020 · 8 comments
Closed

Allocate interned strings in a contigious buffer #16

matklad opened this issue Mar 22, 2020 · 8 comments

Comments

@matklad
Copy link

matklad commented Mar 22, 2020

This is just a crazy idea, which might not actually work out in practice, but...

At the moment, the interner does roughly as many allocations, as there are interned strings, as each string is a separate box. A more efficient approach would be to append strings to a growing stable storage (Vec<String>, where each String has capacity double of the previous one). That they, there would be roughly O(log(n)) allocations, which helps if original strings are not themselves separately allocated (if, for example, a compiler interns all identifiers from a single source file).

The minimal implementation doesn't look too bad:

use std::collections::HashMap;

pub struct Interner {
    /// &'static strs point into self.buf
    map: HashMap<&'static str, u32>,
    vec: Vec<&'static str>,
    /// Invariant: these strings are never resized
    buf: Vec<String>,
}

impl Default for Interner {
    fn default() -> Self {
        Interner { map: HashMap::default(), vec: Vec::new(), buf: vec![String::with_capacity(128)] }
    }
}

impl Interner {
    pub fn intern(&mut self, name: &str) -> u32 {
        if let Some(&idx) = self.map.get(name) {
            return idx;
        }
        let name = unsafe { self.alloc(name) };
        let idx = self.map.len() as u32;
        self.map.insert(name, idx);
        self.vec.push(name);
        idx
    }

    pub fn lookup(&self, id: u32) -> &str {
        self.vec[id as usize]
    }

    unsafe fn alloc(&mut self, name: &str) -> &'static str {
        let mut last = self.buf.last_mut().unwrap();
        if last.capacity() - last.len() < name.len() {
            let cap = (last.capacity().max(name.len()) + 1).next_power_of_two();
            self.buf.push(String::with_capacity(cap));
            last = self.buf.last_mut().unwrap();
        }
        let start = last.len();
        last.push_str(name);
        &*(&last[start..] as *const str)
    }
}
@Robbepop
Copy link
Owner

Robbepop commented Mar 22, 2020

Thanks for the suggestion!

I also already made some thoughts how we can reduce the number of allocations.
Your proposal is interesting and I will experiment a bit with it over the weekend.
One behaviour of StringInterner so far was that it can efficiently take ownership of String while it also accepts &str. I see this being in conflict with certain arena-like designs. Also I already experimented with usage of Pin in StringInterner.

I was also experimenting with integrating https://crates.io/crates/bucket_vec with which prevents re-allocations compared with a normal vector upon push. So we can store u8 in it instead of String. However, there are other complications with it.

Have you looked into the ustr crate? It also is a string interner that allocates a whole chunk of data and puts strings in a similar vein into those chunks until they are filled to eventually just leak them and allocate another chunk of data. All to avoid heap allocations. Accordings to the authors it is faster than string_interner but it is using far more unsafe code as a trade off.

@matklad
Copy link
Author

matklad commented Mar 22, 2020

Yup, the above version is basically an inline version of bucket_vec (but by using String directly one avoids unsafety related to utf8).

I haven't seen ustr before, thanks for the pointer. It indeed seems much more complicated, and leaking memory won't fit every use-case.

Quick and non-scientific benchmarking shows that the above code (with FxHashMap) slightly outperforms ustr:

λ hyperfine ./target/many-allocs ./target/few-allocs ./target/ustr 
Benchmark #1: ./target/many-allocs
  Time (mean ± σ):     385.0 ms ±  10.4 ms    [User: 352.5 ms, System: 32.3 ms]
  Range (min … max):   370.1 ms … 403.7 ms    10 runs
 
Benchmark #2: ./target/few-allocs
  Time (mean ± σ):     217.1 ms ±  12.5 ms    [User: 198.7 ms, System: 18.4 ms]
  Range (min … max):   206.4 ms … 248.1 ms    11 runs
 
Benchmark #3: ./target/ustr
  Time (mean ± σ):     298.6 ms ±   5.5 ms    [User: 275.6 ms, System: 22.9 ms]
  Range (min … max):   292.2 ms … 311.5 ms    10 runs
 
Summary
  './target/few-allocs' ran
    1.38 ± 0.08 times faster than './target/ustr'
    1.77 ± 0.11 times faster than './target/many-allocs'

@Robbepop
Copy link
Owner

Robbepop commented Mar 22, 2020

Thanks for the benchmarks. So 20-30% faster is "slightly outperforms"? :P

Is the many-allocs the current StringInterner and few-alloc a fork of yours or does it only benchmark inserts into an FxHashMap?

I really like your proposal and if we can make sure that these changes actually improve performance for most use cases I am more than willing to either implement or merge a PR for it.

You are right about the bucket_vec and that your solution without a bucket_vec is probably a bit less involved and thus preferable.

@matklad
Copy link
Author

matklad commented Mar 22, 2020

few-allocs is the code from the original comment, many-allocs is this thing:

use std::collections::HashMap;

#[derive(Default)]
pub struct Interner {
    map: HashMap<String, u32>,
    vec: Vec<String>,
}

impl Interner {
    pub fn intern(&mut self, name: &str) -> u32 {
        if let Some(&idx) = self.map.get(name) {
            return idx;
        }
        let idx = self.map.len() as u32;
        self.map.insert(name.to_owned(), idx);
        self.vec.push(name.to_owned());
        idx
    }

    pub fn lookup(&self, id: u32) -> &str {
        self.vec[id as usize].as_str()
    }
}

I haven't actually tried string-interner.

My interners are from a hobby project of mine, which I am too embarrassed to make public rn :) I won't be submitting any PRs (but will probably write a blog about the technique), so feel free to run with the ideas!

@matklad
Copy link
Author

matklad commented Mar 22, 2020

Published the impl in a blog :)

One improvement I've noticed is that buf: String, full: Vec<String> is better than bufs: Vec<String>, as it avoid a bound check when accessing the last buffer.

@Robbepop
Copy link
Owner

Robbepop commented Mar 22, 2020

Nice read!

Some comments of mine:

  1. You could store Box<str> instead of String in vec and get rid of a usize per buffer. Since there won't be too many String instance this won't be very significant tbh. However, it would state the usage intent more clearly since you are not interested in mutating those string anymore when they are put into the vec vector.
  2. Instead of StrId(u32) you could use StrId(NonZeroU32) and offset all IDs by 1 to allows for certain optimizations with Option<StrId>.
  3. You could try to use Pin API instead of 'static references because that's what you are actually trying to do - pinning those strings and use the pinned references (Pin<&str>) in vec and map.

Also thank you for trying out the trie approach. I wanted to experiment with that myself. :)

@Robbepop
Copy link
Owner

Robbepop commented Jul 12, 2020

@matklad Today I implemented a rough skeleton of your proposal into StringInterner to a point were I was able to compare both StringInterner with and without your proposal.

The results are inconclusive:
Below you can see the benchmarks from StringInterner with your proposal in comparison with StringInterner without your proposal. Whenever there is a negative number (e.g. in get_or_intern/fill empty your proposal is a speed-up, otherwise your proposal made the operation less efficient.

The summary is that get_or_intern for an initially empty StringInterner got faster by approx 40%, however, get_or_intern for an already filled StringInterner (when interning is no longer needed later on) we see a 28% decline in performance. Same for resolve and get.
I am not entirely done yet with implementing your proposal but from the first sight it highly depends on the use-case which internment mechanics you want to use.

I have to dig deeper into this but it seems that there are some significant trade-offs.

     Running target/release/deps/bench-f392f0d1d999f1c7
get_or_intern/fill empty                                                                            
                        time:   [11.019 ms 11.096 ms 11.179 ms]
                        change: [-39.691% -39.131% -38.555%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

get_or_intern/already filled                                                                            
                        time:   [8.3101 ms 8.3953 ms 8.4876 ms]
                        change: [+25.861% +28.208% +30.536%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

resolve/already filled  time:   [193.34 us 197.57 us 202.07 us]                                   
                        change: [+17.449% +21.265% +24.802%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

get/already filled      time:   [8.3322 ms 8.4016 ms 8.4752 ms]                               
                        change: [+35.789% +38.018% +40.381%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

@Robbepop
Copy link
Owner

Robbepop commented Jul 14, 2020

This has been implemented in the new string-interner 0.10 within the new BucketBackend that is used as default backend for the StringInterner.

Thanks @matklad for proposing this implementation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants