Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index look-up performance improvements #1189

Open
volsa opened this issue Apr 3, 2024 · 3 comments
Open

Index look-up performance improvements #1189

volsa opened this issue Apr 3, 2024 · 3 comments
Labels
performance refactor internal change, cleanup, code-style-improvement

Comments

@volsa
Copy link
Member

volsa commented Apr 3, 2024

Related to #1185, we did some further investigations into potential resolver performance improvments and realized that there are no obvious bottlenecks per-se (at least the flamegraph didn't show any) but rather looking at the call-graph the to_lowercase() method becomes suspicious (which is called around ~24 million times in an internal project according to callgrind).

Specifically the find_type (and many other index methods) call to_lowercase() before looking up strings in a HashMap. It might be interesting to see if removing all to_lowercase() calls for look-ups improves the plc check performance. If it does, we may be able to improve the performance by implementing some tweaks into the index while keeping the compiler case-sensitive (because the norm defines it as such). Some of these tweaks may be:

  • Implement caching for queries, e.g. if find_type("MyType") is called twice, the first call will cache the result whereas the second call will check if there's a cache-entry (the to_lowercase() method will thereby only be called in the first look-up)
  • String interning, relevant blog post by the rust-analyzer author
  • See if we can somehow include the make_ascii_lowercase method which looks interesting because of in-place modifications
    • pretty sure this will not help though, because it doesn't solve the issue of multiple calls on the same String
    • there might be a problem with UTF-8 strings?
  • ...some further tweaks which I can't think of right now

For what it's worth here's the call graph and flame-graph (note the somewhat narrow spikes in the flamegraph, indicating no real bottlenecks are present)
callgraph
image

@volsa volsa added refactor internal change, cleanup, code-style-improvement performance labels Apr 3, 2024
@volsa
Copy link
Member Author

volsa commented Apr 12, 2024

Some quick and dirty benchmarks in which I made index look-ups case-sensitive, indicating removing the to_lowercase()s could improve the performance. See also master...case-sensitive-index-lookup

Master Branch

~/Development/rusty on  master [$?] is 📦 v0.2.0 via 🦀 v1.77.1 
➜ ./profile.sh 
    Finished release [optimized] target(s) in 0.07s
Benchmark 1: ./target/release/plc check ~/Desktop/internal-project/conf/plc.json --error-config=internal-project-profile.json --threads=1
  Time (mean ± σ):      4.340 s ±  0.039 s    [User: 4.064 s, System: 0.274 s]
  Range (min … max):    4.295 s …  4.411 s    10 runs
 
Benchmark 2: ./target/release/plc check ~/Desktop/internal-project/conf/plc.json --error-config=internal-project-profile.json --threads=2
  Time (mean ± σ):      2.906 s ±  0.049 s    [User: 4.135 s, System: 0.278 s]
  Range (min … max):    2.835 s …  2.973 s    10 runs
 
Benchmark 3: ./target/release/plc check ~/Desktop/internal-project/conf/plc.json --error-config=internal-project-profile.json --threads=4
  Time (mean ± σ):      2.110 s ±  0.018 s    [User: 4.108 s, System: 0.286 s]
  Range (min … max):    2.087 s …  2.143 s    10 runs
 
Benchmark 4: ./target/release/plc check ~/Desktop/internal-project/conf/plc.json --error-config=internal-project-profile.json --threads=8
  Time (mean ± σ):      1.775 s ±  0.025 s    [User: 4.222 s, System: 0.328 s]
  Range (min … max):    1.745 s …  1.816 s    10 runs
 
Summary
  ./target/release/plc check ~/Desktop/internal-project/conf/plc.json --error-config=internal-project-profile.json --threads=8 ran
    1.19 ± 0.02 times faster than ./target/release/plc check ~/Desktop/internal-project/conf/plc.json --error-config=internal-project-profile.json --threads=4
    1.64 ± 0.04 times faster than ./target/release/plc check ~/Desktop/internal-project/conf/plc.json --error-config=internal-project-profile.json --threads=2
    2.45 ± 0.04 times faster than ./target/release/plc check ~/Desktop/internal-project/conf/plc.json --error-config=internal-project-profile.json --threads=1

Case Sensitive Branch

~/Development/rusty on  master [$!?] is 📦 v0.2.0 via 🦀 v1.77.1 
➜ ./profile.sh 
    Finished release [optimized] target(s) in 0.07s
Benchmark 1: ./target/release/plc check ~/Desktop/internal-project/conf/plc.json --error-config=internal-project-profile.json --threads=1
  Time (mean ± σ):      3.035 s ±  0.060 s    [User: 2.736 s, System: 0.292 s]
  Range (min … max):    2.956 s …  3.133 s    10 runs
 
Benchmark 2: ./target/release/plc check ~/Desktop/internal-project/conf/plc.json --error-config=internal-project-profile.json --threads=2
  Time (mean ± σ):      2.051 s ±  0.022 s    [User: 2.765 s, System: 0.261 s]
  Range (min … max):    2.027 s …  2.105 s    10 runs
 
Benchmark 3: ./target/release/plc check ~/Desktop/internal-project/conf/plc.json --error-config=internal-project-profile.json --threads=4
  Time (mean ± σ):      1.569 s ±  0.021 s    [User: 2.769 s, System: 0.293 s]
  Range (min … max):    1.538 s …  1.593 s    10 runs
 
Benchmark 4: ./target/release/plc check ~/Desktop/internal-project/conf/plc.json --error-config=internal-project-profile.json --threads=8
  Time (mean ± σ):      1.337 s ±  0.017 s    [User: 2.842 s, System: 0.311 s]
  Range (min … max):    1.319 s …  1.374 s    10 runs
 
Summary
  ./target/release/plc check ~/Desktop/internal-project/conf/plc.json --error-config=internal-project-profile.json --threads=8 ran
    1.17 ± 0.02 times faster than ./target/release/plc check ~/Desktop/internal-project/conf/plc.json --error-config=internal-project-profile.json --threads=4
    1.53 ± 0.02 times faster than ./target/release/plc check ~/Desktop/internal-project/conf/plc.json --error-config=internal-project-profile.json --threads=2
    2.27 ± 0.05 times faster than ./target/release/plc check ~/Desktop/internal-project/conf/plc.json --error-config=internal-project-profile.json --threads=1

@riederm
Copy link
Collaborator

riederm commented Apr 15, 2024

maybe it is also worth looking into haschmap-implementations that use case-insensitive hashing/equal to accomplish the case-insensitive lookups.
options:

It is not clear yet if something like unicase is cheaper compared to the to_lower_case calls.

@volsa
Copy link
Member Author

volsa commented May 7, 2024

For completeness sake some benchmarks to compare how these different variants differ in their performance

test benches::cached_lookup         ... bench:  77,384,767 ns/iter (+/- 682,933)
test benches::unicase_cow_lookup    ... bench: 128,091,353 ns/iter (+/- 475,277)
test benches::unicase_string_lookup ... bench: 152,539,598 ns/iter (+/- 531,244)
test benches::lowercase_lookup      ... bench: 172,139,632 ns/iter (+/- 6,025,281)

test result: ok. 0 passed; 0 failed; 1 ignored; 4 measured; 0 filtered out; finished in 159.87s
  struct CacheMap<K, V> {
      map: HashMap<K, V>,
      cache: FrozenMap<K, Box<Option<V>>>,
  }
  
  impl CacheMap<String, i32> {
      fn new() -> Self {
          CacheMap {
              map: HashMap::new(),
              cache: FrozenMap::new(),
          }
      }
  
      fn insert(&mut self, key: &str, value: i32) {
          self.map.insert(key.to_lowercase(), value);
      }
  
      fn get(&self, key: &str) -> Option<&i32> {
          if let Some(cached) = self.cache.get(key) {
              return cached.as_ref();
          }
  
          let lowered = key.to_lowercase();
          match self.map.get(&lowered) {
              Some(entry) => {
                  self.cache.insert(key.to_string(), Box::new(Some(entry.clone())));
                  return Some(entry);
              }
              None => {
                  self.cache.insert(key.to_string(), Box::new(None));
                  return None;
              }
          }
      }
  }

    #[bench]
    fn lowercase_lookup(bench: &mut Bencher) {
        let mut hm: HashMap<String, i32> = HashMap::new();
        hm.insert("FooBar".to_string(), 0);

        bench.iter(|| {
            for _ in 0..=1_000_000 {
                hm.get(&"FooBar".to_lowercase());
                hm.get(&"FOoBar".to_lowercase());
                hm.get(&"FoOBar".to_lowercase());
                hm.get(&"Foobar".to_lowercase());
                hm.get(&"FooBAr".to_lowercase());
                hm.get(&"FooBaR".to_lowercase());
            }
        });
    }

    #[bench]
    fn unicase_string_lookup(bench: &mut Bencher) {
        let mut hm: HashMap<UniCase<String>, i32> = HashMap::new();
        hm.insert(UniCase::new(String::from("FooBar")), 0);

        bench.iter(|| {
            for _ in 0..=1_000_000 {
                hm.get(&UniCase::new(String::from("FooBar")));
                hm.get(&UniCase::new(String::from("FOoBar")));
                hm.get(&UniCase::new(String::from("FoOBar")));
                hm.get(&UniCase::new(String::from("Foobar")));
                hm.get(&UniCase::new(String::from("FooBAr")));
                hm.get(&UniCase::new(String::from("FooBaR")));
            }
        });
    }

    #[bench]
    fn unicase_cow_lookup(bench: &mut Bencher) {
        let mut hm: HashMap<UniCase<Cow<str>>, i32> = HashMap::new();
        hm.insert(UniCase::new(Cow::Owned("FooBar".to_string())), 0);

        bench.iter(|| {
            for _ in 0..=1_000_000 {
                hm.get(&UniCase::new(Cow::Borrowed("FooBar")));
                hm.get(&UniCase::new(Cow::Borrowed("FOoBar")));
                hm.get(&UniCase::new(Cow::Borrowed("FoOBar")));
                hm.get(&UniCase::new(Cow::Borrowed("Foobar")));
                hm.get(&UniCase::new(Cow::Borrowed("FooBAr")));
                hm.get(&UniCase::new(Cow::Borrowed("FooBaR")));
            }
        });
    }

    #[bench]
    fn cached_lookup(bench: &mut Bencher) {
        let mut hm: CacheMap<String, i32> = CacheMap::new();
        hm.insert("FooBar", 0);

        // All of these look-ups will have a cache-miss in their first call, but a cache entry in the following ones.
        // Furthermore all of these will return 0, because a `to_lowercase` will be called exactly once per cache-miss.
        bench.iter(|| {
            for _ in 0..=1_000_000 {
                hm.get("FooBar");
                hm.get("FooBar");
                hm.get("FOoBar");
                hm.get("FoOBar");
                hm.get("Foobar");
                hm.get("FooBAr");
                hm.get("FooBaR");
            }
        });
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance refactor internal change, cleanup, code-style-improvement
Projects
None yet
Development

No branches or pull requests

2 participants