-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Index look-up performance improvements #1189
Comments
Some quick and dirty benchmarks in which I made index look-ups case-sensitive, indicating removing the Master Branch
Case Sensitive Branch
|
maybe it is also worth looking into haschmap-implementations that use case-insensitive hashing/equal to accomplish the case-insensitive lookups.
It is not clear yet if something like unicase is cheaper compared to the to_lower_case calls. |
For completeness sake some benchmarks to compare how these different variants differ in their performance
struct CacheMap<K, V> {
map: HashMap<K, V>,
cache: FrozenMap<K, Box<Option<V>>>,
}
impl CacheMap<String, i32> {
fn new() -> Self {
CacheMap {
map: HashMap::new(),
cache: FrozenMap::new(),
}
}
fn insert(&mut self, key: &str, value: i32) {
self.map.insert(key.to_lowercase(), value);
}
fn get(&self, key: &str) -> Option<&i32> {
if let Some(cached) = self.cache.get(key) {
return cached.as_ref();
}
let lowered = key.to_lowercase();
match self.map.get(&lowered) {
Some(entry) => {
self.cache.insert(key.to_string(), Box::new(Some(entry.clone())));
return Some(entry);
}
None => {
self.cache.insert(key.to_string(), Box::new(None));
return None;
}
}
}
}
#[bench]
fn lowercase_lookup(bench: &mut Bencher) {
let mut hm: HashMap<String, i32> = HashMap::new();
hm.insert("FooBar".to_string(), 0);
bench.iter(|| {
for _ in 0..=1_000_000 {
hm.get(&"FooBar".to_lowercase());
hm.get(&"FOoBar".to_lowercase());
hm.get(&"FoOBar".to_lowercase());
hm.get(&"Foobar".to_lowercase());
hm.get(&"FooBAr".to_lowercase());
hm.get(&"FooBaR".to_lowercase());
}
});
}
#[bench]
fn unicase_string_lookup(bench: &mut Bencher) {
let mut hm: HashMap<UniCase<String>, i32> = HashMap::new();
hm.insert(UniCase::new(String::from("FooBar")), 0);
bench.iter(|| {
for _ in 0..=1_000_000 {
hm.get(&UniCase::new(String::from("FooBar")));
hm.get(&UniCase::new(String::from("FOoBar")));
hm.get(&UniCase::new(String::from("FoOBar")));
hm.get(&UniCase::new(String::from("Foobar")));
hm.get(&UniCase::new(String::from("FooBAr")));
hm.get(&UniCase::new(String::from("FooBaR")));
}
});
}
#[bench]
fn unicase_cow_lookup(bench: &mut Bencher) {
let mut hm: HashMap<UniCase<Cow<str>>, i32> = HashMap::new();
hm.insert(UniCase::new(Cow::Owned("FooBar".to_string())), 0);
bench.iter(|| {
for _ in 0..=1_000_000 {
hm.get(&UniCase::new(Cow::Borrowed("FooBar")));
hm.get(&UniCase::new(Cow::Borrowed("FOoBar")));
hm.get(&UniCase::new(Cow::Borrowed("FoOBar")));
hm.get(&UniCase::new(Cow::Borrowed("Foobar")));
hm.get(&UniCase::new(Cow::Borrowed("FooBAr")));
hm.get(&UniCase::new(Cow::Borrowed("FooBaR")));
}
});
}
#[bench]
fn cached_lookup(bench: &mut Bencher) {
let mut hm: CacheMap<String, i32> = CacheMap::new();
hm.insert("FooBar", 0);
// All of these look-ups will have a cache-miss in their first call, but a cache entry in the following ones.
// Furthermore all of these will return 0, because a `to_lowercase` will be called exactly once per cache-miss.
bench.iter(|| {
for _ in 0..=1_000_000 {
hm.get("FooBar");
hm.get("FooBar");
hm.get("FOoBar");
hm.get("FoOBar");
hm.get("Foobar");
hm.get("FooBAr");
hm.get("FooBaR");
}
});
} |
Related to #1185, we did some further investigations into potential resolver performance improvments and realized that there are no obvious bottlenecks per-se (at least the flamegraph didn't show any) but rather looking at the call-graph the
to_lowercase()
method becomes suspicious (which is called around ~24 million times in an internal project according to callgrind).Specifically the
find_type
(and many other index methods) callto_lowercase()
before looking up strings in a HashMap. It might be interesting to see if removing allto_lowercase()
calls for look-ups improves theplc check
performance. If it does, we may be able to improve the performance by implementing some tweaks into the index while keeping the compiler case-sensitive (because the norm defines it as such). Some of these tweaks may be:find_type("MyType")
is called twice, the first call will cache the result whereas the second call will check if there's a cache-entry (theto_lowercase()
method will thereby only be called in the first look-up)make_ascii_lowercase
method which looks interesting because of in-place modificationsFor what it's worth here's the call graph and flame-graph (note the somewhat narrow spikes in the flamegraph, indicating no real bottlenecks are present)
The text was updated successfully, but these errors were encountered: