Faster Caching #42

o0Ignition0o · 2021-10-18T09:09:07Z

We're currently using a HashMap to cache introspection and queries. Which might not be the best thing to do (the default hashing function is SipHash 1-3, which is robust but not the fastest one.)

We might need to investigate for better hash functions, or caching crates alltogether.

Geal · 2021-10-18T10:04:02Z

note that hashing functions where the key is controlled by client requests can lead to hash flooding attacks where keys are generated specifically to create a lot of collisions and ultilmately DoS the server. It's rediscovered every 2 years in every web platform (php, ruby, js...) and it's the reason algorithms like SipHash were developed.
If we end up using a hashmap like structure, I strongly advise that we use a safe hashing function, since that will be internet facing code

cecton · 2021-10-18T10:16:19Z

I haven't check but there are specialized crates for that: https://crates.io/crates/lru

Geal · 2021-10-18T13:33:03Z

I haven't check but there are specialized crates for that: https://crates.io/crates/lru

lru allows choosing the hash function: https://docs.rs/lru/0.7.0/lru/struct.LruCache.html#method.with_hasher
Using it instead of HashMap would already be an improvement, so let's try it

o0Ignition0o · 2021-10-18T13:35:00Z

Any suggestions as to the LRU size we wanna try ? (we can also go the unbounded() route)

Geal · 2021-10-18T15:34:08Z

that should have its own configuration option, the unfortunate part is that the cache does not measure memory usage but the number of entries. We should not have unbounded resource usage, nothing is unbounded in computers 😉

abernix · 2021-10-18T17:37:03Z

A configuration knob for the LRU that is memory-based does seem preferable over number of items. In the Gateway, for example, we crudely make this a "30MiB cache" by merely checking the serialized length of the JSON object and using that as a relative indicator.

(Yes, acknowledging that this is literally the loosest meaning of "30 MiB" and is undoubtably off by a factor of hopefully not more than 2. 😜)

o0Ignition0o · 2021-10-18T19:32:21Z

We could play with pop_lru() and sizeof and see how it goes

cecton · 2021-10-20T07:51:38Z

We could play with pop_lru() and sizeof and see how it goes

Use maybe that one instead: https://doc.rust-lang.org/std/mem/fn.size_of_val.html

Though... I think we could just use it on the entire object without popping anything if you want.

Geal · 2021-10-29T15:22:53Z

we could also take a look at https://crates.io/crates/stretto

garypen · 2021-11-19T15:03:28Z

The complication here is that functions such as size_of_val() only report the stack size of an item and don't report the heap usage. This makes the tracking of memory much more complicated and intrusive, since, for example, all Strings will report 24 bytes no matter how much heap the String actually consumes.

Furthermore, memory management from within an application is fairly muddy even in low level languages because the low level allocators (jemalloc, libc, tcmalloc, etc...) all follow their own policy for actually requesting memory from the underlying OS, The allocator might be mapping large pages, or multiple small pages to satisfy requests and even not returning memory pages until a program terminates.

We could go down the route @abernix describes of serializing data and using that to estimate how much memory is being consumed. That would impose a performance burden on cache mutation operations. I could do some benchmarking to explore that, but my instincts are to skip it and only consider it if we have a concrete memory consumption concern due to massive variation between element sizes.

We could use an abstract cost mechanism such as stretto exposes, but I'm nervous about using a crate which is so new. Also: estimating the cost would still be intrusive.

All of this means I think it's much more realistic (and far less intrusive) to just manage the number of items held in the cache.

Summary: Don't try to track actual memory usage. Do use an established crate (I'd favour lru crate), continue to use SipHash for now since robustness trumps speed. Provide a default sizing for caches and consider exposing a configuration override.

garypen · 2021-12-14T14:46:29Z

We introduced and refined a Caching mechanism for the router (LRU caching) in #174 and then refined it in #235. However, the approach did leave some questions unresolved and these were captured in an umbrella issue: #244.

I'm going to close this issue and mark it resolved: #174. The remaining question in this issue about which hashing algorithm to use remain unresolved, so I'll add that to #244.

o0Ignition0o transferred this issue from another repository Nov 4, 2021

garypen self-assigned this Nov 19, 2021

garypen mentioned this issue Nov 19, 2021

Reduce cloning in Response building #168

Closed

Geal mentioned this issue Nov 29, 2021

Parse & cache queries #205

Merged

2 tasks

garypen closed this as completed Dec 14, 2021

garypen mentioned this issue Dec 14, 2021

Improve the new CachingMap struct by addressing these concerns #244

Closed

abernix added 2021-12 labels Jan 6, 2022

garypen mentioned this issue Jan 11, 2022

Release the caching functionality as a separate crate. #273

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster Caching #42

Faster Caching #42

o0Ignition0o commented Oct 18, 2021 •

edited by Geal

Loading

Geal commented Oct 18, 2021

cecton commented Oct 18, 2021

Geal commented Oct 18, 2021

o0Ignition0o commented Oct 18, 2021 •

edited

Loading

Geal commented Oct 18, 2021

abernix commented Oct 18, 2021

o0Ignition0o commented Oct 18, 2021

cecton commented Oct 20, 2021

Geal commented Oct 29, 2021

garypen commented Nov 19, 2021

garypen commented Dec 14, 2021

Faster Caching #42

Faster Caching #42

Comments

o0Ignition0o commented Oct 18, 2021 • edited by Geal Loading

Geal commented Oct 18, 2021

cecton commented Oct 18, 2021

Geal commented Oct 18, 2021

o0Ignition0o commented Oct 18, 2021 • edited Loading

Geal commented Oct 18, 2021

abernix commented Oct 18, 2021

o0Ignition0o commented Oct 18, 2021

cecton commented Oct 20, 2021

Geal commented Oct 29, 2021

garypen commented Nov 19, 2021

garypen commented Dec 14, 2021

o0Ignition0o commented Oct 18, 2021 •

edited by Geal

Loading

o0Ignition0o commented Oct 18, 2021 •

edited

Loading