Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the new CachingMap struct by addressing these concerns #244

Closed
garypen opened this issue Dec 6, 2021 · 5 comments
Closed

Improve the new CachingMap struct by addressing these concerns #244

garypen opened this issue Dec 6, 2021 · 5 comments

Comments

@garypen
Copy link
Contributor

garypen commented Dec 6, 2021

Describe the solution you'd like
The PR to unify caching surfaced a number of discussions which didn't seem to be converging to resolution. We are going to make progress by extracting those discussions into this issue and merging the code as is. Consider these items for future changes in this area.
Describe alternatives you've considered

  1. Traits or Generics for the CachingMap struct

  2. Propagating Debug information

  3. Reducing Cache Contention on the Cache Lock

  4. Release the caching functionality as a separate crate.

Additional context
I've extracted the comments from the PR and re-produced below as well as I can.

cecton 3 days ago
This dynamic dispatching is unnecessary. We should probably use a generic parameter instead.

Member
Author
@garypen garypen 3 days ago
I put a comment in the PR about how the resolver would be used. It made the code a lot cleaner and I really don't think the performance matters in this use case.

Member
@cecton cecton 3 days ago
Ah sorry I didn't read.

Member
@cecton cecton 3 days ago
After spending a bit of time exploring different ideas, I've settled
on the idea proposed by Geoffroy to provide a value resolver for use
by the cache map.

This approach is cleanest from the point of view of client interactions,
although it does require clients to implement the CacheResolver trait.

You don't really explain why this approach was better. In my opinion it is much better if the user just provide a function-like.

Ideally we could store a callback in the CachingMap struct, but
lifetimes and async restrictions mean I can't make that work at the
moment.

Is this comment related?

What about this?

use std::marker::PhantomData;

pub struct CachingStuff<K, V, F> {
callback: F,
phantom_data: PhantomData<(K, V)>,
}

impl<K, V, F> CachingStuff<K, V, F>
where
F: Fn(K) -> V,
{
pub fn new(callback: F) -> Self {
Self {
callback,
phantom_data: PhantomData,
}
}

pub fn get(&self, key: K) -> V {
    (self.callback)(key)
}

}

#[derive(Debug)]
pub struct MyError;

fn main() {
let cache = CachingStuff::new(|_key: String| Ok::<u32, MyError>(42));
let value_res: Result<u32, MyError> = cache.get("boo".to_string());
println!("Value: {:?}", value_res);
}

Member
Author
@garypen garypen 3 days ago
Sorry if the detail isn't apparent. The issue was caused by trying to maintain an async function callback. The various things we are calling are async and I wanted to provide an async interface to prevent blocking of threads when retrieving values (I guess I could have used spawn_blocking with a sync interface...).

I couldn't figure out a nice way to store such a "thing" (async fn callback effectively) in a struct.

Member
@Geal Geal 3 days ago
the cache could be generic over the CacheResolver without going all the way to using a function though. That would simplify the code a bit. The performance difference is not meaningful

pub struct CachingMap<K, V, R> {
#[derivative(Debug = "ignore")]
cached: Mutex<LruCache<K, CacheResult>>,
#[allow(clippy::type_complexity)]
#[derivative(Debug = "ignore")]
wait_map: Mutex<HashMap<K, Sender<(K, CacheResult)>>>,
cache_limit: usize,
#[derivative(Debug = "ignore")]
resolver: R,

Member
@cecton cecton 3 days ago
use std::future::Future;
use std::marker::PhantomData;

pub struct CachingStuff<K, V, F, FUT> {
callback: F,
phantom_data: PhantomData<(K, V, FUT)>,
}

impl<K, V, F, FUT> CachingStuff<K, V, F, FUT>
where
F: Fn(K) -> FUT,
FUT: Future<Output = V>,
{
pub fn new(callback: F) -> Self {
Self {
callback,
phantom_data: PhantomData,
}
}

pub async fn get(&self, key: K) -> V {
    (self.callback)(key).await
}

}

#[derive(Debug)]
pub struct MyError;

#[tokio::main]
async fn main() {
let cache = CachingStuff::new(|_key: String| async { Ok::<u32, MyError>(42) });
let value_res: Result<u32, MyError> = cache.get("boo".to_string()).await;
println!("Value: {:?}", value_res);
let cache = CachingStuff::new(|_key: String| async {
tokio::task::spawn_blocking(move || Ok::<u32, MyError>(42))
.await
.unwrap()
});
let value_res: Result<u32, MyError> = cache.get("boo".to_string()).await;
println!("Value: {:?}", value_res);
}

Member
@Geal Geal 3 days ago
I'd rather we have the trait and a generic member, implement the trait over Fn, and let the calling side decide how they will use it. This will change nothing over the functionality, be more flexible and you'll have the function if you want

Member
@cecton cecton 3 days ago
It's make the code more complicated for no good reason.

This cache can only call one function per caching thing anyway. If we have new use cases and it feels relevant to add a trait, then we can add a trait. Until then please keep the code stupid simple.

Geal 3 days ago
the cache could be generic over the CacheResolver without going all the way to using a function though. That would simplify the code a bit. The performance difference is not meaningful

pub struct CachingMap<K, V, R> {
#[derivative(Debug = "ignore")]
cached: Mutex<LruCache<K, CacheResult>>,
#[allow(clippy::type_complexity)]
#[derivative(Debug = "ignore")]
wait_map: Mutex<HashMap<K, Sender<(K, CacheResult)>>>,
cache_limit: usize,
#[derivative(Debug = "ignore")]
resolver: R,

Geal 3 days ago
do we really need Derivative here? Does the cache need to implement Debug? the only field that will be displayed is the cache limit

Member
@cecton cecton 3 days ago
No we don't need but it's a nice to have and it costs 5 lines. I'm fine with it and implementing Debug as much as possible makes the API nicer to use.

So I'm not against removing it. Slight preference for keeping it.

Member
Author
@garypen garypen 3 days ago
It's required because:

apollo-router-core/src/query_planner/caching_query_planner.rs

#[derive(Debug)]
pub struct CachingQueryPlanner<T: QueryPlanner> {
cm: CachingMap<QueryKey, Arc>,
...

Member
@cecton cecton 3 days ago
That's not required. We can use derivative there if we want

Member
@Geal Geal 3 days ago
why would having the cache implement Debug make the API nice to use?

Member
Author
@garypen garypen 3 days ago
I don't have strong feelings about this but I think it's better to control the output of Debug here rather than in every consumer of this struct.

Member
@cecton cecton 3 days ago
why would having the cache implement Debug make the API nice to use?

Because people don't need to use derivative when they want to wrap the object. They can just derive(Debug) and it works immediately.

Member
@cecton cecton 3 days ago
I think it's better to control the output of Debug here rather than in every consumer of this struct.

Less lines using derivative (also less opportunities to f up the implementation). Though I don't mind if you do want to implement Debug yourself.

Member
@Geal Geal 3 days ago
I do not think there are cases where we would need to print the cache, especially if it's only to show the cache limit. it should not have to implement Debug at all

Member
@cecton cecton 3 days ago
Ok let's remove the debug impl then because I don't have strong opinion on this.

So I'm not against removing it. Slight preference for keeping it.

Member
Author
@garypen garypen 3 days ago
I think we've missed something here though. We have to have Debug on CachingQueryPlanner right now, because it is collected in a caching span. (#[instrument]). So, it might be the right thing to not have a Debug on the cache, but to avoid it we have to exclude it in all Debug consumers, so right now, it's the right answer (I think). But, if we remove Debug requirements upstream, we can remove this one at the same time.

Maybe putting the above comment on the source would be helpful?

Member
@cecton cecton 3 days ago
Oh!! That's because it is on the trait QueryPlanner. I don't mind removing this trait boundary. Then on ApolloRouter you can use derivative to exclude the field from the debug output.

Member
@Geal Geal 3 days ago
We have to have Debug on CachingQueryPlanner right now, because it is collected in a caching span. (#[instrument]).

that sounds like something we do not want to have (cf the other perf issues we had with #[instrument])

Member
Author
@garypen garypen 3 days ago
I think the use of "skip_all" as recommended in the ADR on telemetry will sort out the telemetry perf problem when we implement it.

I'm trying to minimise the "blast radius" of this PR and so I'd like to avoid making changes outside of the required by the PR changes. So, I want to avoid modifying QueryPlanner/ApolloRouter/... and contain the changes to the minimum set. Keeping this change for now and fixing later and documenting the ability to remove later in a comment seems the cleanest approach.

Geal 3 days ago
could this be done in a different scope? Like this:

{
let mut locked_cache = self.cached.lock().await;
locked_cache.put(key.clone(), value.clone());
// Update our wait list
let mut locked_wait_map = self.wait_map.lock().await;
locked_wait_map.remove(&key);
}
Because with the current code, the cache and map locks are still held when we're await'ing on the blocking task

Member
Author
@garypen garypen 3 days ago
I'm concerned that we would introduce a race if we didn't keep both the locks until the end of the notification. There may be opportunities for optimisation here, but I don't want to take the risk since it would only be a micro-optimisation and require careful analysis to make sure there were no mistakes.

Member
@Geal Geal 3 days ago
I don't think there will be a race here:

the sender was removed from the map after the value was put in the cache, so any new queries for the same key will get it from the cache directly
all the receivers have already dropped the locks when they're waiting

Member
Author
@garypen garypen 3 days ago
I don't think so either, but I'm not sure. So, rather than take the risk of being wrong, I don't think the benefit of a small improvement in scalability is justified for the risk.

In particular, I'm concerned that a cache expiration event would cause a race. Very unlikely, but requires some careful thinking that I haven't done.

Member
@Geal Geal 3 days ago
yes the concerns are real. Actually, could we work on a proof that it works, even without the modification I want? Not necessarily something formal, just exploring the various cases and check that they will be safe

Member
@Geal Geal 3 days ago
ok so I tried to explore the various states. That looks like the problem where we should throw some formal method tool like TLA+ if we want to look at cases with more than 2 queries.

Here's the list of operations:

Cl: lock the cache
Cu: unlock the cache
Wl: lock the wait map
Wu: unlock the wait map
Ch: cache hit
Cm: cache miss
Wh: wait map hit
Wm: Wait map miss
Ci: cache insert
Wi: wait map insert
Wr: wait map remove
S: subscribe to the broadcast
Rx: receive
Tx: broadcast to all the receivers
G: get from the resolver
There should be three flows:

cache hit: ClChCu then done
cache miss, first query to ask the cache, set up the broadcast, do the work, send values:
ClCmWlCu WmWiWu G ClCiWlWrTxWuCu
cache miss, another query is currently working:
ClCmWlCu WhSWu Rx
Now I'll try variations with concurrent queries. First thing we can consider: the
sequence ClCmWlCu is completely protected by the cache lock, we cannot see multiple
queries inside at the same time. When a query reaches the Rx state (waiting for another
query to do the work), we can consider it safe, it won't lock anything else

1ClCmWlCu 2ClCm 2Wl: can't happen, 1 has Wl

1ClCmWlCu 2ClCm 1WmWiWuG 1Cl: can't happen, 2 has Cl
1ClCmWlCu 2ClCm 1WmWiWuG 2WlCu 2Wm: can't happen, 1 has Wi

1ClCmWlCu 2ClCm 1WmWiWuG 2WlCuWhSWu: from here, no issue, 2 is not waiting on any locks

1ClCmWlCu 2ClCm 1WmWiWuG 2WlCu 2WhS 1ClCi 1Wl: can't happen, 2 has Wl
1ClCmWlCu 2ClCm 1WmWiWuG 2WlCu 2WhSWuRx 1ClCiWlWrTxWuCu OK

1ClCmWlCu 1WmWiWuG 2ClCmWl 1Cl: can't happen, 2 has Cl
1ClCmWlCu 1WmWiWuG 2ClCmWlCu 1ClCi 1Wl: can't happen, 2 has Wl

1ClCmWlCu 1WmWiWuG 2ClCmWlCu 1ClCi 2WhSWuRx 1WlWrTxWuCu OK

Member
@Geal Geal 3 days ago
the change I propose would affect the last sequence for the first query: ClCiWlWrTxWuCu would become ClCiWlWrWuCuTx

In all the cases I tried, once 1 reaches Wr, 2 is already in a safe state

Member
@Geal Geal 3 days ago •
another note: the steps 2ClCm and 1WmWiWuG are commutative.

and actually the 3 queries case resolves to less cases, since it's very constrained:
1ClCmWlCu 2ClCm 1WmWiWuG 2WlCu 3ClCm 1Cl: can't happen 3 has Cl
1ClCmWlCu 2ClCm 1WmWiWuG 2WlCu 3ClCm 3Wl: can't happen 2 has Wl
1ClCmWlCu 2ClCm 1WmWiWuG 2WlCu 3ClCm 2WhSWuRx 3WlCu 1ClCi 1Wl: can't happen 3 has Wl
1ClCmWlCu 2ClCm 1WmWiWuG 2WlCu 3ClCm 2WhSWuRx 3WlCu 1ClCi 3WhSWuRx 1WlWrTxWuCu OK

Member
@o0Ignition0o o0Ignition0o 3 days ago
I'm having a hard time to follow the discussion, is there a chance we can land the "conservative approach" and chase the optimisation as a followup?

The code as is looks like a great improvement already IMO

Member
@Geal Geal 3 days ago
it is not just an optimization, but a way to avoid high contention on the cache lock: the broadcast channel works by cloning the value before sending it to each receiver, so if the value is large and there is a lot of receivers, the lock would be held while executing a serie of clones, which is potentially expensive. And so no other queries would be able to use the cache in that time

If the caching implementation is generic enough, it should be moved to its own crate. This would make it easier to replace with a potential future off the shelf implementation or be used in other projects as a useful part of the rust ecosystem.

I don't think this externalisation should be done until the relevant discussions in 1 -> 3 are resolved.

@cecton
Copy link
Contributor

cecton commented Dec 6, 2021

Point 1

I'm in favor of embracing KISS philosophy. If you don't need an additional trait to achieve the same goal, don't use an additional trait.

As I have proven in this discussion. No trait is needed to cover all our use cases.

@cecton
Copy link
Contributor

cecton commented Dec 6, 2021

Point 4

I recommend that if the "caching thingy" is generic enough it should be moved to a separate project. This is because we want to share with the community the bits we can. They shouldn't have to pull Apollo Router to get our cool caching thingy. Our cool caching thingy should be accessible directly.

  • Don't even prefix the crate name with apollo-: we don't want to give the impression this is specific, we want to stay generic
  • Make a separate repository: so it looks more autonomous
  • Use the traditional "MIT OR Apache-2.0" license: this is what the Rust community uses. There is no reason for us to go further than that since this code will be simple and not part of our business logic.

@cecton
Copy link
Contributor

cecton commented Dec 6, 2021

(I have no opinion on point 2 and 3)

This was referenced Dec 8, 2021
@garypen
Copy link
Contributor Author

garypen commented Dec 14, 2021

One more thing to consider in our caching thinking is the original question which motivated (to some degree at least) #42: namely should we be using the default hasher or something faster.

@garypen
Copy link
Contributor Author

garypen commented Dec 14, 2021

This issue is a very big issue that covers a number of different facets of our current caching implementation. I'm going to split this into five separate issues to make it easy to consider each of them independently. See
#270
#271
#272
#273
#274
for more details.

@garypen garypen closed this as completed Dec 14, 2021
@abernix abernix added size/small and removed triage labels Dec 14, 2021
o0Ignition0o added a commit that referenced this issue Nov 10, 2023
…lection-star

Support `alias: * {...}` (aka "star") selections for capturing unselected properties
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants