Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warm the query planner cache with hot queries #174

Merged
merged 18 commits into from
Nov 25, 2021
Merged

Conversation

garypen
Copy link
Contributor

@garypen garypen commented Nov 22, 2021

If we use an LRU cache (crate: lru) for our query plans, then, whenever
we update queries or configuration, we can pick the top current query
keys and use those to pre-populate a new cache.

Most of this change is just expressing that simple idea.

There are yet questions to be resolved:

  • How large should the cache be and should it be configurable?
  • How many hot keys should we preserve on re-configuration?
  • How best to expose this functionality without affecting good
    code structure too much?

Answers:

  • Cache defaults to 100 items and is configurable
  • Preserve 20% of hot keys on re-configuration/schema update
  • Currently passed as parameter, will change to adopt configuration updates

Hence this is a draft PR for early review.
Now in final review

If we use an LRU cache (crate: lru) for our query plans, then, whenever
we update queries or configuration, we can pick the top current query
keys and use those to pre-populate a new cache.

Most of this change is just expressing that simple idea.

There are yet questions to be resolved:
 - How large should the cache be and should it be configurable?
 - How many hot keys should we preserve on re-configuration?
 - How best to expose this functionality without affecting good
   code structure too much?

Hence this is a draft PR for early review.
@garypen garypen requested a review from Geal November 22, 2021 16:02
Make sure to wait for the future which we generate when getting the key
of our query plan.
This version avoids the twin evils of:
 - locking cache for entire duration of delegated get
 - OR doing multiple delegated get for single query

It's more complex, so maybe not desirable on those grounds and does
introduce new failure paths and a new crate (bus). If we like the
approach, I could probably get rid of the bus crate and do this using
condvar or a different spmc implementation.
Slightly better comments help keep things clear.
...from CachingQueryPlanner implementation
I wasn't happy with the bus crate, so this change removes it and makes
use of the tokio broadcast channel. Maybe not as fast, but well
supported and always good to use less crates.

All the mock tests are passing now, but I'm not really happy with the
changes I made. Might need more detailed examination.
?

I don't know, but I'll remove it again.
@garypen garypen marked this pull request as ready for review November 24, 2021 17:24
So that CircleCI complains less.
Copy link
Contributor

@cecton cecton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partially reviewed. I still need to review & understand the entire logic.

crates/apollo-router/src/lib.rs Outdated Show resolved Hide resolved
crates/apollo-router/src/lib.rs Outdated Show resolved Hide resolved
crates/apollo-router/src/router_factory.rs Outdated Show resolved Hide resolved
crates/apollo-router/src/main.rs Outdated Show resolved Hide resolved
Most of them at least. I'm leaving the broader configuration and cache
handling questions raised by cecile to later work.
Add space to comment at line 172
@garypen garypen requested review from Geal and cecton November 25, 2021 09:38
Copy link
Contributor

@cecton cecton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as the discussion about the parameter as been moved, this is not blocking this PR anymore 😁

I still have not finished reviewing the core logic though but I have more remarks about the traits.

.gitignore Outdated Show resolved Hide resolved
crates/apollo-router/src/router_factory.rs Outdated Show resolved Hide resolved
Comment on lines +100 to 109

async fn get_hot_keys(&self) -> Vec<QueryKey> {
let locked_cache = self.cached.lock().await;
locked_cache
.iter()
.take(self.plan_cache_limit / 5)
.map(|(key, _value)| key.clone())
.collect()
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to move outside the trait implementation.

The trait QueryPlanner shouldn't be aware of any caching functionality.

Suggested change
async fn get_hot_keys(&self) -> Vec<QueryKey> {
let locked_cache = self.cached.lock().await;
locked_cache
.iter()
.take(self.plan_cache_limit / 5)
.map(|(key, _value)| key.clone())
.collect()
}
}
}
impl<T: QueryPlanner> CachingQueryPlanner<T> {
async fn get_hot_keys(&self) -> Vec<QueryKey> {
let locked_cache = self.cached.lock().await;
locked_cache
.iter()
.take(self.plan_cache_limit / 5)
.map(|(key, _value)| key.clone())
.collect()
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it may be undesirable, but I think this would break RouterFactory::recreate() because:
let hot_keys = graph.get_query_planner().get_hot_keys().await;
I'm trying to decide if there's a reason why a non-caching QueryPlanner would return hot keys... I mean, it could, but the current implementation doesn't do any tracking and just returns an empty vector.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I just saw... 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the API problem behind this issue but it's best to solve it in another PR. So let keep this for now.

Copy link
Contributor

@o0Ignition0o o0Ignition0o left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only question I had was about the get_hot_keys function, which we cleared up together, otherwise this looks good to me overall.

The conversations seem to be more about nits, which can be iterated on IMO

And put it in my global .gitignore
Comment on lines +100 to 109

async fn get_hot_keys(&self) -> Vec<QueryKey> {
let locked_cache = self.cached.lock().await;
locked_cache
.iter()
.take(self.plan_cache_limit / 5)
.map(|(key, _value)| key.clone())
.collect()
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I just saw... 🤔

crates/apollo-router/src/router_factory.rs Outdated Show resolved Hide resolved
Make sure to only drop the cache lock after acquiring the wait lock.
Also address more review comments.
Copy link
Contributor

@cecton cecton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small knits but good to go 🤗 Thanks a lot for your patience!!

crates/apollo-router/src/router_factory.rs Outdated Show resolved Hide resolved
crates/apollo-router/src/router_factory.rs Outdated Show resolved Hide resolved
Comment on lines +100 to 109

async fn get_hot_keys(&self) -> Vec<QueryKey> {
let locked_cache = self.cached.lock().await;
locked_cache
.iter()
.take(self.plan_cache_limit / 5)
.map(|(key, _value)| key.clone())
.collect()
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the API problem behind this issue but it's best to solve it in another PR. So let keep this for now.

More review comments
@garypen
Copy link
Contributor Author

garypen commented Nov 25, 2021

Some small knits but good to go 🤗 Thanks a lot for your patience!!

Thanks for your diligent review. I think the quality is much improved.

@garypen garypen dismissed Geal’s stale review November 25, 2021 13:47

I've added the requested changes.

@garypen garypen merged commit 5a3e34c into main Nov 25, 2021
@garypen garypen deleted the warm-query-planner branch November 25, 2021 13:50
@garypen garypen linked an issue Nov 25, 2021 that may be closed by this pull request
@garypen garypen mentioned this pull request Dec 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

warm the query planner cache with well known queries
4 participants