perf(runloop) prefetch services when rebuilding router #4101

p0pr0ck5 · 2018-12-16T06:21:20Z

Summary

Trade space for time to improve performance when rebuilding the router by loading the service associated with a given route from a local table, rather than requiring a separate query for each route in the cluster. This reduces the number of queries needed to rebuild the router with N routes from N to roughly N / 1000 * 2.

This is a proof-of-concept commit that needs further discussion (for example, should we cache services in memory over a longer lifetime, rather than using a function-local table for potentially several thousand objects).

Full changelog

prefetch services when rebuilding router

Trade space for time to improve performance when rebuilding the router by loading the service associated with a given route from a local table, rather than requiring a separate query for each route in the cluster. This reduces the number of queries needed to rebuild the router with N routes from N to roughly N / 1000 * 2. This is a proof-of-concept commit that needs further discussion (for example, should we cache services in memory over a longer lifetime, rather than using a function-local table for potentially several thousand objects).

p0pr0ck5 · 2018-12-17T19:21:45Z

Given the comments in #4055 (comment), I'm changing the label to this to 'please review'. I think this is a valid change that should be pushed as part of 1.0 as it involves no additional necessary configuration and no breaking client change.

bungle · 2018-12-17T20:26:46Z

Why not use caching strategy from #4102 instead?

bungle · 2018-12-17T20:30:20Z

TBH, I think deciding on these these will be post-1.0.0. The issue of 500/401 is already (mostly) solved.

p0pr0ck5 · 2018-12-17T20:30:46Z

The goal here was to implement an improvement with as little behavioral change as possible, to be tested in accordance with the traffic patterns noted in #4055. Caching Services potentially imbues non-trivial memory pressure on a given node, depending on the size of the cluster config; I was hoping to have this change tested with as little divergence from current/RC4 code as possible. But definitely, caching object has value if it's not too big a burden to bear in the shm. I've no strong feelings about caching Service objects in-memory vs fetching them, but tbvh I don't have the bandwidth the implement and test a larger changeset at this point. Hopefully someone can review this and help drive it forward into 1.0. I'll be happy to help with a pair of hands where and when i can :)

p0pr0ck5 · 2018-12-17T20:34:52Z

The issue of 500/401 is already solved.

I'm not sure that it is, per #4055. The latest test results from @jeremyjpj0916 for code that is released (available as of #4084; see #4055 (comment)) indicated that the constant-500 errors were solved (due to the incorrect mutex handling), but potentially-sporadic 500 responses are still very possible (see #4055 (comment)). This PR attempts to solve that problem and that problem alone - future optimizations (timer-based rebuild, configurable consistency, etc) are out of scope here.

bungle · 2018-12-17T20:36:27Z

We already presented a maybe theoretical problem with paging on routes. Your PR introduces another (possibly theoretical) issue: think about new route and service being added while building a router using paging. You get new route in results, but the related service is not found in your gathered services, so your code will return an error. The caching does not have that problem.

p0pr0ck5 · 2018-12-17T20:40:07Z

Iterative results. This approach is a pragmatic- send less expensive network traffic where we don't need. In odd cases where many Routes are added near-simultaneously, the router likely breaks as-is. This PR doesn't try to change this; it just makes talking to the DB (and in particular, strongly-consistent C* installations, the behavior of which is easily forgotten during development) less expensive. Nothing more, nothing less.

bungle · 2018-12-17T20:41:58Z

This PR changes that as it makes hitting here more possible:
https://github.com/Kong/kong/pull/4101/files#diff-1f0f137350308286de76ec59740c8edaR150

At least you could add a fallback to normal select in that case.

bungle · 2018-12-17T21:00:12Z

As I am obviously biased as I would merge #4102 or #4107 (similar to what we do with plugins already) instead of this, so I ask others to do review and possible merge (I do not have anything against it, though I do not see why it cannot wait).

p0pr0ck5 · 2019-01-11T00:58:09Z

This will get handled elsewhere

p0pr0ck5 mentioned this pull request Dec 16, 2018

Kong throws 404 not found on route paths that exist at times. #4055

Closed

bungle mentioned this pull request Dec 16, 2018

feat(core) router rebuild timer (revisited) #4102

Closed

bungle added the pr/discussion This PR is being debated. Probably just a few details. label Dec 16, 2018

p0pr0ck5 added pr/please review and removed pr/discussion This PR is being debated. Probably just a few details. labels Dec 17, 2018

bungle mentioned this pull request Dec 17, 2018

perf(runloop) cache services on router building #4107

Closed

p0pr0ck5 closed this Jan 11, 2019

p0pr0ck5 deleted the perf/prefetch-router-services branch January 11, 2019 00:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(runloop) prefetch services when rebuilding router #4101

perf(runloop) prefetch services when rebuilding router #4101

p0pr0ck5 commented Dec 16, 2018

p0pr0ck5 commented Dec 17, 2018

bungle commented Dec 17, 2018

bungle commented Dec 17, 2018 •

edited

Loading

p0pr0ck5 commented Dec 17, 2018

p0pr0ck5 commented Dec 17, 2018

bungle commented Dec 17, 2018 •

edited

Loading

p0pr0ck5 commented Dec 17, 2018

bungle commented Dec 17, 2018

bungle commented Dec 17, 2018 •

edited

Loading

p0pr0ck5 commented Jan 11, 2019

perf(runloop) prefetch services when rebuilding router #4101

perf(runloop) prefetch services when rebuilding router #4101

Conversation

p0pr0ck5 commented Dec 16, 2018

Summary

Full changelog

p0pr0ck5 commented Dec 17, 2018

bungle commented Dec 17, 2018

bungle commented Dec 17, 2018 • edited Loading

p0pr0ck5 commented Dec 17, 2018

p0pr0ck5 commented Dec 17, 2018

bungle commented Dec 17, 2018 • edited Loading

p0pr0ck5 commented Dec 17, 2018

bungle commented Dec 17, 2018

bungle commented Dec 17, 2018 • edited Loading

p0pr0ck5 commented Jan 11, 2019

bungle commented Dec 17, 2018 •

edited

Loading

bungle commented Dec 17, 2018 •

edited

Loading

bungle commented Dec 17, 2018 •

edited

Loading