New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a simple cache for objects stored in etcd #7288
Conversation
func (h *EtcdHelper) GetFromCache(index uint64) (obj reflect.Value, ok bool) { | ||
h.mutex.RLock() | ||
defer h.mutex.RUnlock() | ||
obj, ok = h.cache[index] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would need to do a copy here to prevent people from mutating the cache - add that so your benchmarks are accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is unfortunately true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you can do the copy after unlocking.
@@ -38,6 +39,8 @@ type EtcdHelper struct { | |||
Codec runtime.Codec | |||
// optional, no atomic operations can be performed without this interface | |||
Versioner EtcdVersioner | |||
cache map[uint64]reflect.Value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment explaining this cache... including why you chose reflect.Value to store instead of runtime.Object, that it depends on etcd's indexes being globally unique per object (so if we make multiple etcd clusters, we'd need to include object key as part of this key), and that it's limited to N entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
I like this a lot more than I was expecting to. :) But I fear adding a deep copy call in the cache is going to give back most of the performance gains. :( |
My money is on it'll be 10% better. I'd really like to wait on the local caching until we have the unified watcher from etcd. Then we will be in a much better place to impose "fake etcdhelper". Just my 2c ----- Original Message -----
|
179bba0
to
0bef09e
Compare
const maxEtcdCacheEntries int = 100000 | ||
|
||
func (h *EtcdHelper) getFromCache(index uint64) (interface{}, bool) { | ||
var obj interface{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the measured performance of this after you make these changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately we don't have a good metric to measure this, so I will use the same as I've used initially:
- e2e performance tests still run ~25% faster (from ~20 min down to ~15 min)
- cpu usage is slightly higher and profiling shows that DeepCopy uses up to 30% (usually around 10%). Still CPU is not saturated and uses 100% only rarely (but is very close).
Also latency metrics didn't degrade - 99%tile of api call, except list pods, run below 1s (checked manually based on apiserver metrics).
On a related note - we need a better metric, than e2e total time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
----- Original Message -----
} return nil
}
+// etcdCache defines interface used for caching objects stored in etcd.
Objects are keyed by
+// their Node.ModifiedIndex, which is unique across all types.
+// All implementations must be thread-safe.
+type etcdCache interface {
- getFromCache(index uint64) (interface{}, bool)
- addToCache(index uint64, obj interface{})
+}
+const maxEtcdCacheEntries int = 100000
+
+func (h *EtcdHelper) getFromCache(index uint64) (interface{}, bool) {
- var obj interface{}
Unfortunately we don't have a good metric to measure this, so I will the same
as I've used initially:
- e2e performance tests still run ~25% faster (from ~20 min down to ~15 min)
- cpu usage is slightly higher and profiling shows that DeepCopy uses up to
30% (usually around 10%). Still CPU is not saturated and uses 100% only
rarely (but is very close).
How much more memory?
Fortunately, the work being done for conversion can be used to implement efficient DeepCopy, so I would expect that portion to go down.
Also latency metrics didn't degrade - 99%tile of api call, except list pods,
run below 1s (checked manually based on apiserver metrics).On a related note - we need a better metric, than e2e total time.
Agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fortunately, the work being done for conversion can be used to implement efficient DeepCopy, so I would expect that portion to go down.
Yeah, but that work should also make this caching mechanism mostly unnecessary, after that is finished this will only save on unmarshalling costs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton I checked memory and (surprisingly) memory footprint is much lower with cache. Maximum reserved memory during performance tests (100 nodes, 3000 pods) is:
- 2.9 GB without cache
- 1.8 GB with cache
It seems that parsing JSON uses a lot of memory. If it's done in multliple goroutines concurrently it can affect memory usage pretty significantly
@lavalamp I think that even with fast conversions we will still benefit from cache due to:
- reduce memory footprint
- unmarshalling JSON can actually use a lot of CPU (I've seen profiles with >20%)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Apr 28, 2015, at 7:21 AM, Filip Grzadkowski notifications@github.com wrote:
In pkg/tools/etcd_helper.go:
} return nil
}
+// etcdCache defines interface used for caching objects stored in etcd. Objects are keyed by
+// their Node.ModifiedIndex, which is unique across all types.
+// All implementations must be thread-safe.
+type etcdCache interface {
- getFromCache(index uint64) (interface{}, bool)
- addToCache(index uint64, obj interface{})
+}
+const maxEtcdCacheEntries int = 100000
+
+func (h *EtcdHelper) getFromCache(index uint64) (interface{}, bool) {
- var obj interface{}
@smarterclayton I checked memory and (surprisingly) memory footprint is much lower with cache. Maximum reserved memory during performance tests (100 nodes, 3000 pods) is:2.9 GB without cache
1.8 GB with cache It seems that parsing JSON uses a lot of memory. If it's done in multliple goroutines concurrently it can affect memory usage pretty significantly
@lavalamp I think that even with fast conversions we will still benefit from cache due to:reduce memory footprint
unmarshalling JSON can actually use a lot of CPU (I've seen profiles with >20%)
I'd like I see a quick investigation of ugorji after the conversion work is in place - if unmarshalling JSON is making a significant memory impact and ugorji is cheap to setup (since I believe it can fall back to the default json serializer implementation), it may be a very quick win for us.
—
Reply to this email directly or view it on GitHub.
I addressed all of the initial comments, tests pass so this PR is ready for normal review. |
@@ -205,7 +217,7 @@ func TestWatchEtcdError(t *testing.T) { | |||
fakeClient := NewFakeEtcdClient(t) | |||
fakeClient.expectNotFoundGetSet["/some/key"] = struct{}{} | |||
fakeClient.WatchImmediateError = fmt.Errorf("immediate error") | |||
h := EtcdHelper{fakeClient, codec, versioner} | |||
h := NewEtcdHelper(fakeClient, codec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little worried about removing the versioner setting here-- it means that this test will test inconsistent sets of things if someone changes the "versioner" at the top of the file OR if someone changes the versioner that NewEtcdHelper uses. It's not too likely to change so I'm not too worried, just vaguely uneasy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, but on the other hand we want test setup that is consistent with prod.
LGTM-- travis appears to have found a data race that you should probably figure out before we merge. |
// have to revisited if we decide to do things like multiple etcd clusters, or etcd will | ||
// support multi-object transaction that will result in many objects with the same index. | ||
// Number of entries stored in the cache is controlled by maxEtcdCacheEntries constant. | ||
cache map[uint64]interface{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: recommend storing runtime.Object
instead of interface{}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that would require additional casting, which doesn't improve readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may not improve readability, but it tells readers a very important feature of the objects they will find in this cache. Types should be as narrowly scoped as possible.
1434872
to
965c0ba
Compare
Won't there be a consistency issue here if we have multiple apiservers? @rrati ^ FYI. |
@timothysc If we have multiple apiservers, we do not need the same cache entries. We only require that etcd returns the same ModifiedIndex for the same objects (which will be the case). |
Can you get numbers again with the second deep copy? Also I really would perfer to use runtime.Object as the type of the cache... |
DeepCopy in write path does not change anything, because of hit ration (1 write for ~100 reads) Per request changed to runtime.Object. |
e2e tests pass. |
LGTM, no more comments. ----- Original Message -----
|
LGTM |
Add a simple cache for objects stored in etcd
What's the status now? Looks like it got reverted. |
@timothysc - yes it was reverted, but the hopfully fixed version is send out for review - see #7559 |
This is a prototype of a simple cache that stores objects stored in etcd. It uses ModifiedIndex as a key, assuming that each modification changes only one object.
It's pretty hard to benchmark effect of this change, but I have measurements and observations:
@wojtek-t @davidopp @lavalamp @timothysc @smarterclayton