KAFKA-5932: Avoid call to fetchPrevious in FlushListeners #3978

bbejeck · 2017-09-27T20:28:41Z

No description provided.

bbejeck · 2017-09-27T20:28:59Z

mjsax · 2017-09-27T20:46:10Z

streams/src/main/java/org/apache/kafka/streams/state/internals/CachingWindowStore.java

@@ -110,16 +111,20 @@ private void maybeForward(final ThreadCache.DirtyEntry entry,
            final RecordContext current = context.recordContext();
            context.setRecordContext(entry.recordContext());
            try {
+                V previous = sendOldValues ? fetchPrevious(key, windowedKey.window().start()) : null;


Nit: previous -> oldValue

mjsax

One nit.

bbejeck · 2017-09-27T21:28:21Z

Updated for comments.

mjsax · 2017-09-27T22:15:03Z

PR seems reasonable to me. But don't know this part of the code very well.

dguy

1 minor and 1 not so minor but easy to fix comment. Would also be good to verify that for time window aggregations this does indeed provide some performance boost. I suspect it should as we avoid the rockdb seek

dguy · 2017-09-28T08:34:31Z

streams/src/main/java/org/apache/kafka/streams/state/internals/CachingWindowStore.java

@@ -110,16 +111,20 @@ private void maybeForward(final ThreadCache.DirtyEntry entry,
            final RecordContext current = context.recordContext();
            context.setRecordContext(entry.recordContext());
            try {
+                V oldValue = sendOldValues ? fetchPrevious(key, windowedKey.window().start()) : null;


nit: final

dguy · 2017-09-28T09:33:42Z

streams/src/main/java/org/apache/kafka/streams/state/internals/CachingSessionStore.java

@@ -170,7 +171,7 @@ private void putAndMaybeForward(final ThreadCache.DirtyEntry entry, final Intern
            final Bytes rawKey = Bytes.wrap(serdes.rawKey(key.key()));
            if (flushListener != null) {
                final AGG newValue = serdes.valueFrom(entry.newValue());
-                final AGG oldValue = fetchPrevious(rawKey, key.window());
+                final AGG oldValue = sendOldValues ? fetchPrevious(rawKey, key.window()) : null;


Here we need to do:
final Agg oldValue == newValue == null || sendOldValues ? fetchPervious(..) : null;
This is because SessionWindows have a dynamic time range, the the start is always fixed. So we need to send deletes for the previous smaller window when a window is merged, i.e, a simple count:
a@0 -> SessionKey(key=a start=0, end=0), 1
a@5 -> SessionKey(key=a start=0, end=0), null (delete this as it is merged)
SessionKey(key=a start=0, end=5), 2 (this is the new merged session)

xvrl · 2017-09-28T16:10:47Z

streams/src/main/java/org/apache/kafka/streams/state/internals/CachingSessionStore.java

@@ -170,7 +171,7 @@ private void putAndMaybeForward(final ThreadCache.DirtyEntry entry, final Intern
            final Bytes rawKey = Bytes.wrap(serdes.rawKey(key.key()));
            if (flushListener != null) {
                final AGG newValue = serdes.valueFrom(entry.newValue());
-                final AGG oldValue = fetchPrevious(rawKey, key.window());
+                final AGG oldValue = sendOldValues ? fetchPrevious(rawKey, key.window()) : null;


is there a reason why fetchPrevious has to rely on iterating over the store instead of just calling get() for the right key?

There is no get() for a SessionStore The key in the session store is a combination of the record key, start and end time. We only know the start time for the previous key so we need to find the previous session with the correct start time.

what about for window stores, couldn't we use get() there?

No, it's a similar situation, there isn't a get() for a WindowStore. The key is a combination of the record key and a timestamp when the record is placed in the store and records with the same key could be stored across multiple segments. @dguy correct me if I'm mistaken here.

I thought the timestamp would uniquely define the segment in which that key is stored.

@xvrl there is no get on WindowStore. We could add one and it would work in scenarios where we don't have duplicates, i.e., the key for a WindowStore is (recordkey, timestamp, sequenceNumber) - if the store doesn't have duplicates the sequence number is always 0. If the store does have duplicates then we don't know what the sequence number is.
Without a KIP to add a get() to WindowStore, the only thing we could do is add a bit of a hack to see if the inner most store is a RocksDBSegmentedBytesStore and then we could call get(..) on that. If it isn't, then we'd still need to call fetch.
For the DSL this would work as the only time we have duplicates in the WindowStore is for joins and we disable caching for those so it skips this code path. However, for the PAPI, we would need to always disable caching if duplicates are set. Which we probably should do anyway as it won't work as is.

Thanks for the explanation @dguy, very helpful to understand where caching and sequence numbers come into play. It might be worthwhile to put this in a JIRA somewhere. I do think it would be a useful optimization to have eventually, as fetches have some setup / teardown overhead.

bbejeck · 2017-09-28T20:38:54Z

updated this

xvrl · 2017-09-28T21:13:35Z

looks like one of my comments got swallowed in the update #3978 (comment)

bbejeck · 2017-09-28T21:41:22Z

@xvrl responded above

dguy · 2017-09-29T08:54:21Z

retest this please

dguy

Thanks @bbejeck, LGTM. We can consider doing further WindowStore optimizations in another PR

dguy · 2017-09-29T10:58:10Z

merged to trunk

bbejeck · 2017-09-29T12:51:15Z

To benchmark the changes I modified the count test in SimpleBenchmark to this

input.groupByKey() .windowedBy(TimeWindows.of(500))
            .count(Materialized.<Integer, Long, WindowStore<Bytes, byte[]>>as("tmpStoreName")
            .withValueSerde(Serdes.Long()).withCachingEnabled())
            .toStream().foreach(new CountDownAction(latch));

Bottom line, skipping fetchPrevious helps performance by roughly 30%.
Here is the results spreadsheet

xvrl · 2017-09-29T16:49:46Z

@bbejeck thanks for the benchmark numbers. I'm assuming you ran those with default caching settings? If so that's a pretty big improvement given that we probably aren't flushing the cache very often, but it's hard to interpret without knowing how often we flush. I'd be curious to see what the "raw" improvements would be if we set the cache size to zero.

bbejeck · 2017-09-29T17:22:43Z

@xvrl no problem.

Yes, I ran those benchmarks using default cache settings. I can re-run the benchmarks and set the cache size to zero and update the results.

bbejeck · 2017-10-03T15:31:17Z

@xvrl here's an updated benchmark using the same code, but with caching set to zero

KAFKA-5932: Avoid call to fetchPrevious in FlushListeners

1fa488d

mjsax reviewed Sep 27, 2017

View reviewed changes

KAFKA-5932: updates for comments

b4f97e8

dguy reviewed Sep 28, 2017

View reviewed changes

xvrl reviewed Sep 28, 2017

View reviewed changes

KAFKA-5932: updates for comments

eb1177b

dguy approved these changes Sep 29, 2017

View reviewed changes

asfgit closed this in 36556b8 Sep 29, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-5932: Avoid call to fetchPrevious in FlushListeners #3978

KAFKA-5932: Avoid call to fetchPrevious in FlushListeners #3978

bbejeck commented Sep 27, 2017

bbejeck commented Sep 27, 2017

mjsax Sep 27, 2017

mjsax left a comment

bbejeck commented Sep 27, 2017

mjsax commented Sep 27, 2017

dguy left a comment

dguy Sep 28, 2017

dguy Sep 28, 2017

xvrl Sep 28, 2017

dguy Sep 28, 2017

xvrl Sep 28, 2017

bbejeck Sep 28, 2017

xvrl Sep 28, 2017

dguy Sep 29, 2017 •

edited

xvrl Sep 29, 2017

bbejeck commented Sep 28, 2017

xvrl commented Sep 28, 2017

bbejeck commented Sep 28, 2017

dguy commented Sep 29, 2017

dguy left a comment

dguy commented Sep 29, 2017

bbejeck commented Sep 29, 2017 •

edited

xvrl commented Sep 29, 2017 •

edited

bbejeck commented Sep 29, 2017

bbejeck commented Oct 3, 2017 •

edited

KAFKA-5932: Avoid call to fetchPrevious in FlushListeners #3978

KAFKA-5932: Avoid call to fetchPrevious in FlushListeners #3978

Conversation

bbejeck commented Sep 27, 2017

bbejeck commented Sep 27, 2017

Choose a reason for hiding this comment

mjsax left a comment

Choose a reason for hiding this comment

bbejeck commented Sep 27, 2017

mjsax commented Sep 27, 2017

dguy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dguy Sep 29, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bbejeck commented Sep 28, 2017

xvrl commented Sep 28, 2017

bbejeck commented Sep 28, 2017

dguy commented Sep 29, 2017

dguy left a comment

Choose a reason for hiding this comment

dguy commented Sep 29, 2017

bbejeck commented Sep 29, 2017 • edited

xvrl commented Sep 29, 2017 • edited

bbejeck commented Sep 29, 2017

bbejeck commented Oct 3, 2017 • edited

dguy Sep 29, 2017 •

edited

bbejeck commented Sep 29, 2017 •

edited

xvrl commented Sep 29, 2017 •

edited

bbejeck commented Oct 3, 2017 •

edited