[9/N][Emit final] Emit final for session window aggregations #12204

guozhangwang · 2022-05-24T17:26:58Z

Add a new API for session windows to range query session window by end time (KIP related).
Augment session window aggregator with emit strategy.
Minor: consolidated some dup classes.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

guozhangwang

@lihaosky @mjsax This PR is not mergable yet since I've not committed any new tests. But I'd like to solicit your feedbacks earlier, especially on two open question I made above (one about the public API, one about the inefficiency due to conservative upper/lowerRange with null key).

guozhangwang · 2022-05-24T17:29:05Z

streams/src/main/java/org/apache/kafka/streams/state/SessionStore.java

@@ -39,6 +39,13 @@
 */
 public interface SessionStore<K, AGG> extends StateStore, ReadOnlySessionStore<K, AGG> {

+    // TODO: javadoc; both ends are inclusive
+    default KeyValueIterator<Windowed<K>, AGG> findSessions(final Instant earliestSessionEndTime,


This is related to 1) in the description, and the first open question: is this public API worth to add? Note I added it into SessionStore not ReadOnlySessionStore, to not expose via IQv1, also I've only added this function for Instant param type as well.

I think there is no way around it? In the end, we allow users to plugin a custom session-store -- thus, if the use the new emit-final, why will need to implement this new method -- existing code with custom session-stores should not break, because existing code does neither implement but also not sure this new method.

If we don't make it public API, we would prevent users to pass in custom session-stores in combination with the new emit-final feature, what seems to be too restrictive?

Why did you pick Instant over long (wondering if long might be better as it's more an internal API)?

guozhangwang · 2022-05-24T17:29:29Z

streams/src/main/java/org/apache/kafka/streams/state/internals/InMemorySessionStore.java

@@ -202,25 +205,43 @@ public void remove(final Windowed<Bytes> sessionKey) {

    @Override
    public byte[] fetchSession(final Bytes key,
-                               final long earliestSessionEndTime,
-                               final long latestSessionStartTime) {
+                               final long sessionStartTime,


This is a minor fix on the param names: the old ones are simply wrong and misleading.

guozhangwang · 2022-05-24T17:36:15Z

...s/src/main/java/org/apache/kafka/streams/state/internals/RocksDBTimeOrderedSessionStore.java

+        final long latestEndTime = ApiUtils.validateMillisecondInstant(latestSessionEndTime,
+                prepareMillisCheckFailMsgPrefix(latestSessionEndTime, "latestSessionEndTime"));
+
+        final KeyValueIterator<Bytes, byte[]> bytesIterator = wrapped().fetchAll(earliestEndTime, latestEndTime);


This is the second open question: with the current prefixed (base, i.e. time-first) session key schema, this fetchAll would be effectively searching for [earliestEnd, INF] because of this logic: https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/state/internals/PrefixedSessionKeySchemas.java#L46

This is because we translate the range query without key inside AbstractRocksDBTimeOrderedSegmentedBytesStore by using the lower/upperRange instead of lower/upperRangeFixedSize): https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/state/internals/AbstractRocksDBTimeOrderedSegmentedBytesStore.java#L241-L242

I cannot remember why we need to do this. @lihaosky @mjsax do you remember why?

Not sure -- I always need to think very hard to understand (not even sure if I succeed) the fetch logic and how we compute the bounds...

But same question as above: why do we need this new method instead of calling findSessions(null, null, A, B) -- I briefly dug into the code and it seems it would do the same thing?

The main reason is that for emit final I need a range API that based on endTime for both ends. And that's also why within its implementation I'd have to use fetchAll instead of fetch here.

If I read the code correctly, what fetchAll() does is correct: from my understanding, that fetchAll() is implement to find "overlapping sessions" given a lower and upper bound -- the lower bound must be smaller and session end and the upper bound must be smaller than session start to find an overlap. Because the upper bound compares to session start,, and we use the "base" we need to search the full "data/base part" of the store.

I guess the issue is, that you actually cannot use fetchAll() at all for our purpose here? Passing in lastEndTime does not work (does it) as it would be used to compare to session start-times, but we want to do a comparison to session end time. -- Thus, I think the right solution is, to actually also add the new findSessions() to the internal SegmentedStore and implement a proper iterator there?

Sounds good. I would just have a special handling on both lower/upper bound as well as the hasNext function for this specific purpose.

guozhangwang · 2022-05-24T17:36:40Z

streams/src/main/java/org/apache/kafka/streams/state/internals/SegmentedBytesStore.java

@@ -91,8 +91,8 @@ public interface SegmentedBytesStore extends StateStore {
    /**
     * Gets all the key-value pairs that belong to the windows within in the given time range.
     *
-     * @param from the beginning of the time slot from which to search
-     * @param to   the end of the time slot from which to search
+     * @param from the beginning of the time slot from which to search (inclusive)


Minor javadoc improvement to remind developers.

guozhangwang · 2022-05-24T17:37:38Z

streams/src/main/java/org/apache/kafka/streams/kstream/internals/TimestampedTupleForwarder.java

@@ -38,7 +39,7 @@
    @SuppressWarnings({"unchecked", "rawtypes"})
    TimestampedTupleForwarder(final StateStore store,
                              final ProcessorContext<K, Change<V>> context,
-                              final TimestampedCacheFlushListener<K, V> flushListener,
+                              final CacheFlushListener<K, ?> flushListener,


This is 3) in the description: as we can use the base class, we then would not need the duplicated TupleFowarder any more.

guozhangwang · 2022-05-24T17:38:51Z

.../src/main/java/org/apache/kafka/streams/kstream/internals/KStreamSessionWindowAggregate.java

+            tupleForwarder.maybeForward(new Record<>(windowedkey, new Change<>(newAgg, sendOldValues ? oldAgg : null), newTimestamp));
+        }
+
+        // TODO: consolidate SessionWindow with TimeWindow to merge common functions


I realize that our SessionWindow and TimeWindow, and even SlidingWindow caused many code duplications (e.g. here) where we can just consolidate into the same class, with boolean flags indicating if the start/end are inclusive or exclusive, with that we can further reduce code duplication. Will file a JIRA for it.

guozhangwang · 2022-05-24T17:39:28Z

.../src/main/java/org/apache/kafka/streams/kstream/internals/KStreamSessionWindowAggregate.java

                    }
                }

                agg = aggregator.apply(record.key(), record.value(), agg);
                final Windowed<KIn> sessionKey = new Windowed<>(record.key(), mergedWindow);
                store.put(sessionKey, agg);
+
+                maybeForwardUpdate(sessionKey, null, agg, record.timestamp());
+                /*


Will remove commented out code when removing WIP, ditto elsewhere.

…ession-aggregation-impl

mjsax · 2022-06-01T23:36:56Z

.../src/main/java/org/apache/kafka/streams/kstream/internals/KStreamSessionWindowAggregate.java

        private long observedStreamTime = ConsumerRecord.NO_TIMESTAMP;
+        private InternalProcessorContext<Windowed<KIn>, Change<VAgg>> internalProcessorContext;
+
+        private final Time time = Time.SYSTEM;


Should we not better pass in a Time object, so we can mock it using TTD?

Makes sense. I will do this in a follow-up PR after merging this.

mjsax · 2022-06-01T23:42:29Z

.../src/main/java/org/apache/kafka/streams/kstream/internals/KStreamSessionWindowAggregate.java

+            }
+
+            // Update the sent record timestamp to the window end time if possible
+            final long newTimestamp = windowedkey.key() != null ? windowedkey.window().end() : oldTimestamp;


For what case could windowedkey.key() == null ? Is this even possible?

This behavior was meant to inherit from the deleted code: https://github.com/apache/kafka/pull/12204/files#diff-85c8c92d464af8eb3a60684bf929725f8fc5263353c38cacc20bee4cefe4fd9eL53, but after checking that logic I now realized it's not necessary anymore (the original PR has to do so since we cannot programmatically guarantee it's always not null but in this change we do not have that concern anymore).

Will remove.

mjsax · 2022-06-01T23:55:37Z

.../org/apache/kafka/streams/kstream/internals/AbstractKStreamTimeWindowAggregateProcessor.java

-        emitFinalLatencySensor = emitFinalLatencySensor(threadId, context.taskId().toString(),
-            internalProcessorContext.currentNode().name(), metrics);
+        emittedRecordsSensor = emittedRecordsSensor(threadId, context.taskId().toString(), processorName, metrics);
+        emitFinalLatencySensor = emitFinalLatencySensor(threadId, context.taskId().toString(), processorName, metrics);


Thanks for all the cleanup -- it's somewhat distracting from the actual changes.

Can we (in the future) extract refactorings/cleanups into individual PRs to simplify reviewing?

Yes! It's my bad to mingle them together here.

mjsax · 2022-06-02T00:00:32Z

...s/src/main/java/org/apache/kafka/streams/state/internals/RocksDBTimeOrderedSessionStore.java

+        final long latestEndTime = ApiUtils.validateMillisecondInstant(latestSessionEndTime,
+                prepareMillisCheckFailMsgPrefix(latestSessionEndTime, "latestSessionEndTime"));
+
+        final KeyValueIterator<Bytes, byte[]> bytesIterator = wrapped().fetchAll(earliestEndTime, latestEndTime);


Not sure -- I always need to think very hard to understand (not even sure if I succeed) the fetch logic and how we compute the bounds...

But same question as above: why do we need this new method instead of calling findSessions(null, null, A, B) -- I briefly dug into the code and it seems it would do the same thing?

mjsax · 2022-06-02T00:06:46Z

streams/src/main/java/org/apache/kafka/streams/state/internals/InMemorySessionStore.java

+                                   null,
+                                    Long.MAX_VALUE,
+                                    endTimeMap.subMap(earliestEndTime, latestEndTime + 1).entrySet().iterator(),
+                                    true);


Not sure if I fully understand why we add this new method instead of calling findSession(null, null, A, B) ?

The code to create the iterator is different, but I am also not sure why. Is it semantically actually the same? Calling findSession(null, null, A, B) would do:

registerNewIterator(null, // same null, // same latestSessionStartTime, // why does your code pass Long.MAX_VALUE, // but because we use `tailMap` instead of `subMap` below, it seems to do the same thing overall? endTimeMap.tailMap(earliestSessionEndTime, true).entrySet().iterator(), true); // same

Logically, the main reason is that for emit-final, we need a range query where the from/to are both endTime, i.e. you can see the parameters are earliestSessionEndTime and latestSessionEndTime.

Whereas for the existing functions, their semantics are based on earliestSessionEndTime but latestSessionStartTime. And that's also the reason for using Long.MAX_VALUE here.

On the physical implementation, the main difference is not in the in-memory session store, but the rocksDB session store. I will reply there separately.

Ok. I read the code of InMemorySessionStore in detail and now understand what's going on. This LGTM.

…ession-aggregation-impl

mjsax · 2022-06-15T19:32:11Z

streams/src/main/java/org/apache/kafka/streams/state/internals/InMemorySessionStore.java

+        return registerNewIterator(null,
+                                   null,
+                                    Long.MAX_VALUE,
+                                    endTimeMap.subMap(earliestEndTime, latestEndTime + 1).entrySet().iterator(),


Nit: can we call subMap(earliestEndTime, true, latestEndTime, true) which is the same thing but more "intuitive" as we always search for inclusive bounds throughout the code (otherwise, this is the only place which has an exclusive upper bound).

mjsax · 2022-06-15T19:32:58Z

streams/src/main/java/org/apache/kafka/streams/state/internals/InMemorySessionStore.java

+        // since subMap is exclusive on toKey, we need to plus one
+        return registerNewIterator(null,
+                                   null,
+                                    Long.MAX_VALUE,


nit: indention

…ession-aggregation-impl

mjsax · 2022-06-27T23:16:24Z

.../src/main/java/org/apache/kafka/streams/kstream/internals/KStreamSessionWindowAggregate.java

+        }
+
+        private long emitRangeLowerBound() {
+            return lastEmitWindowCloseTime == ConsumerRecord.NO_TIMESTAMP ? 0L : Math.max(0L, lastEmitWindowCloseTime);


nit: can this be simplified to Math.max(0L, lastEmitWindowCloseTime)? (Can also be address in follow up PR)

mjsax · 2022-06-27T23:17:57Z

.../src/main/java/org/apache/kafka/streams/kstream/internals/KStreamSessionWindowAggregate.java

+                                  final long windowCloseTime,
+                                  final long emitRangeLowerBound,
+                                  final long emitRangeUpperBound) {
+            final long startMs = time.milliseconds();


Should we use milliseconds or nanoseconds (I am always unsure)

Should be ms to be consistent with other metrics.

mjsax · 2022-06-27T23:24:58Z

streams/src/main/java/org/apache/kafka/streams/state/internals/PrefixedSessionKeySchemas.java

@@ -204,7 +203,8 @@ public static void writeBinary(final ByteBuffer buf,
                                       final long endTime) {
            buf.putLong(endTime);
            buf.putLong(startTime);
-            buf.put(key.get());
+            if (key != null)


Can key ever be null here? (nit: add {} to block)

Yes it's possible since it's used to write lower/upper boundaries in which keys could be null.

mjsax · 2022-06-27T23:26:26Z

...a/org/apache/kafka/streams/state/internals/RocksDBTimeOrderedSessionSegmentedBytesStore.java

+                        final Windowed<Bytes> windowedKey = TimeFirstSessionKeySchema.from(bytes);
+                        final long endTime = windowedKey.window().end();
+
+                        if (endTime <= latestSessionEndTime && endTime >= earliestSessionEndTime)


nit: add {} to block

mjsax · 2022-06-27T23:27:38Z

streams/src/main/java/org/apache/kafka/streams/state/internals/SessionKeySchema.java

@@ -35,7 +35,7 @@ public class SessionKeySchema implements SegmentedBytesStore.KeySchema {
    private static final byte[] MIN_SUFFIX = new byte[SUFFIX_SIZE];

    public static int keyByteLength(final Bytes key) {
-        return key.get().length + 2 * TIMESTAMP_SIZE;
+        return (key == null ? 0 : key.get().length) + 2 * TIMESTAMP_SIZE;


Can key ever be null here?

Yes that's possible -- see above comment, for the lower/upper bound cases.

mjsax

Overall LGTM.

A few nits. Also, there is some missing JavaDoc and stuff you did just put into comments but not remove yet. Also ok to cleanup in a follow up PR.

guozhangwang · 2022-06-28T02:19:34Z

Thanks @mjsax , I've addressed your comment, and also added the test coverage. While adding tests I noticed a bug in the code and fixed it (see my comment above).

Could you please take another look?

guozhangwang · 2022-06-28T02:21:58Z

.../src/main/java/org/apache/kafka/streams/kstream/internals/KStreamSessionWindowAggregate.java

+            return windowCloseTime - 1;
+        }
+
+        private boolean shouldRangeFetch(final long emitRangeLowerBound, final long emitRangeUpperBound) {


This is one minor bug I detected in the latest commit.

guozhangwang · 2022-06-28T02:22:28Z

...a/org/apache/kafka/streams/state/internals/RocksDBTimeOrderedSessionSegmentedBytesStore.java

+                                                         final long latestSessionEndTime) {
+        final List<KeyValueSegment> searchSpace = segments.segments(earliestSessionEndTime, latestSessionEndTime, true);
+
+        // here we want [0, latestSE, FF] as the upper bound to cover any possible keys,


This is the other minor bug I detected in the latest commit.

guozhangwang · 2022-06-28T02:23:32Z

streams/src/main/java/org/apache/kafka/streams/state/internals/SessionStoreBuilder.java

@@ -52,6 +53,16 @@ private SessionStore<Bytes, byte[]> maybeWrapCaching(final SessionStore<Bytes, b
        if (!enableCaching) {
            return inner;
        }
+
+        if (!inner.persistent()) {


Here's the other change I made in order to work around the current tricky situation, since in-memory stores are always "time ordered" as well, we stripe the caching if the inner store is not persistent. cc @mjsax

If we strip the caching, this applies to eager emitting, right? -- So it would be a behavioral change? Do we want to piggy-back such a change into this KIP? Sounds "risky"?

Sounds good, I will move this logic into the earlier stage during the topology building phase for the moment. Also cc @lihaosky who would do similar things for sliding windows.

mjsax · 2022-06-28T20:47:19Z

...a/org/apache/kafka/streams/kstream/internals/KStreamSessionWindowAggregateProcessorTest.java

@@ -253,20 +300,29 @@ public void shouldRemoveMergedSessionsFromStateStore() {

    @Test
    public void shouldHandleMultipleSessionsAndMerging() {
+        time.sleep(1001L);


Might it be easier to change the internal config of the "emit interval" and set it to zero (instead of advancing time)?

…ession-aggregation-impl

mjsax · 2022-06-28T23:43:44Z

LGTM (assuming Jenkins passes).

guozhangwang · 2022-06-28T23:47:02Z

...ams/src/main/java/org/apache/kafka/streams/kstream/internals/SessionWindowedKStreamImpl.java

@@ -286,7 +286,8 @@ private <VR> StoreBuilder<SessionStore<K, VR>> materialize(final MaterializedInt
            builder.withLoggingDisabled();
        }

-        if (materialized.cachingEnabled()) {
+        // do not enable cache if the emit final strategy is used


cc @lihaosky this is what I did for session store during the topology building phase.

…12204) * Add a new API for session windows to range query session window by end time (KIP related). * Augment session window aggregator with emit strategy. * Minor: consolidated some dup classes. * Test: unit test on session window aggregator. Reviewers: Guozhang Wang <wangguoz@gmail.com>

guozhangwang added 2 commits May 24, 2022 10:21

first commit to have logic ready

d40ac90

remove dup classes

fadd900

guozhangwang changed the title ~~[9/N][Emit final] Emit final for session window aggregations~~ [9/N WIP][Emit final] Emit final for session window aggregations May 24, 2022

guozhangwang commented May 24, 2022

View reviewed changes

guozhangwang requested a review from mjsax May 24, 2022 17:42

guozhangwang added 2 commits May 25, 2022 09:50

Merge branch 'trunk' of https://github.com/apache/kafka into K13785-s…

16c4ebd

…ession-aggregation-impl

do not enable caching with time ordered session

5ef28a3

mjsax reviewed Jun 2, 2022

View reviewed changes

Merge branch 'trunk' of https://github.com/apache/kafka into K13785-s…

e37249a

…ession-aggregation-impl

mjsax reviewed Jun 15, 2022

View reviewed changes

guozhangwang changed the title ~~[9/N WIP][Emit final] Emit final for session window aggregations~~ [9/N][Emit final] Emit final for session window aggregations Jun 24, 2022

guozhangwang added 2 commits June 24, 2022 14:22

Merge branch 'trunk' of https://github.com/apache/kafka into K13785-s…

fda662f

…ession-aggregation-impl

github comments

3942fc6

mjsax reviewed Jun 27, 2022

View reviewed changes

mjsax approved these changes Jun 27, 2022

View reviewed changes

add unit tests, github comments

66c7f35

guozhangwang commented Jun 28, 2022

View reviewed changes

guozhangwang added 2 commits June 27, 2022 19:28

checkystle and findbugs

0a0f35c

unit tests

484de9d

mjsax reviewed Jun 28, 2022

View reviewed changes

guozhangwang added 3 commits June 28, 2022 15:32

Merge branch 'trunk' of https://github.com/apache/kafka into K13785-s…

da1d91e

…ession-aggregation-impl

github comments

d48119f

github comments 2

578526a

guozhangwang commented Jun 28, 2022

View reviewed changes

guozhangwang merged commit ababc42 into apache:trunk Jun 29, 2022

guozhangwang deleted the K13785-session-aggregation-impl branch June 29, 2022 16:22

mjsax mentioned this pull request Nov 8, 2023

KAFKA-15774: refactor windowed stores to use StoreFactory #14708

Merged

3 tasks

[9/N][Emit final] Emit final for session window aggregations #12204

[9/N][Emit final] Emit final for session window aggregations #12204

Conversation

guozhangwang commented May 24, 2022

Committer Checklist (excluded from commit message)

guozhangwang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guozhangwang May 24, 2022 • edited by mjsax

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mjsax left a comment

Choose a reason for hiding this comment

guozhangwang commented Jun 28, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mjsax commented Jun 28, 2022

Choose a reason for hiding this comment

guozhangwang May 24, 2022 •

edited by mjsax