KAFKA-6474: remove KStreamTestDriver #6732

vvcephei · 2019-05-14T19:54:44Z

The implementation of KIP-258 broke the state store methods in KStreamTestDriver.
These methods were unused in this project, so the breakage was not detected.
Since this is an internal testing utility, and it was deprecated and partially removed in
favor of TopologyTestDriver, I opted to just complete the removal of the class.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

vvcephei

Hey @ableegoldman @abbccdda @bbejeck , do you mind taking a look at this?

Someone reported that KIP-258 broke their tests, but it turned out they were using this deprecated, internal test utility. Rather than "fix" it, we can just go ahead and finally remove it.

vvcephei · 2019-05-14T19:55:32Z

streams/src/test/java/org/apache/kafka/streams/kstream/internals/KStreamTransformTest.java

+                        PunctuationType.WALL_CLOCK_TIME,
+                        timestamp -> context.forward(-1, (int) timestamp)
+                    );
+                }


Re-introducing test logic that had been previously removed (see the lines below that I uncommented).

streams/src/test/java/org/apache/kafka/streams/kstream/internals/KStreamTransformTest.java

vvcephei · 2019-05-14T19:57:44Z

streams/src/test/java/org/apache/kafka/streams/kstream/internals/KStreamTransformTest.java

+                "200:1110 (ts: 0)",
+                "2000:11110 (ts: 0)",
+                "-1:2 (ts: 2)",
+                "-1:3 (ts: 3)"


These two extra results are from the punctuation. You can see that they were previously expected in the commented-out expectation on old line 89.

vvcephei · 2019-05-14T19:58:57Z

streams/src/test/java/org/apache/kafka/streams/kstream/internals/KTableAggregateTest.java

+                    "B:0+2-2+4-4 (ts: 0)",
+                    "B:0+2-2+4-4+7 (ts: 0)",
+                    "C:0+5-5 (ts: 0)",
+                    "C:0+5-5+8 (ts: 0)"),


Note that there are some extra intermediate states here, because TTD doesn't do caching (or rather, flushes after each record)

Not sure if I can follow. The old code calls driver.flushState(); after each driver.process(...) call.

Answered this above. Even though we flush between input records, the internal streams code still forwards the retraction first, and then the addition. Those would previously get cached together, but TTD passes them though individually.

vvcephei · 2019-05-14T20:00:11Z

streams/src/test/java/org/apache/kafka/streams/kstream/internals/KTableAggregateTest.java

+                    "A:0+1-1 (ts: 0)",
+                    "A:0+1-1+3 (ts: 0)",
+                    "A:0+1-1+3-3 (ts: 0)",
+                    "A:0+1-1+3-3+4 (ts: 0)"


This difference is a mystery to me. We previously expected A:0+4 (ts: 0). In other words, the processor never even saw the intermediate records. I can't figure out the mechanism for this, so there's a risk we're not testing the right thing here anymore. Help?

I'm not too (at all) familiar with either driver, but isn't the "mechanism" just that before we were only flushing after processing all three records, whereas now we're flushing on every record (and thus see these intermediates as you point out above)?

Yeah, but what seems strange to me is that the processor should still have seen all the events, and then caching should have absorbed some of the intermediate ones, but looking at the old output, it seems that the processor never saw those intermediate events, which indicates that there was some caching upstream of the processor. There's only a source KTable, but it's not materialized. I was thinking that it should not actually be stored in that case, but now that I'm reflecting on it again, I think that's only if you have optimizations enabled. So, this would explain it.

@vvcephei I agree with you plus there's a call driver.flushState(); (line 142 of the removed code) so indeed I suspect that it was caching that was filtering the intermediate records.

The source KTable is materialized for this case. That's why groupBy().aggregate() is only executed after the flush on the last value per key.

However, with TopologyTestDriver we cannot mimic this behavior any longer. Hence, it would make sense to delete this test, as it's the same as testAggBasic

Roger, wilco.

vvcephei · 2019-05-14T20:00:56Z

streams/src/test/java/org/apache/kafka/streams/kstream/internals/KTableAggregateTest.java

+            assertEquals(
+                asList(
+                    "green:1 (ts: 0)",
+                    "green:2 (ts: 0)",


a couple of extra intermediate states here.

vvcephei · 2019-05-14T20:01:26Z

streams/src/test/java/org/apache/kafka/streams/kstream/internals/KTableAggregateTest.java

+                    "1:1 (ts: 0)",
+                    "1:12 (ts: 0)",
+                    "1:2 (ts: 0)",
+                    "1: (ts: 0)",


one extra intermediate state here.

vvcephei · 2019-05-14T20:02:38Z

streams/src/test/java/org/apache/kafka/streams/kstream/internals/KTableAggregateTest.java

-        // that in turn will cause an eviction on reducer-topic. It will flush
-        // key 2 as it is the only dirty entry in the cache
-        driver.process("tableOne", "1", "5");
-        assertEquals(Long.valueOf(4L), reduceResults.get("2"));


This logic isn't possible to test with TTD, so it would have to become an full integration test. But, it seems like it's testing the logic of caching and repartitioning, not the logic of the KTableAggregate, so I'm proposing to remove the test.

Thoughts?

streams/src/test/java/org/apache/kafka/streams/kstream/internals/KStreamTransformTest.java

ableegoldman

LGTM!

vvcephei · 2019-05-15T14:22:15Z

@mjsax maybe you can have a look at this, if you have a chance.

abbccdda

One high level comment: do you think it's worth to add non-zero timestamp records into this test? Will that help use verify the order?

abbccdda · 2019-05-15T16:00:54Z

streams/src/test/java/org/apache/kafka/streams/kstream/internals/KStreamTransformTest.java


-        final String[] expected = {"2:10 (ts: 0)", "20:110 (ts: 0)", "200:1110 (ts: 0)", "2000:11110 (ts: 0)"};
+            final String[] expected = {


Could we init expected first and use its size for assertion on L100?

mjsax · 2019-05-15T16:12:35Z

@vvcephei Thanks for the PR. This should be covered as https://issues.apache.org/jira/browse/KAFKA-6474 -- not as minor PR. Please also comment on the ticket -- don't think we need to reassign it, as @h314to did the lion's share of the ticket. Just to make sure we are all on the same page.

vvcephei · 2019-05-15T16:18:30Z

Hi, @h314to ! I hope I didn't step on your toes. @mjsax just reminded me that this PR (which I made for other reasons) is actually in the scope of your ticket. Do you mind giving me a review?

vvcephei · 2019-05-15T16:20:38Z

Hey @abbccdda , thanks for the review!

We certainly could add non-zero timestamps to the test, but I don't think it would help us verify the order, since TTD always just processes one record at a time, and we're already asserting a specific order in the expectation array. WDYT?

vvcephei · 2019-05-15T16:23:39Z

Thanks for the reminder @mjsax ! I've changed the title of the PR.

bbejeck · 2019-05-16T14:58:38Z

Java 8 failed, Java 11 passed, but test results already cleaned up

retest this please

bbejeck

Thanks for the PR @vvcephei LGTM.

bbejeck · 2019-05-16T15:21:56Z

streams/src/test/java/org/apache/kafka/streams/kstream/internals/KTableAggregateTest.java

-        // that in turn will cause an eviction on reducer-topic. It will flush
-        // key 2 as it is the only dirty entry in the cache
-        driver.process("tableOne", "1", "5");
-        assertEquals(Long.valueOf(4L), reduceResults.get("2"));


bbejeck · 2019-05-16T15:41:00Z

streams/src/test/java/org/apache/kafka/streams/kstream/internals/KTableAggregateTest.java

+                    "A:0+1-1 (ts: 0)",
+                    "A:0+1-1+3 (ts: 0)",
+                    "A:0+1-1+3-3 (ts: 0)",
+                    "A:0+1-1+3-3+4 (ts: 0)"


@vvcephei I agree with you plus there's a call driver.flushState(); (line 142 of the removed code) so indeed I suspect that it was caching that was filtering the intermediate records.

mjsax · 2019-05-17T18:51:21Z

streams/src/test/java/org/apache/kafka/streams/kstream/internals/KTableAggregateTest.java

+                asList(
+                    "A:0+1 (ts: 0)",
+                    "B:0+2 (ts: 0)",
+                    "A:0+1-1 (ts: 0)",


This seems different to the old expected result.

Hm, I figured it's because TTD forwards every record through one at a time. Even though we flush after each change, the old code would still "coalesce" the retraction and update. Does this seem wrong to you?

Interesting. Should be fine. Still not sure if I understand why...

Hm, I figured it's because TTD forwards every record through one at a time.

Yes. But a single input record result in an add plus remove, so I am wondering why those are not "coalesce" in TTD, too? The caching in the stores should behave the same...

Does this seem wrong to you?

Not necessarily. Just try to understand why TTD behaves differently.

It seems that caching is disabled, but I am not sure why?

@vvcephei Still would like to understand this. Why does caching not take effect any longer?

@mjsax @vvcephei I ran the new commit locally and I think I get the difference here:

In the ToplogyTestDriver#pipeInput:

// Process the record ... task.process(); task.maybePunctuateStreamTime(); task.commit(); captureOutputRecords();

I.e. each record would cause a commit immediately, and in this case, when processing the two records from the repartition topics, each of them will trigger the pipeInput once and hence commit once, i.e. the processing of the original one pipeInput would cause two pipeInput from the repartition topic, and hence commit twice, and flush twice.

While in the old KStreamTestDriver, we do not commit from the repartition-topic piped record, hence only result in one flush.

I feel that generally speaking the commit-on-every-pipeInput of TopologyTestDriver is debatable, especially since we call pipeInput recursively from repartition topics, which means each of the new / old records via the repartition topic would be triggering once. Will merge this PR still as-is and we can discuss if we want to change this behavior later.

Ah. That makes sense. Thanks!

Not sure if we want/need to change the behavior. Also, it would require a KIP imho, because people may have tests in place testing for the current behavior... Not sure if it's worth it.

mjsax · 2019-05-17T18:52:16Z

streams/src/test/java/org/apache/kafka/streams/kstream/internals/KTableAggregateTest.java

+                    "B:0+2 (ts: 0)",
+                    "A:0+1-1 (ts: 0)",
+                    "A:0+1-1+3 (ts: 0)",
+                    "B:0+2-2 (ts: 0)",


Same here.

Why do we get more intermediate result?

Same explanation, I think.

mjsax · 2019-05-17T18:57:20Z

streams/src/test/java/org/apache/kafka/streams/kstream/internals/KTableAggregateTest.java

@@ -125,23 +118,43 @@ public void testAggCoalesced() {
        final KTable<String, String> table2 = table1
            .groupBy(
                MockMapper.noOpKeyValueMapper(),
-                stringSerialzied)
+                stringSerialized)
            .aggregate(MockInitializer.STRING_INIT,


nit: move this to next line

mjsax · 2019-05-17T19:05:58Z

streams/src/test/java/org/apache/kafka/streams/kstream/internals/KTableAggregateTest.java

+
+            driver.pipeInput(recordFactory.create(topic1, "NULL", "5"));
+
+            driver.pipeInput(recordFactory.create(topic1, "B", "7"));


nit: why do many blank lines??

ah, it was s/driver.flushState();//g :) I'll remove the blank lines.

mjsax · 2019-05-17T19:07:06Z

streams/src/test/java/org/apache/kafka/streams/kstream/internals/KTableAggregateTest.java

+
+            driver.pipeInput(recordFactory.create(input, "C", "yellow"));
+
+            driver.pipeInput(recordFactory.create(input, "D", "green"));


nit: why do many blank lines?

…ls/KStreamTransformTest.java

…ls/KStreamTransformTest.java Co-Authored-By: A. Sophie Blee-Goldman <ableegoldman@gmail.com>

vvcephei · 2019-05-18T00:40:39Z

Hey @bbejeck or @guozhangwang , I've rebased and taken @mjsax 's feedback.

I think the only outstanding comment is #6732 (comment), but it seems out of scope for this PR (it's about TTD itself, not this change). I'm thinking we can just merge this and follow up on that question next week. WDYT?

guozhangwang · 2019-05-18T01:59:48Z

cc @RichardYuSTUG since he's also interested in this ticket.

guozhangwang

Minor comment, otherwise LGTM.

guozhangwang · 2019-05-18T02:16:56Z

streams/src/test/java/org/apache/kafka/streams/kstream/internals/KTableAggregateTest.java

@@ -275,53 +226,17 @@ public void testCountWithInternalStore() {
        final StreamsBuilder builder = new StreamsBuilder();
        final String input = "count-test-input";

+        final MockProcessorSupplier<String, Object> supplier = new MockProcessorSupplier<>();


Why we cannot continue to reuse a single MockProcessorSupplier?

Ah, I misread it. I thought they were all reusing the processor, somehow, which I thought was risky. Since it's just the supplier, I'll move it back.

guozhangwang · 2019-05-18T02:17:26Z

streams/src/test/java/org/apache/kafka/streams/kstream/internals/KTableAggregateTest.java

-        // that in turn will cause an eviction on reducer-topic. It will flush
-        // key 2 as it is the only dirty entry in the cache
-        driver.process("tableOne", "1", "5");
-        assertEquals(Long.valueOf(4L), reduceResults.get("2"));


vvcephei · 2019-05-18T03:32:25Z

@guozhangwang , Thanks for the review. I moved the supplier back to a field.

The implementation of KIP-258 broke the state store methods in KStreamTestDriver. These methods were unused in this project, so the breakage was not detected. Since this is an internal testing utility, and it was deprecated and partially removed in favor of TopologyTestDriver, I opted to just complete the removal of the class. Reviewers: A. Sophie Blee-Goldman <ableegoldman@gmail.com>, Boyang Chen <boyang@confluent.io>, Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>

vvcephei commented May 14, 2019

View reviewed changes

bbejeck added the streams label May 14, 2019

ableegoldman reviewed May 14, 2019

View reviewed changes

streams/src/test/java/org/apache/kafka/streams/kstream/internals/KStreamTransformTest.java Outdated Show resolved Hide resolved

ableegoldman approved these changes May 14, 2019

View reviewed changes

abbccdda reviewed May 15, 2019

View reviewed changes

vvcephei changed the title ~~MINOR: remove KStreamTestDriver~~ KAFKA-6474: remove KStreamTestDriver May 15, 2019

bbejeck approved these changes May 16, 2019

View reviewed changes

vvcephei mentioned this pull request May 17, 2019

KAFKA-6455: Improve DSL operator timestamp semantics #6725

Merged

mjsax reviewed May 17, 2019

View reviewed changes

vvcephei and others added 6 commits May 17, 2019 19:36

MINOR: remove KStreamTestDriver

b7aa02e

Update streams/src/test/java/org/apache/kafka/streams/kstream/interna…

b9f3849

…ls/KStreamTransformTest.java

Update streams/src/test/java/org/apache/kafka/streams/kstream/interna…

49f3e13

…ls/KStreamTransformTest.java

remove unnecessary suppression

73dbe5a

Update streams/src/test/java/org/apache/kafka/streams/kstream/interna…

d349595

…ls/KStreamTransformTest.java Co-Authored-By: A. Sophie Blee-Goldman <ableegoldman@gmail.com>

cr comment

edfbc12

guozhangwang reviewed May 18, 2019

View reviewed changes

cr comment

71dcd8d

guozhangwang merged commit c140f09 into apache:trunk May 19, 2019

vvcephei deleted the MINOR-remove-KStreamTestDriver branch May 20, 2019 17:56


		final String[] expected = {"2:10 (ts: 0)", "20:110 (ts: 0)", "200:1110 (ts: 0)", "2000:11110 (ts: 0)"};
		final String[] expected = {


		driver.pipeInput(recordFactory.create(topic1, "NULL", "5"));

		driver.pipeInput(recordFactory.create(topic1, "B", "7"));


		driver.pipeInput(recordFactory.create(input, "C", "yellow"));

		driver.pipeInput(recordFactory.create(input, "D", "green"));

KAFKA-6474: remove KStreamTestDriver #6732

KAFKA-6474: remove KStreamTestDriver #6732

Conversation

vvcephei commented May 14, 2019

Committer Checklist (excluded from commit message)

vvcephei left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ableegoldman May 14, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ableegoldman left a comment

Choose a reason for hiding this comment

vvcephei commented May 15, 2019

abbccdda left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mjsax commented May 15, 2019

vvcephei commented May 15, 2019

vvcephei commented May 15, 2019

vvcephei commented May 15, 2019

bbejeck commented May 16, 2019

bbejeck left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vvcephei commented May 18, 2019

guozhangwang commented May 18, 2019

guozhangwang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vvcephei commented May 18, 2019

ableegoldman May 14, 2019 •

edited

Loading