KAFKA-14742: Throttle connectors in ExactlyOnceSourceIntegrationTest to fix flakey OOMEs #13291

gharris1727 · 2023-02-22T20:32:00Z

On my local machine, testIntervalBoundary is asserting on nearly 2.5 million records, when it appears that the test is written to need only 100-1000 records to perform assertions. This causes OOMEs in the test assertions which iterate over the set of records and perform memory allocations.

I looked into reducing the assertion's memory overhead, but it didn't seem practical as even the smallest allocations appeared to exceed the memory limit.

Instead, I configured the pre-existing throttle mechanism inside the MonitorableSourceConnector, so that tests now seem to produce ~90k records on my machine, leaving adequate spare memory for the existing assertions to pass without issue.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

…to fix flakey OOMEs Signed-off-by: Greg Harris <greg.harris@aiven.io>

edoardocomar

Many thanks for the PR, looks good to me.
I could see that the number of records produced was so large that running the tests I was getting an OOM too.
I jsut left a couple fo small comments

edoardocomar · 2023-02-24T16:47:21Z

...ime/src/test/java/org/apache/kafka/connect/integration/ExactlyOnceSourceIntegrationTest.java

@@ -81,6 +81,7 @@
 import static org.apache.kafka.connect.integration.MonitorableSourceConnector.CUSTOM_EXACTLY_ONCE_SUPPORT_CONFIG;
 import static org.apache.kafka.connect.integration.MonitorableSourceConnector.CUSTOM_TRANSACTION_BOUNDARIES_CONFIG;
 import static org.apache.kafka.connect.integration.MonitorableSourceConnector.MESSAGES_PER_POLL_CONFIG;
+import static org.apache.kafka.connect.integration.MonitorableSourceConnector.THROUGHPUT_CONFIG;


this could be THROUGHPUT_MSGS_PER_SEC_CONFIG

I think that would make sense if the underlying configuration was throughput.msgs.per.sec but it is currently throughput. I preferred to keep the existing name instead of renaming + aliasing the configuration. just to keep this PR small.

Do you think renaming the configuration is important here?

I had to go and look what unit throughput was in and ThroughputThrottler says Can be messages/sec or bytes/sec that's why its name is generic.
In the case of this test, it is messages/sec, so for me the longer name I suggested helps readability.

edoardocomar · 2023-02-24T16:51:09Z

...ime/src/test/java/org/apache/kafka/connect/integration/ExactlyOnceSourceIntegrationTest.java

@@ -266,6 +267,7 @@ public void testPollBoundary() throws Exception {
        props.put(NAME_CONFIG, CONNECTOR_NAME);
        props.put(TRANSACTION_BOUNDARY_CONFIG, POLL.toString());
        props.put(MESSAGES_PER_POLL_CONFIG, Integer.toString(recordsProduced));
+        props.put(THROUGHPUT_CONFIG, Integer.toString(recordsProduced));


the config is a Long, so these settings could be
Long.toString(100L)
I checked the test that OOM'ing for me too and the number of records actually produced with your setting is still much larger than actually erquired.
I found using the same variable recordsProduced for throughput was a bit puzzling, maybe just using another literal would be ok.

even 50 msgs/sec will be enough

Since the result of Integer.toString(100) and Long.toString(100L) are the same, I don't think this necessary.
The reason I re-used the same variable was because I wanted to keep the runtime of the test constant. If there were two variables, someone could tune one while holding the other constant until the test timed out.

I agree that recordsProduced is a poor name, because this test produces many more records than that under normal conditions. Do you have a better name in mind?

Again, it may not be necessary to use Long instead of Integer but it helps. Property is a long, I'd prefer to set a long rather than rely on conversion.
And using two variables instead of one, although with related values is again helping readability.
Reusing the same one is something that makes me stop and think "why...?"
So having a 2nd variable e.g. like
long throughput_msgs_sec = recordsProduced / 2L;
would be my preference (and a short line comment for it) e.g.
// need to limit actual records.count() to avoid OOM

Signed-off-by: Greg Harris <greg.harris@aiven.io>

gharris1727 · 2023-02-27T20:44:12Z

@edoardocomar

I pulled these values out into three separate constants with descriptive names, let me know if this is closer to what you had in mind!

edoardocomar

Thanks LGTM

…tream-trunk-27-Feb-2023 * commit 'dcc179995153c22c6248702976b60755b0b9fda8': MINOR: srcJar should depend on processMessages task (apache#13316) KAFKA-14659 source-record-write-[rate|total] metrics should exclude filtered records (apache#13193) MINOR: ExponentialBackoff Javadoc improvements (apache#13317) KAFKA-14742: Throttle connectors in ExactlyOnceSourceIntegrationTest to fix flakey OOMEs (apache#13291)

KAFKA-14742: Throttle connectors in ExactlyOnceSourceIntegrationTest …

312b56c

…to fix flakey OOMEs Signed-off-by: Greg Harris <greg.harris@aiven.io>

C0urante added connect tests Test fixes (including flaky tests) labels Feb 23, 2023

edoardocomar requested changes Feb 24, 2023

View reviewed changes

gharris1727 requested a review from edoardocomar February 24, 2023 18:58

fixup: rename config constant, use meaningful constant names in EOSIT

47de1ba

Signed-off-by: Greg Harris <greg.harris@aiven.io>

edoardocomar approved these changes Feb 28, 2023

View reviewed changes

edoardocomar merged commit 867fb29 into apache:trunk Feb 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-14742: Throttle connectors in ExactlyOnceSourceIntegrationTest to fix flakey OOMEs #13291

KAFKA-14742: Throttle connectors in ExactlyOnceSourceIntegrationTest to fix flakey OOMEs #13291

gharris1727 commented Feb 22, 2023

edoardocomar left a comment

edoardocomar Feb 24, 2023

gharris1727 Feb 24, 2023

edoardocomar Feb 27, 2023 •

edited

edoardocomar Feb 24, 2023

edoardocomar Feb 24, 2023

gharris1727 Feb 24, 2023

edoardocomar Feb 27, 2023 •

edited

gharris1727 commented Feb 27, 2023

edoardocomar left a comment

KAFKA-14742: Throttle connectors in ExactlyOnceSourceIntegrationTest to fix flakey OOMEs #13291

KAFKA-14742: Throttle connectors in ExactlyOnceSourceIntegrationTest to fix flakey OOMEs #13291

Conversation

gharris1727 commented Feb 22, 2023

Committer Checklist (excluded from commit message)

edoardocomar left a comment

Choose a reason for hiding this comment

edoardocomar Feb 24, 2023

Choose a reason for hiding this comment

gharris1727 Feb 24, 2023

Choose a reason for hiding this comment

edoardocomar Feb 27, 2023 • edited

Choose a reason for hiding this comment

edoardocomar Feb 24, 2023

Choose a reason for hiding this comment

edoardocomar Feb 24, 2023

Choose a reason for hiding this comment

gharris1727 Feb 24, 2023

Choose a reason for hiding this comment

edoardocomar Feb 27, 2023 • edited

Choose a reason for hiding this comment

gharris1727 commented Feb 27, 2023

edoardocomar left a comment

Choose a reason for hiding this comment

edoardocomar Feb 27, 2023 •

edited

edoardocomar Feb 27, 2023 •

edited