Skip to content

Fix nondeterministic behavior in Protobuf and Quartz-related tests#18690

Merged
kfaraz merged 4 commits intoapache:masterfrom
Shiyang-Zhao:fix-nondeterminism-protobuf-and-quartz
Oct 27, 2025
Merged

Fix nondeterministic behavior in Protobuf and Quartz-related tests#18690
kfaraz merged 4 commits intoapache:masterfrom
Shiyang-Zhao:fix-nondeterminism-protobuf-and-quartz

Conversation

@Shiyang-Zhao
Copy link
Contributor

This PR fixes nondeterministic behavior in the following flaky tests:

  • org.apache.druid.data.input.protobuf.InlineDescriptorProtobufBytesDecoderTest.testSingleDescriptorNoMessageType
  • org.apache.druid.data.input.protobuf.FileBasedProtobufBytesDecoderTest.testSingleDescriptorNoMessageType
  • org.apache.druid.indexing.scheduledbatch.QuartzCronSchedulerConfigTest.testInvalidCronExpression

Description

The tests InlineDescriptorProtobufBytesDecoderTest.testSingleDescriptorNoMessageType and FileBasedProtobufBytesDecoderTest.testSingleDescriptorNoMessageType failed intermittently because Protobuf does not guarantee a consistent order when iterating over message descriptors.

Depending on internal ordering, the decoder could return either google.protobuf.Timestamp or prototest.ProtoTestEvent. The tests originally asserted equality against a single expected name, causing nondeterministic failures even though both outcomes were correct.

Failure messages:

[ERROR] org.apache.druid.data.input.protobuf.InlineDescriptorProtobufBytesDecoderTest.testSingleDescriptorNoMessageType -- Time elapsed: 0.118 s <<< FAILURE!
org.opentest4j.AssertionFailedError: expected: <google.protobuf.Timestamp> but was: <prototest.ProtoTestEvent>
	at org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:1145)
	at org.apache.druid.data.input.protobuf.InlineDescriptorProtobufBytesDecoderTest.testSingleDescriptorNoMessageType(InlineDescriptorProtobufBytesDecoderTest.java:129)

[WARNING] Flakes: 
[WARNING] org.apache.druid.data.input.protobuf.InlineDescriptorProtobufBytesDecoderTest.testSingleDescriptorNoMessageType
[ERROR]   Run 1: InlineDescriptorProtobufBytesDecoderTest.testSingleDescriptorNoMessageType:129 expected: <google.protobuf.Timestamp> but was: <prototest.ProtoTestEvent>
[INFO]   Run 2: PASS
[ERROR] org.apache.druid.data.input.protobuf.FileBasedProtobufBytesDecoderTest.testSingleDescriptorNoMessageType -- Time elapsed: 0.111 s <<< FAILURE!
org.opentest4j.AssertionFailedError: expected: <google.protobuf.Timestamp> but was: <prototest.ProtoTestEvent>
	at org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:1145)
	at org.apache.druid.data.input.protobuf.FileBasedProtobufBytesDecoderTest.testSingleDescriptorNoMessageType(FileBasedProtobufBytesDecoderTest.java:135)

[WARNING] Flakes: 
[WARNING] org.apache.druid.data.input.protobuf.FileBasedProtobufBytesDecoderTest.testSingleDescriptorNoMessageType
[ERROR]   Run 1: FileBasedProtobufBytesDecoderTest.testSingleDescriptorNoMessageType:135 expected: <google.protobuf.Timestamp> but was: <prototest.ProtoTestEvent>
[INFO]   Run 2: PASS

Proposed Changes:

  • Adjusted test expectations to recognize multiple valid descriptor outcomes.

The QuartzCronSchedulerConfigTest.testInvalidCronExpression failed due to nondeterministic ordering in Quartz’s validation error messages.

When validating malformed cron expressions, Quartz reports multiple expected-part counts (e.g., [6, 7] vs [7, 6]), and the order of these values is not stable. The previous test compared messages strictly by string equality, leading to flaky results when the order changed.

Failure messages:

[ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.069 s <<< FAILURE! -- in org.apache.druid.indexing.scheduledbatch.QuartzCronSchedulerConfigTest
[ERROR] org.apache.druid.indexing.scheduledbatch.QuartzCronSchedulerConfigTest.testInvalidCronExpression -- Time elapsed: 0.059 s <<< FAILURE!
java.lang.AssertionError: 
Expected: (message: "Quartz schedule[0 15 10 * *] is invalid: [Cron expression contains 5 parts but we expect one of [6, 7]]" and targetPersona: is <USER> and category: is <INVALID_INPUT> and errorCode: is "invalidInput")
     but: message: "Quartz schedule[0 15 10 * *] is invalid: [Cron expression contains 5 parts but we expect one of [6, 7]]" was "Quartz schedule[0 15 10 * *] is invalid: [Cron expression contains 5 parts but we expect one of [7, 6]]"
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:8)
	at org.apache.druid.indexing.scheduledbatch.QuartzCronSchedulerConfigTest.testInvalidCronExpression(QuartzCronSchedulerConfigTest.java:109)

[WARNING] Flakes: 
[WARNING] org.apache.druid.indexing.scheduledbatch.QuartzCronSchedulerConfigTest.testInvalidCronExpression
[ERROR]   Run 1: QuartzCronSchedulerConfigTest.testInvalidCronExpression:109 
Expected: (message: "Quartz schedule[0 15 10 * *] is invalid: [Cron expression contains 5 parts but we expect one of [6, 7]]" and targetPersona: is <USER> and category: is <INVALID_INPUT> and errorCode: is "invalidInput")
     but: message: "Quartz schedule[0 15 10 * *] is invalid: [Cron expression contains 5 parts but we expect one of [6, 7]]" was "Quartz schedule[0 15 10 * *] is invalid: [Cron expression contains 5 parts but we expect one of [7, 6]]"
[INFO]   Run 2: PASS

Proposed Changes:

  • Updated the test to tolerate equivalent error lists with differing element order.

This PR has:

  • been self-reviewed.
  • ensured no production logic changes beyond test stabilization.

Shiyang Zhao added 3 commits October 22, 2025 22:56
Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, @Shiyang-Zhao .
I have left a minor suggestion.

@@ -132,7 +133,11 @@ public void testSingleDescriptorNoMessageType()
{
final var decoder = new FileBasedProtobufBytesDecoder("proto_test_event.desc", null);

assertEquals("google.protobuf.Timestamp", decoder.getDescriptor().getFullName());
String actual = decoder.getDescriptor().getFullName();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a short comment on the source of the flakiness/difference.

@@ -111,8 +112,13 @@ public void testInvalidCronExpression()
DruidException.class,
() -> new QuartzCronSchedulerConfig("0 15 10 * *")
),
DruidExceptionMatcher.invalidInput().expectMessageIs(
"Quartz schedule[0 15 10 * *] is invalid: [Cron expression contains 5 parts but we expect one of [6, 7]]"
Matchers.anyOf(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a one-line comment on the source of the flakiness.

@Shiyang-Zhao
Copy link
Contributor Author

@kfaraz
Thanks for reviewing and for the helpful feedback! I’ve added one-line comments explaining the source of the non-determinism in all three tests. Please let me know if you’d like me to adjust the wording or make any other changes.

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix, @Shiyang-Zhao !

@kfaraz kfaraz merged commit 8b32892 into apache:master Oct 27, 2025
91 of 92 checks passed
@kgyrtkirk kgyrtkirk added this to the 36.0.0 milestone Jan 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants