Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flaky tests in multiple modules due to non-deterministic specification of HashSet and JSON objects #15418

Closed

Conversation

yashdeep97
Copy link
Contributor

@yashdeep97 yashdeep97 commented Nov 23, 2023

I found some flaky tests using the NonDex Plugin for maven in various modules.

Description

Fixed the following flaky tests:

processing module:

  1. org.apache.druid.segment.nested.NestedDataColumnSupplierV4Test#testBasicFunctionality
  2. org.apache.druid.segment.nested.NestedDataColumnSupplierV4Test#testConcurrency
  3. org.apache.druid.segment.nested.NestedDataColumnSupplierTest#testBasicFunctionality
  4. org.apache.druid.segment.nested.NestedDataColumnSupplierTest#testArrayFunctionality
  5. org.apache.druid.segment.nested.NestedDataColumnSupplierTest#testConcurrency
  6. org.apache.druid.segment.virtual.ExpressionVirtualColumnTest#testRequiredColumns
  7. org.apache.druid.java.util.emitter.core.ParametrizedUriEmitterTest#testEmitterWithParametrizedUriExtractor

services module:

  1. org.apache.druid.server.router.TieredBrokerHostSelectorTest#testGetAllBrokers

indexing-service module:

  1. org.apache.druid.indexing.overlord.TaskLockboxTest#testGetLockedIntervals

extensions-core/parquet-extensions module:

  1. org.apache.druid.data.input.parquet.WikiParquetReaderTest#testUint32Datatype
  2. org.apache.druid.data.input.parquet.WikiParquetReaderTest#testWiki
  3. org.apache.druid.data.input.parquet.NestedColumnParquetReaderTest#testNestedColumnSchemalessNestedTestFile
  4. org.apache.druid.data.input.parquet.CompatParquetReaderTest#testBinaryAsString
  5. org.apache.druid.data.input.parquet.CompatParquetReaderTest#testParquetThriftCompat
  6. org.apache.druid.data.input.parquet.CompatParquetReaderTest#testReadNestedArrayStruct
  7. org.apache.druid.data.input.parquet.CompatParquetReaderTest#testProtoStructWithArray
  8. org.apache.druid.data.input.parquet.FlattenSpecParquetReaderTest#testFlat1FlattenSelectListItem
  9. org.apache.druid.data.input.parquet.FlattenSpecParquetReaderTest#testFlat1NoFlattenSpec
  10. org.apache.druid.data.input.parquet.FlattenSpecParquetReaderTest#testFlat1Autodiscover
  11. org.apache.druid.data.input.parquet.FlattenSpecParquetReaderTest#testFlat1Flatten
  12. org.apache.druid.data.input.parquet.FlattenSpecParquetReaderTest#testNested1NoFlattenSpec
  13. org.apache.druid.data.input.parquet.FlattenSpecParquetReaderTest#testNested1Autodiscover
  14. org.apache.druid.data.input.parquet.FlattenSpecParquetReaderTest#testNested1Flatten
  15. org.apache.druid.data.input.parquet.FlattenSpecParquetReaderTest#testNested1FlattenSelectListItem
  16. org.apache.druid.data.input.parquet.TimestampsParquetReaderTest#testDateHandling

server module:

  1. org.apache.druid.discovery.BaseNodeRoleWatcherTest#testGeneralUseSimulation
  2. org.apache.druid.indexing.overlord.supervisor.SupervisorStatusTest#testJsonAttr
  3. org.apache.druid.metadata.input.SqlEntityTest#testExecuteQuery

Problem:

The above mentioned tests have been reported as flaky (tests assuming deterministic implementation of a non-deterministic specification ) when ran against the NonDex tool.
The tests contain assertions that compare strings created from JSON objects and lists created from HashSets and HashMaps.

However, HashSet does not guarantee the ordering of elements and thus resulting in these flaky tests that assume deterministic implementation of HashSet. Also, some tests check for a specific ordering of elements in JSON strings. JSON objects are equal even if the ordering of the elements in the JSON strings are not equal. This results in flakiness as the Jackson ObjectMapper does not guarantee consistent ordering of the JSON keys.

Thus, when the NonDex tool shuffles the HashSet elements and the JSON keys, it results in test failures:
Errors can be found in this Github issue: #15471

To reproduce run:

mvn -pl <module_name> edu.illinois:nondex-maven-plugin:2.1.1:nondex -Dtest=<test_name>

Fix:

For the tests failing due to inequality in ordering of objects in lists that were created from HashSets or HashMaps, first convert the arraylists to hashSets and then compare the 2 HashSets using assertEquals().

For all the tests failing due to inequality in ordering of keys in JSON strings, first create the JSON node trees and then compare the two trees using assertEquals().

Made these fixes in collaboration with - @same8891, @Rette66, @prathyushreddylpr, @lxb007981

Key changed/added classes in this PR
  • org.apache.druid.segment.nested.NestedDataColumnSupplierV4Test
  • org.apache.druid.segment.nested.NestedDataColumnSupplierTest
  • org.apache.druid.segment.virtual.ExpressionVirtualColumnTest
  • org.apache.druid.java.util.emitter.core.ParametrizedUriEmitterTest
  • org.apache.druid.server.router.TieredBrokerHostSelectorTest
  • org.apache.druid.indexing.overlord.TaskLockboxTest
  • org.apache.druid.data.input.parquet.WikiParquetReaderTest
  • org.apache.druid.data.input.parquet.NestedColumnParquetReaderTest
  • org.apache.druid.data.input.parquet.CompatParquetReaderTest
  • org.apache.druid.data.input.parquet.FlattenSpecParquetReaderTest
  • org.apache.druid.data.input.parquet.TimestampsParquetReaderTest
  • org.apache.druid.discovery.BaseNodeRoleWatcherTest
  • org.apache.druid.indexing.overlord.supervisor.SupervisorStatusTest
  • org.apache.druid.metadata.input.SqlEntityTest

This PR has:

  • been self-reviewed.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.

@yashdeep97
Copy link
Contributor Author

yashdeep97 commented Nov 23, 2023

Hi @cryptoe, thanks for reviewing my previous Pull request (#15318)!
Could you please review this one as well?, it's quite similar to the previous one. (I found more flaky tests in other classes of the processing module)

Please let me know if you need additional details. Thanks in advance!

@cryptoe
Copy link
Contributor

cryptoe commented Nov 28, 2023

@yashdeep97 Could you and the other folks working with the https://github.com/TestingResearchIllinois/NonDex raise a single PR with the changes to the module.

@cryptoe cryptoe mentioned this pull request Nov 28, 2023
2 tasks
@yashdeep97
Copy link
Contributor Author

Hi @cryptoe,
Absolutely! I will reach out to others who have similar PRs and try to get them to merge into one.
Also, would it be preferable to have 1 PR for each module (eg. processing, sql etc. ) or a combined PR for tests in all modules?

@yashdeep97 yashdeep97 marked this pull request as draft December 1, 2023 01:26
prathyushreddylpr and others added 2 commits November 30, 2023 19:54
…st and FlattenSpecParquetReaderTest

Co-authored-by: lakkidi2 <lakkidi2@fa23-cs527-035.cs.illinois.edu>
Co-authored-by: lxb007981 <lxb007981@hotmail.com>
Co-authored-by: Prathyush Reddy Lakkidi <prathyushreddylakkidi@Prathyushs-MacBook-Air.local>
@yashdeep97 yashdeep97 changed the title Fix flaky tests in processing module Fix flaky tests in multiple modules due to non-deterministic specification of HashSet and JSON objects Dec 1, 2023
@yashdeep97 yashdeep97 marked this pull request as ready for review December 1, 2023 23:14
@yashdeep97
Copy link
Contributor Author

Hi @cryptoe,
I have merged all the pending PRs from students in the course, and updated the PR description.
Please let me know if you need further clarification or if any changes are necessary.

@@ -39,6 +39,7 @@
class BaseParquetReaderTest extends InitializedNullHandlingTest
{
ObjectWriter DEFAULT_JSON_WRITER = new ObjectMapper().writerWithDefaultPrettyPrinter();
protected final ObjectMapper objectMapper = new ObjectMapper();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: you can use the object mapper created on line 42 for line 41 and remove the inline instantiation of the object mapper.

expectedJsonBinary,
DEFAULT_JSON_WRITER.writeValueAsString(sampledAsBinary.get(0).getRawValues())
);
objectMapper.readTree(expectedJsonBinary),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a json test utils and then use that to do the assert everywhere.
You can put that class in druid core I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @cryptoe, Thanks for your feedback, I'll work on making these changes over this weekend

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @cryptoe, I'm unable to figure out where to place this new class that does assert after converting it to JSON tree, should I place it under the druid-testing-tools module in extensions-core? (I did not understand what druid core refers to)

"mediumBroker", ImmutableList.of(),
"coldBroker", ImmutableList.of("coldHost1:8080", "coldHost2:8080"),
"hotBroker", ImmutableList.of("hotHost:8080")
"mediumBroker", ImmutableSet.of(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change required ?

Copy link
Contributor Author

@yashdeep97 yashdeep97 Dec 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, it uses servers which is a ConcurrentHashMap to create the list of Hostnames.

private final ConcurrentHashMap<String, NodesHolder> servers = new ConcurrentHashMap<>();

And ConcurrentHashMap does not preserve the the ordering of the keys, which can result in the order of servers being different that what is expected in the Assert.
For example: we might get ImmutableList.of("coldHost2:8080", "coldHost1:8080") instead of ImmutableList.of("coldHost1:8080", "coldHost2:8080") which will result in the test failing.
Thus, the code incorrectly assumes deterministic ordering of the keys in ConcurrentHashMap.

@cryptoe
Copy link
Contributor

cryptoe commented Jan 3, 2024

Can you also include #15268 if not included already.

@kfaraz
Copy link
Contributor

kfaraz commented Jan 11, 2024

Have all of these tests been reported as flaky by Druid users/devs? I am not sure if that is really the case. These changes seem to be using sets instead of lists in a lot of places and changing equality assertions of json strings to JsonNode objects. Are these changes really needed?

If there is indeed a flaky test, please share links in the PR description to one or more builds where these tests actually failed.

For cases where there is a flake, we should try to figure out exactly what is causing the flakiness and update tests only when we fully understand the intent of the original test.

cc: @yashdeep97 , @cryptoe

Copy link

This pull request has been marked as stale due to 60 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If you think
that's incorrect or this pull request should instead be reviewed, please simply
write any comment. Even if closed, you can still revive the PR at any time or
discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

@github-actions github-actions bot added the stale label Mar 12, 2024
Copy link

github-actions bot commented Apr 9, 2024

This pull request/issue has been closed due to lack of activity. If you think that
is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants