Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-26277: Add unit tests for ColumnStatsAggregator classes #3339

Closed

Conversation

asolimando
Copy link
Member

@asolimando asolimando commented Jun 2, 2022

What changes were proposed in this pull request?

Adding unit tests for *ColumnStatsAggregator classes (first commit), fixing bugs discovered while writing the UTs (second commit) and guarding against invoking the methods with an empty list of statistics (which leads to NPEs).

Why are the changes needed?

Lack of unit tests is detrimental for code quality, as highlighted by the bugs discovered while writing the tests.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

mvn test -Dtest.groups=org.apache.hadoop.hive.metastore.annotation.MetastoreUnitTest -Dtest='*ColumnStatsAggregatorTest.java' -pl standalone-metastore/metastore-server

@zabetak
Copy link
Contributor

zabetak commented Jun 8, 2022

Hey @asolimando If the bugs can appear in production I would suggest to create a new JIRA describing the problem or re-purpose this one. It is more important to know that a commit is fixing a bug rather than adding tests.

@asolimando
Copy link
Member Author

Hey @asolimando If the bugs can appear in production I would suggest to create a new JIRA describing the problem or re-purpose this one. It is more important to know that a commit is fixing a bug rather than adding tests.

You are right @zabetak, I'd then rename the Jira ticket as Fixed NPEs and rounding issues in ColumnStatsAggregator classes and give the details of what has been fixed in the extended comments, something like:

1. lost precision after integer division assigned to float (`densityAvgSum` was incorrectly updated, bug affecting almost all aggregator classes)
2. checks against potential NPEs (affecting almost all aggregator classes)
3. NDV lower-bound was not updated and left to zero for Date and Timestamp aggregator classes 

Added unit tests covering all the aggregator classes.

WDYT?

Copy link
Contributor

@zabetak zabetak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asolimando Thanks a lot for investing so much time in testing; definitely needed!

I didn't go over all the changes in the PR but I left some comments here and there (most of them rather minor) which could reduce the code size and make it more readable. I left the comments in specific places but I think they apply in more than one place.
Let's discuss on them to see if it makes sense to incorporate them in the PR before I do a complete pass.

@zabetak zabetak force-pushed the master-HIVE-26277-add_aggregator_UTs branch from e2fcd42 to be7f2c5 Compare September 14, 2022 13:17
Copy link
Contributor

@zabetak zabetak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @asolimando, I pushed a few small changes to your branch:

Let me know what you think of those, and from my side the change is good to go once tests come back green.

@sonarcloud
Copy link

sonarcloud bot commented Sep 14, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug C 2 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 44 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@asolimando
Copy link
Member Author

Thanks @zabetak, your commits LGTM and are improving the contribution!
Tests are back green now, so I think we are good to go when you have time :)

@zabetak zabetak closed this in b6cbb2e Sep 16, 2022
@asolimando asolimando deleted the master-HIVE-26277-add_aggregator_UTs branch September 21, 2022 09:48
DongWei-4 pushed a commit to DongWei-4/hive that referenced this pull request Oct 28, 2022
… (Alessandro Solimando reviewed by Stamatis Zampetakis)

1. Add and invoke checkStatisticsList to prevent NPEs in aggregators;
they all rely on a non-empty list of statistics.
2. Cast integers to double in divisions to make computations more
accurate and avoid rounding issues.
3. Align loggers names to match the class they are in and avoid
misleading log messages.
4. Add documentation for ndvtuner based on current understanding of how
it should work.

Closes apache#3339

Move (and complete) ndvTuner documentation from tests to production classes
dengzhhu653 pushed a commit to dengzhhu653/hive that referenced this pull request Dec 15, 2022
… (Alessandro Solimando reviewed by Stamatis Zampetakis)

1. Add and invoke checkStatisticsList to prevent NPEs in aggregators;
they all rely on a non-empty list of statistics.
2. Cast integers to double in divisions to make computations more
accurate and avoid rounding issues.
3. Align loggers names to match the class they are in and avoid
misleading log messages.
4. Add documentation for ndvtuner based on current understanding of how
it should work.

Closes apache#3339

Move (and complete) ndvTuner documentation from tests to production classes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants