Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-23829: Compute Stats Incorrect for Binary Columns #1313

Merged
merged 12 commits into from
Oct 28, 2020

Conversation

HunterL
Copy link
Contributor

@HunterL HunterL commented Jul 24, 2020

Updated the LazySimple SerDe to no longer attempt to auto-detect if Binary columns were Base64 and instead use a table property. The previous way this was done was expensive and did not correctly check if the values were valid Base64 which in niche cases could result in statistics being computed incorrectly.

Copy link
Contributor

@belugabehr belugabehr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending test

@belugabehr
Copy link
Contributor

belugabehr commented Aug 25, 2020

@HunterL Really great stuff. Need one test with hive.serialization.decode.binary.as.base64 set to true.

Edit: The default is true so presumably some test have this flag enabled. Are there any examples of this being exercised? (i.e., doing base-64 conversion on a data set?)

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Feel free to reach out on the dev@hive.apache.org list if the patch is in need of reviews.

@github-actions github-actions bot added the stale label Oct 25, 2020
@belugabehr belugabehr merged commit 0e4e1ac into apache:master Oct 28, 2020
@belugabehr
Copy link
Contributor

@HunterL Merged to master. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants