Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-23954 count(*) with count(distinct) gives wrong results with hive.optimize.countdistinct=true #1414

Closed
wants to merge 1 commit into from

Conversation

EugeneChung
Copy link

@EugeneChung EugeneChung commented Aug 20, 2020

What changes were proposed in this pull request?

It skips the reducer deduplication for the case that the usages of count function for all and distinct are mixed.

Why are the changes needed?

select count(*), count(distinct mid) from db1.table1 where partitioned_column = '...' shows the wrong results especially for count(*).

Does this PR introduce any user-facing change?

No

How was this patch tested?

I've tested with the same query like select count(*), count(distinct mid) from db1.table1 where partitioned_column = '...' over the same data set. My patch shows the correct result with hive.optimize.countdistinct=true as the one of hive.optimize.countdistinct=false.

@EugeneChung
Copy link
Author

EugeneChung commented Aug 25, 2020

It seems the error of init-metastore is not related with my patch.

[2020-08-20T11:29:42.268Z] Status: Downloaded newer image for postgres:latest

[2020-08-20T11:31:48.820Z] 3a1dc3a0b3a75eaf727731e23f9967bbc6007831481878432cad7c8354e0c922

[2020-08-20T11:31:48.821Z] waiting for postgres to be available...

[2020-08-20T11:31:48.821Z] psql: FATAL:  the database system is starting up

[2020-08-20T11:31:48.821Z] ok

[2020-08-20T11:31:48.821Z] NOTICE:  database "ms_hive_precommit_pr_1414_1_wqrcz_rrtk5_5s4x9" does not exist, skipping

[2020-08-20T11:31:48.821Z] DROP DATABASE

[2020-08-20T11:31:48.821Z] ERROR:  role "hive" does not exist

http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1414/1/tests shows all tests were passed.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Feel free to reach out on the dev@hive.apache.org list if the patch is in need of reviews.

@github-actions github-actions bot added the stale label Oct 25, 2020
@github-actions github-actions bot closed this Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants