Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Paimon Table produce more reduce task for same data volume #864

Open
1 of 2 tasks
JingsongLi opened this issue Apr 10, 2023 · 4 comments
Open
1 of 2 tasks

[Bug] Paimon Table produce more reduce task for same data volume #864

JingsongLi opened this issue Apr 10, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@JingsongLi
Copy link
Contributor

JingsongLi commented Apr 10, 2023

Search before asking

  • I searched in the issues and found nothing similar.

Paimon version

0.4

Compute Engine

hive

Minimal reproduce step

Compare the paimon table and hive table with the same statement and data amount using the hive client:
painmon table:

image

hive table:

image

What doesn't meet your expectations?

read paimon table should have lower reducer number.

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@calvinjiang
Copy link
Contributor

I'd like to fix this issue.

@JingsongLi
Copy link
Contributor Author

I'd like to fix this issue.

How to?

@wg1026688210
Copy link
Contributor

hi~ @JingsongLi Is it because there are more files in the Paimon table. Is it effective to reduce the number of reduce tasks, If we set number of the upstream map by mapred.map.tasks ,which reduce the number of shuffle files.

@JingsongLi
Copy link
Contributor Author

hi~ @JingsongLi Is it because there are more files in the Paimon table. Is it effective to reduce the number of reduce tasks, If we set number of the upstream map by mapred.map.tasks ,which reduce the number of shuffle files.

We should figure out what is the mechanism of the task number inference in Hive, and try to work around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants