[Improvement] Read HDFS data files with random sequence to distribute pressure #451

zuston · 2022-12-29T07:03:53Z

Code of Conduct

I agree to follow this project's Code of Conduct

Search before asking

I have searched in the issues and found no similar issues.

What would you like to be improved?

In PR #396 to support concurrently writing single partition's data into multiple HDFS files, it's better to randomly read HDFS data files to distribute stress in client side.

How should we improve?

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

zuston · 2022-12-29T07:04:27Z

PTAL @advancedxy @jerqi

jerqi · 2022-12-29T08:17:15Z

I'm not sure that it can bring much performance improvement.

zuston · 2022-12-29T08:56:15Z

I'm not sure that it can bring much performance improvement.

To reduce the datanode pressure from multiple readers, especially for 1 replica.

advancedxy · 2022-12-30T03:24:45Z

To reduce the datanode pressure from multiple readers, especially for 1 replica.

Normally, there should be only one reader to read one/multiple partition file(s)?

Do you encounter this case in prod.

zuston · 2022-12-30T03:27:29Z

If this is a huge skewed partition, there are many readers to handle this partition.

zuston changed the title ~~[Improvement] Randomly read single HDFS data files to distribute stress~~ [Improvement] Randomly read HDFS data files to distribute stress Dec 29, 2022

zuston changed the title ~~[Improvement] Randomly read HDFS data files to distribute stress~~ [Improvement] Randomly read HDFS data files to distribute pressure Dec 29, 2022

zuston changed the title ~~[Improvement] Randomly read HDFS data files to distribute pressure~~ [Improvement] Read HDFS data files with random sequence to distribute pressure Dec 29, 2022

jerqi linked a pull request Jan 3, 2023 that will close this issue

[ISSUE-451][Improvement] Read HDFS data files with random sequence to distribute pressure #452

Merged

zuston closed this as completed in #452 Jan 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Improvement] Read HDFS data files with random sequence to distribute pressure #451

[Improvement] Read HDFS data files with random sequence to distribute pressure #451

zuston commented Dec 29, 2022

zuston commented Dec 29, 2022

jerqi commented Dec 29, 2022

zuston commented Dec 29, 2022

advancedxy commented Dec 30, 2022

zuston commented Dec 30, 2022

[Improvement] Read HDFS data files with random sequence to distribute pressure #451

[Improvement] Read HDFS data files with random sequence to distribute pressure #451

Comments

zuston commented Dec 29, 2022

Code of Conduct

Search before asking

What would you like to be improved?

How should we improve?

Are you willing to submit PR?

zuston commented Dec 29, 2022

jerqi commented Dec 29, 2022

zuston commented Dec 29, 2022

advancedxy commented Dec 30, 2022

zuston commented Dec 30, 2022