Skip to content

Conversation

kaori-seasons
Copy link

@kaori-seasons kaori-seasons commented Jun 11, 2022

Related to FLINK-28009
Optimizing split data logic for large data volumes using Stream Api

@kaori-seasons kaori-seasons changed the title Optimize data split [FLINK-28009] Optimize data split Jun 11, 2022
@JingsongLi
Copy link
Contributor

Hi @complone Thanks for the contribution.
Do you mean FileStorePathFactory.createDataFilePathFactory is very slow?

@JingsongLi
Copy link
Contributor

CC @tsreaper

@kaori-seasons
Copy link
Author

@JingsongLi Yes, in the case of a large amount of data, there is a lot of data that needs to be sharded. Parallel streams will first allocate the same key to the internal bucket. I think this can improve performance.

@JingsongLi
Copy link
Contributor

@JingsongLi Yes, in the case of a large amount of data, there is a lot of data that needs to be sharded. Parallel streams will first allocate the same key to the internal bucket. I think this can improve performance.

Can you have some test benchmark data? For example, how many times can it be executed in 1 second?

@kaori-seasons
Copy link
Author

@JingsongLi According to Oracle's performance test, the parallel stream is affected by the number of cpu cores, I will change it to stream to simplify the code logic
Related to parallel-streams-performance-benchmark

@JingsongLi
Copy link
Contributor

@complone Any update about benchmark?

@kaori-seasons
Copy link
Author

@complone关于基准的任何更新?

No. Simplify code with only stream API

@JingsongLi
Copy link
Contributor

JingsongLi commented Jun 21, 2022

Hi @complone
Before submitting a PR, it is best to discuss clearly in JIRA what it is that needs to be done and what the general idea is.
I'll close this one, if we (you and me) make further progress, we'll continue working on the code.

@JingsongLi JingsongLi closed this Jun 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants