Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ISSUE-378][HugePartition][Part-2] Introduce memory usage limit and data flush #471

Merged
merged 7 commits into from
Jan 17, 2023

Conversation

zuston
Copy link
Member

@zuston zuston commented Jan 11, 2023

What changes were proposed in this pull request?

  1. Introduce memory usage limit for huge partition to keep the regular partition writing stable
  2. Once partition is marked as huge-partition, when its buffer size is greater than rss.server.single.buffer.flush.threshold value, single-buffer flush will be triggered whatever the single buffer flush is enabled or not

Why are the changes needed?

  1. To solve the problems mentioned by [Improvement] Optimize data flushing and memory usage for huge partitions to improve stability  #378

Does this PR introduce any user-facing change?

Yes.

How was this patch tested?

  1. UTs

@zuston zuston changed the title [ISSUE-378][HugePartition][Part-2] Introduce memory usage limit [ISSUE-378][HugePartition][Part-2] Introduce memory usage limit and data flush Jan 11, 2023
@zuston zuston requested a review from jerqi January 11, 2023 11:37
@advancedxy
Copy link
Contributor

advancedxy commented Jan 12, 2023

Introduce independent buffer flush for huge partition, which will make admin flush data from huge partition to HDFS directly when the conf of rss.server.single.buffer.flush.enabled is disable.

I don't think we should introduce a new configuration to control buffer flush. rss.server.single.buffer.flush.enabled can be used for the huge partition buffer flush purpose. If the huge partition limit is enabled, the single buffer flush could be enabled automatically.

@advancedxy
Copy link
Contributor

I think these two similar configurations will bring more confusion to end user.

@zuston
Copy link
Member Author

zuston commented Jan 12, 2023

I don't think we should introduce a new configuration to control buffer flush. rss.server.single.buffer.flush.enabled can be used for the huge partition buffer flush purpose. If the huge partition limit is enabled, the single buffer flush could be enabled automatically.

If single buffer flush could be enabled automatically, how to set the flush threshold size? I don't hope the rss.server.single.buffer.flush.enabled is enabled for regular partitions, which could be flushed to ssd/hdd directly instead of cold storage. And huge partition could be flushed to HDFS.

I think these two similar configurations will bring more confusion to end user.

Emm. Yes.

@zuston
Copy link
Member Author

zuston commented Jan 12, 2023

It seems I got your point. The conf of rss.server.single.buffer.flush.threshold should be specified especially for huge partition whatever the single buffer flush is enabled or not. Right?

@codecov-commenter
Copy link

codecov-commenter commented Jan 12, 2023

Codecov Report

Merging #471 (cfffd7a) into master (19a8bac) will increase coverage by 0.07%.
The diff coverage is 75.47%.

@@             Coverage Diff              @@
##             master     #471      +/-   ##
============================================
+ Coverage     58.78%   58.86%   +0.07%     
- Complexity     1704     1714      +10     
============================================
  Files           206      206              
  Lines         11471    11517      +46     
  Branches       1024     1033       +9     
============================================
+ Hits           6743     6779      +36     
- Misses         4317     4324       +7     
- Partials        411      414       +3     
Impacted Files Coverage Δ
...pache/uniffle/server/ShuffleServerGrpcService.java 0.79% <0.00%> (-0.02%) ⬇️
.../org/apache/uniffle/server/ShuffleTaskManager.java 76.50% <76.92%> (-0.06%) ⬇️
...he/uniffle/server/buffer/ShuffleBufferManager.java 83.21% <90.47%> (+0.45%) ⬆️
...a/org/apache/uniffle/server/ShuffleServerConf.java 99.31% <100.00%> (+0.02%) ⬆️
...rg/apache/uniffle/server/ShuffleServerMetrics.java 97.05% <0.00%> (-0.14%) ⬇️
...org/apache/uniffle/server/ShuffleFlushManager.java 84.04% <0.00%> (+0.08%) ⬆️
...g/apache/uniffle/server/ShuffleDataFlushEvent.java 83.67% <0.00%> (+0.34%) ⬆️
...ava/org/apache/uniffle/server/ShuffleTaskInfo.java 100.00% <0.00%> (+5.55%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@advancedxy
Copy link
Contributor

I don't think we should introduce a new configuration to control buffer flush. rss.server.single.buffer.flush.enabled can be used for the huge partition buffer flush purpose. If the huge partition limit is enabled, the single buffer flush could be enabled automatically.

If single buffer flush could be enabled automatically, how to set the flush threshold size? I don't hope the rss.server.single.buffer.flush.enabled is enabled for regular partitions, which could be flushed to ssd/hdd directly instead of cold storage. And huge partition could be flushed to HDFS.

I think these two similar configurations will bring more confusion to end user.

Emm. Yes.

Sorry, I forgot to reply this comment.

If I understand the code and design correctly, single.buffer.flushed.enabled is added to support flushing big buffer to cold storage, such as hdfs directly, which I think it perfectly matches you intention when serving huge partitions.

So, I think we could reuse the rss.server.single.buffer.flush.threshold settings, which is 64MB by default. We don't have to introduce rss.server.huge-partition.memory.limit.ratio ?

@zuston
Copy link
Member Author

zuston commented Jan 12, 2023

So, I think we could reuse the rss.server.single.buffer.flush.threshold settings, which is 64MB by default. We don't have to introduce rss.server.huge-partition.memory.limit.ratio ?

Memory limit is required. Usually the huge partition writing speed is fast, and the HDFS flushing speed is slower than writing. If missing memory limit, the regular partitions will be affected.

By the way, this design has been introduced into our internal version, it works well. Before this PR, sometimes one huge partition will make other regular partitions buffer require fail.

@advancedxy
Copy link
Contributor

So, I think we could reuse the rss.server.single.buffer.flush.threshold settings, which is 64MB by default. We don't have to introduce rss.server.huge-partition.memory.limit.ratio ?

Memory limit is required. Usually the huge partition writing speed is fast, and the HDFS flushing speed is slower than writing. If missing memory limit, the regular partitions will be affected.

By the way, this design has been introduced into our internal version, it works well. Before this PR, sometimes one huge partition will make other regular partitions buffer require fail.

I'm ok to add rss.server.huge-partition.size.threshold setting, which is used to add memory limit. I just think maybe we should reuse rss.server.single.buffer.flush.threshold instead of another rss.server.huge-partition.memory.limit.ratio settings?

@zuston
Copy link
Member Author

zuston commented Jan 12, 2023

I'm ok to add rss.server.huge-partition.size.threshold setting, which is used to add memory limit. I just think maybe we should reuse rss.server.single.buffer.flush.threshold instead of another rss.server.huge-partition.memory.limit.ratio settings?

This suggestion has been accepted 😁 (Here: #471 (comment)).

Please review the latest code.

@advancedxy
Copy link
Contributor

I'm ok to add rss.server.huge-partition.size.threshold setting, which is used to add memory limit. I just think maybe we should reuse rss.server.single.buffer.flush.threshold instead of another rss.server.huge-partition.memory.limit.ratio settings?

This suggestion has been accepted 😁 (Here: #471 (comment)).

Please review the latest code.

en. I will take a look in details by tomorrow. But on the surface, the rss.server.huge-partition.memory.limit.ratio setting still existed?
image

@zuston
Copy link
Member Author

zuston commented Jan 12, 2023

rss.server.huge-partition.memory.limit.ratio is the conf for memory limitation rather than buffer flush threshold. The original conf to control flush for huge partition has been removed. 49c2400

@zuston
Copy link
Member Author

zuston commented Jan 16, 2023

PTAL @advancedxy. After this PR is merged, I will introduce some metrics about huge partition.

Copy link
Contributor

@advancedxy advancedxy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@zuston zuston merged commit e802b93 into apache:master Jan 17, 2023
@zuston
Copy link
Member Author

zuston commented Jan 17, 2023

Merged. Thanks for your review @advancedxy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants