Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#859][Improvement] Set MALLOC_ARENA_MAX in start-shuffle-server.sh #860

Merged
merged 2 commits into from
May 10, 2023

Conversation

rhh777
Copy link
Contributor

@rhh777 rhh777 commented May 10, 2023

What changes were proposed in this pull request?

Reduce memory usage for some versions of glibc. Prevent oom.

Why are the changes needed?

Fix: #859

Does this PR introduce any user-facing change?

No.

How was this patch tested?

UT

@jerqi jerqi changed the title [ISSUE-859][Improvement] Set MALLOC_ARENA_MAX in start-shuffle-server.sh [#859][Improvement] Set MALLOC_ARENA_MAX in start-shuffle-server.sh May 10, 2023
@jerqi
Copy link
Contributor

jerqi commented May 10, 2023

Will it influence the performance of writing data? Have you test the patch in your local environment?

# memory usage to explode. This interacts badly
# with the many threads that we use in rss. Tune the variable
# down to prevent vmem explosion. Default value is 8 * CORES.
export MALLOC_ARENA_MAX=${MALLOC_ARENA_MAX:-4}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we choose 4 as our default value?
The comment is a little misunderstanding.

Default value is 8 * CORES

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we choose 4 as our default value? The comment is a little misunderstanding.

Default value is 8 * CORES

Comments may be confusing. Default value is 8 * CORES. Is the default value for glibc. ShuffleServer set to 4 by default is a reference to hadoop/presto configuration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we choose 4 as our default value? The comment is a little misunderstanding.

Default value is 8 * CORES

Comments may be confusing. Default value is 8 * CORES. Is the default value for glibc. ShuffleServer set to 4 by default is a reference to hadoop/presto configuration.

Could you modify the misunderstanding comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we choose 4 as our default value? The comment is a little misunderstanding.

Default value is 8 * CORES

Comments may be confusing. Default value is 8 * CORES. Is the default value for glibc. ShuffleServer set to 4 by default is a reference to hadoop/presto configuration.

Could you modify the misunderstanding comment?

yes

@codecov-commenter
Copy link

codecov-commenter commented May 10, 2023

Codecov Report

Merging #860 (90be432) into master (0125fe3) will increase coverage by 2.14%.
The diff coverage is n/a.

@@             Coverage Diff              @@
##             master     #860      +/-   ##
============================================
+ Coverage     56.57%   58.71%   +2.14%     
- Complexity     2174     2175       +1     
============================================
  Files           327      307      -20     
  Lines         15979    13619    -2360     
  Branches       1263     1263              
============================================
- Hits           9040     7997    -1043     
+ Misses         6430     5185    -1245     
+ Partials        509      437      -72     

see 21 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@jerqi
Copy link
Contributor

jerqi commented May 10, 2023

Will it influence the performance of writing data? Have you test the patch in your local environment?

According to prestodb/presto#8993, there will be subtle difference among different values of MALLOC_ARENA_MAX.

@jerqi
Copy link
Contributor

jerqi commented May 10, 2023

Will it influence the performance of writing data? Have you test the patch in your local environment?

I am a little doubtful about the effect of this pr. Because we don't have compress/decompress operation in the shuffle server.

@rhh777
Copy link
Contributor Author

rhh777 commented May 10, 2023

Will it influence the performance of writing data? Have you test the patch in your local environment?

Theoretically, in some cases, it will affect the performance. In my environment self-test, there is no obvious change, which may be due to the data volume. However, the situation in OOM no longer occurs.

@rhh777
Copy link
Contributor Author

rhh777 commented May 10, 2023

Will it influence the performance of writing data? Have you test the patch in your local environment?

I am a little doubtful about the effect of this pr. Because we don't have compress/decompress operation in the shuffle server.

Does the Resident Set Size of the shuffle server process continuously increase in your environment?

@jerqi
Copy link
Contributor

jerqi commented May 10, 2023

Will it influence the performance of writing data? Have you test the patch in your local environment?

I am a little doubtful about the effect of this pr. Because we don't have compress/decompress operation in the shuffle server.

Does the Resident Set Size of the shuffle server process continuously increase in your environment?

We don't use the memory limit. So we don't find this issue. If you verified this pr in your production environment, we can merge it.

Copy link
Contributor

@jerqi jerqi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @rhh777

@jerqi
Copy link
Contributor

jerqi commented May 10, 2023

Merged to master.

@jerqi jerqi merged commit d303fe6 into apache:master May 10, 2023
23 checks passed
@rhh777
Copy link
Contributor Author

rhh777 commented May 10, 2023

Will it influence the performance of writing data? Have you test the patch in your local environment?

I am a little doubtful about the effect of this pr. Because we don't have compress/decompress operation in the shuffle server.

Does the Resident Set Size of the shuffle server process continuously increase in your environment?

We don't use the memory limit. So we don't find this issue. If you verified this pr in your production environment, we can merge it.

Have you ever observed the Resident Set Size of the shuffle server process? In my environment, if I don't limit it, it just keeps growing.

@jerqi
Copy link
Contributor

jerqi commented May 10, 2023

I never observe the resident set size. If I have time, I will observer it.

@jerqi
Copy link
Contributor

jerqi commented May 10, 2023

Have you used the Uniffle in your production environment?

@rhh777
Copy link
Contributor Author

rhh777 commented May 10, 2023

Have you used the Uniffle in your production environment?

Currently deployed in a test environment, running tpcds tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement] ShuffleServer set MALLOC_ARENA_MAX
3 participants