[SPARK-45651][BUILD][FOLLOWUP] Reduce mvn -Xmx option to 2g in publish_snapshot workflow by EnricoMi · Pull Request #43538 · apache/spark

EnricoMi · 2023-10-26T09:54:29Z

What changes were proposed in this pull request?

Limit max memory for mvn clean deploy to 2g when run in publish_snapshot Github workflow.

Why are the changes needed?

The host that runs the workflow has only 7G of memory, while the release-build.sh script sets the limit to 12g, causing the process to be killed (for branch master).

Does this PR introduce any user-facing change?

No

How was this patch tested?

Not tested

Was this patch authored or co-authored using generative AI tooling?

No

EnricoMi · 2023-10-26T10:03:25Z

CC @LuciferYang @HyukjinKwon

LuciferYang · 2023-10-26T10:06:35Z

dev/create-release/release-build.sh

 cd ..

-export MAVEN_OPTS="-Xss128m -Xmx12g -XX:ReservedCodeCacheSize=1g"
+export MAVEN_OPTS="-Xss128m -Xmx${MAVEN_MXM_OPT:-12g} -XX:ReservedCodeCacheSize=1g"


Ok, I previously changed the -Xmx option to 3g in build/mvn script, not here, so the previous fix probably didn't take effect...

So perhaps the -Xmx here could be larger.

so the previous fix probably didn't take effect...

I suspect so. The memory logging that is currently in place will tell us for sure if any attempt has any or just too little effect.

So perhaps the -Xmx here could be larger.

Why so? Do you refer to the 12g in this line or the 2g in the workflow? Since the build workflow uses 2g already, we should stick to those 2g for consistency or bump the 2g in build.yml to 3g as well.

What I mean is perhaps MAVEN_MXM_OPT could be 4g, because I've tested before and java-other-versions can also run successfully on GA with 4g.

But 2g is also fine to me.

Understood, either of 2g, 3g, and 4g is fine with me, too.

LuciferYang

LGTM if test pass

HyukjinKwon · 2023-10-27T00:10:04Z

Merged to master.

HyukjinKwon · 2023-10-27T00:47:28Z

Seems not working (https://github.com/apache/spark/actions/runs/6661131541/job/18103495351). I reverted this for now.

HyukjinKwon · 2023-10-27T02:38:36Z

For now, we use a different Docker image between the actual test and snapshot build. We cache the image (see image_urlat .github/workflows/build_and_test.yml, and https://github.com/apache/spark/blob/master/dev/infra/Dockerfile). Can we match the same image?

EnricoMi · 2023-10-27T08:50:30Z

Seems not working (https://github.com/apache/spark/actions/runs/6661131541/job/18103495351). I reverted this for now.

The problem was unrelated:

408 Request Timeout

EnricoMi · 2023-10-27T09:06:50Z

That workflow run did not pick up the changes of this PR:

https://github.com/apache/spark/actions/runs/6661131541/workflow

EnricoMi · 2023-10-27T09:08:51Z

In fact, you manually triggered the publish, and it worked perfectly fine for master:
https://github.com/apache/spark/actions/runs/6661389655/job/18104176828
https://github.com/apache/spark/actions/runs/6661389655/workflow

Branch branch-3.5 failed due to an unrelated 408 Request Timeout. So no need to revert.

LuciferYang · 2023-10-27T09:09:20Z

Shall we revive this pr and give it another try ...

EnricoMi · 2023-10-27T09:10:46Z

Yes, please!

EnricoMi · 2023-10-27T09:17:29Z

The mem statistics with 2g are

MiB Mem :   6922.0 total,    579.9 free,   5634.8 used,    707.2 buff/cache
MiB Swap:   4096.0 total,   2069.7 free,   2026.2 used.    976.4 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   2494 runner    20   0   16.4g   5.1g  23040 S 118.8  75.5 103:16.37 java

So even with 2g, memory usage is still quite high but significantly better than with 12g, there is headroom of 2g free swap and 500m free mem.

HyukjinKwon · 2023-10-27T09:30:31Z

I manually triggered once more at https://github.com/apache/spark/actions/runs/6661389655 and it failed too. Is that also unrelated?

EnricoMi · 2023-10-27T09:30:47Z

Revived in #43555.

HyukjinKwon · 2023-10-27T09:30:47Z

If that's the case we can get this in again and see if it works.

HyukjinKwon · 2023-10-27T09:31:20Z

(sorry it's my phone now so can't properly check the logs on my own)

EnricoMi · 2023-10-27T09:36:30Z

Same HTTP timeout:

2023-10-27T00:46:23.1689370Z mem: top - 00:46:23 up 7 min,  0 users,  load average: 0.86, 0.96, 0.50
2023-10-27T00:46:23.1690374Z mem: Tasks: 129 total,   1 running, 128 sleeping,   0 stopped,   0 zombie
2023-10-27T00:46:23.1691430Z mem: %Cpu(s):  0.0 us,  3.2 sy,  0.0 ni, 96.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
2023-10-27T00:46:23.1692524Z mem: MiB Mem :   6922.0 total,    596.6 free,   2637.2 used,   3688.3 buff/cache
2023-10-27T00:46:31.1391267Z mem: MiB Swap:   4096.0 total,   4095.2 free,      0.8 used.   39
2023-10-27T00:46:31.1396047Z org.apache.maven.wagon.TransferFailedException: transfer failed for https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-network-shuffle_2.12/3.5.1-SNAPSHOT/spark-network-shuffle_2.12-3.5.1-20231027.004212-88-sources.jar.md5, status: 408 Request Timeout

EnricoMi · 2023-10-27T09:40:43Z

Looks like a known unrelated issue, workaround here: https://github.com/kiegroup/kie-wb-common/pull/3416/files

…h_snapshot workflow ### What changes were proposed in this pull request? This re-does #43538, which has [falsely been reverted](#43538 (comment)). Limit max memory for `mvn clean deploy` to `2g` when run in `publish_snapshot` Github workflow. ### Why are the changes needed? The host that runs the workflow has only 7G of memory, while the `release-build.sh` script sets the limit to 12g, causing the process to be killed (for branch `master`). ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Not tested ### Was this patch authored or co-authored using generative AI tooling? No Closes #43555 from EnricoMi/publish-snapshot-mvn-xmx-2. Authored-by: Enrico Minack <github@enrico.minack.dev> Signed-off-by: yangjie01 <yangjie01@baidu.com>

EnricoMi · 2023-10-27T11:04:42Z

I think a better alternative is to retry: https://maven.apache.org/plugins/maven-deploy-plugin/deploy-mojo.html#retryfaileddeploymentcount

See #43559.

HyukjinKwon · 2023-10-27T11:47:19Z

Let's retry few times and see if it actually works first

Reduce mvn -Xmx option to 2g in publish_snapshot workflow

d288013

EnricoMi mentioned this pull request Oct 26, 2023

[SPARK-45651][Build] Log memory usage of publish snapshot workflow #43513

Closed

github-actions bot added BUILD INFRA labels Oct 26, 2023

EnricoMi changed the title ~~[SPARK-45651][Build][Follow-up] Reduce mvn -Xmx option to 2g in publish_snapshot workflow~~ [SPARK-45651][BUILD][FOLLOWUP] Reduce mvn -Xmx option to 2g in publish_snapshot workflow Oct 26, 2023

LuciferYang reviewed Oct 26, 2023

View reviewed changes

LuciferYang approved these changes Oct 26, 2023

View reviewed changes

HyukjinKwon approved these changes Oct 27, 2023

View reviewed changes

HyukjinKwon closed this in 00c638c Oct 27, 2023

EnricoMi deleted the publish-snapshot-mvn-xmx branch October 27, 2023 09:09

EnricoMi mentioned this pull request Oct 27, 2023

[SPARK-45651][BUILD][FOLLOWUP] Reduce mvn -Xmx option to 2g in publish_snapshot workflow #43555

Closed

EnricoMi mentioned this pull request Oct 27, 2023

[SPARK-44751][SQL][FOLLOWUP] Change xmlExpressions.scala package name #43102

Closed

Conversation

EnricoMi commented Oct 26, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

EnricoMi commented Oct 26, 2023

Uh oh!

LuciferYang Oct 26, 2023

Choose a reason for hiding this comment

Uh oh!

LuciferYang Oct 26, 2023

Choose a reason for hiding this comment

Uh oh!

EnricoMi Oct 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

EnricoMi Oct 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LuciferYang Oct 26, 2023

Choose a reason for hiding this comment

Uh oh!

EnricoMi Oct 26, 2023

Choose a reason for hiding this comment

Uh oh!

LuciferYang left a comment

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Oct 27, 2023

Uh oh!

HyukjinKwon commented Oct 27, 2023

Uh oh!

HyukjinKwon commented Oct 27, 2023

Uh oh!

EnricoMi commented Oct 27, 2023

Uh oh!

EnricoMi commented Oct 27, 2023

Uh oh!

EnricoMi commented Oct 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LuciferYang commented Oct 27, 2023

Uh oh!

EnricoMi commented Oct 27, 2023

Uh oh!

EnricoMi commented Oct 27, 2023

Uh oh!

HyukjinKwon commented Oct 27, 2023

Uh oh!

EnricoMi commented Oct 27, 2023

Uh oh!

HyukjinKwon commented Oct 27, 2023

Uh oh!

HyukjinKwon commented Oct 27, 2023

Uh oh!

EnricoMi commented Oct 27, 2023

Uh oh!

EnricoMi commented Oct 27, 2023

Uh oh!

EnricoMi commented Oct 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HyukjinKwon commented Oct 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

EnricoMi Oct 26, 2023 •

edited

Loading

EnricoMi Oct 26, 2023 •

edited

Loading

EnricoMi commented Oct 27, 2023 •

edited

Loading

EnricoMi commented Oct 27, 2023 •

edited

Loading