Skip to content

[CELEBORN-1530] support MPU for S3#2830

Closed
zhaohehuhu wants to merge 15 commits into
apache:mainfrom
zhaohehuhu:dev-1021
Closed

[CELEBORN-1530] support MPU for S3#2830
zhaohehuhu wants to merge 15 commits into
apache:mainfrom
zhaohehuhu:dev-1021

Conversation

@zhaohehuhu
Copy link
Copy Markdown
Contributor

@zhaohehuhu zhaohehuhu commented Oct 21, 2024

What changes were proposed in this pull request?

as title

Why are the changes needed?

AWS S3 doesn't support append, so Celeborn had to copy the historical data from s3 to worker and write to s3 again, which heavily scales out the write. This PR implements a better solution via MPU to avoid copy-and-write.

Does this PR introduce any user-facing change?

How was this patch tested?

WechatIMG257

I conducted an experiment with a 1GB input dataset to compare the performance of Celeborn using only S3 storage versus using SSD storage. The results showed that Celeborn with SSD storage was approximately three times faster than with only S3 storage.

Screenshot 2024-11-16 at 13 02 10

The above screenshot is the second test with 5000 mapper and reducer that I did.

@FMX FMX changed the title support MPU for S3 [CELEBORN-1530] support MPU for S3 Oct 21, 2024
@FMX FMX self-requested a review October 21, 2024 08:53
@FMX
Copy link
Copy Markdown
Contributor

FMX commented Oct 23, 2024

Thanks for this PR. Are there any test results?

@zhaohehuhu
Copy link
Copy Markdown
Contributor Author

Thanks for this PR. Are there any test results?
Not yet. I'm ready to do a benchmark for it.

Copy link
Copy Markdown
Contributor

@FMX FMX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR but there are some points to polish.

Comment thread multipart-uploader/pom.xml Outdated
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-1.2-api</artifactId>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dependency is duplicated.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread multipart-uploader/pom.xml Outdated
<name>aws-mpu-deps</name>
</property>
</activation>
<dependencies>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these dependencies can be moved to dependencies section because this module is loaded when aws-mpu profile is activated only.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread multipart-uploader/pom.xml Outdated

<profiles>
<profile>
<id>aws-mpu</id>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The profile name can be changed to aws.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


package org.apache.celeborn.server.common.service.mpu.bean;

public class AWSCredentials {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class should not be in the common module.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread worker/pom.xml Outdated
<property>
<name>aws-mpu-deps</name>
</property>
</activation>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This segment is not needed.

<activation>
        <property>
          <name>aws-mpu-deps</name>
        </property>
      </activation>

DynConstructors.builder()
.impl(
"org.apache.celeborn.S3MultipartUploadHandler",
awsCredentials.getClass(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass the arguments to S3MultipartUploadHandler should be enough for this scenerio.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

task = new S3FlushTask(flushBuffer, diskFileInfo.getDfsPath(), notifier, true);
task =
new S3FlushTask(
flushBuffer, notifier, true, s3MultipartUploadHandler, partNumber);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
flushBuffer, notifier, true, s3MultipartUploadHandler, partNumber);
flushBuffer, notifier, true, s3MultipartUploadHandler, partNumber++);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if (task != null) {
addTask(task);
flushBuffer = null;
partNumber++;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line can be removed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

s3MultipartUploadHandler.complete();
}

if (notifier.hasException()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two if blocks can be merged.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

import java.lang.{Long => JLong}
import java.util.{List => JList}

case class MultipartUploadRequestParam(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused class. Can be removed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@zyclove
Copy link
Copy Markdown

zyclove commented Oct 30, 2024

@zhaohehuhu @FMX @WillemJiang
When can we expect this pr to be fully validated and merged into the main branch? It's very important.

@zhaohehuhu
Copy link
Copy Markdown
Contributor Author

@zhaohehuhu @FMX @WillemJiang When can we expect this pr to be fully validated and merged into the main branch? It's very important.

I still need more time to fully test it as S3 has some limitations related to MPU.

@FMX
Copy link
Copy Markdown
Contributor

FMX commented Nov 11, 2024

@zhaohehuhu @FMX @WillemJiang When can we expect this pr to be fully validated and merged into the main branch? It's very important.

Every PR should be production-ready before it's been merged.

Copy link
Copy Markdown
Contributor

@FMX FMX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks. Merged into main(v0.6.0).

@FMX FMX closed this in a2d3972 Nov 22, 2024
@zhaohehuhu zhaohehuhu deleted the dev-1021 branch November 22, 2024 08:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants