Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for uploading big files on S3 #337

Closed
stof opened this issue Mar 19, 2020 · 14 comments · Fixed by #403 or #653
Closed

Add support for uploading big files on S3 #337

stof opened this issue Mar 19, 2020 · 14 comments · Fixed by #403 or #653
Labels
enhancement New feature or request

Comments

@stof
Copy link
Member

stof commented Mar 19, 2020

PutObject is limited to 5GB. Bigger objects (up to 5TB) require using a multipart upload. The official SDK has a upload() method, which handles this transparently (by default, it will switch to multipart much earlier than the 5GB threshold, probably because they found it more efficient or more reliable to upload in chunks). Does it make sense to provide the same kind of feature in this SDK ?

@jderusse
Copy link
Member

jderusse commented Mar 19, 2020

probably because they don't implement signature on chunk/stream, and put the entiere payload in memory to generate the signature.
edit: They don't put the payload in memory but read it twice. One to generate the hash, and another one when uploading the content

For now, we just provide a low level methods and let user deal with such edge cases.

But since the beginning we planned to add extra high level method/helper. I think this is a great candidate to start. Thank for the proposal.
We just have to figure out where to put that code.

@stof
Copy link
Member Author

stof commented Mar 19, 2020

No, the 5GB limit is part of the PutObject endpoint itself. It is not about the signing part. It is about the fact that bigger files must be uploaded in several requests (I totally understand that they don't want to allow incoming requests with a 5TB body).

@jderusse
Copy link
Member

I was talking about this by default, it will switch to multipart much earlier than the 5GB threshold.

@Nyholm Nyholm added the enhancement New feature or request label Mar 19, 2020
@jderusse
Copy link
Member

Thinking about that, I'm not sure it should be part of this project, It's more something that should be managed by a high level application "flysystem-adapter" for instance.

One may want a $ses->send('<html>') that convert HTML to text. Or $sqs->consume() that handle all the complexity of polling the endpoint.
IMHO: Those thing should be managed by tools like "symfony/mailer-adapter" or "symfony/messenger-adapter"

@stof
Copy link
Member Author

stof commented Mar 24, 2020

uploading a big file to S3 while taking into account the limitation of the S3 PutObject API (needing to switch to multipart upload) seems like a fit for a S3 SDK. That's only dealing with S3 APIs.
And this SDK does not even expose the multipart endpoints so a separate package cannot even do it (it would have to use the huge official SDK, and then it has nothing to do as the official SDK has the feature built-in).

@Nyholm
Copy link
Member

Nyholm commented Mar 24, 2020

I think this is an interesting feature. I think we should add it when there is users asking for it. As I understand from @stof, you are just making an observation of a missing feature, right?

The question we should ask us now is: Do we see that we can add this feature without breaking BC in the future?

@stof
Copy link
Member Author

stof commented Mar 24, 2020

Well, the issues I opened for big files and presigned requests correspond to what I found missing to be able to migrate the Incenteev project from the official AWS SDK to async-aws/s3 (based on a code review, not based on actually doing the migration, so I might have missed something).

Nyholm added a commit to Nyholm/aws that referenced this issue Mar 29, 2020
@Nyholm
Copy link
Member

Nyholm commented Mar 29, 2020

I added a PR for this. See #403

Nyholm added a commit that referenced this issue Mar 31, 2020
* [S3] MultipartUpload

This will fix #337

* Added UploadPart and ListParts

* Added unit tests

* Added some tests

* Make sure tests are green
@jderusse
Copy link
Member

I think the point of @stof is providing a helper method upload that automatically switch from putObject to multipart upload if size of the file is too big. and take care of spliting the file in small part, and uploading them (re-uploading in case o retry).

@jderusse jderusse reopened this Mar 31, 2020
@Nyholm
Copy link
Member

Nyholm commented Mar 31, 2020

Yes.
I've been thinking on this.

Could we (manually) create a simple abstraction layer of upload/download above the S3 client. Similar to the official SDK.

$s3 = new S3Client();
$simpleS3 = new SimpleS3Client($s3, 'example-bucket');

$simpleS3->upload($path, $file, $acl, $somethingElse);
$simpleS3->download($path);

@Nyholm
Copy link
Member

Nyholm commented Apr 1, 2020

Here is an example implementation:

object $data is a value object we need to create for each method.

use AsyncAws\Core\Exception\Exception;
use AsyncAws\Core\Exception\Http\HttpException;
use AsyncAws\S3\Result\GetObjectOutput;
use AsyncAws\S3\S3Client;

class SimpleS3Client
{
    /**
     * @var S3Client
     */
    private $s3;

    /**
     * Default bucket if none is provided.
     *
     * @var string|null
     */
    private $bucket;

    public function __construct(S3Client $s3, ?string $bucket = null)
    {
        $this->s3 = $s3;
        $this->bucket = $bucket;
    }

    /**
     * @throws Exception
     */
    public function upload(object $data): bool
    {
        // TODO convert $data to $input

        if (!$input['Bucket']) {
            $input['Bucket'] = $this->bucket;
        }

        // TODO get content length from input or read body
        if ($contentLenth < 5 * 1024 * 1024 * 1024) {
            $this->s3->putObject($input);

            return true;
        }

        $uploadId = $this->s3->createMultipartUpload($input)->getUploadId();
        foreach ($data as $chunk) {
            $this->s3->uploadPart($input);
        }
        $this->s3->completeMultipartUpload($input);

        return true;
    }

    /**
     * @throws Exception
     */
    public function download(object $data): GetObjectOutput
    {
        // TODO convert $data to $input

        if (!$input['Bucket']) {
            $input['Bucket'] = $this->bucket;
        }

        // TODO convert input to GetObjectRequest

        $result = $this->s3->getObject($input);
        $result->resolve();

        return $result;
    }

    /**
     * @throws Exception
     */
    public function remove(object $data): bool
    {
        // TODO convert $data to $input

        if (!$input['Bucket']) {
            $input['Bucket'] = $this->bucket;
        }

        // TODO convert input to GetObjectRequest

        try {
            $this->s3->deleteObject($input);
        } catch (HttpException $e) {
            return false;
        }

        return true;
    }
}

@Nyholm
Copy link
Member

Nyholm commented May 7, 2020

If I can get some help with a draft of an API (or class skeleton) then I would be happy implementing this.
I was sure it was a SimpleS3 abstraction somewhere but I cannot find it.

@jderusse
Copy link
Member

jderusse commented May 7, 2020

@Nyholm
Copy link
Member

Nyholm commented May 24, 2020

I started here: #653

@stof Could you have a look if the API is helpful for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants