Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use Pre-signed URLs for multipart upload. #2305

Closed
harshit196 opened this issue Feb 25, 2020 · 15 comments
Closed

How to use Pre-signed URLs for multipart upload. #2305

harshit196 opened this issue Feb 25, 2020 · 15 comments
Assignees
Labels

Comments

@harshit196
Copy link

I am already using the presigned URLs to enable client side to put files in S3. But the issue which is still not clear is how to put large files to S3 using these pre-signed URLs in multipart form.

@swetashre swetashre self-assigned this Feb 25, 2020
@swetashre
Copy link
Contributor

@harshit196 - Thank you for your post. For mutlipart upload first you have to use 3 api call

  1. create_multipart_upload
  2. upload_part
  3. complete_multipart_upload

You can use pre-signed url with any of these operations. For example in the below code i have used with with upload part api call :

import boto3
import requests

s3 = boto3.client('s3')
max_size = 5 * 1024 * 1024 #you can define your own size

res = s3.create_multipart_upload(Bucket=bucket_name, Key=key)
upload_id = res['UploadId']

# please note this is for only 1 part of the file, you have to do it for all parts and store all the etag, partnumber in a list 

parts=[]
signed_url = s3.generate_presigned_url(ClientMethod='upload_part',Params={'Bucket': bucket_name, 'Key': key, 'UploadId': upload_id, 'PartNumber': part_no})

with target_file.open('rb') as f:
      file_data = f.read(max_size) #here reading content of only 1 part of file 

res = requests.put(signed_url, data=file_data)

etag = res.headers['ETag']
parts.append({'ETag': etag, 'PartNumber': part}) #you have to append etag and partnumber of each parts  

#After completing for all parts, you will use complete_multipart_upload api which requires that parts list 
res = s3.complete_multipart_upload(Bucket=bucket_name, Key=key, MultipartUpload={'Parts': parts},UploadId=upload_id)

In this above example i have used presigned_url only for upload_part api. But if you want you can use for all the three api call.

Hope it helps and please let me know if you have any questions.

@swetashre swetashre added s3 closing-soon This issue will automatically close in 4 days unless further comments are made. labels Feb 25, 2020
@harshit196
Copy link
Author

harshit196 commented Feb 26, 2020

Thanks for the detailed response.
I still have one query.
In step-2 of the above three steps , client side will request the server for the pre-signed URLs untill the complete file is put to the bucket. Since we can always pre-compute the number of parts into which the file will be divided to, lets assume that file is divide into 15 parts, then can the server send the 15 pre-signed URLs to the client in one call or client will have to ask one by one to the server for pre-signed URLs.
In case if it is possible to send the multiple pre-signed URLs to client in one go , will s3 be able to make a single object out of multiple parts based on the part number and Etag value.

@no-response no-response bot removed the closing-soon This issue will automatically close in 4 days unless further comments are made. label Feb 26, 2020
@swetashre
Copy link
Contributor

generate_presigned_url is a local operation. Client(Botocore) does not make any call to the server(s3) to return the url. The function creates a signed url based on the given parameters.

So as per your question if you have 15 parts then you have to generate 15 signed url and then use those url with requests.put() operation to upload each part to s3.

@swetashre swetashre added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Feb 26, 2020
@harshit196
Copy link
Author

Thanks for great help.

@no-response no-response bot removed the closing-soon This issue will automatically close in 4 days unless further comments are made. label Feb 27, 2020
@julien-c
Copy link

Reading your code sample @swetashre, I was wondering: is there any way to leverage boto3's multipart file upload capabilities (i.e. retries, multithreading, etc.), when using presigned URLs?

i.e. Is there any way to use S3Tranfer, boto3.s3.upload_file, or boto3.s3.MultipartUpload with presigned urls?

@julien-c
Copy link

(For context, we allow our users to upload large files, and we'd rather use boto3's code for the actual multipart uploads, than roll out our own custom code to upload each chunk with e.g. requests)

@PN-picsell
Copy link

@julien-c did you find a way to achieve this with boto3-only methods ? We have the exact same needs as you and have our own custom code covering the whole process efficiently, but as we are rethinking part of our codebase I came here to see if there is any new simpler way to leverage Boto3's API.

@matteosimone
Copy link

matteosimone commented Jun 21, 2021

@julien-c @PN-picsell Continuing the chain.. did either of you achieve this by reusing anything from boto3?

@julien-c
Copy link

@matteosimone @PN-picsell No, I rolled my own...

@lebovic
Copy link

lebovic commented Dec 14, 2021

@julien-c did you happen to implement the multipart uploads in a public repo?

We're about to roll our own as well for https://github.com/trytoolchest/toolchest-client-python/, and it would be amazing to have another open source reference for the additional functionality (retries, multithreading, etc).

@julien-c
Copy link

@lebovic not really, sorry. What I have in the open is mostly inside https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/commands/lfs.py but this is probably a bit specific to the context of implementing a LFS custom transfer agent. Let me know if this helps.

@lebovic
Copy link

lebovic commented Dec 14, 2021

@julien-c that's very helpful! Thanks for the reference.

@suzukieng
Copy link

Hi, has anyone actually managed to get a pre-signed URL for the complete_multipart_upload operation to work?

It seems to work fine for the upload_part operation, but the complete_multipart_upload pre-signed URL seems to be missing the MultipartUpload dictionary with the parts list.

This is how I create the URL:

url = s3_client.generate_presigned_url(
    'complete_multipart_upload',
    Params={
        'Bucket': self.bucket_name, 'Key': key, 'UploadId': upload_id,
        'MultipartUpload': {'Parts': parts_param}
    },
    ExpiresIn=PRESIGNED_URLS_EXPIRATION_SECONDS
)

This is the URL that gets spit out:

https://<bucket>.s3.amazonaws.com/<key>?uploadId=<upload_id>&AWSAccessKeyId=<upload_id>&Signature=<signature>&Expires=<expires>

Or am I expected to put the MultipartUpload stuff into the POST body? I don't think so, as it would have to be formatted exactly right so the signature will match.

Related SO question: https://stackoverflow.com/q/70754676/1370154

@TulyOpt
Copy link

TulyOpt commented Aug 26, 2022

@suzukieng , yes ! see https://stackoverflow.com/q/70754676/1370154 to update your code ( MultipartUpload is removed from Params + CompleteMultipartUpload passed in body as xml)

@jmaragh
Copy link

jmaragh commented Apr 11, 2023

suzukien

were you able to resolve this issue using generate_presigned_url ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants