Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boto3 Client Put Object Hangs Indefinitely #3657

Closed
jacob-aegis opened this issue Apr 7, 2023 · 12 comments
Closed

Boto3 Client Put Object Hangs Indefinitely #3657

jacob-aegis opened this issue Apr 7, 2023 · 12 comments
Assignees
Labels
bug This issue is a confirmed bug. closed-for-staleness p3 This is a minor priority issue response-requested Waiting on additional information or feedback. s3

Comments

@jacob-aegis
Copy link

Describe the bug

We're uploading image bytes frequently to the same bucket in s3. We upload several million images each day using this same code snippet, but we are finding that put_object has intermittent problems with hanging indefinitely (around 1000 uploads each day). The only resolution has been to relaunch the application pod with the faulty s3 client,

Any suggestions to help catch why this is hanging? It's in a try except/block within our application, but no exceptions are being logged.

Expected Behavior

Successful upload or an exception

Current Behavior

Indefinite hanging

Reproduction Steps

import boto3
from botocore.client import Config as BotoConfig

s3_client = boto3.client("s3", config=botocore.client.Config(max_pool_connections=50))

s3_client.put_object(Bucket=bucket, Key=key,  Body=img_bytes, StorageClass='STANDARD_IA')

Possible Solution

No response

Additional Information/Context

No response

SDK version used

boto3-1.26.108

Environment details (OS name and version, etc.)

Amazon Linux 2023 ARM

@jacob-aegis jacob-aegis added bug This issue is a confirmed bug. needs-triage This issue or PR still needs to be triaged. labels Apr 7, 2023
@RyanFitzSimmonsAK
Copy link
Contributor

Hi @jacob-aegis, thanks for reaching out. Just to be clear, do you mean that around 1000 images are hanging out of those that you upload, or that they continuously hang after reaching your 1000th upload? Also, could you provide debug logs of an upload that hangs? You can get debug logs by adding boto3.set_stream_logger('') to the top of your script and redacting any sensitive information. Thanks!

@RyanFitzSimmonsAK RyanFitzSimmonsAK self-assigned this Apr 7, 2023
@RyanFitzSimmonsAK RyanFitzSimmonsAK added s3 p3 This is a minor priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Apr 7, 2023
@jacob-aegis
Copy link
Author

We have about 2000+ pods on Kubernetes each day and each pod uploads images to s3 during it's lifecycle. Across those 2000+ pods we upload up to 10M images a day.

We are seeing around 10 pod's s3 client running into this hanging issue, which I estimate leads to about 1000 hanging uploads. The pods themselves are able to upload to other services (dynamo, RDS, etc.) and have internet access otherwise.

I'm hesitant to turn on debug logging since i'd have to do so on all 2000+ pods since there is no pattern to why the uploads are sometimes hanging.

@jacob-aegis
Copy link
Author

jacob-aegis commented Apr 7, 2023

We changed our code to generate a presigned url and then PUT with requests. It's still showing intermittent hanging on the PUT and adding a timeout to the request seems to have no effect

def upload_with_presigned_url(url, bytes, key, is_jpg):
        headers = {'content-type': 'binary/octet-stream'} if is_jpg else {}
        response = requests.put(url, data=bytes, headers=headers, timeout=3.05)
        if(response.status_code not in [200, 201]):
            logging.error(response.text)
            logging.error(f"Failed to upload with presigned url {key}")


url = s3_client.generate_presigned_url("put_object", {'Bucket': bucket, 'Key': key, 'StorageClass':'STANDARD_IA', 'ContentType':'binary/octet-stream'}, 1000)

upload_with_presigned_url(url, bytes, key, is_jpg)

@RyanFitzSimmonsAK RyanFitzSimmonsAK added the investigating This issue is being investigated and/or work is in progress to resolve the issue. label Apr 7, 2023
@RyanFitzSimmonsAK
Copy link
Contributor

Thanks for following up. In your config, how do you have retries configured?

@RyanFitzSimmonsAK RyanFitzSimmonsAK added response-requested Waiting on additional information or feedback. and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. labels Apr 7, 2023
@jacob-aegis
Copy link
Author

We're not passing retries in our config. We're only using requests for the put at the moment, but can switch back to uploading via the S3 Client's put_object. What config do you suggest?

@jacob-aegis
Copy link
Author

Realizing we actually already tested with this before but still saw hanging:

botoconfig = BotoConfig(max_pool_connections=50, read_timeout=1, retries={'max_attempts': 1})

@RyanFitzSimmonsAK RyanFitzSimmonsAK removed the response-requested Waiting on additional information or feedback. label Apr 10, 2023
@RyanFitzSimmonsAK
Copy link
Contributor

RyanFitzSimmonsAK commented Apr 10, 2023

By specifying the max_attempts at 1, and not specifying the mode, you're actually overriding legacy mode's default of 5. I would recommend trying retries with standard mode and without overriding max_attempts, and letting me know how that works. Thanks!

@jacob-aegis
Copy link
Author

We tried standard mode and unfortunately we are still seeing the same issues. In the interim we have added a mechanism to reboot the application if it notices hanging, which is a sufficient workaround for the time being.

@RyanFitzSimmonsAK
Copy link
Contributor

Glad to hear you have a workaround while we figure this out. I do want to ask again if there is any way you can get debug logs of a failed upload. Those would really help me narrow this issue down. Otherwise, is there any consistency in the failed uploads that might point us in the right direction?

@RyanFitzSimmonsAK RyanFitzSimmonsAK added the response-requested Waiting on additional information or feedback. label Apr 19, 2023
@RyanFitzSimmonsAK RyanFitzSimmonsAK removed the bug This issue is a confirmed bug. label Apr 26, 2023
@github-actions
Copy link

Greetings! It looks like this issue hasn’t been active in longer than five days. We encourage you to check if this is still an issue in the latest release. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or upvote with a reaction on the initial post to prevent automatic closure. If the issue is already closed, please feel free to open a new one.

@github-actions github-actions bot added closing-soon This issue will automatically close in 4 days unless further comments are made. closed-for-staleness and removed closing-soon This issue will automatically close in 4 days unless further comments are made. labels Apr 26, 2023
@RyanFitzSimmonsAK RyanFitzSimmonsAK added the bug This issue is a confirmed bug. label Nov 2, 2023
@Erokos
Copy link

Erokos commented Nov 10, 2023

Hi,
can we keep this issue open? I noticed that the upload of files from our django app hangs for 5s before it evenutally manages to upload it to the bucket and this happens only on ARM kubernetes clusters. Any info on when this issue will be resolved or the status of solving this?

@Erokos
Copy link

Erokos commented Nov 23, 2023

For anyone coming across this thread with similar issues to the one I described above, we managed to resolve it by updating our base Docker image of our app to python:${PYTHON_VERSION}-slim-buster to python:${PYTHON_VERSION}-slim-bookworm which resolved our issues with uploading.
Our logic was that boto probably uses some underlying OS package which could be malfunctioning on older OSes for ARM so updating helped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a confirmed bug. closed-for-staleness p3 This is a minor priority issue response-requested Waiting on additional information or feedback. s3
Projects
None yet
Development

No branches or pull requests

3 participants