New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 client no reuse of HTTPS connections #1128

Closed
peterbe opened this Issue Jun 14, 2017 · 10 comments

Comments

Projects
None yet
4 participants
@peterbe

peterbe commented Jun 14, 2017

I have a function that looks something like this:

def break_up(zip, bucket):
    session = boto3.session.Session()
    client = session.client('s3')

    # open the zip file object and iterate over its files
    for filename, fileobj, filesize in unpack(zip):
        try:
            existing_object = client.head_object(
                Bucket=bucket,
                Key=filename,
            )
            if existing_object['ContentLength'] == filesize:
                # Already have this in S3, no need to upload again
                continue
            update = True
            new = False
        except ClientError as ex:
            if exception.response['Error']['Code'] == '404':
               update = False
               new = True
       
         # Now upload, the file into S3
         client.put_object(
             Bucket=bucket,
             Key=filename,
             Body=fileobj,
        )
        # ... do some logging based on 'update' and 'new'

(Note! I didn't copy-n-paste my real code. Tried to "dumb" it down to the bare essentials for the sake of this issue)

So, as you can see, it opens a .zip file, for each file within it does a HeadObject followed by a PutObject (if the file needs to be uploaded again).

The code works. For a .zip file with about 40 files, the whole thing finishes in a couple of seconds.

However, I get this logging output:

worker_1         | [2017-06-14 15:07:04,876: INFO/PoolWorker-1] Starting new HTTPS connection (2): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:05,688: INFO/PoolWorker-1] Starting new HTTPS connection (3): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:06,454: INFO/PoolWorker-1] Starting new HTTPS connection (4): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:07,111: INFO/PoolWorker-1] Starting new HTTPS connection (5): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:08,043: INFO/PoolWorker-1] Starting new HTTPS connection (6): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:08,563: INFO/PoolWorker-1] Starting new HTTPS connection (7): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:09,123: INFO/PoolWorker-1] Starting new HTTPS connection (8): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:09,948: INFO/PoolWorker-1] Starting new HTTPS connection (9): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:10,781: INFO/PoolWorker-1] Starting new HTTPS connection (10): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:11,484: INFO/PoolWorker-1] Starting new HTTPS connection (11): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:12,494: INFO/PoolWorker-1] Starting new HTTPS connection (12): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:13,636: INFO/PoolWorker-1] Starting new HTTPS connection (13): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:14,439: INFO/PoolWorker-1] Starting new HTTPS connection (14): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:15,227: INFO/PoolWorker-1] Starting new HTTPS connection (15): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:16,034: INFO/PoolWorker-1] Starting new HTTPS connection (16): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:16,644: INFO/PoolWorker-1] Starting new HTTPS connection (17): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:17,588: INFO/PoolWorker-1] Starting new HTTPS connection (18): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:18,793: INFO/PoolWorker-1] Starting new HTTPS connection (19): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:19,359: INFO/PoolWorker-1] Starting new HTTPS connection (20): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:20,076: INFO/PoolWorker-1] Starting new HTTPS connection (21): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:20,668: INFO/PoolWorker-1] Starting new HTTPS connection (22): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:21,230: INFO/PoolWorker-1] Starting new HTTPS connection (23): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:21,886: INFO/PoolWorker-1] Starting new HTTPS connection (24): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:22,790: INFO/PoolWorker-1] Starting new HTTPS connection (25): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:23,483: INFO/PoolWorker-1] Starting new HTTPS connection (26): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:25,057: INFO/PoolWorker-1] Starting new HTTPS connection (27): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:26,409: INFO/PoolWorker-1] Starting new HTTPS connection (28): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:27,264: INFO/PoolWorker-1] Starting new HTTPS connection (29): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:28,056: INFO/PoolWorker-1] Starting new HTTPS connection (30): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:28,688: INFO/PoolWorker-1] Starting new HTTPS connection (31): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:29,293: INFO/PoolWorker-1] Starting new HTTPS connection (32): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:31,310: INFO/PoolWorker-1] Starting new HTTPS connection (33): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:31,791: INFO/PoolWorker-1] Starting new HTTPS connection (34): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:33,230: INFO/PoolWorker-1] Starting new HTTPS connection (35): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:34,240: INFO/PoolWorker-1] Starting new HTTPS connection (36): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:34,993: INFO/PoolWorker-1] Starting new HTTPS connection (37): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:35,450: INFO/PoolWorker-1] Starting new HTTPS connection (38): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:36,274: INFO/PoolWorker-1] Starting new HTTPS connection (39): symbols-public-dev.s3.amazonaws.com
worker_1         | [2017-06-14 15:07:37,697: INFO/PoolWorker-1] Starting new HTTPS connection (40): symbols-public-dev.s3.amazonaws.com

(Sorry for the formatting. It comes from my Celery task worker)

The point is, even though I reuse the same session (all within under 60 seconds) it seems to make a new connection each time.

Either this is user error or a bug in that it's unable to re-use an established HTTP connection.

boto3==1.4.4
botocore==1.5.66
s3transfer==0.1.10

@jamesls

This comment has been minimized.

Member

jamesls commented Jun 14, 2017

Not able to repro this. We rely on requests/urllib3 to do the connection pooling for us. Given the short duration in your logs, the only think I can think of for why it's creating a new connection would be if these requests happen in separate threads. If I try this script below I only get a single connection:

# file: /tmp/test.py
import boto3
import boto3.session
boto3.set_stream_logger('')


session = boto3.session.Session()
s3 = session.client('s3')
s3.put_object(Bucket='bucketname-here', Key='asdfasdf', Body=b'asdfadf')
s3.put_object(Bucket='bucketname-here', Key='asdfasdf', Body=b'asdfadf')
s3.put_object(Bucket='bucketname-here', Key='asdfasdf', Body=b'asdfadf')
s3.put_object(Bucket='bucketname-here', Key='asdfasdf', Body=b'asdfadf')
s3.put_object(Bucket='bucketname-here', Key='asdfasdf', Body=b'asdfadf')
s3.put_object(Bucket='bucketname-here', Key='asdfasdf', Body=b'asdfadf')

I get:

$ python /tmp/test.py 2>&1 | grep connectionpool
2017-06-14 16:23:35,127 botocore.vendored.requests.packages.urllib3.connectionpool [INFO] Starting new HTTPS connection (1): bucketname-here.s3.amazonaws.com
2017-06-14 16:23:35,231 botocore.vendored.requests.packages.urllib3.connectionpool [DEBUG] "PUT /asdfasdf HTTP/1.1" 200 0
2017-06-14 16:23:35,268 botocore.vendored.requests.packages.urllib3.connectionpool [DEBUG] "PUT /asdfasdf HTTP/1.1" 200 0
2017-06-14 16:23:35,308 botocore.vendored.requests.packages.urllib3.connectionpool [DEBUG] "PUT /asdfasdf HTTP/1.1" 200 0
2017-06-14 16:23:35,340 botocore.vendored.requests.packages.urllib3.connectionpool [DEBUG] "PUT /asdfasdf HTTP/1.1" 200 0
2017-06-14 16:23:35,376 botocore.vendored.requests.packages.urllib3.connectionpool [DEBUG] "PUT /asdfasdf HTTP/1.1" 200 0
2017-06-14 16:23:35,409 botocore.vendored.requests.packages.urllib3.connectionpool [DEBUG] "PUT /asdfasdf HTTP/1.1" 200 0

If you try my script (replacing with your own bucket name) do you get similar output?

@peterbe

This comment has been minimized.

peterbe commented Jun 16, 2017

Hmm... I can't reproduce it myself either.

from botocore.exceptions import ClientError
import boto3
import boto3.session
boto3.set_stream_logger('')


session = boto3.session.Session()
s3 = session.client('s3')
B = 'symbols-public-dev'
K = 'test_boto3_issue_1128.txt'
for i in range(10):
    try:
        s3.head_object(
            Bucket=B,
            Key=K,
        )
        print("EXISTED ALREADY")
    except ClientError as exc:
        if exc.response['Error']['Code'] == '404':
            print("DID NOT EXIST")
        else:
            raise
    s3.put_object(Bucket=B, Key=K, Body=b'asdfasdf')

and the output

root@847ffb8e91bd:/app# python boto3_issue_1128_2.py 2>&1 | grep connectionpool
2017-06-16 15:50:36,058 botocore.vendored.requests.packages.urllib3.connectionpool [INFO] Starting new HTTPS connection (1): symbols-public-dev.s3.amazonaws.com
2017-06-16 15:50:36,485 botocore.vendored.requests.packages.urllib3.connectionpool [DEBUG] "HEAD /test_boto3_issue_1128.txt HTTP/1.1" 200 0
2017-06-16 15:50:36,625 botocore.vendored.requests.packages.urllib3.connectionpool [DEBUG] "PUT /test_boto3_issue_1128.txt HTTP/1.1" 200 0
2017-06-16 15:50:36,698 botocore.vendored.requests.packages.urllib3.connectionpool [DEBUG] "HEAD /test_boto3_issue_1128.txt HTTP/1.1" 200 0
2017-06-16 15:50:36,912 botocore.vendored.requests.packages.urllib3.connectionpool [DEBUG] "PUT /test_boto3_issue_1128.txt HTTP/1.1" 200 0
2017-06-16 15:50:36,983 botocore.vendored.requests.packages.urllib3.connectionpool [DEBUG] "HEAD /test_boto3_issue_1128.txt HTTP/1.1" 200 0
2017-06-16 15:50:37,123 botocore.vendored.requests.packages.urllib3.connectionpool [DEBUG] "PUT /test_boto3_issue_1128.txt HTTP/1.1" 200 0
2017-06-16 15:50:37,194 botocore.vendored.requests.packages.urllib3.connectionpool [DEBUG] "HEAD /test_boto3_issue_1128.txt HTTP/1.1" 200 0
2017-06-16 15:50:37,341 botocore.vendored.requests.packages.urllib3.connectionpool [DEBUG] "PUT /test_boto3_issue_1128.txt HTTP/1.1" 200 0
2017-06-16 15:50:37,415 botocore.vendored.requests.packages.urllib3.connectionpool [DEBUG] "HEAD /test_boto3_issue_1128.txt HTTP/1.1" 200 0
2017-06-16 15:50:37,556 botocore.vendored.requests.packages.urllib3.connectionpool [DEBUG] "PUT /test_boto3_issue_1128.txt HTTP/1.1" 200 0
2017-06-16 15:50:37,629 botocore.vendored.requests.packages.urllib3.connectionpool [DEBUG] "HEAD /test_boto3_issue_1128.txt HTTP/1.1" 200 0
2017-06-16 15:50:37,771 botocore.vendored.requests.packages.urllib3.connectionpool [DEBUG] "PUT /test_boto3_issue_1128.txt HTTP/1.1" 200 0
2017-06-16 15:50:37,844 botocore.vendored.requests.packages.urllib3.connectionpool [DEBUG] "HEAD /test_boto3_issue_1128.txt HTTP/1.1" 200 0
2017-06-16 15:50:37,985 botocore.vendored.requests.packages.urllib3.connectionpool [DEBUG] "PUT /test_boto3_issue_1128.txt HTTP/1.1" 200 0
2017-06-16 15:50:39,242 botocore.vendored.requests.packages.urllib3.connectionpool [DEBUG] "HEAD /test_boto3_issue_1128.txt HTTP/1.1" 200 0
2017-06-16 15:50:39,394 botocore.vendored.requests.packages.urllib3.connectionpool [DEBUG] "PUT /test_boto3_issue_1128.txt HTTP/1.1" 200 0
2017-06-16 15:50:39,465 botocore.vendored.requests.packages.urllib3.connectionpool [DEBUG] "HEAD /test_boto3_issue_1128.txt HTTP/1.1" 200 0
2017-06-16 15:50:39,606 botocore.vendored.requests.packages.urllib3.connectionpool [DEBUG] "PUT /test_boto3_issue_1128.txt HTTP/1.1" 200 0
2017-06-16 15:50:39,676 botocore.vendored.requests.packages.urllib3.connectionpool [DEBUG] "HEAD /test_boto3_issue_1128.txt HTTP/1.1" 200 0
2017-06-16 15:50:39,844 botocore.vendored.requests.packages.urllib3.connectionpool [DEBUG] "PUT /test_boto3_issue_1128.txt HTTP/1.1" 200 0

I need to dig deeper to see what of the environment that's causing the recreation of a connection.

@peterbe

This comment has been minimized.

peterbe commented Jun 16, 2017

A ha! I figured it out. At least how to reproduce it.
Consider this change:

from botocore.exceptions import ClientError
import boto3
import boto3.session
boto3.set_stream_logger('')


session = boto3.session.Session()
s3 = session.client('s3')
B = 'symbols-public-dev'
K_base = 'test_boto3_issue_1128_{}.txt'
import random
for i in range(10):
    K = K_base.format(random.random())
    try:
        s3.head_object(
            Bucket=B,
            Key=K,
        )
        print("EXISTED ALREADY")
    except ClientError as exc:
        if exc.response['Error']['Code'] == '404':
            print("DID NOT EXIST")
        else:
            raise
    s3.put_object(Bucket=B, Key=K, Body=b'asdfasdf')

Now, the filename (aka. key name) is always different so every time it checks if the file is there, it concludes that it needs to do the s3.put_object.

If you do that you get Starting new HTTPS connection for every iteration in the loop.

I suspect that because the there's an exception frame, it resets the connection.

Do you know a different way to ask if the file exists without having to rely on try:except ClientError:?

@peterbe

This comment has been minimized.

peterbe commented Jun 16, 2017

For the record, my code above is a bit nuts because it does the s3.pub_object independent of the business logic lines above. In my real app I only do the put_object if the file did not exist.

@peterbe

This comment has been minimized.

peterbe commented Jun 16, 2017

Sure enough, if you do this logic instead, it works!

for i in range(10):
    K = K_base.format(random.random())

    response = s3.list_objects(
        Bucket=B,
        Prefix=K,
    )
    existing_keys = [x['Key'] for x in response['Contents']]
    if K in existing_keys:
        print("EXISTED ALREADY")
    else:
        print("DID NOT EXIST")
    s3.put_object(Bucket=B, Key=K, Body=b'asdfasdf')

So the bug here is that raising an exception does reset any connections.

Now, I wonder, if it's ANY exception or just botocore.exceptions.ClientError.

@peterbe

This comment has been minimized.

peterbe commented Jun 16, 2017

I injected an error...

try:
    1/0
except Exception:
    pass

in the loop and it didn't reset the connection.

@peterbe

This comment has been minimized.

peterbe commented Jun 16, 2017

For the record, the problem with starting a new connection only happens if you do a put_object after the exception has happened.

@joguSD

This comment has been minimized.

Contributor

joguSD commented Jul 21, 2017

Possibly related to boto/botocore#1248?

@timuralp

This comment has been minimized.

timuralp commented Jul 22, 2017

@peterbe @joguSD it is the same issue as boto/botocore#1248.

@peterbe the reason you see new connections after an error is because S3 responses on error sets the Transfer-Encoding to chunked, which triggers this bug in urllib3: urllib3/urllib3#1234.

You can turn on logging/wire logs and see the problem first hand:

send: 'HEAD /<bucket>/<existing key> HTTP/1.1\r\nHost: s3.amazonaws.com\r\nAccept-Encoding: identity\r\nX-Amz-Content-SHA256: <HASH>\r\nAuthorization: AWS4-HMAC-SHA256 Credential=<AWS ACCESS KEY ID>/20170722/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=<Signature>\r\nX-Amz-Date: 20170722T175805Z\r\nUser-Agent: Boto3/1.4.4 Python/2.7.11+ Linux/4.11.0-1-amd64 Botocore/1.5.86\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: x-amz-id-2: DqWO9cXvRMm73NyS2FzftvtxaG3vzCEfwxLsujOj7gvnsxVe/ITcUoKlQrcpnwgu3cSPRkP5lQI=
header: x-amz-request-id: F30D7265A62AB402
header: Date: Sat, 22 Jul 2017 17:58:06 GMT
header: Last-Modified: Sat, 22 Jul 2017 17:58:06 GMT
header: ETag: "d41d8cd98f00b204e9800998ecf8427e"
header: Accept-Ranges: bytes
header: Content-Type: binary/octet-stream
header: Content-Length: 0
header: Server: AmazonS3

send: 'HEAD /<bucket>/<non existing key> HTTP/1.1\r\nHost: s3.amazonaws.com\r\nAccept-Encoding: identity\r\nX-Amz-Content-SHA256: <HASH>\r\nAuthorization: AWS4-HMAC-SHA256 Credential=<AWS ACCESS KEY ID>/20170722/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=<Signature>\r\nX-Amz-Date: 20170722T175959Z\r\nUser-Agent: Boto3/1.4.4 Python/2.7.11+ Linux/4.11.0-1-amd64 Botocore/1.5.86\r\n\r\n'
reply: 'HTTP/1.1 404 Not Found\r\n'
header: x-amz-request-id: 6EEF16DFB4CE8782
header: x-amz-id-2: yPx8Sq+Par0pQ3SfBT2FTH3XrukfcOlRpVNzM8+Yy3KEioRtaRWoAz0heZNaq+mCuurNmo6x5sY=
header: Content-Type: application/xml
header: Transfer-Encoding: chunked
header: Date: Sat, 22 Jul 2017 17:59:59 GMT
header: Server: AmazonS3

In the first case, the key exists and there is no "Transfer-Encoding" header. In the second case, the key does not exist, the "Transfer-Encoding: chunked" header is set and the connection is not released (the last part I omitted).

One of these bugs should be set as the dup of the other.

@joguSD

This comment has been minimized.

Contributor

joguSD commented Aug 3, 2017

Duplicate of boto/botocore#1248

@joguSD joguSD marked this as a duplicate of boto/botocore#1248 Aug 3, 2017

@joguSD joguSD closed this Aug 3, 2017

@joguSD joguSD added the duplicate label Aug 3, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment