New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't use bucket names with dots #2836
Comments
Having the same issue here, running on Python 2.7.9. Python 2.7.9 introduced strict certificate checking. This might be why the error is happening. @lewisdiamond — are you also running Python 2.7.9? |
Note: setting the following configuration solves the issue for me:
|
@krallin yes, 2.7.9 |
Same here and @krallin's config change fixes it for me. |
Same issue experienced here. Solved by @krallin's fix. |
Thanks @krallin, your fix works for me on Python 2.7.9. |
This looks like an AWS bug to me, as far as I can tell from the various RFCs, |
This workaround does not work for me. Test program: from boto.s3.connection import S3Connection
conn = S3Connection()
print conn.get_bucket("web-autobahn-ws")
print conn.get_bucket("autobahn.ws") Without the workaround in $ python test.py
<Bucket: web-autobahn-ws>
Traceback (most recent call last):
File "test.py", line 4, in <module>
print conn.get_bucket("autobahn.ws")
File "c:\Python27\lib\site-packages\boto\s3\connection.py", line 502, in get_bucket
return self.head_bucket(bucket_name, headers=headers)
File "c:\Python27\lib\site-packages\boto\s3\connection.py", line 521, in head_bucket
response = self.make_request('HEAD', bucket_name, headers=headers)
File "c:\Python27\lib\site-packages\boto\s3\connection.py", line 664, in make_request
retry_handler=retry_handler
File "c:\Python27\lib\site-packages\boto\connection.py", line 1068, in make_request
retry_handler=retry_handler)
File "c:\Python27\lib\site-packages\boto\connection.py", line 942, in _mexe
request.body, request.headers)
File "c:\Python27\lib\httplib.py", line 1001, in request
self._send_request(method, url, body, headers)
File "c:\Python27\lib\httplib.py", line 1035, in _send_request
self.endheaders(body)
File "c:\Python27\lib\httplib.py", line 997, in endheaders
self._send_output(message_body)
File "c:\Python27\lib\httplib.py", line 850, in _send_output
self.send(msg)
File "c:\Python27\lib\httplib.py", line 812, in send
self.connect()
File "c:\Python27\lib\httplib.py", line 1216, in connect
server_hostname=server_hostname)
File "c:\Python27\lib\ssl.py", line 350, in wrap_socket
_context=self)
File "c:\Python27\lib\ssl.py", line 566, in __init__
self.do_handshake()
File "c:\Python27\lib\ssl.py", line 796, in do_handshake
match_hostname(self.getpeercert(), self.server_hostname)
File "c:\Python27\lib\ssl.py", line 269, in match_hostname
% (hostname, ', '.join(map(repr, dnsnames))))
ssl.CertificateError: hostname 'autobahn.ws.s3.amazonaws.com' doesn't match either of '*.s3.amazonaws.com', 's3.amazonaws.com' With the workaround: $ python test.py
Traceback (most recent call last):
File "test.py", line 3, in <module>
print conn.get_bucket("web-autobahn-ws")
File "c:\Python27\lib\site-packages\boto\s3\connection.py", line 502, in get_bucket
return self.head_bucket(bucket_name, headers=headers)
File "c:\Python27\lib\site-packages\boto\s3\connection.py", line 549, in head_bucket
response.status, response.reason, body)
boto.exception.S3ResponseError: S3ResponseError: 301 Moved Permanently |
FWIW, here is how to monkey patch away hostname verification: import ssl
if hasattr(ssl, '_create_unverified_context'):
ssl._create_default_https_context = ssl._create_unverified_context Other than that, it seems, migrating to Cloudfront (which doesn't require source S3 buckets to be dotted), might be an option. |
Here is a more specific monkey patch: import ssl
_old_match_hostname = ssl.match_hostname
def _new_match_hostname(cert, hostname):
if hostname.endswith('.s3.amazonaws.com'):
pos = hostname.find('.s3.amazonaws.com')
hostname = hostname[:pos].replace('.', '') + hostname[pos:]
return _old_match_hostname(cert, hostname)
ssl.match_hostname = _new_match_hostname |
I imagine S3 can't (or just doesn't) generate certificates on the fly for buckets that don't match the generic certificate. However, it might make sense for boto to default to the ordinary calling format (vs. the subdomain calling format)? At least for "dotted" buckets where the subdomain calling format will not work? The default might have to do with #443, though. Is your bucket located outside of us-east-1? Your issue looks a lot like #443. |
Using a patched HTTP Connection seems to work too. The following code uses the standard library for SSL cert validation, but submits a different hostname for validation to pass (one that matches S3's cert). It appears on Python 2.7.9 (though it will not work on an earlier version of Python, since those don't have
Output:
|
Note that I'm still getting a 400 when connecting to S3 in Frankfurt, but I think that's because S3 in Frankfurt requires a different signature format. |
I'm working on a patch here https://github.com/krallin/boto/compare/fix-2836 First, I'm making sure that cert validation is left up to Boto regardless of the Python version (which means that when Finally, I'll also try and add an option for boto to accept certs for "dotted" buckets on S3. Cheers, |
Same issue here with 2.7.9. However aws-cli, which is based on the new botocore works fine in case a certain way of passing the args is used. There's a related issue in aws-cli. |
Although boto disables certificate hostname validation for S3, the standard library still checks certificates in Python 2.7.9.
Although boto disables certificate hostname validation for S3, the standard library still checks certificates in Python 2.7.9.
Although boto disables certificate hostname validation for S3, the standard library still checks certificates in Python 2.7.9.
Yeah same here, getting an error when accessing buckets with dots. |
I'm having similar issues. With this code: conn = S3Connection(awsAccessKeyID, awsSecretKey) It works fine with a bucket name that has no periods in it. Like
And if I change the code to: conn = S3Connection(awsAccessKeyID, awsSecretKey, calling_format=OrdinaryCallingFormat()) it now works when there are periods in the name, but fails when there are no periods in the name. Here's the failure for when there are no periods in the name:
The only solution I've found to work is to if/else based on the type of bucket: if '.' in bucketName:
conn = S3Connection(awsAccessKeyID, awsSecretKey, calling_format=OrdinaryCallingFormat())
else:
conn = S3Connection(awsAccessKeyID, awsSecretKey) Hope that helps. EDIT: It's still failing for international buckets (anything other than US Standard). I'm trying to figure that out next. |
I'll post some info here to save a little time: the ordinary calling format uses older, "path-style" URLs like |
@gholms thanks for the insight! So I have buckets located in various regions with and without dots in their name that I have to upload to. I would like a clean solution to be able to handle them all. My current solution is now: conn = boto.s3.connect_to_region(
region,
aws_access_key_id=awsAccessKeyID,
aws_secret_access_key=awsSecretKey,
calling_format=OrdinaryCallingFormat()
) But this requires me to map a region, ( If my goal is to not require a region with each bucket, is my only option to wait for @krallin 's PR? |
@MATTSE , Unfortunately that PR doesn't really seem to be making much progress :( It's been lingering there for a while and I haven't really heard back ever. As explained a bit above, the change might have been caused by you upgrading to Python 2.7.9, which broke a lot of stuff that used to silently work (albeit insecurely!) as far as SSL is concerned. Cheers, |
This sets the remote logs URL to a S3 bucket, making sure our logs persist even if Airflow's host machine is destroyed. There's a caveat, though: we can't use buckets with dots in the name (e.g. "datastore.opentrials.net"). This is because Airflow still uses the older boto (not boto3) that has this issue (see boto/boto#2836 and https://issues.apache.org/jira/browse/AIRFLOW-115). Fixes opentrials/opentrials#763
This sets the remote logs URL to a S3 bucket, making sure our logs persist even if Airflow's host machine is destroyed. There's a caveat, though: we can't use buckets with dots in the name (e.g. "datastore.opentrials.net"). This is because Airflow still uses the older boto (not boto3) that has this issue (see boto/boto#2836 and https://issues.apache.org/jira/browse/AIRFLOW-115). Fixes opentrials/opentrials#763
Although boto disables certificate hostname validation for S3, the standard library still checks certificates in Python 2.7.9.
If you are experiencing this issue in airflow. Go the Admin -> Connection settings for your S3 connection, and add an extra key Your Extra field should look like:
|
@YunanHu Thanks. |
For S3 buckets with dots in them (eg qubole.customer_name), boto cannot verify ssl certs. The work-around is to use OrdinaryCallingFormat, which puts the bucket name in the url rather than the domain. More info in: boto/boto#2836
If you're looking for a solution that does not depend upon global configs, here's the code for connecting to a bucket in a non- import boto
def get_bucket( bucket_name ):
"""
Establish a connection and get an S3 bucket
"""
s3 = boto.s3.connect_to_region(
'us-east-2',
host='s3-us-east-2.amazonaws.com', # endpoint name from https://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region
calling_format=OrdinaryCallingFormat()
)
return s3.get_bucket(bucket_name) |
Based on @oberstet solution but with support of region zones import ssl
_old_match_hostname = ssl.match_hostname
def remove_dot(host):
"""
>>> remove_dot('a.x.s3-eu-west-1.amazonaws.com')
'ax.s3-eu-west-1.amazonaws.com'
>>> remove_dot('a.s3-eu-west-1.amazonaws.com')
'a.s3-eu-west-1.amazonaws.com'
>>> remove_dot('s3-eu-west-1.amazonaws.com')
's3-eu-west-1.amazonaws.com'
>>> remove_dot('a.x.s3-eu-west-1.example.com')
'a.x.s3-eu-west-1.example.com'
"""
if not host.endswith('.amazonaws.com'):
return host
parts = host.split('.')
h = ''.join(parts[:-3])
if h:
h += '.'
return h + '.'.join(parts[-3:])
def _new_match_hostname(cert, hostname):
return _old_match_hostname(cert, remove_dot(hostname))
ssl.match_hostname = _new_match_hostname |
It looks like boto in python 2 does not handle bucket names with periods in them [1]. An SSl CertificateError is thrown. Looks like possible solutions are monkey patching, adding a check for period to determine which S3 Calling Method to use, or update to boto3. Opted to update to boto3. [1] boto/boto#2836
It looks like boto in python 2 does not handle bucket names with periods in them [1]. An SSl CertificateError is thrown. Looks like possible solutions are monkey patching, adding a check for period to determine which S3 Calling Method to use, or update to boto3. Opted to update to boto3. [1] boto/boto#2836
@benlk Thanks for the function. For future reference, I'm writing to confirm that it's working in Python 3.7.4 and boto 2.49.0. |
Using boto for a s3 bucket named with dots, e.g. my.bucket.s3.amazonaws.com fails:
ssl.CertificateError: hostname 'my.bucket.s3.amazonaws.com' doesn't match either of '*.s3.amazonaws.com', 's3.amazonaws.com'
The text was updated successfully, but these errors were encountered: