Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't use bucket names with dots #2836

Open
lewisdiamond opened this issue Dec 22, 2014 · 99 comments
Open

Can't use bucket names with dots #2836

lewisdiamond opened this issue Dec 22, 2014 · 99 comments

Comments

@lewisdiamond
Copy link

Using boto for a s3 bucket named with dots, e.g. my.bucket.s3.amazonaws.com fails:

ssl.CertificateError: hostname 'my.bucket.s3.amazonaws.com' doesn't match either of '*.s3.amazonaws.com', 's3.amazonaws.com'

@krallin
Copy link
Contributor

krallin commented Dec 22, 2014

Having the same issue here, running on Python 2.7.9. Python 2.7.9 introduced strict certificate checking. This might be why the error is happening.

@lewisdiamond — are you also running Python 2.7.9?

@krallin
Copy link
Contributor

krallin commented Dec 22, 2014

Note: setting the following configuration solves the issue for me:

[s3]
calling_format = boto.s3.connection.OrdinaryCallingFormat

@lewisdiamond
Copy link
Author

@krallin yes, 2.7.9

@kmarekspartz
Copy link

Same here and @krallin's config change fixes it for me.

@starrify
Copy link

Same issue experienced here. Solved by @krallin's fix.

@jbmartin
Copy link

Thanks @krallin, your fix works for me on Python 2.7.9.

@alex
Copy link

alex commented Jan 3, 2015

This looks like an AWS bug to me, as far as I can tell from the various RFCs, *.domain.com in a SAN should not match domain1.domain2.domain.com.

@oberstet
Copy link

oberstet commented Jan 3, 2015

This workaround does not work for me.

Test program:

from boto.s3.connection import S3Connection
conn = S3Connection()
print conn.get_bucket("web-autobahn-ws")
print conn.get_bucket("autobahn.ws")

Without the workaround in .boto:

$ python test.py
<Bucket: web-autobahn-ws>
Traceback (most recent call last):
  File "test.py", line 4, in <module>
    print conn.get_bucket("autobahn.ws")
  File "c:\Python27\lib\site-packages\boto\s3\connection.py", line 502, in get_bucket
    return self.head_bucket(bucket_name, headers=headers)
  File "c:\Python27\lib\site-packages\boto\s3\connection.py", line 521, in head_bucket
    response = self.make_request('HEAD', bucket_name, headers=headers)
  File "c:\Python27\lib\site-packages\boto\s3\connection.py", line 664, in make_request
    retry_handler=retry_handler
  File "c:\Python27\lib\site-packages\boto\connection.py", line 1068, in make_request
    retry_handler=retry_handler)
  File "c:\Python27\lib\site-packages\boto\connection.py", line 942, in _mexe
    request.body, request.headers)
  File "c:\Python27\lib\httplib.py", line 1001, in request
    self._send_request(method, url, body, headers)
  File "c:\Python27\lib\httplib.py", line 1035, in _send_request
    self.endheaders(body)
  File "c:\Python27\lib\httplib.py", line 997, in endheaders
    self._send_output(message_body)
  File "c:\Python27\lib\httplib.py", line 850, in _send_output
    self.send(msg)
  File "c:\Python27\lib\httplib.py", line 812, in send
    self.connect()
  File "c:\Python27\lib\httplib.py", line 1216, in connect
    server_hostname=server_hostname)
  File "c:\Python27\lib\ssl.py", line 350, in wrap_socket
    _context=self)
  File "c:\Python27\lib\ssl.py", line 566, in __init__
    self.do_handshake()
  File "c:\Python27\lib\ssl.py", line 796, in do_handshake
    match_hostname(self.getpeercert(), self.server_hostname)
  File "c:\Python27\lib\ssl.py", line 269, in match_hostname
    % (hostname, ', '.join(map(repr, dnsnames))))
ssl.CertificateError: hostname 'autobahn.ws.s3.amazonaws.com' doesn't match either of '*.s3.amazonaws.com', 's3.amazonaws.com'

With the workaround:

$ python test.py
Traceback (most recent call last):
  File "test.py", line 3, in <module>
    print conn.get_bucket("web-autobahn-ws")
  File "c:\Python27\lib\site-packages\boto\s3\connection.py", line 502, in get_bucket
    return self.head_bucket(bucket_name, headers=headers)
  File "c:\Python27\lib\site-packages\boto\s3\connection.py", line 549, in head_bucket
    response.status, response.reason, body)
boto.exception.S3ResponseError: S3ResponseError: 301 Moved Permanently

@oberstet
Copy link

oberstet commented Jan 3, 2015

FWIW, here is how to monkey patch away hostname verification:

import ssl
if hasattr(ssl, '_create_unverified_context'):
   ssl._create_default_https_context = ssl._create_unverified_context

Other than that, it seems, migrating to Cloudfront (which doesn't require source S3 buckets to be dotted), might be an option.

@oberstet
Copy link

oberstet commented Jan 5, 2015

Here is a more specific monkey patch:

import ssl

_old_match_hostname = ssl.match_hostname

def _new_match_hostname(cert, hostname):
   if hostname.endswith('.s3.amazonaws.com'):
      pos = hostname.find('.s3.amazonaws.com')
      hostname = hostname[:pos].replace('.', '') + hostname[pos:]
   return _old_match_hostname(cert, hostname)

ssl.match_hostname = _new_match_hostname

@krallin
Copy link
Contributor

krallin commented Jan 5, 2015

@alex,

I imagine S3 can't (or just doesn't) generate certificates on the fly for buckets that don't match the generic certificate.

However, it might make sense for boto to default to the ordinary calling format (vs. the subdomain calling format)? At least for "dotted" buckets where the subdomain calling format will not work?

The default might have to do with #443, though.

@oberstet,

Is your bucket located outside of us-east-1? Your issue looks a lot like #443.

@oberstet
Copy link

oberstet commented Jan 5, 2015

@krallin yes, my bucket is in EU west. and yes, the workaround with the calling format stuff triggers an error that looks very similar to #443. currently, the only thing that works for me is monkey patching ..

@krallin
Copy link
Contributor

krallin commented Jan 5, 2015

@oberstet

Using a patched HTTP Connection seems to work too. The following code uses the standard library for SSL cert validation, but submits a different hostname for validation to pass (one that matches S3's cert).

It appears on Python 2.7.9 (though it will not work on an earlier version of Python, since those don't have ssl.SSLContext, so some conditionals would be required).

import logging
import socket, ssl
import re

import boto
from boto.https_connection import CertValidatingHTTPSConnection

logging.basicConfig(level=logging.WARNING)

class TestHttpConntection(CertValidatingHTTPSConnection):
    # !! Unsafe on Python < 2.7.9
    def __init__(self, *args, **kwargs):
        CertValidatingHTTPSConnection.__init__(self, *args, **kwargs)  # No super, it's an old-style class
        self.ssl_ctx = ssl.create_default_context(cafile=self.ca_certs)  # Defaults to cert validation
        if self.cert_file is not None:
            self.ssl_ctx.load_cert_chain(certfile=self.cert_file, keyfile=self.key_file)

    def connect(self):
        "Connect to a host on a given (SSL) port."
        if hasattr(self, "timeout"):
            sock = socket.create_connection((self.host, self.port), self.timeout)
        else:
            sock = socket.create_connection((self.host, self.port))

        if re.match(".*\.s3.*\.amazonaws\.com", self.host):
            patched_host = ".".join(self.host.rsplit(".", 4)[1:])
        boto.log.warn("Connecting to '%s', validated as '%s'", self.host, patched_host)
        self.sock = self.ssl_ctx.wrap_socket(sock, server_hostname=patched_host)


def main():
    from boto.s3.connection import S3Connection
    conn = S3Connection(https_connection_factory=(TestHttpConntection, ()))

    print conn.get_bucket("boto2836.us-east-1.test")
    print "Standard OK"

    print conn.get_bucket("boto2836.eu-west-1.test")
    print "EU OK"

if __name__ == "__main__":
    main()

Output:

WARNING:boto:Connecting to 'boto2836.us-east-1.test.s3.amazonaws.com', validated as 'test.s3.amazonaws.com'
<Bucket: boto2836.us-east-1.test>
Standard OK
WARNING:boto:Connecting to 'boto2836.eu-west-1.test.s3.amazonaws.com', validated as 'test.s3.amazonaws.com'
WARNING:boto:Connecting to 'boto2836.eu-west-1.test.s3-eu-west-1.amazonaws.com', validated as 'test.s3-eu-west-1.amazonaws.com'
<Bucket: boto2836.eu-west-1.test>
EU OK

@krallin
Copy link
Contributor

krallin commented Jan 5, 2015

Note that I'm still getting a 400 when connecting to S3 in Frankfurt, but I think that's because S3 in Frankfurt requires a different signature format.

@krallin
Copy link
Contributor

krallin commented Jan 5, 2015

I'm working on a patch here https://github.com/krallin/boto/compare/fix-2836

First, I'm making sure that cert validation is left up to Boto regardless of the Python version (which means that when validate_certs is None, as is in the case in S3Connection, certs are indeed not validated). This is done, so and I'm adding tests (I have a few done manually, just need to convert those to integration tests)

Finally, I'll also try and add an option for boto to accept certs for "dotted" buckets on S3.

Cheers,

@excieve
Copy link

excieve commented Jan 5, 2015

Same issue here with 2.7.9. However aws-cli, which is based on the new botocore works fine in case a certain way of passing the args is used. There's a related issue in aws-cli.

krallin added a commit to krallin/boto that referenced this issue Jan 5, 2015
Although boto disables certificate hostname validation for S3, the
standard library still checks certificates in Python 2.7.9.
krallin added a commit to krallin/boto that referenced this issue Jan 5, 2015
Although boto disables certificate hostname validation for S3, the
standard library still checks certificates in Python 2.7.9.
krallin added a commit to krallin/boto that referenced this issue Jan 5, 2015
Although boto disables certificate hostname validation for S3, the
standard library still checks certificates in Python 2.7.9.
powdahound added a commit to vantage-sh/ec2instances.info that referenced this issue Jan 26, 2015
@NamanJn
Copy link

NamanJn commented Feb 26, 2015

Yeah same here, getting an error when accessing buckets with dots.

@m-tse
Copy link

m-tse commented Mar 5, 2015

I'm having similar issues.

With this code:

conn = S3Connection(awsAccessKeyID, awsSecretKey)

It works fine with a bucket name that has no periods in it. Like matt-test
But if a bucket name has a period name in it, like matt.test, I'll get the following error:

InvalidCertificateException: Host matt.test.s3.amazonaws.com returned an invalid certificate (remote hostname \"matt.test.s3.amazonaws.com\" does not match certificate): {'notAfter': 'Apr  9 23:59:59 2015 GMT', 'subjectAltName': ((u'DNS', '*.s3.amazonaws.com'), (u'DNS', 's3.amazonaws.com')), 'subject': ((('countryName', u'US'),), (('stateOrProvinceName', u'Washington'),), (('localityName', u'Seattle'),), (('organizationName', u'Amazon.com Inc.'),), (('commonName', u'*.s3.amazonaws.com'),))}

And if I change the code to:

conn = S3Connection(awsAccessKeyID, awsSecretKey, calling_format=OrdinaryCallingFormat())

it now works when there are periods in the name, but fails when there are no periods in the name. Here's the failure for when there are no periods in the name:

<Error><Code>PermanentRedirect</Code><Message>The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.</Message><Bucket>matt-test</Bucket><Endpoint>matt-test.s3.amazonaws.com</Endpoint><RequestId>DB89A58C5FFB8B2E</RequestId><HostId>ODxMzw0brxB4PyqpmGD+Ecff8lak6DuULecHrt3S6PHcRclft8tFaDjUXRXd62dm</HostId></Error>"

The only solution I've found to work is to if/else based on the type of bucket:

if '.' in bucketName:
    conn = S3Connection(awsAccessKeyID, awsSecretKey, calling_format=OrdinaryCallingFormat())
else:
    conn = S3Connection(awsAccessKeyID, awsSecretKey)

Hope that helps.

EDIT: It's still failing for international buckets (anything other than US Standard). I'm trying to figure that out next.

@gholms
Copy link
Contributor

gholms commented Mar 5, 2015

I'll post some info here to save a little time: the ordinary calling format uses older, "path-style" URLs like https://s3-us-west-2.amazonaws.com/bukkit/key to address things, which means when you create the S3Connection you have to ensure you point it at the right endpoint or the service will reply with a redirect that is intentionally difficult to handle automatically. As long as you use the DNS name that matches the region of the bucket you want to work with it should work just fine.

@m-tse
Copy link

m-tse commented Mar 6, 2015

@gholms thanks for the insight!

So I have buckets located in various regions with and without dots in their name that I have to upload to. I would like a clean solution to be able to handle them all. My current solution is now:

conn = boto.s3.connect_to_region(
    region,
    aws_access_key_id=awsAccessKeyID,
    aws_secret_access_key=awsSecretKey,
    calling_format=OrdinaryCallingFormat()
    )

But this requires me to map a region, (us-east-1, us-west-1, etc.) to each bucket, which is something I haven't had to do before. Previously the default calling format worked fine for buckets with dots in their name. Looking at my logs, it seems that starting February 13th, I began getting the ssl.CertificateError error mentioned in the first post of this issue thread for buckets with a dot in their name. Nothing in the code changed, although it's possible that some software on the box got updated.

If my goal is to not require a region with each bucket, is my only option to wait for @krallin 's PR?

@krallin
Copy link
Contributor

krallin commented Mar 6, 2015

@MATTSE ,

Unfortunately that PR doesn't really seem to be making much progress :( It's been lingering there for a while and I haven't really heard back ever.

As explained a bit above, the change might have been caused by you upgrading to Python 2.7.9, which broke a lot of stuff that used to silently work (albeit insecurely!) as far as SSL is concerned.

Cheers,

vitorbaptista added a commit to opentrials/opentrials-airflow that referenced this issue Apr 28, 2017
This sets the remote logs URL to a S3 bucket, making sure our logs persist even
if Airflow's host machine is destroyed. There's a caveat, though: we can't use
buckets with dots in the name (e.g. "datastore.opentrials.net"). This is
because Airflow still uses the older boto (not boto3) that has this issue (see
boto/boto#2836 and
https://issues.apache.org/jira/browse/AIRFLOW-115).

Fixes opentrials/opentrials#763
vitorbaptista added a commit to opentrials/opentrials-airflow that referenced this issue Apr 28, 2017
This sets the remote logs URL to a S3 bucket, making sure our logs persist even
if Airflow's host machine is destroyed. There's a caveat, though: we can't use
buckets with dots in the name (e.g. "datastore.opentrials.net"). This is
because Airflow still uses the older boto (not boto3) that has this issue (see
boto/boto#2836 and
https://issues.apache.org/jira/browse/AIRFLOW-115).

Fixes opentrials/opentrials#763
RevolutionTech pushed a commit to infoscout/boto that referenced this issue May 9, 2017
Although boto disables certificate hostname validation for S3, the
standard library still checks certificates in Python 2.7.9.
@Koff
Copy link

Koff commented May 19, 2017

If you are experiencing this issue in airflow. Go the Admin -> Connection settings for your S3 connection, and add an extra key calling_format to your connection dictionary with boto.s3.connection.OrdinaryCallingFormat as value.

Your Extra field should look like:

{"aws_access_key_id":"_your_key_", "aws_secret_access_key": "_your_secret_", "calling_format": "boto.s3.connection.OrdinaryCallingFormat"}

@tolgahanuzun
Copy link

@YunanHu Thanks.
My problem is solved.

mrterry added a commit to mrterry/qds-sdk-py that referenced this issue Aug 30, 2017
For S3 buckets with dots in them (eg qubole.customer_name), boto cannot
verify ssl certs. The work-around is to use OrdinaryCallingFormat, which
puts the bucket name in the url rather than the domain.

More info in: boto/boto#2836
@benlk
Copy link

benlk commented Oct 24, 2017

If you're looking for a solution that does not depend upon global configs, here's the code for connecting to a bucket in a non-us-east-1 region that has periods in its name:

import boto

def get_bucket( bucket_name ):
    """
    Establish a connection and get an S3 bucket
    """
 
    s3 = boto.s3.connect_to_region(
        'us-east-2',
        host='s3-us-east-2.amazonaws.com', # endpoint name from https://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region
        calling_format=OrdinaryCallingFormat()
    )

    return s3.get_bucket(bucket_name)

@ykhrustalev
Copy link

Based on @oberstet solution but with support of region zones

import ssl

_old_match_hostname = ssl.match_hostname


def remove_dot(host):
    """
    >>> remove_dot('a.x.s3-eu-west-1.amazonaws.com')
    'ax.s3-eu-west-1.amazonaws.com'
    >>> remove_dot('a.s3-eu-west-1.amazonaws.com')
    'a.s3-eu-west-1.amazonaws.com'
    >>> remove_dot('s3-eu-west-1.amazonaws.com')
    's3-eu-west-1.amazonaws.com'
    >>> remove_dot('a.x.s3-eu-west-1.example.com')
    'a.x.s3-eu-west-1.example.com'
    """
    if not host.endswith('.amazonaws.com'):
        return host
    parts = host.split('.')
    h = ''.join(parts[:-3])
    if h:
        h += '.'
    return h + '.'.join(parts[-3:])


def _new_match_hostname(cert, hostname):
    return _old_match_hostname(cert, remove_dot(hostname))


ssl.match_hostname = _new_match_hostname

dustinspecker added a commit to dustinspecker/dockerfiles that referenced this issue Sep 3, 2019
It looks like boto in python 2 does not handle bucket names with periods
in them [1]. An SSl CertificateError is thrown.

Looks like possible solutions are monkey patching, adding a check for
period to determine which S3 Calling Method to use, or update to boto3.

Opted to update to boto3.

[1] boto/boto#2836
dustinspecker added a commit to dustinspecker/dockerfiles that referenced this issue Sep 12, 2019
It looks like boto in python 2 does not handle bucket names with periods
in them [1]. An SSl CertificateError is thrown.

Looks like possible solutions are monkey patching, adding a check for
period to determine which S3 Calling Method to use, or update to boto3.

Opted to update to boto3.

[1] boto/boto#2836
@UrosOgrizovic
Copy link

@benlk Thanks for the function. For future reference, I'm writing to confirm that it's working in Python 3.7.4 and boto 2.49.0.

symroe added a commit to DemocracyClub/UK-Polling-Stations that referenced this issue Mar 1, 2023
GeoWill pushed a commit to DemocracyClub/UK-Polling-Stations that referenced this issue Mar 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests