Glacier code returning Name or service not known on big files #1055

jorourke opened this Issue Oct 14, 2012 · 8 comments


None yet

3 participants


In uploading a 10GB archive (tar file verified) it seems the connection method is failing.

Traceback (most recent call last):
File "", line 12, in
archive_id = vault.upload_archive(filename)
File "/usr/local/lib/python2.7/dist-packages/boto/glacier/", line 77, in upload_archive
return self.create_archive_from_file(filename)
File "/usr/local/lib/python2.7/dist-packages/boto/glacier/", line 140, in create_archive_from_file
File "/usr/local/lib/python2.7/dist-packages/boto/glacier/", line 152, in write
File "/usr/local/lib/python2.7/dist-packages/boto/glacier/", line 141, in send_part
content_range, part)
File "/usr/local/lib/python2.7/dist-packages/boto/glacier/", line 625, in upload_part
File "/usr/local/lib/python2.7/dist-packages/boto/glacier/", line 78, in make_request
File "/usr/local/lib/python2.7/dist-packages/boto/", line 910, in make_request
return self._mexe(http_request, sender, override_num_retries)
File "/usr/local/lib/python2.7/dist-packages/boto/", line 872, in _mexe
raise e
socket.gaierror: [Errno -2] Name or service not known

I get this uploading from a large EC2 instance. My code is:

import boto
import boto.glacier
import sys

GLACIER_VAULT = 'dropbox-photos-vault'
ONE_MB = 1024 * 1024
if name == "main":

    filename = sys.argv[1]
    conn =  boto.glacier.connect_to_region('us-east-1')
    vault = conn.get_vault(GLACIER_VAULT)
    archive_id = vault.upload_archive(filename)

    print 'Archive {0} created'.format(archive_id)
@jorourke jorourke closed this Oct 14, 2012
@jorourke jorourke reopened this Oct 14, 2012

After further investigation, I believe that the connection pool associated with a particular (host, is_secure) key just simply continues to grow due to the value of HostConnectionPool._conn_ready(conn) in always returning false. This continues to grow until we encounter an error. This is a serious issue for large uploads. The connection pooling in needs improving. Firstly, there is no limit to the number of connections that can be in a single pool. Secondly, there is no closing of connections. I suspect that that code:

def _conn_ready(self, conn):
There is a nice state diagram at the top of It
indicates that once the response headers have been read (which
_mexe does before adding the connection to the pool), a
response is attached to the connection, and it stays there
until it's done reading. This isn't entirely true: even after
the client is done reading, the response may be closed, but
not removed from the connection yet.

    This is ugly, reading a private instance variable, but the
    state we care about isn't available in any public methods.
        # Google AppEngine implementation of HTTPConnection doesn't contain
        # _HTTPConnection__response attribute. Moreover, it's not possible
        # to determine if given connection is ready. Reusing connections
        # simply doesn't make sense with App Engine urlfetch service.
        return False
        response = getattr(conn, '_HTTPConnection__response', None)
        return (response is None) or response.isclosed()

Needs improvement.

I will look for an elegant fix and prepare a pull request.

@garnaat garnaat closed this Oct 17, 2012
stigger commented Nov 6, 2012

The fix does not seem to work -- _conn_ready still always returns False and eventually everything crashes with "Name or service not known".

jorourke commented Nov 6, 2012

stigger, can you post any code for me to look at?

stigger commented Nov 6, 2012

I'm using glacier-cmd ( It could be that the problem is not in boto, but in glacier-cmd itself, however, it just calls writer.write() (

jorourke commented Nov 6, 2012

Do you know what version of boto it is using?

stigger commented Nov 6, 2012

I just checked out the boto/develop branch and tested with it — problem is
still there, the trick didn't fix it.

jorourke commented Nov 6, 2012

Can you post in the stack trace you're seeing, and any other relevant information.

stigger commented Nov 6, 2012

Apparently, I was wrong -- they are calling layer1.upload_part directly. I applied that trick there, and now it works as supposed to.

I'm very sorry for bumping this issue. boto works correctly, this issue is definitely fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment