Glacier code returning Name or service not known on big files #1055

Closed
jorourke opened this Issue Oct 14, 2012 · 8 comments

Projects

None yet

3 participants

@jorourke
Contributor

In uploading a 10GB archive (tar file verified) it seems the connection method is failing.

Traceback (most recent call last):
File "glacier_upload_tar.py", line 12, in
archive_id = vault.upload_archive(filename)
File "/usr/local/lib/python2.7/dist-packages/boto/glacier/vault.py", line 77, in upload_archive
return self.create_archive_from_file(filename)
File "/usr/local/lib/python2.7/dist-packages/boto/glacier/vault.py", line 140, in create_archive_from_file
writer.write(data)
File "/usr/local/lib/python2.7/dist-packages/boto/glacier/writer.py", line 152, in write
self.send_part()
File "/usr/local/lib/python2.7/dist-packages/boto/glacier/writer.py", line 141, in send_part
content_range, part)
File "/usr/local/lib/python2.7/dist-packages/boto/glacier/layer1.py", line 625, in upload_part
response_headers=response_headers)
File "/usr/local/lib/python2.7/dist-packages/boto/glacier/layer1.py", line 78, in make_request
data=data)
File "/usr/local/lib/python2.7/dist-packages/boto/connection.py", line 910, in make_request
return self._mexe(http_request, sender, override_num_retries)
File "/usr/local/lib/python2.7/dist-packages/boto/connection.py", line 872, in _mexe
raise e
socket.gaierror: [Errno -2] Name or service not known

I get this uploading from a large EC2 instance. My code is:

import boto
import boto.glacier
import sys

GLACIER_VAULT = 'dropbox-photos-vault'
ONE_MB = 1024 * 1024
if name == "main":

    filename = sys.argv[1]
    conn =  boto.glacier.connect_to_region('us-east-1')
    vault = conn.get_vault(GLACIER_VAULT)
    archive_id = vault.upload_archive(filename)

    print 'Archive {0} created'.format(archive_id)
@jorourke jorourke closed this Oct 14, 2012
@jorourke jorourke reopened this Oct 14, 2012
@jorourke
Contributor

After further investigation, I believe that the connection pool associated with a particular (host, is_secure) key just simply continues to grow due to the value of HostConnectionPool._conn_ready(conn) in connection.py always returning false. This continues to grow until we encounter an error. This is a serious issue for large uploads. The connection pooling in connection.py needs improving. Firstly, there is no limit to the number of connections that can be in a single pool. Secondly, there is no closing of connections. I suspect that that code:

def _conn_ready(self, conn):
"""
There is a nice state diagram at the top of httplib.py. It
indicates that once the response headers have been read (which
_mexe does before adding the connection to the pool), a
response is attached to the connection, and it stays there
until it's done reading. This isn't entirely true: even after
the client is done reading, the response may be closed, but
not removed from the connection yet.

    This is ugly, reading a private instance variable, but the
    state we care about isn't available in any public methods.
    """
    if ON_APP_ENGINE:
        # Google AppEngine implementation of HTTPConnection doesn't contain
        # _HTTPConnection__response attribute. Moreover, it's not possible
        # to determine if given connection is ready. Reusing connections
        # simply doesn't make sense with App Engine urlfetch service.
        return False
    else:
        response = getattr(conn, '_HTTPConnection__response', None)
        return (response is None) or response.isclosed()

Needs improvement.

I will look for an elegant fix and prepare a pull request.

@garnaat garnaat closed this Oct 17, 2012
@stigger
stigger commented Nov 6, 2012

The fix does not seem to work -- _conn_ready still always returns False and eventually everything crashes with "Name or service not known".

@jorourke
Contributor
jorourke commented Nov 6, 2012

stigger, can you post any code for me to look at?

@stigger
stigger commented Nov 6, 2012

I'm using glacier-cmd (https://github.com/uskudnik/amazon-glacier-cmd-interface). It could be that the problem is not in boto, but in glacier-cmd itself, however, it just calls writer.write() (GlacierWrapper.py:1086).

@jorourke
Contributor
jorourke commented Nov 6, 2012

Do you know what version of boto it is using?

@stigger
stigger commented Nov 6, 2012

I just checked out the boto/develop branch and tested with it — problem is
still there, the response.read() trick didn't fix it.

@jorourke
Contributor
jorourke commented Nov 6, 2012

Can you post in the stack trace you're seeing, and any other relevant information.

@stigger
stigger commented Nov 6, 2012

Apparently, I was wrong -- they are calling layer1.upload_part directly. I applied that response.read() trick there, and now it works as supposed to.

I'm very sorry for bumping this issue. boto works correctly, this issue is definitely fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment