Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Glacier: Raise exception on empty archive upload attempt #1083

Closed
wants to merge 1 commit into from

2 participants

@petertodd

Currently you get a weird failure in tree_hash() instead:

Traceback (most recent call last):
  <snip>
  File "/home/pete/src/glacier-cli/boto/glacier/writer.py", line 65, in tree_hash
    return hashes[0]
IndexError: list index out of range

Amazon does not allow empty archives to be uploaded, so raising an exception specific to this error is correct.

FWIW I've also written a (much uglier) patch for glacier-cli to allow you to fake uploads of empty archives so git-annex works.

@petertodd petertodd referenced this pull request in basak/glacier-cli
Merged

Fake the ability to upload an empty archive if requested #7

@jamesls jamesls commented on the diff
boto/glacier/writer.py
@@ -89,6 +90,10 @@ def compute_hashes_from_fileobj(fileobj, chunk_size=1024 * 1024):
linear_hash.update(chunk)
chunks.append(hashlib.sha256(chunk).digest())
chunk = fileobj.read(chunk_size)
+
+ if not chunks:
+ raise EmptyArchiveError()
@jamesls Owner
jamesls added a note

Wouldn't it make more sense for this check to happen up a level? It just seems odd that compute_hashes_from_fileobj raises an EmptyArchiveError. It seems more appropriate for Vault.upload_archive to perform the empty file check and raise an exception if appropriate

It's almost a reasonable thing to do if Vault.upload_archive has been given a file, because with a file it can find out ahead of time how big the file it, but unfortunately it isn't a reasonable thing if it's been given a file descriptor instead. (like stdin) In that case the current code calls compute_hashes_from_fileobj() without being able to do any validation on the file object. To allow Vault.upload_archive() to raise the error you'd have to give compute_hashes_from_fileobj() it's own "empty file" error message, because tree_hash() just doesn't make sense without something to hash. (well, it should have just done sha256('').digest(), but that's not how Amazon defined it)

Even with a file of a known length, someone could still truncate the file between the check of it's length and when we upload it, so putting a similar check in the Writer class as opposed to upload_archive() still makes sense too.

IMO Amazon really should have allowed zero-length uploads...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jamesls jamesls was assigned
@jamesls
Owner

Rereading what you wrote above, it might just make sense to use sha256('').digest() for the empty body case. You'll get an exception that specifies that 0 content length is not allowed. That way if zero length archives ever are allowed, there's nothing we have to do to support this.

@jamesls jamesls closed this pull request from a commit
@jamesls jamesls Merge branch 'glacier-tree-hash' into develop
Fixes #1083.

* glacier-tree-hash:
  Account for empty archives in hash calculations
52f6720
@jamesls jamesls closed this in 52f6720
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Oct 26, 2012
  1. @petertodd
This page is out of date. Refresh to see the latest.
Showing with 12 additions and 0 deletions.
  1. +3 −0  boto/glacier/exceptions.py
  2. +9 −0 boto/glacier/writer.py
View
3  boto/glacier/exceptions.py
@@ -44,3 +44,6 @@ def __init__(self, expected_responses, response):
class UploadArchiveError(Exception):
pass
+
+class EmptyArchiveError(UploadArchiveError):
+ pass
View
9 boto/glacier/writer.py
@@ -27,6 +27,7 @@
import math
import json
+from .exceptions import EmptyArchiveError
_ONE_MEGABYTE = 1024 * 1024
@@ -89,6 +90,10 @@ def compute_hashes_from_fileobj(fileobj, chunk_size=1024 * 1024):
linear_hash.update(chunk)
chunks.append(hashlib.sha256(chunk).digest())
chunk = fileobj.read(chunk_size)
+
+ if not chunks:
+ raise EmptyArchiveError()
@jamesls Owner
jamesls added a note

Wouldn't it make more sense for this check to happen up a level? It just seems odd that compute_hashes_from_fileobj raises an EmptyArchiveError. It seems more appropriate for Vault.upload_archive to perform the empty file check and raise an exception if appropriate

It's almost a reasonable thing to do if Vault.upload_archive has been given a file, because with a file it can find out ahead of time how big the file it, but unfortunately it isn't a reasonable thing if it's been given a file descriptor instead. (like stdin) In that case the current code calls compute_hashes_from_fileobj() without being able to do any validation on the file object. To allow Vault.upload_archive() to raise the error you'd have to give compute_hashes_from_fileobj() it's own "empty file" error message, because tree_hash() just doesn't make sense without something to hash. (well, it should have just done sha256('').digest(), but that's not how Amazon defined it)

Even with a file of a known length, someone could still truncate the file between the check of it's length and when we upload it, so putting a similar check in the Writer class as opposed to upload_archive() still makes sense too.

IMO Amazon really should have allowed zero-length uploads...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+
return linear_hash.hexdigest(), bytes_to_hex(tree_hash(chunks))
@@ -156,6 +161,10 @@ def close(self):
return
if self._buffer_size > 0:
self.send_part()
+
+ if not self._tree_hashes:
+ raise EmptyArchiveError()
+
# Complete the multiplart glacier upload
hex_tree_hash = bytes_to_hex(tree_hash(self._tree_hashes))
response = self.vault.layer1.complete_multipart_upload(self.vault.name,
Something went wrong with that request. Please try again.