Skip to content

Conversation

@pzrq
Copy link
Contributor

@pzrq pzrq commented Dec 2, 2014

pzrq added 4 commits December 2, 2014 13:25
…on't need to explain as much magic in an earlier test.
…he chunks, otherwise it turns a 1.6MB file into more than 400MB (when I gave up and killed the request).
@pzrq pzrq changed the title Python 3 Error: "Invalid argument %r for b()" when using upload_object_via_stream [LIBCLOUD-639] Python 3 Error: "Invalid argument %r for b()" when using upload_object_via_stream Dec 2, 2014
@@ -742,7 +742,8 @@ def _stream_data(self, response, iterator, chunked=False,
if calculate_hash:
data_hash = self._get_hash_function()

generator = libcloud.utils.files.read_in_chunks(iterator, chunk_size)
generator = libcloud.utils.files.read_in_chunks(iterator, chunk_size,
fill_size=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the fill_size change necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer to my commit message:
"Don't send one byte at a time, read the iterator in chunks and fill the chunks, otherwise it turns a 1.6MB file into more than 400MB (when I gave up and killed the request)."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more explicit, I tested this by watching the python3.4 process in the OSX Activity Monitor > Network > Sent Bytes. Without this fix I saw over 400MB uploaded (when I gave up, debugging later revealed the process sending the bytes one at a time, so probably one per packet with all the associated overheads).

With this fix, the process only uploaded a little over the expected 1.6MB for my test file, which makes sense with typical network overheads (wrapping in packets, <1500 Bytes MTU meaning some fragmentation, etc).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked at it again in a debugger and it appears this only affects storage drivers with supports_chunked_encoding = True, i.e. the Atmos and Rackspace drivers at this time.
https://github.com/apache/libcloud/blob/ea16988/libcloud/storage/base.py#L634-636

Otherwise it uses the else branch directly below and consumes the entire iterator as already documented:
https://github.com/apache/libcloud/blob/ea16988/libcloud/storage/base.py#L400-408

EDIT: Changed links to include a gitref so they should be stable over time.

@Kami
Copy link
Member

Kami commented Dec 2, 2014

Can you please paste the whole exception with a stack trace?

I'm wondering where an int gets passed to the b function. I'm thinking this might be a programmer error (int shouldn't get passed to the b function), but I can't be 100% sure without seeing a full stack trace.

@pzrq
Copy link
Contributor Author

pzrq commented Dec 2, 2014

Stack trace from running the test:

/usr/local/Cellar/python3/3.4.2_1/bin/python3 /Applications/PyCharm.app/Contents/helpers/pycharm/utrunner.py /Users/pzrq/Projects/libcloud/libcloud/test/storage/test_cloudfiles.py::CloudFilesTests::test_upload_object_via_stream_python3_bytes_error true
Testing started at 10:51 PM ...

Error
Traceback (most recent call last):
  File "/Users/pzrq/Projects/libcloud/libcloud/test/storage/test_cloudfiles.py", line 688, in test_upload_object_via_stream_python3_bytes_error
    object_name="img_or_vid",
  File "/Users/pzrq/Projects/libcloud/libcloud/storage/base.py", line 157, in upload_object_via_stream
    iterator, self, object_name, extra=extra, **kwargs)
  File "/Users/pzrq/Projects/libcloud/libcloud/storage/drivers/cloudfiles.py", line 445, in upload_object_via_stream
    headers=headers)
  File "/Users/pzrq/Projects/libcloud/libcloud/storage/drivers/cloudfiles.py", line 779, in _put_object
    headers=headers, file_path=file_path, iterator=iterator)
  File "/Users/pzrq/Projects/libcloud/libcloud/storage/base.py", line 660, in _upload_object
    **upload_func_kwargs)
  File "/Users/pzrq/Projects/libcloud/libcloud/storage/base.py", line 749, in _stream_data
    chunk = next(generator)
  File "/Users/pzrq/Projects/libcloud/libcloud/utils/files.py", line 74, in read_in_chunks
    chunk = b(get_data(*args))
  File "/Users/pzrq/Projects/libcloud/libcloud/utils/py3.py", line 89, in b
    raise TypeError("Invalid argument %r for b()" % (s,))
TypeError: Invalid argument 98 for b()

AFAIK that's how Python3 works with iter(b'some bytes') from my understanding of http://stackoverflow.com/a/21017834 and the related question.

However I did notice I only updated one of the three usages of read_in_chunks() in libcloud/storage/base.py and checked it worked for me uploading a file and that I could get the same file (matching hash, image visually looked the same) back from Rackspace's Cloudfiles API, so if you can pass the test in a cleaner way I'm all ears :)

@pzrq
Copy link
Contributor Author

pzrq commented Dec 2, 2014

Or to put it another way, at a raw python 3.4 console:

pzrq@icebox:~$ python3
Python 3.4.2 (default, Oct 19 2014, 17:52:17) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.51)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> for x in iter(b'hi'):
...   print(x)
... 
104
105

@pzrq
Copy link
Contributor Author

pzrq commented Dec 8, 2014

@Kami Added a regression test for the fill_size=True change, please let me know if you have any other concerns.

@Kami
Copy link
Member

Kami commented Dec 9, 2014

@pzrq Sorry for the delay.

I was traveling a week ago so I missed your comments. I will go over it and get it merged into trunk asap.

@asfgit asfgit closed this in 8fe3e40 Dec 9, 2014
@Kami
Copy link
Member

Kami commented Dec 9, 2014

Merged into trunk.

Thanks!

@Kami
Copy link
Member

Kami commented Dec 9, 2014

Also forgot to say - good catch and improvement on the fill_size thing.

@pzrq
Copy link
Contributor Author

pzrq commented Dec 9, 2014

@Kami Thanks for merging. I hope you're enjoying traveling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants