[LIBCLOUD-639] Python 3 Error: "Invalid argument %r for b()" when using upload_object_via_stream #408

pzrq · 2014-12-02T02:27:22Z

https://issues.apache.org/jira/browse/LIBCLOUD-639

…on't need to explain as much magic in an earlier test.

…he chunks, otherwise it turns a 1.6MB file into more than 400MB (when I gave up and killed the request).

Kami · 2014-12-02T11:16:44Z

libcloud/storage/base.py

@@ -742,7 +742,8 @@ def _stream_data(self, response, iterator, chunked=False,
        if calculate_hash:
            data_hash = self._get_hash_function()

-        generator = libcloud.utils.files.read_in_chunks(iterator, chunk_size)
+        generator = libcloud.utils.files.read_in_chunks(iterator, chunk_size,
+                                                        fill_size=True)


Why is the fill_size change necessary?

Refer to my commit message:
"Don't send one byte at a time, read the iterator in chunks and fill the chunks, otherwise it turns a 1.6MB file into more than 400MB (when I gave up and killed the request)."

To be more explicit, I tested this by watching the python3.4 process in the OSX Activity Monitor > Network > Sent Bytes. Without this fix I saw over 400MB uploaded (when I gave up, debugging later revealed the process sending the bytes one at a time, so probably one per packet with all the associated overheads).

With this fix, the process only uploaded a little over the expected 1.6MB for my test file, which makes sense with typical network overheads (wrapping in packets, <1500 Bytes MTU meaning some fragmentation, etc).

Looked at it again in a debugger and it appears this only affects storage drivers with supports_chunked_encoding = True, i.e. the Atmos and Rackspace drivers at this time.
https://github.com/apache/libcloud/blob/ea16988/libcloud/storage/base.py#L634-636

Otherwise it uses the else branch directly below and consumes the entire iterator as already documented:
https://github.com/apache/libcloud/blob/ea16988/libcloud/storage/base.py#L400-408

EDIT: Changed links to include a gitref so they should be stable over time.

Kami · 2014-12-02T11:23:03Z

Can you please paste the whole exception with a stack trace?

I'm wondering where an int gets passed to the b function. I'm thinking this might be a programmer error (int shouldn't get passed to the b function), but I can't be 100% sure without seeing a full stack trace.

pzrq · 2014-12-02T12:02:04Z

Stack trace from running the test:

/usr/local/Cellar/python3/3.4.2_1/bin/python3 /Applications/PyCharm.app/Contents/helpers/pycharm/utrunner.py /Users/pzrq/Projects/libcloud/libcloud/test/storage/test_cloudfiles.py::CloudFilesTests::test_upload_object_via_stream_python3_bytes_error true
Testing started at 10:51 PM ...

Error
Traceback (most recent call last):
  File "/Users/pzrq/Projects/libcloud/libcloud/test/storage/test_cloudfiles.py", line 688, in test_upload_object_via_stream_python3_bytes_error
    object_name="img_or_vid",
  File "/Users/pzrq/Projects/libcloud/libcloud/storage/base.py", line 157, in upload_object_via_stream
    iterator, self, object_name, extra=extra, **kwargs)
  File "/Users/pzrq/Projects/libcloud/libcloud/storage/drivers/cloudfiles.py", line 445, in upload_object_via_stream
    headers=headers)
  File "/Users/pzrq/Projects/libcloud/libcloud/storage/drivers/cloudfiles.py", line 779, in _put_object
    headers=headers, file_path=file_path, iterator=iterator)
  File "/Users/pzrq/Projects/libcloud/libcloud/storage/base.py", line 660, in _upload_object
    **upload_func_kwargs)
  File "/Users/pzrq/Projects/libcloud/libcloud/storage/base.py", line 749, in _stream_data
    chunk = next(generator)
  File "/Users/pzrq/Projects/libcloud/libcloud/utils/files.py", line 74, in read_in_chunks
    chunk = b(get_data(*args))
  File "/Users/pzrq/Projects/libcloud/libcloud/utils/py3.py", line 89, in b
    raise TypeError("Invalid argument %r for b()" % (s,))
TypeError: Invalid argument 98 for b()

AFAIK that's how Python3 works with iter(b'some bytes') from my understanding of http://stackoverflow.com/a/21017834 and the related question.

However I did notice I only updated one of the three usages of read_in_chunks() in libcloud/storage/base.py and checked it worked for me uploading a file and that I could get the same file (matching hash, image visually looked the same) back from Rackspace's Cloudfiles API, so if you can pass the test in a cleaner way I'm all ears :)

pzrq · 2014-12-02T12:04:32Z

Or to put it another way, at a raw python 3.4 console:

pzrq@icebox:~$ python3
Python 3.4.2 (default, Oct 19 2014, 17:52:17) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.51)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> for x in iter(b'hi'):
...   print(x)
... 
104
105

pzrq · 2014-12-08T01:27:17Z

@Kami Added a regression test for the fill_size=True change, please let me know if you have any other concerns.

Kami · 2014-12-09T12:00:44Z

@pzrq Sorry for the delay.

I was traveling a week ago so I missed your comments. I will go over it and get it merged into trunk asap.

Kami · 2014-12-09T12:44:20Z

Merged into trunk.

Thanks!

Kami · 2014-12-09T13:03:06Z

Also forgot to say - good catch and improvement on the fill_size thing.

pzrq · 2014-12-09T23:47:05Z

@Kami Thanks for merging. I hope you're enjoying traveling.

pzrq added 4 commits December 2, 2014 13:25

[LIBCLOUD-639] Add erroring test for this Python 3 bytes issue so I d…

f73abb1

…on't need to explain as much magic in an earlier test.

Fix test under Python 3.4 at least.

7eaa0a6

http://stackoverflow.com/a/21017834

Don't send one byte at a time, read the iterator in chunks and fill t…

fe6cca9

…he chunks, otherwise it turns a 1.6MB file into more than 400MB (when I gave up and killed the request).

Typo missing i.

776f6e5

pzrq changed the title ~~Python 3 Error: "Invalid argument %r for b()" when using upload_object_via_stream~~ [LIBCLOUD-639] Python 3 Error: "Invalid argument %r for b()" when using upload_object_via_stream Dec 2, 2014

Kami reviewed Dec 2, 2014
View reviewed changes

Add regression test for fill_size=True change in fe6cca9

3bf7af1

asfgit closed this in 8fe3e40 Dec 9, 2014

[LIBCLOUD-639] Python 3 Error: "Invalid argument %r for b()" when using upload_object_via_stream #408

[LIBCLOUD-639] Python 3 Error: "Invalid argument %r for b()" when using upload_object_via_stream #408

Uh oh!

Conversation

pzrq commented Dec 2, 2014

Uh oh!

Kami Dec 2, 2014

Choose a reason for hiding this comment

Uh oh!

pzrq Dec 3, 2014

Choose a reason for hiding this comment

Uh oh!

pzrq Dec 3, 2014

Choose a reason for hiding this comment

Uh oh!

pzrq Dec 3, 2014

Choose a reason for hiding this comment

Uh oh!

Kami commented Dec 2, 2014

Uh oh!

pzrq commented Dec 2, 2014

Uh oh!

pzrq commented Dec 2, 2014

Uh oh!

pzrq commented Dec 8, 2014

Uh oh!

Kami commented Dec 9, 2014

Uh oh!

Kami commented Dec 9, 2014

Uh oh!

Kami commented Dec 9, 2014

Uh oh!

pzrq commented Dec 9, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants