Document.get_attachment() fails with binary data and when write_to is not None #102

toddreed · 2016-02-28T22:07:36Z

Using 2.0.0b2 with Python 3.5.0 and connecting to CouchDB 1.6.1.

I'm trying to access an attachment with this code:

image = Document(database, result['id'])
image_filename = '{0}.jpg'.format(image['_id'])

with open(image_filename, 'wb') as file:
    image.get_attachment('image', write_to=file, attachment_type='binary')

This produces the following exception:

  File "[redacted]/src/python/lemonaid.py", line 97, in seedannotations
    image.get_attachment('image', write_to=file, attachment_type='binary')
  File "[redacted]/lib/python3.5/site-packages/cloudant/document.py", line 396, in get_attachment
    write_to.write(resp.raw)
TypeError: a bytes-like object is required, not 'HTTPResponse'

In the debugger, the type of resp.raw is reported as <class 'requests.packages.urllib3.response.HTTPResponse'>.

Also, passing binary for attachment_type is not supported as documented. The get_attachment() method attempts to convert binary data to Unicode:

# from get_attachment...
        if attachment_type == 'json':
            return resp.json()
        return unicode_(resp.content)

When the response is binary, the last line above produces this exception:

  File "[redacted]/python3.5/site-packages/cloudant/document.py", line 400, in get_attachment
    return unicode_(resp.content)
  File "[redacted]/lib/python3.5/site-packages/cloudant/_2to3.py", line 95, in unicode_
    return astr.decode(ENCODING) if hasattr(astr, 'decode') else astr
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

I would also argue that the attachment_type argument is ill-conceived. What if the caller does not know whether the attachment is binary or text. Ideally, the method should expose the content-type of the attachment rather than presume the caller knows. (In my case, my image attachment could be either PNG or JPEG, but since get_attachment() does not provide the content-type I'm guessing in my code right now.)

The text was updated successfully, but these errors were encountered:

alfinkel · 2016-02-29T23:16:19Z

This does appear to be a bug. We'll take a closer look and resolve.

alfinkel · 2016-03-07T14:36:04Z

The write_to functionality has been fixed and the attachment will be handled based on the response Content-Type now. The attachment_type argument is now optional so that the caller can override the attachment handling logic if necessary.

alfinkel · 2016-03-07T18:15:48Z

Re: #106 (comment)

If I understand correctly, the functionality you are looking for is actually already available. The content type of any attachment is stored in a document's "_attachments" metadata. So you should either already have that information before calling get_attachment() or you will have that information as part of the call to get_attachment().

The Document content would look something like:

{
  "_id": "foo",
  "_rev": ...,
  ...
  "_attachments": {
    "bar": {
      "content_type": "image/jpeg",
      ...
    }
  }
}

If you already have a fetched version of the Document locally then you could find the the content_type and have your code react accordingly.

As in:

doc = db['foo']
content_type = doc['_attachments']['bar']['content_type']
# Handle your file settings and call get_attachment
...

If you are working with a Document object that has not been fetched from the server then get_attachment() performs a fetch of that document as an initial step. In this case you would not use the write_to argument and instead set the attachment_type='binary' which will return the raw data of the attachment. Then your code can interrogate the _attachments content_type and react accordingly to process the raw data returned by the method:

As in:

doc = Document(db, 'foo')
data = doc.get_attachments('bar', attachment_type='binary')
content_type = doc['_attachments']['bar']['content_type']
# Handle your file settings and process "data" as necessary
...

I hope this helps.

toddreed · 2016-03-07T18:30:38Z

Yes, that helps. Thanks for showing how the content type can be obtained.

alfinkel added the bug label Feb 29, 2016

alfinkel added this to the 2.0.0 milestone Feb 29, 2016

alfinkel mentioned this issue Mar 3, 2016

Fix Document.get_attachment bug #106

Merged

alfinkel closed this as completed Mar 7, 2016

alfinkel reopened this Mar 7, 2016

alfinkel added question and removed bug labels Mar 7, 2016

alfinkel closed this as completed Mar 7, 2016

alfinkel added the bug label Mar 7, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document.get_attachment() fails with binary data and when write_to is not None #102

Document.get_attachment() fails with binary data and when write_to is not None #102

toddreed commented Feb 28, 2016

alfinkel commented Feb 29, 2016

alfinkel commented Mar 7, 2016

alfinkel commented Mar 7, 2016

toddreed commented Mar 7, 2016

Document.get_attachment() fails with binary data and when write_to is not None #102

Document.get_attachment() fails with binary data and when write_to is not None #102

Comments

toddreed commented Feb 28, 2016

alfinkel commented Feb 29, 2016

alfinkel commented Mar 7, 2016

alfinkel commented Mar 7, 2016

toddreed commented Mar 7, 2016