Skip to content
This repository has been archived by the owner on Mar 11, 2022. It is now read-only.

Document.get_attachment() fails with binary data and when write_to is not None #102

Closed
toddreed opened this issue Feb 28, 2016 · 4 comments
Milestone

Comments

@toddreed
Copy link

Using 2.0.0b2 with Python 3.5.0 and connecting to CouchDB 1.6.1.

I'm trying to access an attachment with this code:

image = Document(database, result['id'])
image_filename = '{0}.jpg'.format(image['_id'])

with open(image_filename, 'wb') as file:
    image.get_attachment('image', write_to=file, attachment_type='binary')

This produces the following exception:

  File "[redacted]/src/python/lemonaid.py", line 97, in seedannotations
    image.get_attachment('image', write_to=file, attachment_type='binary')
  File "[redacted]/lib/python3.5/site-packages/cloudant/document.py", line 396, in get_attachment
    write_to.write(resp.raw)
TypeError: a bytes-like object is required, not 'HTTPResponse'

In the debugger, the type of resp.raw is reported as <class 'requests.packages.urllib3.response.HTTPResponse'>.

Also, passing binary for attachment_type is not supported as documented. The get_attachment() method attempts to convert binary data to Unicode:

# from get_attachment...
        if attachment_type == 'json':
            return resp.json()
        return unicode_(resp.content)

When the response is binary, the last line above produces this exception:

  File "[redacted]/python3.5/site-packages/cloudant/document.py", line 400, in get_attachment
    return unicode_(resp.content)
  File "[redacted]/lib/python3.5/site-packages/cloudant/_2to3.py", line 95, in unicode_
    return astr.decode(ENCODING) if hasattr(astr, 'decode') else astr
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

I would also argue that the attachment_type argument is ill-conceived. What if the caller does not know whether the attachment is binary or text. Ideally, the method should expose the content-type of the attachment rather than presume the caller knows. (In my case, my image attachment could be either PNG or JPEG, but since get_attachment() does not provide the content-type I'm guessing in my code right now.)

@alfinkel
Copy link
Contributor

This does appear to be a bug. We'll take a closer look and resolve.

@alfinkel alfinkel added the bug label Feb 29, 2016
@alfinkel alfinkel added this to the 2.0.0 milestone Feb 29, 2016
@alfinkel
Copy link
Contributor

alfinkel commented Mar 7, 2016

The write_to functionality has been fixed and the attachment will be handled based on the response Content-Type now. The attachment_type argument is now optional so that the caller can override the attachment handling logic if necessary.

@alfinkel alfinkel closed this as completed Mar 7, 2016
@alfinkel alfinkel reopened this Mar 7, 2016
@alfinkel alfinkel added question and removed bug labels Mar 7, 2016
@alfinkel
Copy link
Contributor

alfinkel commented Mar 7, 2016

Re: #106 (comment)

If I understand correctly, the functionality you are looking for is actually already available. The content type of any attachment is stored in a document's "_attachments" metadata. So you should either already have that information before calling get_attachment() or you will have that information as part of the call to get_attachment().

The Document content would look something like:

{
  "_id": "foo",
  "_rev": ...,
  ...
  "_attachments": {
    "bar": {
      "content_type": "image/jpeg",
      ...
    }
  }
}

If you already have a fetched version of the Document locally then you could find the the content_type and have your code react accordingly.

As in:

doc = db['foo']
content_type = doc['_attachments']['bar']['content_type']
# Handle your file settings and call get_attachment
...

If you are working with a Document object that has not been fetched from the server then get_attachment() performs a fetch of that document as an initial step. In this case you would not use the write_to argument and instead set the attachment_type='binary' which will return the raw data of the attachment. Then your code can interrogate the _attachments content_type and react accordingly to process the raw data returned by the method:

As in:

doc = Document(db, 'foo')
data = doc.get_attachments('bar', attachment_type='binary')
content_type = doc['_attachments']['bar']['content_type']
# Handle your file settings and process "data" as necessary
...

I hope this helps.

@toddreed
Copy link
Author

toddreed commented Mar 7, 2016

Yes, that helps. Thanks for showing how the content type can be obtained.

@alfinkel alfinkel closed this as completed Mar 7, 2016
@alfinkel alfinkel added the bug label Mar 7, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants