Skip to content
This repository has been archived by the owner on Mar 11, 2022. It is now read-only.

Fix Document.get_attachment bug #106

Merged
merged 2 commits into from Mar 7, 2016
Merged

Conversation

alfinkel
Copy link
Contributor

@alfinkel alfinkel commented Mar 3, 2016

What

Fix the Document get_attachment method so that it now correctly creates text and binary files as exepected as well as returns text, binary, and json content appropriately.

How

  • Make the attachment_type method argument optional.
  • Add logic to figure out what type of content should be returned by the method as well as what kind of file should be created if a write_to argument is provided.

Testing

  • Add a test to verify getting a text attachment and writing it to a file.
  • Add a test to verify getting a json attachment and writing it to a file.
  • Add a test to verify getting a binary attachment and writing it to a file.

Reviewers

reviewer: @emlaver
reviewer: @ricellis

Issues

@emlaver
Copy link
Member

emlaver commented Mar 4, 2016

+1

@ricellis
Copy link
Member

ricellis commented Mar 4, 2016

What happens in the case that there is an attachment which is text/plain but uses a charset other than unicode? e.g. Content-type: text/plain; charset=IBM038
If using attachment_type=None it will assume text which will then hit write_to.write(resp.text), but what charset will that use? I'm guessing probably unicode by default?

@alfinkel
Copy link
Contributor Author

alfinkel commented Mar 4, 2016

re: #106 (comment)

It will (now) use utf-8 to encode. Done in b0b522c. Thanks for catching that.

@ricellis
Copy link
Member

ricellis commented Mar 4, 2016

I think b0b522c is ok for the JSON case because JSON has to be unicode.
However, I still think this might be broken for a text attachment that is in another encoding. Don't we need to read the charset type from the header to either read it in the correct charset or convert it to unicode?

@alfinkel
Copy link
Contributor Author

alfinkel commented Mar 4, 2016

The Python Requests module handles decoding/encoding. See http://docs.python-requests.org/en/master/user/quickstart/#response-content. Based on this I should probably remove the b0b522c commit.

@alfinkel
Copy link
Contributor Author

alfinkel commented Mar 4, 2016

Calling resp.text returns the encoded content.

@ricellis
Copy link
Member

ricellis commented Mar 4, 2016

Ah ok, thanks for investigating. Would you mind adding a test that uses a text attachment with some other charset, just to validate that the behaviour is as we expect?

@alfinkel
Copy link
Contributor Author

alfinkel commented Mar 4, 2016

I don't think that there is much upside in creating a test to verify that the Python Requests module is behaving as it is supposed to. At best I think all that it would prove is that it works for the charset chosen for that test. If the concern is that a charset may not be handled correctly by Requests then there is a route for a user of this method to take, which is to provide the attachment_type='binary' argument to the method call and subsequently take the returned raw response content and encode it however they want to before writing it to a file.

@ricellis
Copy link
Member

ricellis commented Mar 7, 2016

ok +1

- Change method signature and make attachment_type argument optional.
- Add logic to figure out whether attachment should returned as
text, json, or binary.
- Add additional get attachment tests
alfinkel added a commit that referenced this pull request Mar 7, 2016
@alfinkel alfinkel merged commit 763e8d3 into master Mar 7, 2016
@toddreed
Copy link

toddreed commented Mar 7, 2016

I still think that the API for get_attachment() is flawed because of the dependency between the write_to parameter and the attachment_type. Suppose the content type of the attachment is not known to the caller: they pass None for attachment_type so that get_attachment() will choose. The problem is that the caller needs to provide a write_to that is compatible. If the content type is text, then a text stream needs to be opened; if the content type is binary, a binary stream needs to be opened:

with open('attachment', 'w') as f: # will fail if attachment is binary because f expects str not bytes
    doc.get_attachment('attachment', write_to=f)

I think that a get_attachment() API needs to provide the data and the content type. (For example, in my case my content type is image/*, and I want to know is it image/png or image/jpg?)

@alfinkel
Copy link
Contributor Author

alfinkel commented Mar 7, 2016

Reopening issue #102 as a question and continuing the conversation there.

@ricellis ricellis deleted the 102-fix-get-attachment-bug branch August 3, 2016 10:23
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants