Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing decode error #405

Closed
wants to merge 6 commits into from
Closed

Fixing decode error #405

wants to merge 6 commits into from

Conversation

epifanio
Copy link

Fixing issue with UTF-8 strings when

decoding = self.charset or self.default_body_encoding

passes the wrong decoding option.

adding a try / except statement when the body.decode fails in detecting the right encoding
fixing bare except
Copy link
Member

@stevepiercy stevepiercy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial triage. Still needs further review from maintainers.

.gitignore Outdated Show resolved Hide resolved
try:
return body.decode(decoding, self.unicode_errors)
except UnicodeDecodeError:
return body.decode("UTF-8", self.unicode_errors)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes test coverage to fail. 100% test coverage is required in PRs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am not familiar with coverage test, trying to learn. I had a look at the code in test_response.py
which seems the part of coverage addressing the function I modified.
do you have any hints on what needs to be added, in order to fix the coverage issue?
Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add a test that covers that which is not covered, specifically the except block.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@epifanio to save time, you can also run tests locally before committing and pushing to this PR by installing tox and using tox -e py27,py36,coverage. Let me know if you need help with that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got the tox -e py27,py36,coverage working. From the test coverage codei see some clue can come from the following lines https://github.com/Pylons/webob/blob/master/tests/test_response.py#L758-L787. — it is my understanding I should add a new test method like def test_text_get_wrong_encoding(). To try to cover the case when a wrong encoding is passed to _text_get method. Is that correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're on the right path. I would narrow it down by running a specific test in debug mode, and step through, to see which lines are executed. Saves guesswork.

As far as its name, naming is hard. Maybe incorporate the method name, e.g., text_get_body_unicode_decode_error or something? Although longer, it gives hints to what it tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not fall back to attempting to decode with UTF-8 if the remote client has sent a charset.

IDE specific instructions should go in a personal `.gitignore` file.
@@ -18,3 +18,4 @@ WebOb.egg-info/
pytest*.xml
coverage*.xml
.pytest_cache/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't update the .gitignore, I would ask you to please rebase your changes to avoid touching this file at all.

try:
return body.decode(decoding, self.unicode_errors)
except UnicodeDecodeError:
return body.decode("UTF-8", self.unicode_errors)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not fall back to attempting to decode with UTF-8 if the remote client has sent a charset.

@digitalresistor
Copy link
Member

I am sorry, but I am going to reject this change. If the remote client has sent a charset, we shouldn't attempt to guess and fall back to decoding UTF-8. If the remote has not set a charset, we are correctly using the self.default_body_encoding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants