Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Unicode response message #41

Closed
doanguyen opened this issue Dec 8, 2018 · 4 comments
Closed

Support Unicode response message #41

doanguyen opened this issue Dec 8, 2018 · 4 comments
Assignees

Comments

@doanguyen
Copy link

Hi Cole,

I was reading the test and see the test_live.py::test_qq_login has been marked because of utf-8 unsupported. As far as I investigated, the reason is that qq smtp service return non-ascii (and even non utf-8 chars):

535 Error: \xc7\xeb\xca\xb9\xd3\xc3\xca\xda\xc8\xa8\xc2\xeb\xb5\xc7\xc2\xbc\xa1\xa3\xcf\xea\xc7\xe9\xc7\xeb\xbf\xb4: http://service.mail.qq.com/cgi-bin/help?subtype=1&&id=28&&no=1001256\r\n

I was thinking of 2 posibility to solve this problem:

  1. Using chardet to guess the response message encoding.
  2. Using re to eliminate non-utf8 (ascii) chars since it is the message, we still have response code

What do you think?

@cole
Copy link
Owner

cole commented Dec 8, 2018

Thanks for submitting. Yeah, I'm not sure how best to handle this. IIRC originally we just did UTF-8 decoding, but then the SMTP protocol actually requires ASCII responses, so I changed it to that.

I agree though this is situation where we might as well be liberal with what we accept (as you pointed out, there is still the status code). However:

  1. I don't like using chardet here as it would be nice to avoid requiring any additional libs (currently they are only required for tests).
  2. Stripping non-utf8 chars seems not ideal either, as character encoding issues could leave you with an empty response when there is actually data.

smtplib seems to avoid decoding response test altogether, which is a neat solution, although it would be a bigger change. Maybe it'll look at that though.

@doanguyen
Copy link
Author

Another possibility just come to my mind is that we can just print out the formatted bytecodes that are not fit in utf-8 chars, my temp solution just to get rid of the error is something like:

message = line[4:].strip(b" \t\r\n").decode("ascii", errors='backslashreplace')

Thanks for giving very nice library.

@cole
Copy link
Owner

cole commented Dec 8, 2018

Yeah, I was just looking at that too 😄
I think backslashreplace should allow debugging without errors in most cases. That seems the best to me right now.

Glad you're finding it useful!

@cole cole reopened this Jan 14, 2019
@cole cole self-assigned this Jan 14, 2019
@cole
Copy link
Owner

cole commented Jan 14, 2019

Fixed in 581d718.

@cole cole closed this as completed Jan 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants