Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancements #84

Merged
merged 12 commits into from
Feb 15, 2021
Merged

Enhancements #84

merged 12 commits into from
Feb 15, 2021

Conversation

nitishkansal
Copy link
Contributor

  1. Few enhancements in Regex used to parse the header values.
  2. Get the filename using python email module instead of getting it from content id or content disposition as we have seen some email services don't follow the standard and content id is not present at all in the attachment part.
  3. Stopped some noise in error logging as that error was already handled but it was still adding some unnecessary noise in the logs, Ideally there should be logger setting which can be used to suppress some specific log levels.
  4. Change the way how attachment is identified as we have seen some email services send the email with html and plain text with content id and they are also treated as attachment but they should be treated as email body.
  5. Change the way how payload is handled, because when get_payload() decodes 7bit or 8bit or without any Content-Transfer-Encoding then it encodes them with raw-unicode-escape which leave the message unreadable and makes it some gibberish characters. so when we call get_payload() with decode, we are again checking if Content-Transfer-Encoding was one of the culprits then we decode them again with raw-unicode-escape so that we get the message as it was sent before sending it to ported_string().

We checked the test suite as well added in the library but that test suite just check if email parsing is working fine or not but it does not check if encoding was maintained or not. So test suite is working but emails were left with gibberish characters.

PS: We have been using this library for almost 8-9 months now and it has been great to use. But we were still facing encoding issues so we had to investigate and made some changes to library and we have been using these updates for almost 6-7 months now and now our encoding complaints are reduced by 95%-96% Main problems we were facing with unicode characters and different Content-Type, which is mostly resolved now.

… of getting it from content id or content disposition as some of the emails are breaking because they send body with content-id
…m by mail.domain.com with esmtp envelope-from <support@domain.com> id 1jt7Nz-0000Da-by for xyz@domain.com; Wed, 08 Jul 2020 10:33:11 +0000
…id by mail.domain.com with esmtps TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256 envelope-from <local@domain.co.id> id 1jr0oT-0006e6-Mx for local@domain.com; Thu, 02 Jul 2020 15:07:51 +0000
…6 by smtpd.kaskus.co.id Postfix with ESMTP id 8C02C2E063E for <formail@ctemplar.com>; Wed, 8 Jul 2020 18:40:03 +0700 WIB
…om with XMail 1.2 password ESMTP Server id <S000000> for <local@domain.com> from <local@domain.com>; Mon, 6 Jul 2020 01:09:35 +0900
…ught and handled already and we dont even need to parse this header, just stopping some noise in our error logger
… all encoding which are not decoded properly by python
@The-Hidden-Hand
Copy link

@fedelemantuano
Hello, are you accepting pull requests?

@fedelemantuano
Copy link
Contributor

The PR look good. I'm looking inside it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants