Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mails with blocks added after underscore are not correctly managed #13

Closed
jbaranguan opened this issue Apr 12, 2023 · 10 comments
Closed

Comments

@jbaranguan
Copy link

jbaranguan commented Apr 12, 2023

Hi,

Your lib is great! Thank you!

Nevertheless I have an issue when I parse a forwarded message containing an automatically insterted block that is inserted in the end following multiple "_".

A reproducer:

I transfer you that mail.

De : Jorge BARANGUAN <baranguan@hotmail.com>
Envoyé : jeudi 6 avril 2023 16:17
À : Jorge BARANGUAN <jorge.baranguan@iwecloud.com>
Objet : ***URGENT** 9673155358 nos réf


MY body email...
  ________________________________
  This email (including any attachments) is intended for the designated recipient(s) only, and may be confidential, non-public, proprietary, and/or protected by the attorney-client or other privilege. Unauthorized reading, distribution, copying or other use of this communication is prohibited and may be unlawful. Receipt by anyone other than the intended recipient(s) should not be deemed a waiver of any privilege or protection. If you are not the intended recipient or if you believe that you have received this email in error, please notify the sender immediately and delete all copies from your computer system without reading, saving, printing, forwarding or using it in any manner. Although it has been checked for viruses and other malicious software (\"malware\"), we do not warrant, represent or guarantee in any way that this communication is free of malware or potentially damaging defects. All liability for any actual or alleged loss, damage, or injury arising out of or resulting in any way from the receipt, opening or use of this email is expressly disclaimed.

When performing new EmailForwardParser().read(mailBody, "***URGENT** 9673155358 nos réf"), the lib detects the part after the ____ (This email (including any attachments) is intended for the designated recipient(s) only...) as the forwarded email, hence I cannot extract the from/to information.

Do you think that it could be fixed by removing this groups of _ characters before parsing?

@eliottvincent
Copy link
Member

Hey Jorge! Can you please provide me with the full export of that email? With headers etc. Like an .EML file or even .txt.
Furthermore, from what email client was the email forwarded?

@jbaranguan
Copy link
Author

I'm afraid that I cannot provide you the exact full export of the email because it contains personal data of our clients. I made a first round of anonymization of the content to try to remove some personal data.

The email was forwarded from Outlook 2019 to our platform and the body-plain is provided by mailgun.js our mail provider. You can find the json file saved by our WS when received from mailgun. I reproduced the problem using this transformed email.

email.txt

@jbaranguan
Copy link
Author

In the email body there is a thread of forwarded messages and I cannot say which mailer is used by the user that inserts the automatic block This email (including...

@eliottvincent
Copy link
Member

Thanks for the anonymized email!

Could you please screenshot me the specific version of Outlook? I think it's the "new" Outlook 2019. In that version, there is no separator anymore, which makes the parsing really difficult. Especially when it's a long chain of email replies / email forwards (your case).

What happens is that the ________________________________ part at the end acts as a false positive, as it's the exact separator used by Outlook 365 / Outlook Live. And this library "prefers" an exact separator rather than no separator at all.

If we delete it, the parsing is successful. There is one remaining issue on recipients with a coma in their name (eg. "C,A" or "LBRN, NFZ"), which are wrongly parsed because I never expected this format. I will update the library to fix this.

For the ________________________________ thing, I need to find a solution to avoid detecting this as a false positive.

@jbaranguan
Copy link
Author

jbaranguan commented Apr 13, 2023

Thanks for the reactivity!

I cannot screenshot the specific version of Outlook as it's a client's client user.

I was thinking that a possibility in a best-effort mode would be to discard the found email text if you don't find a proper forwarded email (no from, to, subject, etc) and iterate on the remaining body until you find a well formatted email. What do you think ?

EDIT: This approach does not work either with a thread like my example. The parsing should be performed following the email order, otherwise you will always find emails in the middle of the thread if they have a separator that is handled with a higher priority, right?

@eliottvincent
Copy link
Member

That's exactly true, the higher the better. In fact this is already enforced but there is an edge case when the highest email has no separator at all.

I can definitely improve things. I'll have a look at this in the coming days!

@eliottvincent
Copy link
Member

Hey! I have improved the support for nested emails, v1.4.0 will fix your issues.

Let me know ;)

@jbaranguan
Copy link
Author

jbaranguan commented May 16, 2023 via email

@eliottvincent
Copy link
Member

Hey there! Were you able to test?

@jbaranguan
Copy link
Author

Hey! Yes, I did, and it works much better :)

We're releasing a new version in production today containing your fix, I hope it will fix all our support tickets on that! 👍

I close the ticket.

Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants