Mails with blocks added after underscore are not correctly managed #13

jbaranguan · 2023-04-12T13:16:33Z

Hi,

Your lib is great! Thank you!

Nevertheless I have an issue when I parse a forwarded message containing an automatically insterted block that is inserted in the end following multiple "_".

A reproducer:

I transfer you that mail.

De : Jorge BARANGUAN <baranguan@hotmail.com>
Envoyé : jeudi 6 avril 2023 16:17
À : Jorge BARANGUAN <jorge.baranguan@iwecloud.com>
Objet : ***URGENT** 9673155358 nos réf


MY body email...
  ________________________________
  This email (including any attachments) is intended for the designated recipient(s) only, and may be confidential, non-public, proprietary, and/or protected by the attorney-client or other privilege. Unauthorized reading, distribution, copying or other use of this communication is prohibited and may be unlawful. Receipt by anyone other than the intended recipient(s) should not be deemed a waiver of any privilege or protection. If you are not the intended recipient or if you believe that you have received this email in error, please notify the sender immediately and delete all copies from your computer system without reading, saving, printing, forwarding or using it in any manner. Although it has been checked for viruses and other malicious software (\"malware\"), we do not warrant, represent or guarantee in any way that this communication is free of malware or potentially damaging defects. All liability for any actual or alleged loss, damage, or injury arising out of or resulting in any way from the receipt, opening or use of this email is expressly disclaimed.

When performing new EmailForwardParser().read(mailBody, "***URGENT** 9673155358 nos réf"), the lib detects the part after the ____ (This email (including any attachments) is intended for the designated recipient(s) only...) as the forwarded email, hence I cannot extract the from/to information.

Do you think that it could be fixed by removing this groups of _ characters before parsing?

The text was updated successfully, but these errors were encountered:

eliottvincent · 2023-04-12T13:22:49Z

Hey Jorge! Can you please provide me with the full export of that email? With headers etc. Like an .EML file or even .txt.
Furthermore, from what email client was the email forwarded?

jbaranguan · 2023-04-12T14:33:05Z

I'm afraid that I cannot provide you the exact full export of the email because it contains personal data of our clients. I made a first round of anonymization of the content to try to remove some personal data.

The email was forwarded from Outlook 2019 to our platform and the body-plain is provided by mailgun.js our mail provider. You can find the json file saved by our WS when received from mailgun. I reproduced the problem using this transformed email.

email.txt

jbaranguan · 2023-04-12T14:34:40Z

In the email body there is a thread of forwarded messages and I cannot say which mailer is used by the user that inserts the automatic block This email (including...

eliottvincent · 2023-04-12T18:55:39Z

Thanks for the anonymized email!

Could you please screenshot me the specific version of Outlook? I think it's the "new" Outlook 2019. In that version, there is no separator anymore, which makes the parsing really difficult. Especially when it's a long chain of email replies / email forwards (your case).

What happens is that the ________________________________ part at the end acts as a false positive, as it's the exact separator used by Outlook 365 / Outlook Live. And this library "prefers" an exact separator rather than no separator at all.

If we delete it, the parsing is successful. There is one remaining issue on recipients with a coma in their name (eg. "C,A" or "LBRN, NFZ"), which are wrongly parsed because I never expected this format. I will update the library to fix this.

For the ________________________________ thing, I need to find a solution to avoid detecting this as a false positive.

jbaranguan · 2023-04-13T07:56:14Z

Thanks for the reactivity!

I cannot screenshot the specific version of Outlook as it's a client's client user.

I was thinking that a possibility in a best-effort mode would be to discard the found email text if you don't find a proper forwarded email (no from, to, subject, etc) and iterate on the remaining body until you find a well formatted email. What do you think ?

EDIT: This approach does not work either with a thread like my example. The parsing should be performed following the email order, otherwise you will always find emails in the middle of the thread if they have a separator that is handled with a higher priority, right?

eliottvincent · 2023-04-14T08:13:02Z

That's exactly true, the higher the better. In fact this is already enforced but there is an edge case when the highest email has no separator at all.

I can definitely improve things. I'll have a look at this in the coming days!

eliottvincent · 2023-05-10T20:28:18Z

Hey! I have improved the support for nested emails, v1.4.0 will fix your issues.

Let me know ;)

jbaranguan · 2023-05-16T12:30:38Z

Hello! Thank you very much for the fix. I think that I will be able to test it next week, I have a lot of work to do for now, I'll let you know! Have a nice day! Jorge BARANGUAN Engineering Lead 07 81 89 99 47 www.iwecloud.com <https://twitter.com/iWE_cloud> <https://www.linkedin.com/company/iwe-cloud>

…

On Wed, 10 May 2023 at 22:28, Eliott Vincent ***@***.***> wrote: Hey! I have improved the support for nested emails, v1.4.0 will fix your issues. Let me know ;) — Reply to this email directly, view it on GitHub <#13 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEXKBW24O46EVZCOT57ZVILXFP26ZANCNFSM6AAAAAAW3VRG3Q> . You are receiving this because you authored the thread.Message ID: ***@***.***>

eliottvincent · 2023-06-16T14:42:35Z

Hey there! Were you able to test?

jbaranguan · 2023-06-16T14:50:34Z

Hey! Yes, I did, and it works much better :)

We're releasing a new version in production today containing your fix, I hope it will fix all our support tickets on that! 👍

I close the ticket.

Thank you very much!

jbaranguan closed this as completed Jun 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mails with blocks added after underscore are not correctly managed #13

Mails with blocks added after underscore are not correctly managed #13

jbaranguan commented Apr 12, 2023 •

edited by eliottvincent

Loading

eliottvincent commented Apr 12, 2023

jbaranguan commented Apr 12, 2023

jbaranguan commented Apr 12, 2023

eliottvincent commented Apr 12, 2023

jbaranguan commented Apr 13, 2023 •

edited

Loading

eliottvincent commented Apr 14, 2023

eliottvincent commented May 10, 2023

jbaranguan commented May 16, 2023 via email

eliottvincent commented Jun 16, 2023

jbaranguan commented Jun 16, 2023

Mails with blocks added after underscore are not correctly managed #13

Mails with blocks added after underscore are not correctly managed #13

Comments

jbaranguan commented Apr 12, 2023 • edited by eliottvincent Loading

eliottvincent commented Apr 12, 2023

jbaranguan commented Apr 12, 2023

jbaranguan commented Apr 12, 2023

eliottvincent commented Apr 12, 2023

jbaranguan commented Apr 13, 2023 • edited Loading

eliottvincent commented Apr 14, 2023

eliottvincent commented May 10, 2023

jbaranguan commented May 16, 2023 via email

eliottvincent commented Jun 16, 2023

jbaranguan commented Jun 16, 2023

jbaranguan commented Apr 12, 2023 •

edited by eliottvincent

Loading

jbaranguan commented Apr 13, 2023 •

edited

Loading