Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace MIME parser completely with some lite version for critical parts #18

Closed
exander77 opened this issue Apr 17, 2020 · 33 comments
Closed
Labels
bug Something isn't working duplicate This issue or pull request already exists

Comments

@exander77
Copy link

exander77 commented Apr 17, 2020

I have encountered several problems inside Bridge and during message encryption on the server as well. The culprit here seems to be that during encryption ProtonMail is trying to parse messages even though it is not actually needed which causes a lot of problems. There are millions and one thing that could go wrong.

I suggest replacing MIME parser for critical tasks with a lite version.
It should be able to find recipient and message body, be able to replace the message body with the encrypted one and add headers. It should not be doing anything else with the message. It should work even if the message does not comply with RFCs.

I think, that find recipient (to decide key) and find the body and replace it and add headers should be safe enough to never fail. I mean like 100%. Because there are some strict conditions on these which needs to be met to even deliver the mail. For the encryption, nothing else is really needed, which in other words mean, nothing else should be done.

This supersedes the: #8

If the UI shows a broken email, it is fine, but if the email gets broken during encryption, it is not fine.

My opinion is that after decryption is completed you should get exactly the same email, to a single bit. Which is currently not the case at all. And with the current way of complete mail parsing is impossible.

@exander77 exander77 changed the title Replace MIME parser completely with some lite version Replace MIME parser completely with some lite version for critical parts Apr 17, 2020
@exander77
Copy link
Author

This is an aggregation of errors encountered through the migration of larger batch of emails:
Almost 300 emails failed to import.

Sorted by the number of occurrences which is the first item on the line.

General errors:

    168 NO mime: invalid media parameter
     50 NO non-utf8 content without charset specificationu
     38 NO multipart: NextPart: EOF
     24 NO mime: duplicate parameter name

Specific mulitpart errors:

      2 NO multipart: unexpected line in Next(): "idka&utm_campa=\r\n"
      2 NO multipart: unexpected line in Next(): "ansparent; text-decoration: =\r\n"
      2 NO multipart: unexpected line in Next(): ".2011&utm_content=3DElektro&utm_campaign=3DListopad\"><img bo=\r\n"
      2 NO malformed MIME header line: DÄujeme za Váš zájem o produkty z internetového obchodu HyperMotoShop.cz. VaÅ¡e objednávka byla pÅijata a bude v nejbližší dobÄyÅízena.
      2 NO malformed MIME header line: --bed3fc21cebe086ca423bad0f7ed139b0--
      2 NO malformed MIME header line: --be2b4fba2ec876e0d6baea9120d5eabde--
      2 NO malformed MIME header line: --9929ef3793b8ac4f7afe31260fa69e6a--

I can supply specific messages if meeded.

@cuthix
Copy link
Collaborator

cuthix commented Apr 22, 2020

Thanks for more details.. It would be nice to have some messages which are failing. If you don't have any personal information can you send eml to bridge@protonmail.ch thanks 🤗

It will be solved by #8 I will close this as duplicate

@cuthix cuthix closed this as completed Apr 22, 2020
@cuthix cuthix added the duplicate This issue or pull request already exists label Apr 22, 2020
@exander77
Copy link
Author

@cuthix Is it possible to get a testing account for this purpose with at least 5GB storage? I have already lost some emails through the process (but was able to recover). I needed to fix several of them going through reformime -r7. And I would need to start all over. Or maybe even better... can be Bridge run in some testing purpose without actually storing messages on the server (just parse, encrypt)? That would speed up the process of testing.

I have also pushed fixes into imapsync to work with the ProtonBridge:
imapsync/imapsync#163
https://github.com/exander77/imapsync/tree/dev

ProtonBridge is kind of talkative when doing append and it broke some not so bulletproof regular expressions in perl IMAPClient library. I can now test with my patched imapsync.

@exander77
Copy link
Author

I have modified imapsync to store all problematic messages. I will attach them and send them to bridge@protonmail.ch. I would like to note, that there is also a similar problem when receiving messages (not related to Bridge). Already reported that to support.

@exander77
Copy link
Author

I have created a new alias, it seems to be sufficient. I am now running the whole migration again. I discovered yet another problem. Some emails are silently dropped, but OK APPEND successful is reported. Another data loss...

@exander77
Copy link
Author

Please reopen this, this is not a duplicate. This is the only sane way forward.

@cuthix cuthix reopened this Apr 27, 2020
@cuthix
Copy link
Collaborator

cuthix commented Apr 27, 2020

If you would like to keep this open I don't mind. From dev point of view this and #8 is has same solution: we need to rework the ways how it's MIME parsed before import to API

@cuthix cuthix added the bug Something isn't working label Apr 27, 2020
@exander77
Copy link
Author

exander77 commented Apr 27, 2020

@cuthix I am not sure, the problems with MIME parser I have experienced so far are pretty serious. And I am not talking only about Bridge, the problem is on the ProtonMail Servers as well. I think they handle incoming emails in a similar way.

I just found out and reported to support a new problem:
When I receive an email with attachment encoded as quoted-printable, It will break such attachment by replacing \r\n by \n in a binary file. :( The binary file is then completely corrupted. The Bridge will be affected similarly.

If you guys think, you can write perfect MIME parser for both the Bridge and Server, go for it. But so far it is not even passable (do not be offended in any way). It fails on so many things and I am not sure you can make it. I am not even sure it is possible to make it. With all the possible encodings and non-standard behaviour, the parser may encounter. So I am not even sure to blame you for not writing a perfect parser. What I really think is, that this is not a way to go.

I think you regularly break millions of emails without even knowing it. And that sincerely is scary. Regular people may not notice... they will be just... the mail is broken for some reason. But what I have investigated so far was that the ProtonMail was always the cause.

I have to repeat myself here: My opinion is that after decryption is completed you should get exactly the same email, to a single bit. Which is currently not the case at all. And with the current way of complete mail parsing is impossible.

Edit: Added horrifying comparison of the corrupted email attachment.
Screenshot_2020-04-27_15-05-49

@cuthix
Copy link
Collaborator

cuthix commented Apr 27, 2020

When I receive an email with attachment encoded as quoted-printable

Just to be sure I understand.. This applies to web-app as well, right?

My opinion is that after decryption is completed you should get exactly the same email, to a single bit.

This email was not sent from ProtonMail user and it was not encrypted by sender but by our server on receiving, right?

Edit: Added horrifying comparison of the corrupted email attachment.

I suspect that this might be bridge encoding issue (after decryption). What did you diff here?

@exander77
Copy link
Author

exander77 commented Apr 27, 2020

When I receive an email with attachment encoded as quoted-printable

Just to be sure I understand.. This applies to web-app as well, right?

I have to check that first.

My opinion is that after decryption is completed you should get exactly the same email, to a single bit.

This email was not sent from ProtonMail user and it was not encrypted by sender but by our server on receiving, right?

Yes, was encrypted by ProtonMail Servers, sent from outside of ProtonMail (not encrypted at all).

Edit: Added horrifying comparison of the corrupted email attachment.

I suspect that this might be bridge encoding issue (after decryption). What did you diff here?

No, email was downloaded from Web UI and attachment as well. Diff is from the downloaded attachment (original) vs file downloaded from Web UI.

The mail was compeletely reworked by ProtonMail Server, so Bridge can't make it worse, original:

------=_NextPart_001_00C7_01D61C8E.D584C3F0
Content-Type: application/x-zip-compressed;
	name="anyconnect.zip"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
	filename="anyconnect.zip"

PK=03=04=0A=
=00=00=00=00=00Fj=95Px!=EB5=95=05=00=00=95=05=00=00=19=00=00=00AnyConnect=
LocalPolicy.xml<?xml version=3D"1.0" encoding=3D"UTF-8"?>
<AnyConnectLocalPolicy xmlns=3D"http://schemas.xmlsoap.org/encoding/" =
xmlns:xsi=3D"http://www.w3.org/2001/XMLSchema-instance" =
xsi:schemaLocation=3D"http://schemas.xmlsoap.org/encoding/ =

From ProtonMail:

-----------------------e839d213a401c6d6a45003b37c090ba5
Content-Type: application/x-zip-compressed; filename="anyconnect.zip"; name="anyconnect.zip"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="anyconnect.zip"; name="anyconnect.zip"

UEsDBAoAAAAAAEZqlVB4Ies1lQUAAJUFAAAZAAAAQW55Q29ubmVjdExvY2FsUG9saWN5LnhtbDw/
eG1sIHZlcnNpb249IjEuMCIgZW5jb2Rpbmc9IlVURi04Ij8+CjxBbnlDb25uZWN0TG9jYWxQb2xp
Y3kgeG1sbnM9Imh0dHA6Ly9zY2hlbWFzLnhtbHNvYXAub3JnL2VuY29kaW5nLyIgeG1sbnM6eHNp
PSJodHRwOi8vd3d3LnczLm9yZy8yMDAxL1hNTFNjaGVtYS1pbnN0YW5jZSIgeHNpOnNjaGVtYUxv
Y2F0aW9uPSJodHRwOi8vc2NoZW1hcy54bWxzb2FwLm9yZy9lbmNvZGluZy8gQW55Q29ubmVjdExv
Y2FsUG9saWN5LnhzZCIgYWN2ZXJzaW9uPSI0LjguMDIwNDUiPgo8QnlwYXNzRG93bmxvYWRlcj5m

It seems that ProtonMail received quoted-printable encoded attachment, broke it and reencoded it as base64.

@exander77
Copy link
Author

exander77 commented Apr 27, 2020

@cuthix So, good news, Bridge actually processes this mail fine.

Bridge imported original:

Content-Type: application/x-zip-compressed; filename="anyconnect.zip"; name="anyconnect.zip"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="anyconnect.zip"; name="anyconnect.zip"

UEsDBAoAAAAAAEZqlVB4Ies1lQUAAJUFAAAZAAAAQW55Q29ubmVjdExvY2FsUG9saWN5LnhtbDw/
eG1sIHZlcnNpb249IjEuMCIgZW5jb2Rpbmc9IlVURi04Ij8+DQo8QW55Q29ubmVjdExvY2FsUG9s
aWN5IHhtbG5zPSJodHRwOi8vc2NoZW1hcy54bWxzb2FwLm9yZy9lbmNvZGluZy8iIHhtbG5zOnhz
aT0iaHR0cDovL3d3dy53My5vcmcvMjAwMS9YTUxTY2hlbWEtaW5zdGFuY2UiIHhzaTpzY2hlbWFM
b2NhdGlvbj0iaHR0cDovL3NjaGVtYXMueG1sc29hcC5vcmcvZW5jb2RpbmcvIEFueUNvbm5lY3RM
b2NhbFBvbGljeS54c2QiIGFjdmVyc2lvbj0iNC44LjAyMDQ1Ij4NCjxCeXBhc3NEb3dubG9hZGVy
PmZhbHNlPC9CeXBhc3NEb3dubG9hZGVyPg0KPEVuYWJsZUNSTENoZWNrPmZhbHNlPC9FbmFibGVD
UkxDaGVjaz4NCjxFeGNsdWRlRmlyZWZveE5TU0NlcnRTdG9yZT5mYWxzZTwvRXhjbHVkZUZpcmVm
b3hOU1NDZXJ0U3RvcmU+DQo8RXhjbHVkZU1hY05hdGl2ZUNlcnRTdG9yZT5mYWxzZTwvRXhjbHVk

ProtonMail received:

Content-Type: application/x-zip-compressed; filename="anyconnect.zip"; name="anyconnect.zip"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="anyconnect.zip"; name="anyconnect.zip"

UEsDBAoAAAAAAEZqlVB4Ies1lQUAAJUFAAAZAAAAQW55Q29ubmVjdExvY2FsUG9saWN5LnhtbDw/
eG1sIHZlcnNpb249IjEuMCIgZW5jb2Rpbmc9IlVURi04Ij8+CjxBbnlDb25uZWN0TG9jYWxQb2xp
Y3kgeG1sbnM9Imh0dHA6Ly9zY2hlbWFzLnhtbHNvYXAub3JnL2VuY29kaW5nLyIgeG1sbnM6eHNp
PSJodHRwOi8vd3d3LnczLm9yZy8yMDAxL1hNTFNjaGVtYS1pbnN0YW5jZSIgeHNpOnNjaGVtYUxv
Y2F0aW9uPSJodHRwOi8vc2NoZW1hcy54bWxzb2FwLm9yZy9lbmNvZGluZy8gQW55Q29ubmVjdExv
Y2FsUG9saWN5LnhzZCIgYWN2ZXJzaW9uPSI0LjguMDIwNDUiPgo8QnlwYXNzRG93bmxvYWRlcj5m
YWxzZTwvQnlwYXNzRG93bmxvYWRlcj4KPEVuYWJsZUNSTENoZWNrPmZhbHNlPC9FbmFibGVDUkxD
aGVjaz4KPEV4Y2x1ZGVGaXJlZm94TlNTQ2VydFN0b3JlPmZhbHNlPC9FeGNsdWRlRmlyZWZveE5T
U0NlcnRTdG9yZT4KPEV4Y2x1ZGVNYWNOYXRpdmVDZXJ0U3RvcmU+ZmFsc2U8L0V4Y2x1ZGVNYWNO

So, this actually means, that there are bugs in the Server parsing when mails are received.
And there are bugs when importing emails using the Bridge.

Does this mean, that you are running different code on Server and on Bridge? The results are similar based on headers, but base64 content is different. The server receiving it broke it, Bridge import did not break it.

@exander77
Copy link
Author

exander77 commented Apr 27, 2020

Also, I would like to point out, that I am not happy, that the contents of the email were in both instances modified, the encoding of attachments was changed. Order of attachments was changed by Bridge.

And I would point out, that there are differences even within your own environment wich regards to MIME handling.

Update:
Investigating those emails closely...

ProtonMail Server made images into inline attachments:
Content-Disposition: inline;
Bridge made them standard attachments:
Content-Disposition: attachment;
But they were not attachments in the first place.

To summarize it:
When receiving quoted-printable attachments, they are broken.
When importing quoted-printable attachments, they are fine.
In both cases, they are converted to base64.
Deliberate and inconsistent changes in multipart content (why?).

Content-Disposition is an optional header (https://tools.ietf.org/html/rfc2183) and when it is not a part of the header, there is probably a reason for it and it should not be there. Why are you making unwarranted and inconsistent changes in email content?

@exander77
Copy link
Author

exander77 commented Apr 27, 2020

I am trying to think about this problem from various angles, but I have to stand my ground, that doing unwarranted changes to the message content is bad on all accounts.

ProtonMail should not, in any case, modify email content for any reason except:

  1. Adding new headers to email (but not the multiparts).
  2. Replacing plain text body with the encrypted body and the other way around.

Other changes are unwarranted, breaking and generally harmful, this consists of:

  1. Changing existing headers.
  2. Removing existing headers.
  3. Altering new line characters.
  4. Altering boundary strings (part of header actually).
  5. Adding new headers into multiparts of the message.
  6. Doing any and all alterations to the message body. The message body should be treated as a whole and no changes should be introduced.
  7. Breaking any email signatures.

I would even question if this is legal. There is the secrecy of correspondence (it differs in various jurisdictions) which protects the contents of the correspondence from tampering but is largely tolerated to do automated scans of emails for viruses and spam and even alter the headers, for example, Subject, to flag it as spam. But I do not think it is tolerable for the message body to be altered in any way during transit. Even if encryption is applied, there should be no alteration of the actual unencrypted content. I haven't tested this yet, but doesn't it break email signatures as well?

@exander77
Copy link
Author

exander77 commented Apr 27, 2020

Oh, yes... I didn't set up S/MIME verification for ProtonMail in mail client yet, just verified on the command line. Both ProtonMail Server and Bridge broke S/MIME signed emails.

$ openssl smime -verify -in FW_\ anyconnect\ -\ ProtonMail.eml 
Error reading S/MIME message
139644724533056:error:0D0D40CD:asn1 encoding routines:SMIME_read_ASN1:invalid mime type:../crypto/asn1/asn_mime.c:469:type: multipart/mixed

$ openssl smime -verify -in FW_\ anyconnect\ -\ Bridge.eml 
Error reading S/MIME message
139802251859776:error:0D0D40CD:asn1 encoding routines:SMIME_read_ASN1:invalid mime type:../crypto/asn1/asn_mime.c:469:type: multipart/mixed

$ openssl smime -verify -in FW_\ anyconnect\ -\ Gmail.eml
...
Verification successful

Creating a new issue: #26

So this is quite serious, ProtonMail is breaking signatures on emails. Basically altering emails in the man-in-the-middle fashion.

For me, this basically means, that I can delete everything I have imported, wait for you to fix this, reimport it and verify, that you have not broken anything. Awwww...

I can expect that my S/MIME signed emails will be broken as well?

I am not sure how to put this into words...

@exander77 exander77 mentioned this issue Apr 27, 2020
@exander77
Copy link
Author

exander77 commented Apr 30, 2020

Another problem mentioned here (while receiving messages): https://www.reddit.com/r/ProtonMail/comments/gavp2v/received_message_are_all_fucked_up/

This is breakage of messages on all fronts. This has to be solved ASAP.

@exander77
Copy link
Author

@emersion
Copy link

emersion commented May 5, 2020

So this is quite serious, ProtonMail is breaking signatures on emails. Basically altering emails in the man-in-the-middle fashion.

Yes. The standard Go library breaks DKIM and S/MIME signatures because it re-orders header fields and mutates whitespace. Another library would need to be used instead, such as go-message's textproto package.

@emersion
Copy link

emersion commented May 5, 2020

Hmm, another thing is that the ProtonMail API doesn't expose the original raw message structure. Text parts get mangled, MIME structure need to be re-generated from scratch, and I can't remember if per-part headers are exposed.

@exander77
Copy link
Author

exander77 commented May 5, 2020

@emersion Yes, it seems that way. All the signatures are broken.

I am stunned that this could be even in production. This is happening both when using IMAP Bridge and when receiving messages by ProtonMail server.

Basically all messages are corrupted in different ways. Some to a point where part of headers ends up in the body or binary attachment are completely broken.

@cuthix Can ProtonMail give the estimate when this will be fixed? This is serious on so many levels I can't even describe it. I thought that after 6 years, ProtonMail is in a reasonably usable state. This is just not normal.

Not corrupting messages seems a prerequisite for running this as a paid product.

@emersion
Copy link

emersion commented May 5, 2020

Some to a point where part of headers ends up in the body or binary attachment are completely broken.

This sounds like a different issue.

I am stunned that this could be even in production.

Keep in mind most users don't care about S/MIME, text parts or the MIME structure getting preserved.

@MorgothSauron
Copy link

Keep in mind most users don't care about S/MIME, text parts or the MIME structure getting preserved.

Because most people don't care, including me up until recently, is not a reason for a broken "system". Right now I have to go through all my messages I imported and received to make sure nothing is broken/corrupted.

@exander77
Copy link
Author

Some to a point where part of headers ends up in the body or binary attachment are completely broken.

This sounds like a different issue.

I think it is related. It is a wrong detection of multipart parts. This all boils down to message parsing. Messages are parsed into some structure and then reconstructed. I think this process is just wrong. You can never reconstruct the original message. Messages need to be modified in a reversible manner. I myself have implemented various encryption/signing for messages and this is just not the way how to do it.

I am stunned that this could be even in production.

Keep in mind most users don't care about S/MIME, text parts or the MIME structure getting preserved.

I am not sure that this argument at all. It could be an argument for not signing emails or not verifying signatures. It just cannot be an argument for breaking the signatures.

S/MIME is widely used and a lot of business emails are signed (if you do not verify it). Consider I will have a problem in 5 years and will need to pull contract in a mail signed by S/MIME. What will I do then, when I discover that all signatures on email were broken by ProtonMail? S/MIME gives me a guarantee I have something to provide to the court.

@exander77
Copy link
Author

Because most people don't care, including me up until recently, is not a reason for a broken "system". Right now I have to go through all my messages I imported and received to make sure nothing is broken/corrupted.

I am thinking about patching imapsync to do verification on the synchronized emails - basically, downloading it after synchronization and comparing with the original message. But the problem is that the emails are changed so much by ProtonMail that it is hard to say that it is actually still the same one. So far I can verify with some certainty that text contents and attachment contents are preserved. I am actually worried about what I will find because I found most of these problems by accidents: "Why the hell I can't open this attachment?" etc.

Would anybody consider this normal if this was standard paper letter mail service? Would it be normal for somebody to go through the contents of your letter and cut the signatures from the signed contracts? Lose some attachments and mangle the paper?

@bartbutler
Copy link

We receive literally millions of emails per day and do not receive any sort of regular reports of corrupted messages. There are many reasons we parse the messages on receipt, mostly due to performance/disk space and the encryption processing. Standardizing both newlines and and charsets are also good things to do for unsigned emails on the server as it makes the client's job significantly easier.

If you have examples of emails which have parts which are corrupted by the server-side ingestion, I would love to get some examples and we'll address them. We also do have a MIME structure-preserving email mode we use for PGP/MIME encrypted or signed emails which we could also probably use if the parsing detects an S/MIME signature, though we'd need to look into this.

But not processing/cleaning/splitting unsigned/unencrypted emails by default would be untenable and in any case will just shift the parsing burden to the client where it's even more difficult to ensure a consistent user experience.

@exander77
Copy link
Author

exander77 commented Jul 26, 2020

@bartbutler This is quite an arrogant statement. You corrupt a lot of these emails. I have reported numerous instances of messages unable to be parsed by the Bridge or corrupted by your servers when received. I have been reporting it since April. I had a call with your programmers as well.

Standardizing both newlines and and charsets are also good things to do for unsigned emails on the server as it makes the client's job significantly easier.

No this is an abomination. And you should be shot for this. You do not fuck with email content! You can parse it all you want, but for the love of God, store the email as you have received it and do not fuck with it.

I sent a lot of them to support@protonmail.com, bridge@protonmail.com, etc.

This is just from the top of my head:

  1. You break a large number of undeliverable mail server responses. All from Office 365 for example. Not fixed yet. Also from Allegro.pl and others. Your mime parsers get rekt when parsing and email which has another email as a text in its body. Basically all undeliverable responses and a lot of other legitimate emails when somebody sends you an email including headers in the body. I think I even seen normal email be fucked this way. Just a specific content in the email... and body is not correctly detected.
  2. You break en email where the attachment is encoded as quoted-printable by breaking binary files with \r\n by replacing all with \n. You fucked my business mail by corrupting attachment and made me look like a moron. So much for normalizing new lines in my binary attachment. Tanks a lot!
  3. I am no longer able to verify S/MIME signatures, they are all fucked.
  4. I am not able to import like 300 emails, because Bridge cannot handle them, this is after I did a lot of reformime -r9 on them. Before that, it was a much higher number.

But not processing/cleaning/splitting unsigned/unencrypted emails by default would be untenable and in any case will just shift the parsing burden to the client where it's even more difficult to ensure a consistent user experience.

Bullshit, neither Gmail, nor Outlook, nor any other service fucked with my email content. They may parse it, but they preserve the original as well, so I can get what was actually received. How you do it when I get an encrypted email from the third party and you can't fuck it on the server? This is so completely nonsensical. I am really interested in how you do it when I receive an encrypted email from outside.

@exander77
Copy link
Author

exander77 commented Jul 26, 2020

I will give an example of how it works in reality:

People use either import tool or Bridge to import emails. And it silently drops a large number of their emails (because it can't parse them). They don't even notice, that they lost a few hundred emails.
People sometimes receive corrupted email or attachment, but will usually blame it on the sender, something.

Your arrogance is based on the customers who do not have the ability and knowledge to find out that they lost emails or that their emails were corrupted.

Sometimes I saw somebody on Reddit complaints about lost emails. It was disregarded by ProtonMail. But I am pretty sure you have actually lost his emails. As I myself have found a lot of ways how to lose emails. My first experience with Bridge was loosing 500 emails in the first 5 minutes: #2 I can only guess how much emails were lost by this feature.

I actually tried to verify my emails after import. And when I get corrupted email I try to find out how that happened.

@bartbutler
Copy link

bartbutler commented Jul 26, 2020

@bartbutler This is quite an arrogant statement. You corrupt a lot of these emails. I have reported numerous instances of messages unable to be parsed by the Bridge or corrupted by your servers when received. I have been reporting it since April. I had a call with your programmers as well.

Then we should fix it.

Standardizing both newlines and and charsets are also good things to do for unsigned emails on the server as it makes the client's job significantly easier.

No this is an abomination. And you should be shot for this. You do not fuck with email content! You can parse it all you want, but for the love of God, store the email as you have received it and do not fuck with it.

I'm happy to do what I can here but no, I should not be shot for this. Please remain civil.

I sent a lot of them to support@protonmail.com, bridge@protonmail.com, etc.

This is just from the top of my head:

  1. You break a large number of undeliverable mail server responses. All from Office 365 for example. Not fixed yet. Also from Allegro.pl and others. Your mime parsers get rekt when parsing and email which has another email as a text in its body. Basically all undeliverable responses and a lot of other legitimate emails when somebody sends you an email including headers in the body. I think I even seen normal email be fucked this way. Just a specific content in the email... and body is not correctly detected.

Yes, we should treat encapsulated emails correctly, and I'd be happy to assign someone to fix this.

  1. You break en email where the attachment is encoded as quoted-printable by breaking binary files with \r\n by replacing all with \n. You fucked my business mail by corrupting attachment and made me look like a moron. So much for normalizing new lines in my binary attachment. Tanks a lot!

We should not be processing any newlines in any attachments. I'd be happy to fix this as well.

  1. I am no longer able to verify S/MIME signatures, they are all fucked.

As I said somewhere else, we have a message mode that might work for this if we can detect the presence of S/MIME

  1. I am not able to import like 300 emails, because Bridge cannot handle them, this is after I did a lot of reformime -r9 on them. Before that, it was a much higher number.

Yes, we should continue improving this.

But not processing/cleaning/splitting unsigned/unencrypted emails by default would be untenable and in any case will just shift the parsing burden to the client where it's even more difficult to ensure a consistent user experience.

Bullshit, neither Gmail, nor Outlook, nor any other service fucked with my email content. They may parse it, but they preserve the original as well, so I can get what was actually received. How you do it when I get an encrypted email from the third party and you can't fuck it on the server? This is so completely nonsensical. I am really interested in how you do it when I receive an encrypted email from outside.

We can't preserve the original and parsed content, it will more than double our storage costs. The others can use compression and other clever tricks to get around this--we cannot, because all we have is ciphertext. For PGP/MIME coming in we are careful to not disrupt the signed part of the message, which is the solution I think could work for S/MIME. It has significant performance and space tradeoffs for other (non-bridge) clients though which is why we do not use this mode for everything.

@exander77
Copy link
Author

It will be more than 3 months since I reported a lot of these issues. I have low hope you can actually make a really good parser within a reasonable time. I don't think this approach is even viable. You should do lightweight parser to get necessary headers and encryption done, replace the body, and preserve the result and just reference it. Complete parsing and reconstruction is extremely hard.

In this case, my civility got corrupted with my emails. :) When you replace \r\n by \n in my binary file attachment, I kind of lost it.

We can't preserve the original and parsed content, it will more than double our storage costs. The others can use compression and other clever tricks to get around this--we cannot, because all we have is ciphertext. For PGP/MIME coming in we are careful to not disrupt the signed part of the message, which is the solution I think could work for S/MIME. It has significant performance and space tradeoffs for other (non-bridge) clients though which is why we do not use this mode for everything.

You could compress the original message by referencing parsed data but you would need to stop messing with them.

@exander77
Copy link
Author

exander77 commented Jul 28, 2020

Any news on this? I would like to try to import my emails again. Was paying for Gmail for the whole time. But gave up. So it will be fun. I will try to migrate my mails without imapsync but by copying them in ThunderBird.

@exander77
Copy link
Author

Any news on this? What is taking so long? This is a critical feature. And are quoted-printable still broken? Should I test?

@jameshoulahan
Copy link
Contributor

jameshoulahan commented Oct 4, 2020

Hi, as discussed in a number of recently opened issues, the parser has been rewritten and will be released in upcoming versions of bridge and import export. It is currently available for testing in the latest beta releases, albeit with a couple of quirks that we intend to fix in the coming few days before the next stable release.

@exander77
Copy link
Author

I am closing this. Will open a new issue for persisting problems.

@trev-dev
Copy link

trev-dev commented Nov 6, 2021

Hey, so, I'm using ProtonMail bridge to send messages via SMTP in mu4e (emacs) and the HTML multipart is being removed somehow. Is this related? If I send via Google's SMTP the multipart data is preserved just fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

7 participants