-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace MIME parser completely with some lite version for critical parts #18
Comments
This is an aggregation of errors encountered through the migration of larger batch of emails: Sorted by the number of occurrences which is the first item on the line. General errors:
Specific mulitpart errors:
I can supply specific messages if meeded. |
Thanks for more details.. It would be nice to have some messages which are failing. If you don't have any personal information can you send eml to bridge@protonmail.ch thanks 🤗 It will be solved by #8 I will close this as duplicate |
@cuthix Is it possible to get a testing account for this purpose with at least 5GB storage? I have already lost some emails through the process (but was able to recover). I needed to fix several of them going through I have also pushed fixes into imapsync to work with the ProtonBridge: ProtonBridge is kind of talkative when doing append and it broke some not so bulletproof regular expressions in perl IMAPClient library. I can now test with my patched imapsync. |
I have modified imapsync to store all problematic messages. I will attach them and send them to bridge@protonmail.ch. I would like to note, that there is also a similar problem when receiving messages (not related to Bridge). Already reported that to support. |
I have created a new alias, it seems to be sufficient. I am now running the whole migration again. I discovered yet another problem. Some emails are silently dropped, but OK APPEND successful is reported. Another data loss... |
Please reopen this, this is not a duplicate. This is the only sane way forward. |
If you would like to keep this open I don't mind. From dev point of view this and #8 is has same solution: we need to rework the ways how it's MIME parsed before import to API |
@cuthix I am not sure, the problems with MIME parser I have experienced so far are pretty serious. And I am not talking only about Bridge, the problem is on the ProtonMail Servers as well. I think they handle incoming emails in a similar way. I just found out and reported to support a new problem: If you guys think, you can write perfect MIME parser for both the Bridge and Server, go for it. But so far it is not even passable (do not be offended in any way). It fails on so many things and I am not sure you can make it. I am not even sure it is possible to make it. With all the possible encodings and non-standard behaviour, the parser may encounter. So I am not even sure to blame you for not writing a perfect parser. What I really think is, that this is not a way to go. I think you regularly break millions of emails without even knowing it. And that sincerely is scary. Regular people may not notice... they will be just... the mail is broken for some reason. But what I have investigated so far was that the ProtonMail was always the cause. I have to repeat myself here: My opinion is that after decryption is completed you should get exactly the same email, to a single bit. Which is currently not the case at all. And with the current way of complete mail parsing is impossible. Edit: Added horrifying comparison of the corrupted email attachment. |
Just to be sure I understand.. This applies to web-app as well, right?
This email was not sent from ProtonMail user and it was not encrypted by sender but by our server on receiving, right?
I suspect that this might be bridge encoding issue (after decryption). What did you diff here? |
I have to check that first.
Yes, was encrypted by ProtonMail Servers, sent from outside of ProtonMail (not encrypted at all).
No, email was downloaded from Web UI and attachment as well. Diff is from the downloaded attachment (original) vs file downloaded from Web UI. The mail was compeletely reworked by ProtonMail Server, so Bridge can't make it worse, original:
From ProtonMail:
It seems that ProtonMail received quoted-printable encoded attachment, broke it and reencoded it as base64. |
@cuthix So, good news, Bridge actually processes this mail fine. Bridge imported original:
ProtonMail received:
So, this actually means, that there are bugs in the Server parsing when mails are received. Does this mean, that you are running different code on Server and on Bridge? The results are similar based on headers, but base64 content is different. The server receiving it broke it, Bridge import did not break it. |
Also, I would like to point out, that I am not happy, that the contents of the email were in both instances modified, the encoding of attachments was changed. Order of attachments was changed by Bridge. And I would point out, that there are differences even within your own environment wich regards to MIME handling. Update: ProtonMail Server made images into inline attachments: To summarize it: Content-Disposition is an optional header (https://tools.ietf.org/html/rfc2183) and when it is not a part of the header, there is probably a reason for it and it should not be there. Why are you making unwarranted and inconsistent changes in email content? |
I am trying to think about this problem from various angles, but I have to stand my ground, that doing unwarranted changes to the message content is bad on all accounts. ProtonMail should not, in any case, modify email content for any reason except:
Other changes are unwarranted, breaking and generally harmful, this consists of:
I would even question if this is legal. There is the secrecy of correspondence (it differs in various jurisdictions) which protects the contents of the correspondence from tampering but is largely tolerated to do automated scans of emails for viruses and spam and even alter the headers, for example, Subject, to flag it as spam. But I do not think it is tolerable for the message body to be altered in any way during transit. Even if encryption is applied, there should be no alteration of the actual unencrypted content. I haven't tested this yet, but doesn't it break email signatures as well? |
Oh, yes... I didn't set up S/MIME verification for ProtonMail in mail client yet, just verified on the command line. Both ProtonMail Server and Bridge broke S/MIME signed emails.
Creating a new issue: #26 So this is quite serious, ProtonMail is breaking signatures on emails. Basically altering emails in the man-in-the-middle fashion. For me, this basically means, that I can delete everything I have imported, wait for you to fix this, reimport it and verify, that you have not broken anything. Awwww... I can expect that my S/MIME signed emails will be broken as well? I am not sure how to put this into words... |
Another problem mentioned here (while receiving messages): https://www.reddit.com/r/ProtonMail/comments/gavp2v/received_message_are_all_fucked_up/ This is breakage of messages on all fronts. This has to be solved ASAP. |
Yes. The standard Go library breaks DKIM and S/MIME signatures because it re-orders header fields and mutates whitespace. Another library would need to be used instead, such as go-message's textproto package. |
Hmm, another thing is that the ProtonMail API doesn't expose the original raw message structure. Text parts get mangled, MIME structure need to be re-generated from scratch, and I can't remember if per-part headers are exposed. |
@emersion Yes, it seems that way. All the signatures are broken. I am stunned that this could be even in production. This is happening both when using IMAP Bridge and when receiving messages by ProtonMail server. Basically all messages are corrupted in different ways. Some to a point where part of headers ends up in the body or binary attachment are completely broken. @cuthix Can ProtonMail give the estimate when this will be fixed? This is serious on so many levels I can't even describe it. I thought that after 6 years, ProtonMail is in a reasonably usable state. This is just not normal. Not corrupting messages seems a prerequisite for running this as a paid product. |
This sounds like a different issue.
Keep in mind most users don't care about S/MIME, text parts or the MIME structure getting preserved. |
Because most people don't care, including me up until recently, is not a reason for a broken "system". Right now I have to go through all my messages I imported and received to make sure nothing is broken/corrupted. |
I think it is related. It is a wrong detection of multipart parts. This all boils down to message parsing. Messages are parsed into some structure and then reconstructed. I think this process is just wrong. You can never reconstruct the original message. Messages need to be modified in a reversible manner. I myself have implemented various encryption/signing for messages and this is just not the way how to do it.
I am not sure that this argument at all. It could be an argument for not signing emails or not verifying signatures. It just cannot be an argument for breaking the signatures. S/MIME is widely used and a lot of business emails are signed (if you do not verify it). Consider I will have a problem in 5 years and will need to pull contract in a mail signed by S/MIME. What will I do then, when I discover that all signatures on email were broken by ProtonMail? S/MIME gives me a guarantee I have something to provide to the court. |
I am thinking about patching imapsync to do verification on the synchronized emails - basically, downloading it after synchronization and comparing with the original message. But the problem is that the emails are changed so much by ProtonMail that it is hard to say that it is actually still the same one. So far I can verify with some certainty that text contents and attachment contents are preserved. I am actually worried about what I will find because I found most of these problems by accidents: "Why the hell I can't open this attachment?" etc. Would anybody consider this normal if this was standard paper letter mail service? Would it be normal for somebody to go through the contents of your letter and cut the signatures from the signed contracts? Lose some attachments and mangle the paper? |
We receive literally millions of emails per day and do not receive any sort of regular reports of corrupted messages. There are many reasons we parse the messages on receipt, mostly due to performance/disk space and the encryption processing. Standardizing both newlines and and charsets are also good things to do for unsigned emails on the server as it makes the client's job significantly easier. If you have examples of emails which have parts which are corrupted by the server-side ingestion, I would love to get some examples and we'll address them. We also do have a MIME structure-preserving email mode we use for PGP/MIME encrypted or signed emails which we could also probably use if the parsing detects an S/MIME signature, though we'd need to look into this. But not processing/cleaning/splitting unsigned/unencrypted emails by default would be untenable and in any case will just shift the parsing burden to the client where it's even more difficult to ensure a consistent user experience. |
@bartbutler This is quite an arrogant statement. You corrupt a lot of these emails. I have reported numerous instances of messages unable to be parsed by the Bridge or corrupted by your servers when received. I have been reporting it since April. I had a call with your programmers as well.
No this is an abomination. And you should be shot for this. You do not fuck with email content! You can parse it all you want, but for the love of God, store the email as you have received it and do not fuck with it. I sent a lot of them to support@protonmail.com, bridge@protonmail.com, etc. This is just from the top of my head:
Bullshit, neither Gmail, nor Outlook, nor any other service fucked with my email content. They may parse it, but they preserve the original as well, so I can get what was actually received. How you do it when I get an encrypted email from the third party and you can't fuck it on the server? This is so completely nonsensical. I am really interested in how you do it when I receive an encrypted email from outside. |
I will give an example of how it works in reality: People use either import tool or Bridge to import emails. And it silently drops a large number of their emails (because it can't parse them). They don't even notice, that they lost a few hundred emails. Your arrogance is based on the customers who do not have the ability and knowledge to find out that they lost emails or that their emails were corrupted. Sometimes I saw somebody on Reddit complaints about lost emails. It was disregarded by ProtonMail. But I am pretty sure you have actually lost his emails. As I myself have found a lot of ways how to lose emails. My first experience with Bridge was loosing 500 emails in the first 5 minutes: #2 I can only guess how much emails were lost by this feature. I actually tried to verify my emails after import. And when I get corrupted email I try to find out how that happened. |
Then we should fix it.
I'm happy to do what I can here but no, I should not be shot for this. Please remain civil.
Yes, we should treat encapsulated emails correctly, and I'd be happy to assign someone to fix this.
We should not be processing any newlines in any attachments. I'd be happy to fix this as well.
As I said somewhere else, we have a message mode that might work for this if we can detect the presence of S/MIME
Yes, we should continue improving this.
We can't preserve the original and parsed content, it will more than double our storage costs. The others can use compression and other clever tricks to get around this--we cannot, because all we have is ciphertext. For PGP/MIME coming in we are careful to not disrupt the signed part of the message, which is the solution I think could work for S/MIME. It has significant performance and space tradeoffs for other (non-bridge) clients though which is why we do not use this mode for everything. |
It will be more than 3 months since I reported a lot of these issues. I have low hope you can actually make a really good parser within a reasonable time. I don't think this approach is even viable. You should do lightweight parser to get necessary headers and encryption done, replace the body, and preserve the result and just reference it. Complete parsing and reconstruction is extremely hard. In this case, my civility got corrupted with my emails. :) When you replace \r\n by \n in my binary file attachment, I kind of lost it.
You could compress the original message by referencing parsed data but you would need to stop messing with them. |
Any news on this? I would like to try to import my emails again. Was paying for Gmail for the whole time. But gave up. So it will be fun. I will try to migrate my mails without imapsync but by copying them in ThunderBird. |
Any news on this? What is taking so long? This is a critical feature. And are |
Hi, as discussed in a number of recently opened issues, the parser has been rewritten and will be released in upcoming versions of bridge and import export. It is currently available for testing in the latest beta releases, albeit with a couple of quirks that we intend to fix in the coming few days before the next stable release. |
I am closing this. Will open a new issue for persisting problems. |
Hey, so, I'm using ProtonMail bridge to send messages via SMTP in |
I have encountered several problems inside Bridge and during message encryption on the server as well. The culprit here seems to be that during encryption ProtonMail is trying to parse messages even though it is not actually needed which causes a lot of problems. There are millions and one thing that could go wrong.
I suggest replacing MIME parser for critical tasks with a lite version.
It should be able to find recipient and message body, be able to replace the message body with the encrypted one and add headers. It should not be doing anything else with the message. It should work even if the message does not comply with RFCs.
I think, that find recipient (to decide key) and find the body and replace it and add headers should be safe enough to never fail. I mean like 100%. Because there are some strict conditions on these which needs to be met to even deliver the mail. For the encryption, nothing else is really needed, which in other words mean, nothing else should be done.
This supersedes the: #8
If the UI shows a broken email, it is fine, but if the email gets broken during encryption, it is not fine.
My opinion is that after decryption is completed you should get exactly the same email, to a single bit. Which is currently not the case at all. And with the current way of complete mail parsing is impossible.
The text was updated successfully, but these errors were encountered: