NIAD-3124: Fix inbound compressed fragments being corrupted#163
Merged
adrianclay merged 4 commits intomainfrom Aug 8, 2024
Merged
NIAD-3124: Fix inbound compressed fragments being corrupted#163adrianclay merged 4 commits intomainfrom
adrianclay merged 4 commits intomainfrom
Conversation
b1d150a to
65d07e6
Compare
|
This change has been tested by the PRM team using a file that had previously been known to cause problems (fragments received were being stored in excess of 12MB instead of the 5MB Spine limit.) We also regression tested an in/out EHR transfer containing an attached image, which transferred successfully. We are happy that this change has resolved the issue we raised - thanks for the efforts of the NIA team for working on this :). |
Previously we were using the email.Message.get_content() method. This would cause the raw_data_manager to forcefully convert any payload with a Content-Type of text/* to a string. This behaviour is unwelcome with GP2GP Large Message spec where an attachment can claim to be a "text/plain" for example, but in reality contain the GZIP compressed base64 representation of "text/plain". There was previously code to workaround the raw_data_manager, however the workaround was buggy in the case that the attachment was encoded with base64 but not decompressable (for example because it's a fragment). This new implementation avoids the ContentManager by calling the email.Message.get_payload() instead. Also avoid guessing whether a payload was base64 or not by inspecting the Content-Transfer-Encoding header directly. This should make the behaviour of the inbound adaptor more predictable, and also matches how the `email` package decides whether a message is base64 or not. https://docs.python.org/3/library/email.contentmanager.html#email.contentmanager.raw_data_manager
martin-nhs
reviewed
Aug 8, 2024
Comment on lines
+284
to
+287
| logger.info( | ||
| 'Successfully decoded message part with {ContentType} {ContentTransferEncoding} as string', | ||
| fparams=logger_dict | ||
| ) |
There was a problem hiding this comment.
What version of Python is being used? Couldn't we just use f-strings here or is this not recommended?
Suggested change
| logger.info( | |
| 'Successfully decoded message part with {ContentType} {ContentTransferEncoding} as string', | |
| fparams=logger_dict | |
| ) | |
| logger.info(f'Successfully decoded message part with {ContentType} {ContentTransferEncoding} as string') |
Contributor
Author
There was a problem hiding this comment.
I'm not sure, this code existed before this PR 🤷🏻
Co-authored-by: martin-nhs <127403254+martin-nhs@users.noreply.github.com>
martin-nhs
approved these changes
Aug 8, 2024
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What + Why
Previously we were using the email.Message.get_content() method. This would cause the raw_data_manager to forcefully convert any payload with a Content-Type of text/* to a string.
This behaviour is unwelcome with GP2GP Large Message spec where an attachment can claim to be a "text/plain" for example, but in reality contain the GZIP compressed base64 representation of "text/plain".
There was previously code to workaround the raw_data_manager, however the workaround was buggy in the case that the attachment was encoded with base64 but not decompressable (for example because it's a fragment).
This new implementation avoids the ContentManager by calling the email.Message.get_payload() instead.
Also avoid guessing whether a payload was base64 or not by inspecting the Content-Transfer-Encoding header directly. This should make the behaviour of the inbound adaptor more predictable, and also matches how the
emailpackage decides whether a message is base64 or not.Type of change
Bug fix (non-breaking change which fixes an issue)
Checklist: