Skip to content

NIAD-3124: Fix inbound compressed fragments being corrupted#163

Merged
adrianclay merged 4 commits intomainfrom
NIAD-3124
Aug 8, 2024
Merged

NIAD-3124: Fix inbound compressed fragments being corrupted#163
adrianclay merged 4 commits intomainfrom
NIAD-3124

Conversation

@adrianclay
Copy link
Copy Markdown
Contributor

@adrianclay adrianclay commented Jul 16, 2024

What + Why

Previously we were using the email.Message.get_content() method. This would cause the raw_data_manager to forcefully convert any payload with a Content-Type of text/* to a string.

This behaviour is unwelcome with GP2GP Large Message spec where an attachment can claim to be a "text/plain" for example, but in reality contain the GZIP compressed base64 representation of "text/plain".

There was previously code to workaround the raw_data_manager, however the workaround was buggy in the case that the attachment was encoded with base64 but not decompressable (for example because it's a fragment).

This new implementation avoids the ContentManager by calling the email.Message.get_payload() instead.

Also avoid guessing whether a payload was base64 or not by inspecting the Content-Transfer-Encoding header directly. This should make the behaviour of the inbound adaptor more predictable, and also matches how the email package decides whether a message is base64 or not.

Type of change

Bug fix (non-breaking change which fixes an issue)

Checklist:

  • I have performed a self-review of my code
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have updated the Changelog with details of my change in the UNRELEASED section if this change will affect end users

@adrianclay adrianclay force-pushed the NIAD-3124 branch 4 times, most recently from b1d150a to 65d07e6 Compare July 19, 2024 14:44
@adrianclay adrianclay changed the title NIAD-3124: WIP fix for compressed fragments being corrupted NIAD-3124: Fix inbound compressed fragments being corrupted Jul 19, 2024
@adrianclay adrianclay marked this pull request as ready for review July 19, 2024 14:56
@chrisbloe-nhse
Copy link
Copy Markdown

This change has been tested by the PRM team using a file that had previously been known to cause problems (fragments received were being stored in excess of 12MB instead of the 5MB Spine limit.)

We also regression tested an in/out EHR transfer containing an attached image, which transferred successfully.

We are happy that this change has resolved the issue we raised - thanks for the efforts of the NIA team for working on this :).

Previously we were using the email.Message.get_content() method.
This would cause the raw_data_manager to forcefully convert any
payload with a Content-Type of text/* to a string.

This behaviour is unwelcome with GP2GP Large Message spec
where an attachment can claim to be a "text/plain" for example, but
in reality contain the GZIP compressed base64 representation of
"text/plain".

There was previously code to workaround the raw_data_manager,
however the workaround was buggy in the case that the attachment was
encoded with base64 but not decompressable (for example because it's
a fragment).

This new implementation avoids the ContentManager by calling the
email.Message.get_payload() instead.

Also avoid guessing whether a payload was base64 or not by inspecting
the Content-Transfer-Encoding header directly. This should make the
behaviour of the inbound adaptor more predictable, and also matches
how the `email` package decides whether a message is base64 or not.

https://docs.python.org/3/library/email.contentmanager.html#email.contentmanager.raw_data_manager
@adrianclay adrianclay enabled auto-merge (squash) August 8, 2024 06:59
Comment thread mhs/common/mhs_common/messages/ebxml_request_envelope.py Outdated
Comment on lines +284 to +287
logger.info(
'Successfully decoded message part with {ContentType} {ContentTransferEncoding} as string',
fparams=logger_dict
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What version of Python is being used? Couldn't we just use f-strings here or is this not recommended?

Suggested change
logger.info(
'Successfully decoded message part with {ContentType} {ContentTransferEncoding} as string',
fparams=logger_dict
)
logger.info(f'Successfully decoded message part with {ContentType} {ContentTransferEncoding} as string')

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, this code existed before this PR 🤷🏻

Co-authored-by: martin-nhs <127403254+martin-nhs@users.noreply.github.com>
@adrianclay adrianclay merged commit b54e4eb into main Aug 8, 2024
@adrianclay adrianclay deleted the NIAD-3124 branch August 8, 2024 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants