NIAD-3124: Fix inbound compressed fragments being corrupted by adrianclay · Pull Request #163 · NHSDigital/integration-adaptor-mhs

adrianclay · 2024-07-16T14:13:08Z

What + Why

Previously we were using the email.Message.get_content() method. This would cause the raw_data_manager to forcefully convert any payload with a Content-Type of text/* to a string.

This behaviour is unwelcome with GP2GP Large Message spec where an attachment can claim to be a "text/plain" for example, but in reality contain the GZIP compressed base64 representation of "text/plain".

There was previously code to workaround the raw_data_manager, however the workaround was buggy in the case that the attachment was encoded with base64 but not decompressable (for example because it's a fragment).

This new implementation avoids the ContentManager by calling the email.Message.get_payload() instead.

Also avoid guessing whether a payload was base64 or not by inspecting the Content-Transfer-Encoding header directly. This should make the behaviour of the inbound adaptor more predictable, and also matches how the email package decides whether a message is base64 or not.

Type of change

Bug fix (non-breaking change which fixes an issue)

Checklist:

I have performed a self-review of my code
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have updated the Changelog with details of my change in the UNRELEASED section if this change will affect end users

chrisbloe-nhse · 2024-07-23T15:32:47Z

This change has been tested by the PRM team using a file that had previously been known to cause problems (fragments received were being stored in excess of 12MB instead of the 5MB Spine limit.)

We also regression tested an in/out EHR transfer containing an attached image, which transferred successfully.

We are happy that this change has resolved the issue we raised - thanks for the efforts of the NIA team for working on this :).

Previously we were using the email.Message.get_content() method. This would cause the raw_data_manager to forcefully convert any payload with a Content-Type of text/* to a string. This behaviour is unwelcome with GP2GP Large Message spec where an attachment can claim to be a "text/plain" for example, but in reality contain the GZIP compressed base64 representation of "text/plain". There was previously code to workaround the raw_data_manager, however the workaround was buggy in the case that the attachment was encoded with base64 but not decompressable (for example because it's a fragment). This new implementation avoids the ContentManager by calling the email.Message.get_payload() instead. Also avoid guessing whether a payload was base64 or not by inspecting the Content-Transfer-Encoding header directly. This should make the behaviour of the inbound adaptor more predictable, and also matches how the `email` package decides whether a message is base64 or not. https://docs.python.org/3/library/email.contentmanager.html#email.contentmanager.raw_data_manager

martin-nhs · 2024-08-08T08:36:23Z

+        logger.info(
+            'Successfully decoded message part with {ContentType} {ContentTransferEncoding} as string',
+            fparams=logger_dict
+        )


What version of Python is being used? Couldn't we just use f-strings here or is this not recommended?

Suggested change

logger.info(

'Successfully decoded message part with {ContentType} {ContentTransferEncoding} as string',

fparams=logger_dict

)

logger.info(f'Successfully decoded message part with {ContentType} {ContentTransferEncoding} as string')

I'm not sure, this code existed before this PR 🤷🏻

Co-authored-by: martin-nhs <127403254+martin-nhs@users.noreply.github.com>

adrianclay force-pushed the NIAD-3124 branch 4 times, most recently from b1d150a to 65d07e6 Compare July 19, 2024 14:44

adrianclay changed the title ~~NIAD-3124: WIP fix for compressed fragments being corrupted~~ NIAD-3124: Fix inbound compressed fragments being corrupted Jul 19, 2024

adrianclay marked this pull request as ready for review July 19, 2024 14:56

adrianclay added 3 commits August 8, 2024 07:59

Regression test for compressed fragment being corrupted by inbound

f0bd2e6

Add to CHANGELOG

34faa19

adrianclay force-pushed the NIAD-3124 branch from 1b81a77 to 34faa19 Compare August 8, 2024 06:59

adrianclay enabled auto-merge (squash) August 8, 2024 06:59

martin-nhs reviewed Aug 8, 2024

View reviewed changes

Refactor, remove typehint

28f8eaf

Co-authored-by: martin-nhs <127403254+martin-nhs@users.noreply.github.com>

martin-nhs approved these changes Aug 8, 2024

View reviewed changes

adrianclay merged commit b54e4eb into main Aug 8, 2024

adrianclay deleted the NIAD-3124 branch August 8, 2024 16:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NIAD-3124: Fix inbound compressed fragments being corrupted#163

NIAD-3124: Fix inbound compressed fragments being corrupted#163
adrianclay merged 4 commits intomainfrom
NIAD-3124

adrianclay commented Jul 16, 2024 •

edited

Loading

Uh oh!

chrisbloe-nhse commented Jul 23, 2024

Uh oh!

Uh oh!

martin-nhs Aug 8, 2024

Uh oh!

adrianclay Aug 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

adrianclay commented Jul 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What + Why

Type of change

Checklist:

Uh oh!

chrisbloe-nhse commented Jul 23, 2024

Uh oh!

Uh oh!

martin-nhs Aug 8, 2024

Choose a reason for hiding this comment

Uh oh!

adrianclay Aug 8, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

adrianclay commented Jul 16, 2024 •

edited

Loading