NIFI-9451 add input character set property for PutEmail and additiona…#6193
NIFI-9451 add input character set property for PutEmail and additiona…#6193emiliosetiadarma wants to merge 2 commits intoapache:mainfrom emiliosetiadarma:NIFI-9451
Conversation
greyp9
left a comment
There was a problem hiding this comment.
Nice implementation, @emiliosetiadarma!
On the base64 encoding, I'm thinking about the ability for the mail receiver to scan the message without needing to decode it first. Not sure of a good way to do that while also supporting Unicode, but it is a change from the existing behavior.
| + "If not set, UTF-8 will be the default value.") | ||
| .required(true) | ||
| .addValidator(StandardValidators.CHARACTER_SET_VALIDATOR) | ||
| .defaultValue("UTF-8") |
There was a problem hiding this comment.
My vote is for using a constant here; one option is StandardCharsets.UTF_8.name().
There was a problem hiding this comment.
Sure that makes sense, making the changes
| message.setSentDate(new Date()); | ||
| final Charset charset = getCharset(context); | ||
| final String charsetName = MimeUtility.mimeCharset(charset.name()); | ||
| final DataHandler messageDataHandler = new DataHandler( |
There was a problem hiding this comment.
If I understand correctly, this would unconditionally encode the body text into base64?
There was a problem hiding this comment.
Yeah it will encode the body text into base64. I figured out that the existing implementation of PutEmail used base64 encoding of body text when the ATTACH_FILE property is set to true.
Just extended it to the non ATTACH_FILE case and made sure to set the appropriate header. If I hadn't done that, the email sent would have a Content-Transfer-Encoding of 7-bit which can only represent US-ASCII characters
| // Verify that the Message was populated correctly | ||
| assertEquals("Expected a single message to be sent", 1, processor.getMessages().size()); | ||
| Message message = processor.getMessages().get(0); | ||
| assertNotEquals(rawString, message.getContent()); |
There was a problem hiding this comment.
Is it reasonable to tighten up this check? What exactly are we expecting the differences to be?
There was a problem hiding this comment.
After looking at the test, I decided to rework it a bit. My goal in this test was to show that the text "SoftwÄrë Ënginëër Ön NiFi" which has characters that are not supported with US-ASCII (proven by the first assertNotEquals(rawBytes, rawBytesUTF8)). This will result in a corrupted message (Softw?r? ?ngin??r ?n NiFi) when improperly setting the INPUT_CHARACTER_SET.
I reworked the test to send "SoftwÄrë Ënginëër Ön NiFi" while setting the INPUT_CHARACTER_SET to US-ASCII (and making sure the result is a corrupted message) - indicating that we must make sure that we specify the appropriate character set for the message
|
Thanks @greyp9 for the comments! I've addressed some of them. As for checking to see the ability for the mail receiver to check the message without decoding it first, I will take a look and post another comment if I find something! |
exceptionfactory
left a comment
There was a problem hiding this comment.
Thanks for the updates on this @emiliosetiadarma.
The latest version changes the message structure to use Base64 and multipart for the body, which works, but may introduce other unintended consequences. Although it may complicate the implementation, it seems like it would be better to use multipart structuring and Base64 encoding only when the Input Character Set property is something other than UTF-8. That would avoid changing the outgoing message structure unless the Input Character Set is configured with something other than the default setting.
| final MimeMultipart multipart = new MimeMultipart(); | ||
| final MimeBodyPart mimeText = new PreencodedMimeBodyPart("base64"); | ||
| mimeText.setDataHandler(messageDataHandler); | ||
| mimeText.setHeader("Content-Transfer-Encoding", "base64"); | ||
| multipart.addBodyPart(mimeText); |
There was a problem hiding this comment.
This approach changes the message format to always use multipart formatting, which does not seem the like the ideal solution.
| + "If not set, UTF-8 will be the default value.") | ||
| .required(true) | ||
| .addValidator(StandardValidators.CHARACTER_SET_VALIDATOR) | ||
| .defaultValue(StandardCharsets.UTF_8.name()) |
There was a problem hiding this comment.
This should probably be StandardCharsets.UTF_8.toString() instead of UTF_8.name().
|
Closing while revising |
…l tests
Also fixed an issue regarding sending non US-ASCII messages in PutEmail. The original version had non US-ASCII messages appear as "?" in the resulting email.
Summary
NIFI-9451
Tracking
Please complete the following tracking steps prior to pull request creation.
Issue Tracking
Pull Request Tracking
NIFI-00000NIFI-00000Pull Request Formatting
mainbranchVerification
Please indicate the verification steps performed prior to pull request creation.
Build
mvn clean install -P contrib-checkLicensing
LICENSEandNOTICEfilesDocumentation