NIFI-9451 add input character set property for PutEmail and additiona… by emiliosetiadarma · Pull Request #6193 · apache/nifi

emiliosetiadarma · 2022-07-11T17:45:38Z

…l tests

Also fixed an issue regarding sending non US-ASCII messages in PutEmail. The original version had non US-ASCII messages appear as "?" in the resulting email.

Summary

NIFI-9451

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Apache NiFi Jira issue created

Pull Request Tracking

Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

Pull Request based on current revision of the main branch
Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

Build completed using mvn clean install -P contrib-check
- JDK 8
- JDK 11
- JDK 17

Licensing

New dependencies are compatible with the Apache License 2.0 according to the License Policy
New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

Documentation formatting appears as expected in rendered files

greyp9

Nice implementation, @emiliosetiadarma!

On the base64 encoding, I'm thinking about the ability for the mail receiver to scan the message without needing to decode it first. Not sure of a good way to do that while also supporting Unicode, but it is a change from the existing behavior.

greyp9 · 2022-07-11T21:35:42Z

...dle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutEmail.java

+                    + "If not set, UTF-8 will be the default value.")
+            .required(true)
+            .addValidator(StandardValidators.CHARACTER_SET_VALIDATOR)
+            .defaultValue("UTF-8")


My vote is for using a constant here; one option is StandardCharsets.UTF_8.name().

Sure that makes sense, making the changes

greyp9 · 2022-07-11T21:37:06Z

...dle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutEmail.java

-            message.setSentDate(new Date());
+            final Charset charset = getCharset(context);
+            final String charsetName = MimeUtility.mimeCharset(charset.name());
+            final DataHandler messageDataHandler = new DataHandler(


If I understand correctly, this would unconditionally encode the body text into base64?

Yeah it will encode the body text into base64. I figured out that the existing implementation of PutEmail used base64 encoding of body text when the ATTACH_FILE property is set to true.

Just extended it to the non ATTACH_FILE case and made sure to set the appropriate header. If I hadn't done that, the email sent would have a Content-Transfer-Encoding of 7-bit which can only represent US-ASCII characters

greyp9 · 2022-07-11T21:43:58Z

...nifi-standard-processors/src/test/java/org/apache/nifi/processors/standard/TestPutEmail.java

+        // Verify that the Message was populated correctly
+        assertEquals("Expected a single message to be sent", 1, processor.getMessages().size());
+        Message message = processor.getMessages().get(0);
+        assertNotEquals(rawString, message.getContent());


Is it reasonable to tighten up this check? What exactly are we expecting the differences to be?

After looking at the test, I decided to rework it a bit. My goal in this test was to show that the text "SoftwÄrë Ënginëër Ön NiFi" which has characters that are not supported with US-ASCII (proven by the first assertNotEquals(rawBytes, rawBytesUTF8)). This will result in a corrupted message (Softw?r? ?ngin??r ?n NiFi) when improperly setting the INPUT_CHARACTER_SET.

I reworked the test to send "SoftwÄrë Ënginëër Ön NiFi" while setting the INPUT_CHARACTER_SET to US-ASCII (and making sure the result is a corrupted message) - indicating that we must make sure that we specify the appropriate character set for the message

emiliosetiadarma · 2022-07-12T19:11:49Z

Thanks @greyp9 for the comments! I've addressed some of them. As for checking to see the ability for the mail receiver to check the message without decoding it first, I will take a look and post another comment if I find something!

exceptionfactory

Thanks for the updates on this @emiliosetiadarma.

The latest version changes the message structure to use Base64 and multipart for the body, which works, but may introduce other unintended consequences. Although it may complicate the implementation, it seems like it would be better to use multipart structuring and Base64 encoding only when the Input Character Set property is something other than UTF-8. That would avoid changing the outgoing message structure unless the Input Character Set is configured with something other than the default setting.

exceptionfactory · 2022-08-10T20:31:45Z

...dle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutEmail.java

+            final MimeMultipart multipart = new MimeMultipart();
+            final MimeBodyPart mimeText = new PreencodedMimeBodyPart("base64");
+            mimeText.setDataHandler(messageDataHandler);
+            mimeText.setHeader("Content-Transfer-Encoding", "base64");
+            multipart.addBodyPart(mimeText);


This approach changes the message format to always use multipart formatting, which does not seem the like the ideal solution.

exceptionfactory · 2022-08-10T20:33:06Z

...dle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutEmail.java

+                    + "If not set, UTF-8 will be the default value.")
+            .required(true)
+            .addValidator(StandardValidators.CHARACTER_SET_VALIDATOR)
+            .defaultValue(StandardCharsets.UTF_8.name())


This should probably be StandardCharsets.UTF_8.toString() instead of UTF_8.name().

…l tests

emiliosetiadarma · 2022-08-16T21:50:13Z

Closing while revising

greyp9 reviewed Jul 11, 2022

View reviewed changes

exceptionfactory mentioned this pull request Jul 14, 2022

NIFI-9451 add attribute for specifying charset of the incoming flowfi… #5622

Closed

13 tasks

exceptionfactory requested changes Aug 10, 2022

View reviewed changes

emiliosetiadarma added 2 commits August 11, 2022 18:47

NIFI-9451 add input character set property for PutEmail and additiona…

f37cdcc

…l tests

NIFI-9451 changes based on latest comments and reworking a test

518ddef

emiliosetiadarma closed this Aug 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NIFI-9451 add input character set property for PutEmail and additiona…#6193

NIFI-9451 add input character set property for PutEmail and additiona…#6193
emiliosetiadarma wants to merge 2 commits intoapache:mainfrom
emiliosetiadarma:NIFI-9451

emiliosetiadarma commented Jul 11, 2022 •

edited

Loading

Uh oh!

greyp9 left a comment

Uh oh!

greyp9 Jul 11, 2022

Uh oh!

emiliosetiadarma Jul 12, 2022

Uh oh!

greyp9 Jul 11, 2022

Uh oh!

emiliosetiadarma Jul 12, 2022

Uh oh!

greyp9 Jul 11, 2022

Uh oh!

emiliosetiadarma Jul 12, 2022

Uh oh!

emiliosetiadarma commented Jul 12, 2022

Uh oh!

exceptionfactory left a comment

Uh oh!

exceptionfactory Aug 10, 2022

Uh oh!

exceptionfactory Aug 10, 2022

Uh oh!

emiliosetiadarma commented Aug 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

emiliosetiadarma commented Jul 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tracking

Issue Tracking

Pull Request Tracking

Pull Request Formatting

Verification

Build

Licensing

Documentation

Uh oh!

greyp9 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

emiliosetiadarma commented Jul 12, 2022

Uh oh!

exceptionfactory left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

emiliosetiadarma commented Aug 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

emiliosetiadarma commented Jul 11, 2022 •

edited

Loading