Skip to content

NIFI-9451 add input character set property for PutEmail and additiona…#6193

Closed
emiliosetiadarma wants to merge 2 commits intoapache:mainfrom
emiliosetiadarma:NIFI-9451
Closed

NIFI-9451 add input character set property for PutEmail and additiona…#6193
emiliosetiadarma wants to merge 2 commits intoapache:mainfrom
emiliosetiadarma:NIFI-9451

Conversation

@emiliosetiadarma
Copy link
Contributor

@emiliosetiadarma emiliosetiadarma commented Jul 11, 2022

…l tests

Also fixed an issue regarding sending non US-ASCII messages in PutEmail. The original version had non US-ASCII messages appear as "?" in the resulting email.

Summary

NIFI-9451

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

  • Pull Request based on current revision of the main branch
  • Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • Build completed using mvn clean install -P contrib-check
    • JDK 8
    • JDK 11
    • JDK 17

Licensing

  • New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • Documentation formatting appears as expected in rendered files

Copy link
Contributor

@greyp9 greyp9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice implementation, @emiliosetiadarma!

On the base64 encoding, I'm thinking about the ability for the mail receiver to scan the message without needing to decode it first. Not sure of a good way to do that while also supporting Unicode, but it is a change from the existing behavior.

+ "If not set, UTF-8 will be the default value.")
.required(true)
.addValidator(StandardValidators.CHARACTER_SET_VALIDATOR)
.defaultValue("UTF-8")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote is for using a constant here; one option is StandardCharsets.UTF_8.name().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure that makes sense, making the changes

message.setSentDate(new Date());
final Charset charset = getCharset(context);
final String charsetName = MimeUtility.mimeCharset(charset.name());
final DataHandler messageDataHandler = new DataHandler(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this would unconditionally encode the body text into base64?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it will encode the body text into base64. I figured out that the existing implementation of PutEmail used base64 encoding of body text when the ATTACH_FILE property is set to true.

Just extended it to the non ATTACH_FILE case and made sure to set the appropriate header. If I hadn't done that, the email sent would have a Content-Transfer-Encoding of 7-bit which can only represent US-ASCII characters

// Verify that the Message was populated correctly
assertEquals("Expected a single message to be sent", 1, processor.getMessages().size());
Message message = processor.getMessages().get(0);
assertNotEquals(rawString, message.getContent());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it reasonable to tighten up this check? What exactly are we expecting the differences to be?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking at the test, I decided to rework it a bit. My goal in this test was to show that the text "SoftwÄrë Ënginëër Ön NiFi" which has characters that are not supported with US-ASCII (proven by the first assertNotEquals(rawBytes, rawBytesUTF8)). This will result in a corrupted message (Softw?r? ?ngin??r ?n NiFi) when improperly setting the INPUT_CHARACTER_SET.

I reworked the test to send "SoftwÄrë Ënginëër Ön NiFi" while setting the INPUT_CHARACTER_SET to US-ASCII (and making sure the result is a corrupted message) - indicating that we must make sure that we specify the appropriate character set for the message

@emiliosetiadarma
Copy link
Contributor Author

Thanks @greyp9 for the comments! I've addressed some of them. As for checking to see the ability for the mail receiver to check the message without decoding it first, I will take a look and post another comment if I find something!

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates on this @emiliosetiadarma.

The latest version changes the message structure to use Base64 and multipart for the body, which works, but may introduce other unintended consequences. Although it may complicate the implementation, it seems like it would be better to use multipart structuring and Base64 encoding only when the Input Character Set property is something other than UTF-8. That would avoid changing the outgoing message structure unless the Input Character Set is configured with something other than the default setting.

Comment on lines +414 to +418
final MimeMultipart multipart = new MimeMultipart();
final MimeBodyPart mimeText = new PreencodedMimeBodyPart("base64");
mimeText.setDataHandler(messageDataHandler);
mimeText.setHeader("Content-Transfer-Encoding", "base64");
multipart.addBodyPart(mimeText);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach changes the message format to always use multipart formatting, which does not seem the like the ideal solution.

+ "If not set, UTF-8 will be the default value.")
.required(true)
.addValidator(StandardValidators.CHARACTER_SET_VALIDATOR)
.defaultValue(StandardCharsets.UTF_8.name())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be StandardCharsets.UTF_8.toString() instead of UTF_8.name().

@emiliosetiadarma
Copy link
Contributor Author

Closing while revising

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants