NIFI-12670 JoltTransform processors incorrectly encode/decode text on Windows #8842

jrsteinebrey · 2024-05-16T17:41:17Z

NIFI-12670 JoltTransform processors incorrectly encode/decode text on the Jolt Specification on Windows

Summary

NIFI-12670

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Apache NiFi Jira issue created

Pull Request Tracking

Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-12670
Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-12670

Pull Request Formatting

Pull Request based on current revision of the main branch
Pull Request refers to a feature branch with one commit containing changes

Verification

Add ran the processors in the UI and confirmed they worked correctly with linux and Windows encoded files.
I also added unit tests.

Build

Build completed using mvn clean install -P contrib-check
- JDK 21

UI Contributions

No UI changes

Licensing

New new dependencies

Documentation

No documentation changes.

… the Jolt Specification on Windows

exceptionfactory

Thanks for proposing this correction and improvement @jrsteinebrey.

The changes in the AbstractJoltTransform class make sense, in terms of using UTF-8, however, we should avoid making changes to nifi-api interfaces. In particular, the createReader() method without a character set specified could easily lead to incorrect usage, falling back to platform-default character encoding.

For that reason, I recommend constructing the Reader with UTF-8 in the AbstractJoltTransform class as a way forward.

nifi-api/src/main/java/org/apache/nifi/components/resource/ResourceReference.java

nifi-api/src/main/java/org/apache/nifi/components/resource/Utf8TextResource.java

exceptionfactory

Thanks for adjusting the approach @jrsteinebrey, the latest version looks good. +1 merging

- Specified UTF-8 encoding for reading Jolt Transform to avoid decoding issues on Windows or platforms with different default character sets This closes apache#8842 Signed-off-by: David Handermann <exceptionfactory@apache.org>

NIFI-12670 JoltTransform processors incorrectly encode/decode text in…

93c0977

… the Jolt Specification on Windows

exceptionfactory requested changes May 16, 2024

View reviewed changes

nifi-api/src/main/java/org/apache/nifi/components/resource/ResourceReference.java Outdated Show resolved Hide resolved

nifi-api/src/main/java/org/apache/nifi/components/resource/Utf8TextResource.java Outdated Show resolved Hide resolved

jrsteinebrey added 2 commits May 16, 2024 14:08

NIFI-12670 fixed a typo

03bfd88

NIFI-12670 set UTF-8 character encoding in AbstractTransform.java

d2b0cb8

jrsteinebrey requested a review from exceptionfactory May 17, 2024 12:47

exceptionfactory approved these changes May 17, 2024

View reviewed changes

exceptionfactory closed this in b27fc46 May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NIFI-12670 JoltTransform processors incorrectly encode/decode text on Windows #8842

NIFI-12670 JoltTransform processors incorrectly encode/decode text on Windows #8842

jrsteinebrey commented May 16, 2024

exceptionfactory left a comment

exceptionfactory left a comment

NIFI-12670 JoltTransform processors incorrectly encode/decode text on Windows #8842

NIFI-12670 JoltTransform processors incorrectly encode/decode text on Windows #8842

Conversation

jrsteinebrey commented May 16, 2024

Summary

Tracking

Issue Tracking

Pull Request Tracking

Pull Request Formatting

Verification

Build

UI Contributions

Licensing

Documentation

exceptionfactory left a comment

Choose a reason for hiding this comment

exceptionfactory left a comment

Choose a reason for hiding this comment