Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIFI-12670 JoltTransform processors incorrectly encode/decode text on Windows #8842

Closed
wants to merge 3 commits into from

Conversation

jrsteinebrey
Copy link
Contributor

NIFI-12670 JoltTransform processors incorrectly encode/decode text on the Jolt Specification on Windows

Summary

NIFI-12670

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-12670
  • Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-12670

Pull Request Formatting

  • Pull Request based on current revision of the main branch
  • Pull Request refers to a feature branch with one commit containing changes

Verification

Add ran the processors in the UI and confirmed they worked correctly with linux and Windows encoded files.
I also added unit tests.

Build

  • Build completed using mvn clean install -P contrib-check
    • JDK 21

UI Contributions

No UI changes

Licensing

New new dependencies

Documentation

No documentation changes.

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for proposing this correction and improvement @jrsteinebrey.

The changes in the AbstractJoltTransform class make sense, in terms of using UTF-8, however, we should avoid making changes to nifi-api interfaces. In particular, the createReader() method without a character set specified could easily lead to incorrect usage, falling back to platform-default character encoding.

For that reason, I recommend constructing the Reader with UTF-8 in the AbstractJoltTransform class as a way forward.

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adjusting the approach @jrsteinebrey, the latest version looks good. +1 merging

shubhluck pushed a commit to shubhluck/nifi that referenced this pull request Jun 1, 2024
- Specified UTF-8 encoding for reading Jolt Transform to avoid decoding issues on Windows or platforms with different default character sets

This closes apache#8842

Signed-off-by: David Handermann <exceptionfactory@apache.org>
shubhluck pushed a commit to shubhluck/nifi that referenced this pull request Jun 1, 2024
- Specified UTF-8 encoding for reading Jolt Transform to avoid decoding issues on Windows or platforms with different default character sets

This closes apache#8842

Signed-off-by: David Handermann <exceptionfactory@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants