Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NIFI-2964] Added ability for AttributeToJSON to handle nested JSON when either outputting to a flow file or an attribute. #7231

Closed
wants to merge 5 commits into from

Conversation

dan-s1
Copy link
Contributor

@dan-s1 dan-s1 commented May 5, 2023

Summary

NIFI-2964

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

  • Pull Request based on current revision of the main branch
  • Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • Build completed using mvn clean install -P contrib-check
    • JDK 11
    • JDK 17

Licensing

  • New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • Documentation formatting appears as expected in rendered files

@dan-s1 dan-s1 changed the title [NIFI-2964] Added ability to handle nested JSON. [NIFI-2964] Added ability for AttributeToJSON to handle nested JSON when outputting to a flow file. May 5, 2023
@dan-s1
Copy link
Contributor Author

dan-s1 commented May 8, 2023

@exceptionfactory I looked over my code since last Friday's initial commit and I realized there were other things that needed to be added so I have added 2 other commits. I believe for my PR for NIFI-11167 I squashed too much when I had to make changes. For this PR would you like me to squash these or should I leave this as is?

@exceptionfactory
Copy link
Contributor

Thanks @dan-s1, it's fine for now, if you end up making additional changes before further review, that squashing commits sounds good, otherwise it looks good for now.

@dan-s1 dan-s1 changed the title [NIFI-2964] Added ability for AttributeToJSON to handle nested JSON when outputting to a flow file. [NIFI-2964] Added ability for AttributeToJSON to handle nested JSON when either outputting to a flow file or an attribute. May 8, 2023
@dan-s1
Copy link
Contributor Author

dan-s1 commented May 12, 2023

@exceptionfactory It was brought to my attention with my previous changes I was not keeping the current functionality of escaping JSON. I therefore pushed another commit which adds a property to allow for filtering which attributes should be retained as JSON or when left blank all attributes are escaped as the current functionality is. I also squashed the other commits.

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for noting the compatibility issue @dan-s1. Unfortunately, I think the fact of compatibility issues highlights some of the potential complexity that this introduces.

If we go forward with this change, it seems better to have a simple property like JSON Handling Strategy with values of Escaped String or Nested Object. That way, anything detected as JSON would be treated the same way, without the potential complexity of pattern-matching on FlowFile attribute names. What do you think of that approach?

@dan-s1
Copy link
Contributor Author

dan-s1 commented May 17, 2023

@exceptionfactory Sorry I have to retract my initial response. My team had a specific use case where they wanted to pick and choose what they wanted as JSON and what they wanted as escaped. The regex would allow for that. Although it does add complexity it does allow for keeping the current functionality intact. The property you suggested would either always treat attributes as JSON or always as escaped.

@exceptionfactory
Copy link
Contributor

@exceptionfactory Sorry I have to retract my initial response. My team had a specific use case where they wanted to pick and choose what they wanted as JSON and what they wanted as escaped. The regex would allow for that. Although it does add complexity it does allow for keeping the current functionality intact. The property you suggested would either always treat attributes as JSON or always as escaped.

Thanks for the reply @dan-s1. Based on that description, are you saying that there is a reason to keep some JSON values as escaped strings? That seems counter-intuitive in general, although every flow is different. The JSON output of AttributesToJSON can be modified using other Processors like JoltTransformJSON. Does translating all JSON attribute values into objects create processing problems in the use case mentioned?

@dan-s1
Copy link
Contributor Author

dan-s1 commented May 17, 2023

@exceptionfactory I agree with you that is counter intuitive so much so my first iteration of this PR actually always converted to JSON. I was told though that converting all attributes to JSON would cause a problem in a particular flow. So per your suggestion could we have 3 allowable values, Escaped String, Nested Object and Pick&Choose? If Pick&Choose is chosen then there is another property which depends on it which would be the regex for picking which attributes should be converted to JSON.

@exceptionfactory
Copy link
Contributor

exceptionfactory commented May 17, 2023

Thanks @dan-s1, having three allowable values sounds like it could be the best solution in terms of both flexibility and usability. For the third option, it could be something like Nested Object for Matched Attributes, and then the dependent property could be JSON Attribute Names Pattern.

I am still wondering about the details of the flow in terms of why selective conversion might be necessary. It adds code complexity, which impacts future maintenance. Can you provide any additional details on the use case, and why it would not work in conjunction with other Processors?

@dan-s1
Copy link
Contributor Author

dan-s1 commented May 18, 2023

@exceptionfactory After much deliberation we realized you were right that NIFI has other processors to handle the JSON transformations needed. We would still like though the ability to handle attributes as nested objects as you suggested:

If we go forward with this change, it seems better to have a simple property like JSON Handling Strategy with values of Escaped String or Nested Object. That way, anything detected as JSON would be treated the same way, without the potential complexity of pattern-matching on FlowFile attribute names.

This would clearly highlight the fact that the processor can handle nested objects. Is that still okay?

@exceptionfactory
Copy link
Contributor

@exceptionfactory After much deliberation we realized you were right that NIFI has other processors to handle the JSON transformations needed. We would still like though the ability to handle attributes as nested objects as you suggested:

If we go forward with this change, it seems better to have a simple property like JSON Handling Strategy with values of Escaped String or Nested Object. That way, anything detected as JSON would be treated the same way, without the potential complexity of pattern-matching on FlowFile attribute names.

This would clearly highlight the fact that the processor can handle nested objects. Is that still okay?

Thanks for evaluating the options. Yes, I think the Handling Strategy approach with the two options provides clarity on both the implementation side and the usability side. If you can make those adjustments to the PR, that would be great.

@dan-s1
Copy link
Contributor Author

dan-s1 commented May 22, 2023

@exceptionfactory Added the handling strategy as we discussed.

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing the new property changes @dan-s1. This looks close to completion, just noted a few additional recommendations.

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the latest updates @dan-s1, the functionality looks good.

In the course of final review, one final question: what do you think about shortening the strategy values to just Escaped and Nested? This came to mind since the nested structure could be an array. Other than that, this looks ready to go.

@dan-s1
Copy link
Contributor Author

dan-s1 commented May 25, 2023

No problem. Should the display name for Nested also be changed to Nested?
In addition should the description for Nested be changed from

Handles JSON attribute values as nested structured objects

to
"Handles JSON attribute values as nested structured objects or arrays."

@exceptionfactory
Copy link
Contributor

No problem. Should the display name for Nested also be changed to Nested? In addition should the description for Nested be changed from

Handles JSON attribute values as nested structured objects

to "Handles JSON attribute values as nested structured objects or arrays."

Thanks, yes, shortening both the enum and the display name to would be good. Updating the description would also be helpful.

The net result should be NESTED("Nested", ...) and ESCAPED("Escaped", ...) where ... is the description.

…e display names and one of the descriptions.
@dan-s1
Copy link
Contributor Author

dan-s1 commented May 25, 2023

@exceptionfactory Done. Let me know if there is anything else.

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for working through the feedback on this new feature @dan-s1, the latest version looks good! Planning to merge soon.

exceptionfactory pushed a commit that referenced this pull request May 25, 2023
This closes #7231

Signed-off-by: David Handermann <exceptionfactory@apache.org>
(cherry picked from commit d6600e6)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants