Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIFI-9901 Add nifi-xml-processing to nifi-commons #5962

Closed
wants to merge 1 commit into from

Conversation

exceptionfactory
Copy link
Contributor

Description of PR

NIFI-9901 Adds the nifi-xml-processing module to nifi-commons and refactors XML handling to use common components.

Components in nifi-xml-processing replace the usage of XmlUtils in nifi-security-utils with interfaces and implementation classes specific to each type of XML processing. The module includes the following interfaces derived from standard Java XML components:

  • DocumentProvider for DOM Documents
  • InputSourceParser for SAX Parsing
  • XMLEventReaderProvider and XMLStreamReaderProvider for StAX Parsing
  • SchemaValidator for XML Schema Validation

The nifi-xml-processing Maven configuration includes spotbugs-maven-plugin with findsecbugs-plugin to analyze components for XML processing vulnerabilities during the build.

General changes include refactoring references to javax.xml.parsers.DocumentBuilderFactory and javax.xml.stream.XMLInputFactory throughout the system to use nifi-xml-processing components.

Specific changes include updates to the EvaluateXPath and EvaluateXQuery Processors to disable Document Type Declaration Validation in the default configuration. Other adjustments include relocating Apache Commons Configuration 2 classes to nifi-lookup-services, which is the only reference to those components. This relocation allows nifi-xml-processing to avoid any external dependencies.

As a result of this refactoring, nifi-security-utils no longer contains any XML processing components, which allows some referencing modules to avoid unnecessary transitive dependencies.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there a JIRA ticket associated with this PR? Is it referenced
    in the commit message?

  • Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

  • Has your PR been rebased against the latest commit within the target branch (typically main)?

  • Is your initial contribution a single, squashed commit? Additional commits in response to PR reviewer feedback should be made on this branch and pushed to allow change tracking. Do not squash or use --force when pushing to allow for clean monitoring of changes.

For code changes:

  • Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
  • Have you written or updated unit tests to verify your changes?
  • Have you verified that the full build is successful on JDK 8?
  • Have you verified that the full build is successful on JDK 11?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
  • If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
  • If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check GitHub Actions CI for build issues and submit an update to your PR as soon as possible.

- Refactored XML parsing to use providers from nifi-xml-processing
- Configured spotbugs-maven-plugin with findsecbugs-plugin in nifi-xml-processing
- Disabled Validate DTD in default configuration for EvaluateXPath and EvaluateXQuery
- Replaced configuration of DocumentBuilder and streaming XML Readers with shared components
- Removed XML utilities from nifi-security-utils
- Moved Commons Configuration classes to nifi-lookup-services
Copy link
Contributor

@greyp9 greyp9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactor! Had a few areas where I didn't quite understand the change, but don't see any hold ups. On to testing...

final DocumentBuilderFactory documentBuilderFactory = getDocumentBuilderFactory();

try {
documentBuilderFactory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, ProcessingFeature.SECURE_PROCESSING.isEnabled());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit of defining the value here to be a lookup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of the ProcessingFeature.SECURE_PROCESSING reference was to define the property value in a central location that could be reused in multiple classes.

try {
parseInputSource(inputSource, contentHandler);
} catch (final ParserConfigurationException|SAXException e) {
throw new ProcessingException("Parser Configuration failed", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would encompass both the parser configuration and the parse operation.

Parser Configuration / Parse Operation failed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, will adjust to a more generalized Parsing failed message for the exception.

@@ -427,8 +429,7 @@ private PoliciesUsersAndGroups parsePoliciesUsersAndGroups(final String fingerpr

final byte[] fingerprintBytes = fingerprint.getBytes(StandardCharsets.UTF_8);
try (final ByteArrayInputStream in = new ByteArrayInputStream(fingerprintBytes)) {
final DocumentBuilder docBuilder = createSafeDocumentBuilder();
final Document document = docBuilder.parse(in);
final Document document = parseFingerprint(in);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't this usage use the new StandardDocumentProvider?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! This is only remaining location with a local reference to the DocumentBuilderFactory. PR #5514 for NIFI-9069 included the direct implementation, as opposed to XmlUtils due to issues with class loading between nifi-framework-api and dependent modules. It might be possible to revisit the problem, but this change maintains a limited scope for nifi-framework-api module dependencies.

@@ -156,7 +152,7 @@ public class EvaluateXQuery extends AbstractProcessor {
.description("Specifies whether or not the XML content should be validated against the DTD.")
.required(true)
.allowableValues("true", "false")
.defaultValue("true")
.defaultValue("false")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disabling Document Type Validation in the default configuration provides a more secure starting point for new instances of the Processor. The implementation in StandardDocumentProvider provides standard security restrictions on Document Type Validation, so enabling the Validate DTD property is not the optimal configuration. Changing the default value to false retains the property for deployments where embedded DTD validation is desired.

* @param source Source to be validated
*/
@Override
public void validate(final Schema schema, final Source source) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we only support the specification of a single schema when validating a document?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the Schema object provides the newValidator() method, so this interface encapsulates that operation and sets standard properties on the Validator.

@@ -162,10 +163,6 @@ public class EvaluateXPath extends AbstractProcessor {

private final AtomicReference<XPathFactory> factoryRef = new AtomicReference<>();

static {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!

@@ -137,7 +138,7 @@ public class EvaluateXPath extends AbstractProcessor {
.description("Specifies whether or not the XML content should be validated against the DTD.")
.required(true)
.allowableValues("true", "false")
.defaultValue("true")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a little context for this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned for EvaluateXQuery, disabling validation of the Document Type Declaration in the default configuration provides a more secure starting point for new instances of the Processor.

@exceptionfactory
Copy link
Contributor Author

Thanks for the initial feedback @greyp9! Will plan on updating the one exception message noted, as well as anything else, following additional feedback from testing.

Copy link
Contributor

@greyp9 greyp9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked out with JRE 8 and JRE 11. Tested processing of a couple of simple XML documents using ValidateXML and EvaluateXPath. LGTM

@greyp9 greyp9 closed this in 15f7590 Apr 13, 2022
asfgit pushed a commit that referenced this pull request Apr 14, 2022
- Refactored XML parsing to use providers from nifi-xml-processing
- Configured spotbugs-maven-plugin with findsecbugs-plugin in nifi-xml-processing
- Disabled Validate DTD in default configuration for EvaluateXPath and EvaluateXQuery
- Replaced configuration of DocumentBuilder and streaming XML Readers with shared components
- Removed XML utilities from nifi-security-utils
- Moved Commons Configuration classes to nifi-lookup-services

This closes #5962
Signed-off-by: Paul Grey <greyp@apache.org>
genehynson pushed a commit to influxdata/nifi that referenced this pull request May 17, 2022
- Refactored XML parsing to use providers from nifi-xml-processing
- Configured spotbugs-maven-plugin with findsecbugs-plugin in nifi-xml-processing
- Disabled Validate DTD in default configuration for EvaluateXPath and EvaluateXQuery
- Replaced configuration of DocumentBuilder and streaming XML Readers with shared components
- Removed XML utilities from nifi-security-utils
- Moved Commons Configuration classes to nifi-lookup-services

This closes apache#5962
Signed-off-by: Paul Grey <greyp@apache.org>
krisztina-zsihovszki pushed a commit to krisztina-zsihovszki/nifi that referenced this pull request Jun 28, 2022
- Refactored XML parsing to use providers from nifi-xml-processing
- Configured spotbugs-maven-plugin with findsecbugs-plugin in nifi-xml-processing
- Disabled Validate DTD in default configuration for EvaluateXPath and EvaluateXQuery
- Replaced configuration of DocumentBuilder and streaming XML Readers with shared components
- Removed XML utilities from nifi-security-utils
- Moved Commons Configuration classes to nifi-lookup-services

This closes apache#5962
Signed-off-by: Paul Grey <greyp@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants