Skip to content

NIFI-2142 Cache compiled XSLT in TransformXml#609

Closed
jfrazee wants to merge 4 commits intoapache:masterfrom
jfrazee:NIFI-2142
Closed

NIFI-2142 Cache compiled XSLT in TransformXml#609
jfrazee wants to merge 4 commits intoapache:masterfrom
jfrazee:NIFI-2142

Conversation

@jfrazee
Copy link
Member

@jfrazee jfrazee commented Jul 6, 2016

Two other tidbits:

  1. This also modifies the XSLT_FILENAME attribute to support expression language.
  2. This updated the test files to have 3 space indents because Saxon defaults to that and AFAICT the free version doesn't let you change that -- it's either no indent or 3 space. Could be wrong though.

See #591 for PR against 0.x and past discussion.

@JPercivall
Copy link
Contributor

Hey @jfrazee, looks like you have contrib check issues that cause the build to fail:

[WARNING] src/main/java/org/apache/nifi/processors/standard/TransformXml.java24:8 UnusedImports: Unused import - java.util.Collection.
[WARNING] src/test/java/org/apache/nifi/processors/standard/TestTransformXml.java19 AvoidStarImport: Using the '.' form of import should be avoided - java.io..
[WARNING] src/test/java/org/apache/nifi/processors/standard/TestTransformXml.java36:8 UnusedImports: Unused import - org.junit.Ignore.

@jfrazee
Copy link
Member Author

jfrazee commented Jul 13, 2016

@JPercivall Oh, right. I think these are corrected now but just noticed the image bundle has a bunch of license violations (even after rebasing).

@JPercivall
Copy link
Contributor

@jfrazee the image bundle is no longer in master (replaced by the media bundle). Do a "git clean -i -d" to remove any old bundles.

.description("Maximum size of the stylesheet cache.")
.required(true)
.defaultValue("100")
.addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this property a max character length or data size limit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this is a integer validator but the property is interpreted as a long on line 171

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually my new guess is that this is the maximum number of stylesheets stored (the description could be added to, lol)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JPercivall Yes, that's correct. So "Maximum number of stylesheets to store in the cache"?

@JPercivall
Copy link
Contributor

@jfrazee I've made a couple comments. Looks good overall, mainly just documentation comments. Do you by chance have a template and/or some example xml/XSLT that I could use for testing to verify the changes?

@jfrazee
Copy link
Member Author

jfrazee commented Jul 28, 2016

  1. @JPercivall I addressed all your comments above and added a template to https://gist.github.com/jfrazee/a501336826aa77d0daa74e8e0aa3afd0.
  2. There is one change in behavior now -- a cache size of 0 disables the cache whereas it was just allowing for an arbitrarily large cache.
  3. There's a screenshot of the example data flow added to https://issues.apache.org/jira/browse/NIFI-2142 which illustrates the difference between running with and without caching. It's not 100% apples to apples with the older TransformXml, but behaves the same way. The queues back up and there's a stark contrast in Tasks/Time.
  4. I rebased on the latest master, but contrib-check is failing because of the commenting style on AbstractSNMPProcessor.

CacheBuilder cacheBuilder = CacheBuilder.newBuilder().maximumSize(cacheSize);

if (cacheSize <= 0) {
logger.warn("Stylesheet cache disabled because cache size is set to 0");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache will still be created and on line 213 it will try to use it to look up an xslt if the cache size is 0. I'd suggest not even creating it in that case.

@jfrazee
Copy link
Member Author

jfrazee commented Jul 28, 2016

@JPercivall Addressed your latest comments and added a test with the cache disabled since the logic varies depending on that value now.

@JPercivall
Copy link
Contributor

Sorry for taking so long @jfrazee, I got caught up with something I thought was an issue but turned out it was just me messing up.

+1

Visually verified code, and did a contrib check build. In standalone and a secure 3 node cluster tested the caching using the provided template. Even with just 5 stylesheets being used I saw speed improvements of ~10x! Thanks @jfrazee, I will merge it in.

@asfgit asfgit closed this in bb24312 Aug 10, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants