Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-15121: Move XSLT (tr param) to scripting contrib #2306

Merged
merged 33 commits into from Feb 15, 2021

Conversation

epugh
Copy link
Contributor

@epugh epugh commented Feb 4, 2021

Description

Reduce security exposure for solr by moving the xslt to scripting.

Solution

Migrating code.

Tests

running existing tests.

Checklist

Please review the following and check all that apply:

  • [X ] I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • [ X] I have created a Jira issue and added the issue ID to my pull request title.
  • [ X] I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • [X ] I have developed this patch against the master branch.
  • [ X] I have run ./gradlew check.
  • [ X] I have added tests for my changes.
  • [ X] I have added documentation for the Ref Guide (for Solr changes only).

@epugh
Copy link
Contributor Author

epugh commented Feb 4, 2021

I still have a broken test, and need to update docs.

@epugh epugh requested a review from dsmiley February 5, 2021 22:03
Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There used to be an XsltUpdateRequestHandler. Maybe we need to bring it back.
https://issues.apache.org/jira/browse/SOLR-2630
(not sure which JIRA it was removed in)
It would mean that use of "tr" would require a dedicated handler but I think that's fine not many people use this and they can register at /update/xslt.

/**
* Parameters used across the XSLT related classes.
*/
public interface XSLTParams {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see why you added this here but I don't think there's a role for XSLTParams. Just move this "tr" constant to the plugin that defines it for its internal use only. CommonParams was in SolrJ deliberately; This XSLTParams you added is not (and should not be).

* See the related unit test OutputWriterTest.
*
*/
public class XSLTOutputWriterTest extends SolrTestCaseJ4 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formatting in this file is a bit off (indentation varies etc.). Lucene recently adopted the Google Java Format. I use an IntelliJ plugin for this. Can you please reformat this file with that? Even the built-in formatter is probably fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this blog post: http://www.practicesofmastery.com/post/eclipse-google-java-style-guide/ (since I use Eclipse)... Do we have a JIRA about moving towards this? Should I look to figure out how to add the Google Java Format to the default Eclipse setup? Or at least a message when you run ./gradlew eclipse that recommends adding this formatter to your setup?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently there's an Eclipse plugin listed here: https://github.com/google/google-java-format
The Lucene side issue is here: https://issues.apache.org/jira/browse/LUCENE-9564
But you'll see in the comments were held off on Solr for a later date.

BTW if I'm mistaken and you moved an existing source file, then I recommend doing no formatting changes as it may fool git rename detection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your recent edits here look good BTW.

@@ -69,8 +77,8 @@ public void testUpdate() throws Exception
"</random>";

Map<String,String> args = new HashMap<>();
args.put(CommonParams.TR, "xsl-update-handler-test.xsl");

args.put(XSLTParams.TR, "xsl-update-handler-test.xsl");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In tests, you can just refer to the parameter directly: "tr". Yonik advised this practice once on the reasoning that the test catches API changes you might not have meant to make. Also, it's shorter :-)

// Try to load the scripting contrib modules XSLTLoader, and if not, fall back to the XMLLoader.
// Enable the XSLTLoader by including the Scripting contrib module.
try {
final Class<?> clazz = Class.forName( "org.apache.solr.scripting.util.xslt.XSLTLoader" );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the scripting contrib is added as a package, Solr core.jar will not be able to load it in this way, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the part where I didn't like what I did... And was putting something up as a starter.. I really appreciate your and @uschindler input on this. Definitily lets find another approach!

import static org.apache.solr.common.params.CommonParams.NAME;


public class XSLTLoader extends ContentStreamLoader {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you substantially copied XMLLoader? Sorry; let's find another approach.

@@ -14,7 +14,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.solr.scripting.response;
package org.apache.solr.scripting;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not the xslt package?

* See the related unit test OutputWriterTest.
*
*/
public class XSLTOutputWriterTest extends SolrTestCaseJ4 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your recent edits here look good BTW.

@dsmiley
Copy link
Contributor

dsmiley commented Feb 8, 2021

@epugh I see in your latest change that XMLLoader continues to know about XSLT stylesheets. Couldn't/shouldn't that go in the XsltRequestHandler that you added? I suppose you're getting somewhat conflicting feedback between me and @uschindler . My preference is that all XSLT stuff is in this new contrib. Uwe had suggested some trivial toggle/setting to the existing code. I think his latest feedback is basically the idea of adding an intercepting spot (in XMLLoader) so that it needn't have the XSLT code there but it could be added in by the contrib, say by subclassing XMLLoader. @uschindler does this make sense to you?

@epugh
Copy link
Contributor Author

epugh commented Feb 8, 2021

I think there is a bit of back and forth.. I added a boolean to XMLLoader constructor to decide if the tr parameter was acceptible. I was going to have it throw an exception from the UpdateRequestHandler if a tr was passed in via solr params, but allow it if it was from the XSLTUpdateRequestHandler...

I'm open to any approach. I wonder if the reason to keep the TransformerProvider.java in core is for people who write their own Solr plugins that use it, and not have to require the scripting module?

I'm going to push up a set of changes where all the paths work and tests pass momentarily.

@uschindler
Copy link
Contributor

uschindler commented Feb 8, 2021

Yes, I tend to think of an interception point.

By default in core the interception point just returns the original Source instance. In the subclass it applies the transformation and returns a new Source instance.

@uschindler
Copy link
Contributor

uschindler commented Feb 8, 2021

I have to again look at the code, to figure out the best interaction that is still performant and clean to implement. So we don't convert too often between different XML API representations.

I think the converter protected method should take a Source impl and return a new Source impl, in base class it just returns the parameter unmodified.

epugh@opensourceconnections.com added 2 commits February 8, 2021 13:14
… XSLT in class names.

I moved everything to XSLT* to match the XML* pattern.
@epugh
Copy link
Contributor Author

epugh commented Feb 8, 2021

@uschindler if you want to push up an example/make the change the way you are thinking, I'm 100% happy to have that! Your Java skillz are way beyond mine, and while I sort of understnad what you say, it may take you 20 minutes to get the change you want, and then I'll learn from you ;-)

@epugh
Copy link
Contributor Author

epugh commented Feb 8, 2021

I am testing out the idea of exporting in Solr XML, and then reimporting using a XSLT, which is something metnioned int he docs.. to see how well this works.

epugh@opensourceconnections.com added 4 commits February 8, 2021 15:40
… xslt working on either the export or the import end of things.
…the scripting jar included.

a bit icky that solrj uses the sample_techproducts_configs for unit testing...
@epugh
Copy link
Contributor Author

epugh commented Feb 9, 2021

Okay, I've done a bunch of tweaking (banging my head?) against the ref guide docs, and they are working, and I think all the tests are passing.

I don't like that the SolrJ tests depend on the sample_techproducts_configs directory, but I think adding some startup=lazy settings for the XSLT classes means that the links in the ref guide will work, and the solrj tests don't blow up on the missing xslt classes.

@@ -150,7 +150,7 @@ rather than the "mycores" directory.
Can I run ZooKeeper and Solr clusters under Docker?
---------------------------------------------------

At the network level the ZooKeeper nodes need to be able to talk to eachother,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to just commit this typo stuff by itself directly (no PR needed)

* This tests the XSLTUpdateRequestHandler ability to work with XSLT stylesheet and xml content.
* </p>
*/
public class XSLTUpdateRequestHandlerTest extends SolrTestCaseJ4 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I the most common naming convention is for acronyms to use camel case instead of all-caps. This has been a long trend over many years. I know why you changed it here -- to match the class it's testing, and so it's a net improvement over what existed. But arguably it's the class it's testing that should be renamed; not this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am down for making this change... XSLT -> Xslt for the files touched in this PR, and then maybe a seperate JIRA for XML --> Xml in the other classes?

@epugh
Copy link
Contributor Author

epugh commented Feb 13, 2021

Thanks for getting your hands dirty @dsmiley and making these changes, they LGTM.

@uschindler
Copy link
Contributor

@uschindler if you want to push up an example/make the change the way you are thinking, I'm 100% happy to have that! Your Java skillz are way beyond mine, and while I sort of understnad what you say, it may take you 20 minutes to get the change you want, and then I'll learn from you ;-)

Sorry, my father died on Thursday. So excuse my ignorance! To me @dsmiley changes look fine. I will review later as some "long term specialist" 😉 on XML processing.

@dsmiley
Copy link
Contributor

dsmiley commented Feb 13, 2021

Sorry, my father died on Thursday.

Wow; so sorry to hear! I fear the same for my own Dad; I can't take his existence for granted so I'll be visiting him tomorrow.

I will review later as some "long term specialist" 😉 on XML processing.

You're not the only one; I'm a long-time XML fan :-) It seems the new kids know little about such things :-/ "What is this weird syntax... " as he/she goes off to then write a dozen lines that an XPath does in one.

I kept the all-caps XSLT despite my preferences because All of the XML related APIs we use are using all-caps for XML and thus it looked off to see it differently. But then... the user doesn't know that and sees XML in the config file so... Hmm.

Maybe the XSLT response writer should demand "tr" and thus be more consistent with the XSLT response writer? After all, you're choosing to send the response there so why would you send it there if it wasn't? I guess it gives you choice and the ability to be more similar to what we have today. Shrug.

We need some docs to show how to reference the XSLT update handler, or at the very least an upgrade note about this.

I hate the crappy caching; it's inferior to a SolrCache but alas... I hold myself back from feature-creep. This code was written forever-ago so I can appreciate it has excuses.

@epugh
Copy link
Contributor Author

epugh commented Feb 13, 2021

Sorry to hear that news @uschindler ....

I did a pass through the Ref Guide to document how to use the XSLT stuff, and more explicitly called out the sample .xsl files that we have. I can add more if needed.

So, if we aren't changing the name of the Java class in this PR, then I think we just need to update the Upgrade to Solr 9 page, and we're good to go?

@dsmiley
Copy link
Contributor

dsmiley commented Feb 13, 2021

So, if we aren't changing the name of the Java class in this PR, then I think we just need to update the Upgrade to Solr 9 page, and we're good to go?

+1

@epugh epugh merged commit e6d9eaa into apache:master Feb 15, 2021
@mikemccand
Copy link
Member

Sorry, my father died on Thursday.

So sorry to hear this @uschindler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants