NIFI-7572: Added ScriptedTransformRecord processor. Addressed a coupl…#4374
NIFI-7572: Added ScriptedTransformRecord processor. Addressed a coupl…#4374markap14 wants to merge 3 commits intoapache:mainfrom
Conversation
| "org.apache.nifi.processors.standard.QueryRecord", | ||
| "org.apache.nifi.processors.standard.JoltTransformRecord", | ||
| "org.apache.nifi.processors.standard.LookupRecord"}) | ||
| public class ScriptedTransformRecord extends AbstractProcessor implements Searchable { |
There was a problem hiding this comment.
As a scripted component it should have an @Restricted annotation for EXECUTE_CODE permissions
|
|
||
|
|
||
| private volatile String scriptToRun = null; | ||
| private volatile ScriptingComponentHelper scriptingComponentHelper = new ScriptingComponentHelper(); |
There was a problem hiding this comment.
Nitpick but this can be final :)
| <p> | ||
| Each time that the script is invoked, it is expected to return a | ||
| <a href="https://javadoc.io/static/org.apache.nifi/nifi-record/1.11.4/org/apache/nifi/serialization/record/Record.html">Record</a> object. | ||
| That Record is then written using the configured Record Writer. If the script returns a <code>null</code> value, the Record will not be written. If the script returns an object that is not |
There was a problem hiding this comment.
Should add a sentence here describing how a Collection of Records can be returned and all will be written out.
MikeThomsen
left a comment
There was a problem hiding this comment.
Looks really good. Definitely want to get this in before 1.12.0!
| if (!(element instanceof Record)) { | ||
| throw new RuntimeException("Evaluated script against Record number " + index + " of " + flowFile | ||
| + " but instead of returning a Record or Collection of Records, script returned a Collection of values, " | ||
| + "at least one of which was not a Record but instead was: " + returnValue); |
There was a problem hiding this comment.
ot a Record but instead was: " + returnValue
could be
ot a Record but instead was: " + element.getClass().getName()
to better show the specific bad value type.
There was a problem hiding this comment.
I think which one (between the object toString() vs. class name) is more informative depends on the particular object. The default implementation of toString() includes the class name, but if that is overridden, the toString() should be more informative.
| } | ||
|
|
||
| private ScriptEvaluator createEvaluator(final ScriptEngine scriptEngine, final FlowFile flowFile) throws ScriptException { | ||
| if (PYTHON_SCRIPT_LANGUAGE.equalsIgnoreCase(scriptEngine.getFactory().getLanguageName())) { |
There was a problem hiding this comment.
@mattyb149 we deprecated support for Jython, didn't we, in ExecuteScript?
There was a problem hiding this comment.
My memory on the matter is vague, but I remember there being some serious bug in Jython that made us have to exclude Jython.
There was a problem hiding this comment.
We don't really have a way of deprecating such a thing. We generally recommend people use Groovy instead of Jython because the performance difference is pretty huge. And Jython definitely has some key limitations, such as the inability to call "C Python" libraries. But I don't recall any bugs that are critical enough for us to say that it shouldn't be used. Unless there's something I missed. @mattyb149 does that ring a bell for you?
There was a problem hiding this comment.
The Jython issue for scripted components was mostly because there's a bug where we can't subclass interfaces that have default methods. For the scripted components whose only job pretty much is to implement such interfaces (RecordReaderFactory, etc.) Jython doesn't make much sense. But Python/Jython shows up a lot in our community, so for this particular processor I think it's a good idea to support Jython here. I imagine quite a few folks will find this very helpful, to quickly script up a transformation in a language they're comfortable with, without having to worry about implementing much of the NiFi API.
| } | ||
|
|
||
|
|
||
| private class PythonScriptEvaluator implements ScriptEvaluator { |
There was a problem hiding this comment.
@markap14 curious why we're not pre-compiling Groovy and other languages as well.
There was a problem hiding this comment.
The ScriptEngine API doesn't support pre-compiling the script - and shouldn't. That should be handled by the Script Engine itself. It is in the case of Groovy and I would suppose others. Unfortunately, the Jython engine doesn't bother with this, and it's up to the implementors to do it. Honestly, the fact that we cast the script engine into a particular implementation and pre-compile for Jython is a hack. But went with it anyway because a quick benchmark on my laptop showed a performance improvement of well over 10x for a simple script. And Jython is likely to be the most commonly used language, I would guess.
There was a problem hiding this comment.
We could check the instance to see if it is Compilable and compile it if so, but IIRC some engines don't mark themselves as Compilable but they always compile under the hood anyway.
There was a problem hiding this comment.
Ah, I did not realize that there was a Compilable interface. That definitely is cleaner. I can update to use that.
There was a problem hiding this comment.
Thanks @MikeThomsen . Well I gave this a try with CompiledScript. Unfortunately, it looks like JRuby has another problem in that it ignores the bindings that are passed to it. So I'll avoid going the route of CompiledScript for now, and just use it the way that I was previously. In testing this though, I found a bug where I was still constantly recompiling the Jython script when I didn't need to be. So will push an update for htat.
There was a problem hiding this comment.
I think we'll need to move toward some abstraction which does the evaluation either with a compiledscript or with invocable as is appropriate. Groovy would definitely benefit from the compiledscript option, the rest are iffy. For Kotlin it was an absolute requirement (for performance reasons) to compile the last time I played with it.
There was a problem hiding this comment.
I don't believe Groovy will benefit from compiling, because the Groovy Engine itself already handles pre-compiling it and caching a Map of Script text to CompiledScript object. I would probably shy away from creating another abstraction that performs Invocable vs. CompileScript, etc. and instead just push for the teams maintaining the Engines themselves to address the concerns.
There was a problem hiding this comment.
... and after re-reading some of the JavaDocs, perhaps it does make sense. Unfortunately, the javax.script package has a lot of these "magical interfaces." We can probably do some refactoring and abstraction in a way that can handle pre-compiling where necessary and using the most appropriate interface. Personally I think this is a weakness in the API, but it is the API that we have. So another layer of abstraction may at least allow us to address it once instead of in each processor, etc.
I do think that is beyond the scope of this Jira, though. Would make sense in a follow-on Jira.
There was a problem hiding this comment.
@markap14 Scala and Kotlin are both JSR223-capable, but would definitely need to be compiled. When I experimented locally with Kotlin and I found that the engine appeared to have very different performance numbers with Invocable vs Compilable.
The guidance from the JRuby team is that all it does is save you from reparsing the scripts, but even that would definitely have value. We already have a ticket to track some of that here.
I agree that the changes should be made here, but they're worth exploring in other Jiras.
…e of minor bugs in mock classes that were encountered during testing.
…cript. Updated documentation showing performance difference. Fixed problematic unit tests for TestResizeImage
|
@markap14 I'll make some time tonight or tomorrow to review. |
|
+1 LGTM tested a few scenarios, @MikeThomsen are you good with me merging? |
|
@mattyb149 I'll do it. |
…e of minor bugs in mock classes that were encountered during testing. NIFI-7572: Addressed review feedback NIFI-7572: Fixed bug that resulted in constantly recompiling Jython script. Updated documentation showing performance difference. Fixed problematic unit tests for TestResizeImage This closes apache#4374 Signed-off-by: Mike Thomsen <mthomsen@apache.org>
…e of minor bugs in mock classes that were encountered during testing. NIFI-7572: Addressed review feedback NIFI-7572: Fixed bug that resulted in constantly recompiling Jython script. Updated documentation showing performance difference. Fixed problematic unit tests for TestResizeImage This closes apache#4374 Signed-off-by: Mike Thomsen <mthomsen@apache.org>
…e of minor bugs in mock classes that were encountered during testing. NIFI-7572: Addressed review feedback NIFI-7572: Fixed bug that resulted in constantly recompiling Jython script. Updated documentation showing performance difference. Fixed problematic unit tests for TestResizeImage This closes apache#4374 Signed-off-by: Mike Thomsen <mthomsen@apache.org>
…e of minor bugs in mock classes that were encountered during testing. NIFI-7572: Addressed review feedback NIFI-7572: Fixed bug that resulted in constantly recompiling Jython script. Updated documentation showing performance difference. Fixed problematic unit tests for TestResizeImage This closes apache#4374 Signed-off-by: Mike Thomsen <mthomsen@apache.org>
…e of minor bugs in mock classes that were encountered during testing. NIFI-7572: Addressed review feedback NIFI-7572: Fixed bug that resulted in constantly recompiling Jython script. Updated documentation showing performance difference. Fixed problematic unit tests for TestResizeImage This closes apache#4374 Signed-off-by: Mike Thomsen <mthomsen@apache.org>
…e of minor bugs in mock classes that were encountered during testing.
Thank you for submitting a contribution to Apache NiFi.
Please provide a short description of the PR here:
Description of PR
Enables X functionality; fixes bug NIFI-YYYY.
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:
For all changes:
Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
Has your PR been rebased against the latest commit within the target branch (typically
master)?Is your initial contribution a single, squashed commit? Additional commits in response to PR reviewer feedback should be made on this branch and pushed to allow change tracking. Do not
squashor use--forcewhen pushing to allow for clean monitoring of changes.For code changes:
mvn -Pcontrib-check clean installat the rootnififolder?LICENSEfile, including the mainLICENSEfile undernifi-assembly?NOTICEfile, including the mainNOTICEfile found undernifi-assembly?.displayNamein addition to .name (programmatic access) for each of the new properties?For documentation related changes:
Note:
Please ensure that once the PR is submitted, you check GitHub Actions CI for build issues and submit an update to your PR as soon as possible.