Skip to content

NIFI-7572: Added ScriptedTransformRecord processor. Addressed a coupl…#4374

Closed
markap14 wants to merge 3 commits intoapache:mainfrom
markap14:NIFI-7572
Closed

NIFI-7572: Added ScriptedTransformRecord processor. Addressed a coupl…#4374
markap14 wants to merge 3 commits intoapache:mainfrom
markap14:NIFI-7572

Conversation

@markap14
Copy link
Copy Markdown
Contributor

…e of minor bugs in mock classes that were encountered during testing.

Thank you for submitting a contribution to Apache NiFi.

Please provide a short description of the PR here:

Description of PR

Enables X functionality; fixes bug NIFI-YYYY.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there a JIRA ticket associated with this PR? Is it referenced
    in the commit message?

  • Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

  • Has your PR been rebased against the latest commit within the target branch (typically master)?

  • Is your initial contribution a single, squashed commit? Additional commits in response to PR reviewer feedback should be made on this branch and pushed to allow change tracking. Do not squash or use --force when pushing to allow for clean monitoring of changes.

For code changes:

  • Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
  • Have you written or updated unit tests to verify your changes?
  • Have you verified that the full build is successful on JDK 8?
  • Have you verified that the full build is successful on JDK 11?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
  • If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
  • If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check GitHub Actions CI for build issues and submit an update to your PR as soon as possible.

"org.apache.nifi.processors.standard.QueryRecord",
"org.apache.nifi.processors.standard.JoltTransformRecord",
"org.apache.nifi.processors.standard.LookupRecord"})
public class ScriptedTransformRecord extends AbstractProcessor implements Searchable {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a scripted component it should have an @Restricted annotation for EXECUTE_CODE permissions

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch!



private volatile String scriptToRun = null;
private volatile ScriptingComponentHelper scriptingComponentHelper = new ScriptingComponentHelper();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick but this can be final :)

<p>
Each time that the script is invoked, it is expected to return a
<a href="https://javadoc.io/static/org.apache.nifi/nifi-record/1.11.4/org/apache/nifi/serialization/record/Record.html">Record</a> object.
That Record is then written using the configured Record Writer. If the script returns a <code>null</code> value, the Record will not be written. If the script returns an object that is not
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add a sentence here describing how a Collection of Records can be returned and all will be written out.

Copy link
Copy Markdown
Contributor

@MikeThomsen MikeThomsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good. Definitely want to get this in before 1.12.0!

if (!(element instanceof Record)) {
throw new RuntimeException("Evaluated script against Record number " + index + " of " + flowFile
+ " but instead of returning a Record or Collection of Records, script returned a Collection of values, "
+ "at least one of which was not a Record but instead was: " + returnValue);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ot a Record but instead was: " + returnValue

could be

ot a Record but instead was: " + element.getClass().getName()

to better show the specific bad value type.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think which one (between the object toString() vs. class name) is more informative depends on the particular object. The default implementation of toString() includes the class name, but if that is overridden, the toString() should be more informative.

}

private ScriptEvaluator createEvaluator(final ScriptEngine scriptEngine, final FlowFile flowFile) throws ScriptException {
if (PYTHON_SCRIPT_LANGUAGE.equalsIgnoreCase(scriptEngine.getFactory().getLanguageName())) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattyb149 we deprecated support for Jython, didn't we, in ExecuteScript?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My memory on the matter is vague, but I remember there being some serious bug in Jython that made us have to exclude Jython.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't really have a way of deprecating such a thing. We generally recommend people use Groovy instead of Jython because the performance difference is pretty huge. And Jython definitely has some key limitations, such as the inability to call "C Python" libraries. But I don't recall any bugs that are critical enough for us to say that it shouldn't be used. Unless there's something I missed. @mattyb149 does that ring a bell for you?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Jython issue for scripted components was mostly because there's a bug where we can't subclass interfaces that have default methods. For the scripted components whose only job pretty much is to implement such interfaces (RecordReaderFactory, etc.) Jython doesn't make much sense. But Python/Jython shows up a lot in our community, so for this particular processor I think it's a good idea to support Jython here. I imagine quite a few folks will find this very helpful, to quickly script up a transformation in a language they're comfortable with, without having to worry about implementing much of the NiFi API.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying @mattyb149!

}


private class PythonScriptEvaluator implements ScriptEvaluator {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markap14 curious why we're not pre-compiling Groovy and other languages as well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ScriptEngine API doesn't support pre-compiling the script - and shouldn't. That should be handled by the Script Engine itself. It is in the case of Groovy and I would suppose others. Unfortunately, the Jython engine doesn't bother with this, and it's up to the implementors to do it. Honestly, the fact that we cast the script engine into a particular implementation and pre-compile for Jython is a hack. But went with it anyway because a quick benchmark on my laptop showed a performance improvement of well over 10x for a simple script. And Jython is likely to be the most commonly used language, I would guess.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could check the instance to see if it is Compilable and compile it if so, but IIRC some engines don't mark themselves as Compilable but they always compile under the hood anyway.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I did not realize that there was a Compilable interface. That definitely is cleaner. I can update to use that.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markap14 FYI, there is a bug in the version of JRuby that we're using that caused me some really weird issues when I played around with NIFI-5422. Still working on that one. Here's the issue I raised. They think they have a fix, but we'd need to test:

jruby/jruby#6230

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MikeThomsen . Well I gave this a try with CompiledScript. Unfortunately, it looks like JRuby has another problem in that it ignores the bindings that are passed to it. So I'll avoid going the route of CompiledScript for now, and just use it the way that I was previously. In testing this though, I found a bug where I was still constantly recompiling the Jython script when I didn't need to be. So will push an update for htat.

Copy link
Copy Markdown
Contributor

@MikeThomsen MikeThomsen Jul 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll need to move toward some abstraction which does the evaluation either with a compiledscript or with invocable as is appropriate. Groovy would definitely benefit from the compiledscript option, the rest are iffy. For Kotlin it was an absolute requirement (for performance reasons) to compile the last time I played with it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe Groovy will benefit from compiling, because the Groovy Engine itself already handles pre-compiling it and caching a Map of Script text to CompiledScript object. I would probably shy away from creating another abstraction that performs Invocable vs. CompileScript, etc. and instead just push for the teams maintaining the Engines themselves to address the concerns.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and after re-reading some of the JavaDocs, perhaps it does make sense. Unfortunately, the javax.script package has a lot of these "magical interfaces." We can probably do some refactoring and abstraction in a way that can handle pre-compiling where necessary and using the most appropriate interface. Personally I think this is a weakness in the API, but it is the API that we have. So another layer of abstraction may at least allow us to address it once instead of in each processor, etc.

I do think that is beyond the scope of this Jira, though. Would make sense in a follow-on Jira.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markap14 Scala and Kotlin are both JSR223-capable, but would definitely need to be compiled. When I experimented locally with Kotlin and I found that the engine appeared to have very different performance numbers with Invocable vs Compilable.

The guidance from the JRuby team is that all it does is save you from reparsing the scripts, but even that would definitely have value. We already have a ticket to track some of that here.

I agree that the changes should be made here, but they're worth exploring in other Jiras.

@markap14 markap14 changed the base branch from master to main July 7, 2020 15:55
…cript. Updated documentation showing performance difference. Fixed problematic unit tests for TestResizeImage
@MikeThomsen
Copy link
Copy Markdown
Contributor

@markap14 I'll make some time tonight or tomorrow to review.

@mattyb149
Copy link
Copy Markdown
Contributor

+1 LGTM tested a few scenarios, @MikeThomsen are you good with me merging?

@MikeThomsen
Copy link
Copy Markdown
Contributor

@mattyb149 I'll do it.

@asfgit asfgit closed this in 6cece9c Jul 8, 2020
simonbence pushed a commit to simonbence/nifi that referenced this pull request Jul 9, 2020
…e of minor bugs in mock classes that were encountered during testing.

NIFI-7572: Addressed review feedback
NIFI-7572: Fixed bug that resulted in constantly recompiling Jython script. Updated documentation showing performance difference. Fixed problematic unit tests for TestResizeImage

This closes apache#4374

Signed-off-by: Mike Thomsen <mthomsen@apache.org>
Wastack pushed a commit to Wastack/nifi that referenced this pull request Jul 20, 2020
…e of minor bugs in mock classes that were encountered during testing.

NIFI-7572: Addressed review feedback
NIFI-7572: Fixed bug that resulted in constantly recompiling Jython script. Updated documentation showing performance difference. Fixed problematic unit tests for TestResizeImage

This closes apache#4374

Signed-off-by: Mike Thomsen <mthomsen@apache.org>
ets pushed a commit to ets/nifi that referenced this pull request Oct 2, 2020
…e of minor bugs in mock classes that were encountered during testing.

NIFI-7572: Addressed review feedback
NIFI-7572: Fixed bug that resulted in constantly recompiling Jython script. Updated documentation showing performance difference. Fixed problematic unit tests for TestResizeImage

This closes apache#4374

Signed-off-by: Mike Thomsen <mthomsen@apache.org>
driesva pushed a commit to driesva/nifi that referenced this pull request Mar 19, 2021
…e of minor bugs in mock classes that were encountered during testing.

NIFI-7572: Addressed review feedback
NIFI-7572: Fixed bug that resulted in constantly recompiling Jython script. Updated documentation showing performance difference. Fixed problematic unit tests for TestResizeImage

This closes apache#4374

Signed-off-by: Mike Thomsen <mthomsen@apache.org>
krisztina-zsihovszki pushed a commit to krisztina-zsihovszki/nifi that referenced this pull request Jun 28, 2022
…e of minor bugs in mock classes that were encountered during testing.

NIFI-7572: Addressed review feedback
NIFI-7572: Fixed bug that resulted in constantly recompiling Jython script. Updated documentation showing performance difference. Fixed problematic unit tests for TestResizeImage

This closes apache#4374

Signed-off-by: Mike Thomsen <mthomsen@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants