NIFI-4946 nifi-spark-bundle : Adding support for pyfiles, file, jars options by Mageswaran1989 · Pull Request #2521 · apache/nifi

Mageswaran1989 · 2018-03-08T07:43:01Z

Thank you for submitting a contribution to Apache NiFi.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
Has your PR been rebased against the latest commit within the target branch (typically master)?
Is your initial contribution a single, squashed commit?

For code changes:

Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
Have you written or updated unit tests to verify your changes?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?

For documentation related changes:

Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.

Mageswaran1989 · 2018-03-08T08:08:22Z

Team,

This MR was created in line with our current requirements.
Currently the changes are tested manually and found working.

We would like to make this changes go in mainline, for which we need community help with reviewing and adding test code, since we are new to the Nifi extensions.

Next plan is to add test cases :

Create a sample PySpark example modules as part of Test resources
Use that in tests

Could any one point out some existing test cases that does similar testing.

zenfenan

Thank you @Mageswaran1989 for the contribution. I haven't done actual testing with these changes. I'll do that update the review as well. Thanks

zenfenan · 2018-03-08T16:26:58Z

...i-livy-processors/src/main/java/org/apache/nifi/processors/livy/ExecuteSparkInteractive.java

+        properties.add(MAIN_PY_FILE);
+        properties.add(NAME);
        properties.add(CODE);
+//        properties.add(ARGS);


Comments to be removed?

Only pyfiles and file options are tested. Rest are yet to be tested.

Plan was to go with implementing test modules and test other features, since the current manual testing takes a long routine of compile, copy and restart of the Nifi.

zenfenan · 2018-03-08T16:31:29Z

...i-livy-processors/src/main/java/org/apache/nifi/processors/livy/ExecuteSparkInteractive.java

+                    }
+                }
+
+                log.debug(" ====> jsonResponse: " + jsonResponse);


Cosmetic change: It would be great if this log.debug message can be changed to something of a proper standard, like "JSON Response : " i.e. Remove ====>

Sure, this will be removed in the next commit.

zenfenan · 2018-03-08T16:32:44Z

...i-livy-processors/src/main/java/org/apache/nifi/processors/livy/ExecuteSparkInteractive.java

+                            break;
+                        default:
+                            log.debug(" ====> default State: " + state);
+                            session.transfer(flowFile, REL_WAIT);


Same as above for these log.debug messages as well.

zenfenan · 2018-03-08T16:38:55Z

...i-livy-processors/src/main/java/org/apache/nifi/processors/livy/ExecuteSparkInteractive.java

+
+    public static final PropertyDescriptor JAR_FILES =  new PropertyDescriptor.Builder()
+            .name("exec-spark-iactive-jarfiles")
+            .displayName("jars")


displayName is what will be rendered on the UI. So lets change it to JARs or Application JARs?

Sure, will do that.

zenfenan · 2018-03-08T16:41:00Z

...i-livy-processors/src/main/java/org/apache/nifi/processors/livy/ExecuteSparkInteractive.java

+            .expressionLanguageSupported(false)
+            .build();
+
+    public static final PropertyDescriptor NAME =  new PropertyDescriptor.Builder()


Is this supposed to be the Spark app name? Looks it is never used anywhere other than adding to the PropertyDescriptor list

Like said before, not yet considered. We just wanted to get a hang of the code with our basic requirements

zenfenan · 2018-03-08T16:42:03Z

...i-livy-processors/src/main/java/org/apache/nifi/processors/livy/ExecuteSparkInteractive.java

+
+    public static final PropertyDescriptor MAIN_PY_FILE = new PropertyDescriptor.Builder()
+            .name("exec-spark-iactive-main-py-file")
+            .displayName("file")


Same as the JARs case. Most of the PropertyDescriptor use all lowercase characters for displayName. Please change it.

zenfenan · 2018-03-08T16:42:56Z

...i-livy-processors/src/main/java/org/apache/nifi/processors/livy/ExecuteSparkInteractive.java

+
+                String jsonResponse = null;
+
+                if (StringUtils.isEmpty(jsonResponse)) {


This will be true all the time, right?

Once current approach is accepted this can be taken care

zenfenan · 2018-03-08T17:03:22Z

...i-livy-processors/src/main/java/org/apache/nifi/processors/livy/ExecuteSparkInteractive.java

+                    //Incoming flow file is not an JSON file hence consider it to be an triggering point
+
+                    String batchPayload = "{ \"pyFiles\": [\"" +context.getProperty(PY_FILES).getValue()+ "\"], " +
+                            "\"file\" : \""+context.getProperty(MAIN_PY_FILE).getValue()+"\" }";


This is confusing to me. Why are we saying that if the incoming flowfile is not a valid JSON, we are going ahead with the assumption that it is going to be PySpark? I mean the assumption here lacks clarity. Please correct me, if I'm wrong.

Could you please check the description @ https://issues.apache.org/jira/browse/NIFI-4946

The assumption was made such that it doesn't break existing code flow and at the same time we wanted to know the status of the submitted job.

So the naive idea was to re-route the Livy Json response back to Spark processor only, so that it can get last submitted url from the custom (tampered) JSON response, wait for user specified wait time and again query the Livy for the Job status in a loop till it succeeds or fails.

So when the processor is configured to submit a Spark job, it will expect the incoming flowfile to be an custom Json response with an url field to query the Livy, if not it is considered as a triggering point nothing else.

I am open for any ideas from your end.

Thanks.

@zenfenan could please review above logic and suggest a way to handle plain Scala/Python code and packages source files for pyfiles/jars ?

The code can be submitted using the CODE property, right? That used to work. Or are you asking of a way to upload/send files or jars through Livy?

As per the code flow @ https://issues.apache.org/jira/browse/NIFI-4946, currently I am able to send *.zip files (Python modules) through livy. My question was what should we do with flowfile, when we are using the processor to submit a batch job?

Sometime this week I am planning to add support for jar files, args and application name over the Livy options.

The catch here is unlike plain Spark code, batch process code will take time to finish which is expected one as we know. So as a hack I was re-routing the Json response after batch submission to itself, where I poll the incoming flowfile and check whether it is a Json file and if so I will try to get the "livy url" to post again to know the status of the batch job as long as it runs. After knowing the the job finished, the success route is triggered.

That was the reason the I have made an assumption if the incoming file is Json, it is from previous batch job submission.

In short, the flow file :

Is considered as a triggering point (or)

Is considered as plain Spark code that compiles over Livy (or)

Is a Livy Json response, which can further be used to check the status of long running Spark batch job

I was looking for the right Nifi way of handling this?

I feel I am too conservative and trying to fit all the functionalities on one processor.

Flow file/property can be used to run a Spark code

Pyfiles can be used to run Spark batch job

Jars can be used to run Spark batch job

Args options for batch mode

By rerouting the success to itself, it can monitor the long running job over Livy rest APIs

Since the current processor is called ExecuteSparkInteractive, you could move the batch functionality out into something called SubmitSparkJob or something. The outgoing flow file could contain the application ID and/or any other information that would allow you to monitor the job downstream (perhaps with InvokeHttp through Livy, e.g.) Are you still interested in working on this? It seems like very nice functionality to have in NiFi!

zenfenan · 2018-03-08T17:11:41Z

...i-livy-processors/src/main/java/org/apache/nifi/processors/livy/ExecuteSparkInteractive.java

+
+                } catch (JSONException | InterruptedException e) {
+
+                    //Incoming flow file is not an JSON file hence consider it to be an triggering point


Cosmetic change: Multiple empty lines were left. IMHO, one empty line should be enough for better readability.

github-actions · 2021-05-02T00:11:23Z

We're marking this PR as stale due to lack of updates in the past few months. If after another couple of weeks the stale label has not been removed this PR will be closed. This stale marker and eventual auto close does not indicate a judgement of the PR just lack of reviewer bandwidth and helps us keep the PR queue more manageable. If you would like this PR re-opened you can do so and a committer can remove the stale tag. Or you can open a new PR. Try to help review other PRs to increase PR review bandwidth which in turn helps yours.

NIFI-4946 initial commit

a8301cd

NIFI-4946 initial commit

d23c0b3

zenfenan suggested changes Mar 8, 2018

View reviewed changes

Mageswaran1989 added 2 commits April 9, 2018 09:57

NIFI-4946 : updated with support for Jars

0984e03

NIFI-4946 : support for command line args added

46743e3

github-actions bot added the Stale label May 2, 2021

github-actions bot closed this May 18, 2021


		String jsonResponse = null;

		if (StringUtils.isEmpty(jsonResponse)) {


		} catch (JSONException \| InterruptedException e) {

		//Incoming flow file is not an JSON file hence consider it to be an triggering point

Conversation

Mageswaran1989 commented Mar 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

For all changes:

For code changes:

For documentation related changes:

Note:

Uh oh!

Mageswaran1989 commented Mar 8, 2018

Uh oh!

zenfenan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented May 2, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Mageswaran1989 commented Mar 8, 2018 •

edited

Loading