Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ZEPPELIN-1665] Z.run with external note executable and access resource for zeppelin in each interpreter #1637

Closed
wants to merge 32 commits into from
Closed

[ZEPPELIN-1665] Z.run with external note executable and access resource for zeppelin in each interpreter #1637

wants to merge 32 commits into from

Conversation

cloverhearts
Copy link
Member

@cloverhearts cloverhearts commented Nov 15, 2016

What is this PR for?

Currently, the z.run command is restricted.
Only paragraphs in a single note can be executed.
I have modified this to allow you to freely execute paragraphs of other notes.
This PR provides the basis for the freeful use of Zeppelin's resources at each Interpreter implementation.

What type of PR is it?

Improvement, Feature

Todos

  • extends z.run
  • run all paragraph in external note
  • run paragraph for external note.
  • get resource for zeppelin in each interpreter.
  • improve test case.
  • how to use docuement

What is the Jira issue?

https://issues.apache.org/jira/browse/ZEPPELIN-1665

How should this be tested?

Currently under development.

run paragraph in same note

%spark
z.run("paragraphID")

run paragraph with external note

z.run("noteid", "paragraphid");

all note run

z.runNote("noteid");

Screenshots (if appropriate)

  • paragraph run
    zrun

  • noterun
    runnote

Questions:

  • Does the licenses files need update? no
  • Is there breaking changes for older versions? no
  • Does this needs documentation? yes

@cloverhearts cloverhearts changed the title [ZEPPELIN-1665] Z.run with external note executable and access resource for zeppelin in each interpreter [ZEPPELIN-1665 : WIP] Z.run with external note executable and access resource for zeppelin in each interpreter Nov 15, 2016
Copy link
Member

@Leemoonsoo Leemoonsoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cloverhearts for the contribution. I left some comments. Please take a look.

* get Remote Zeppelin Server Controller interface
*/
@ZeppelinApi
public abstract RemoteWorksController getRemoteZeppelinServerController();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better pass RemoteWorksController through InterpreterContext from here, and avoid defining new methods in Interpreter class.

@@ -43,6 +44,7 @@
private int port = -1;
private final String interpreterDir;
private final String localRepoDir;
private RemoteWorksController remoteWorksController;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to avoid RemoteInterpreterManagedProcess holds reference to RemoteWorksController, because the object is not used in RemoteInterpreterManagedProcess.
Which can be done by adding new method in the RemoteInterpreterProcessListener interface.

@@ -118,6 +120,9 @@ public ZeppelinServer() throws Exception {

notebook.addNotebookEventListener(heliumApplicationFactory);
notebook.addNotebookEventListener(notebookWsServer.getNotebookInformationListener());

remoteWorksManager = new RemoteWorksManager(notebook);
replFactory.setRemoteController(remoteWorksManager.getInstance());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remoteWorksManager.getInstance() is invoked here only, not anywhere else.
That means we don't need singleton pattern here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deleted this class file.

return RemoteWorksManager.instance;
}

private class NotebookJobManager implements RemoteWorksController {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is purpose of having NotebookJobManager private class and RemoteWorksController interface?
I think all the methods in the NotebookJobManager can be moved to RemoteWorksManager, and RemoteWorksController interface can be removed.

I think that's simple and easier to understand.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a sign of my impatience.
I expected RemoteWorksController to exist in various cases in the future.
But this is not what we need now.
Your opinion is correct.
I will modify this.

@@ -50,8 +50,10 @@ enum RemoteInterpreterEventType {
OUTPUT_UPDATE = 9,
ANGULAR_REGISTRY_PUSH = 10,
APP_STATUS_UPDATE = 11,
REMOTE_ZEPPELIN_SERVER_CONTROL = 12
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use more specific name for this. such as PARAGRAPH_RUN_CONTEXT.

1: RemoteZeppelinServerControlEvent type,
2: string eventOwnerKey
3: string msg
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically, we don't need enum RemoteZeppelinServerControlEvent, enum RemoteZeppelinServerResourceType, struct ZeppelinServerResourceParagraphRunner, struct ZeppelinServerResource, struct RemoteZeppelinServerController.

Please try use generic message data structure RemoteInterpreterEvent for sending event from interpreter process to zeppelin server process.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you,
modified done. :)

@@ -109,4 +140,6 @@ service RemoteInterpreterService {
RemoteApplicationResult loadApplication(1: string applicationInstanceId, 2: string packageInfo, 3: string noteId, 4: string paragraphId);
RemoteApplicationResult unloadApplication(1: string applicationInstanceId);
RemoteApplicationResult runApplication(1: string applicationInstanceId);

void remoteZeppelinServerControlFeedback(1: RemoteZeppelinServerController response);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please try use primitive types instead of RemoteZeppelinServerController. You can json serialize complex data into string like other rpc methods above.
And more specific name, such as 'sendRunContexts()`

# Conflicts:
#	zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/remote/RemoteInterpreterEventPoller.java
#	zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/thrift/RemoteInterpreterEventType.java
# Conflicts:
#	scio/src/main/scala/org/apache/zeppelin/scio/ScioInterpreter.scala
# Conflicts:
#	zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/remote/RemoteInterpreter.java
#	zeppelin-interpreter/src/main/thrift/RemoteInterpreterService.thrift
#	zeppelin-interpreter/src/test/java/org/apache/zeppelin/interpreter/remote/RemoteAngularObjectTest.java
#	zeppelin-interpreter/src/test/java/org/apache/zeppelin/interpreter/remote/RemoteInterpreterOutputTestStream.java
#	zeppelin-interpreter/src/test/java/org/apache/zeppelin/interpreter/remote/RemoteInterpreterTest.java
#	zeppelin-interpreter/src/test/java/org/apache/zeppelin/resource/DistributedResourcePoolTest.java
#	zeppelin-interpreter/src/test/java/org/apache/zeppelin/scheduler/RemoteSchedulerTest.java
#	zeppelin-zengine/src/main/java/org/apache/zeppelin/interpreter/InterpreterFactory.java
@cloverhearts
Copy link
Member Author

@moon Thank you for your kind and good feedback.
sorry, my answer is late.
I confirmed your comments and checked everything for my code.
I think all of you are right.
I will fix most of the content.
Thank you.

@cloverhearts
Copy link
Member Author

Change structure and remove remoteWorksManager
The event has been implemented using RemoteProcessEventListener.

@cloverhearts
Copy link
Member Author

runnote

add feature for z.runNote("noteID")

@cloverhearts
Copy link
Member Author

CI Error.

This is irrelevant to my edits.
It is also verified in other PRs.

Results :

Tests in error: 
  ParagraphActionsIT.testEditOnDoubleClick:443 » ElementNotVisible Element is no...
  ZeppelinIT.testAngularDisplay:142->AbstractZeppelinIT.waitForParagraph:70->AbstractZeppelinIT.pollingWait:96 » Timeout

@cloverhearts cloverhearts changed the title [ZEPPELIN-1665 : WIP] Z.run with external note executable and access resource for zeppelin in each interpreter [ZEPPELIN-1665] Z.run with external note executable and access resource for zeppelin in each interpreter Nov 22, 2016
@cloverhearts
Copy link
Member Author

All work is done.

Copy link
Member

@Leemoonsoo Leemoonsoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cloverhearts for addressing comments.

I left few more comments.
About some unnecessary changes around import blocks in many interpreter modules and for more consistency in thrift idl.

import org.apache.zeppelin.interpreter.InterpreterContext;
import org.apache.zeppelin.interpreter.InterpreterPropertyBuilder;
import org.apache.zeppelin.interpreter.InterpreterResult;
import org.apache.zeppelin.interpreter.*;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you like to keep import block unchanged here (and the in the other interpreters)?

@@ -25,6 +25,7 @@
import org.apache.zeppelin.interpreter.InterpreterContext;
import org.apache.zeppelin.interpreter.InterpreterResult;
import org.apache.zeppelin.interpreter.InterpreterResult.Code;
import org.apache.zeppelin.interpreter.RemoteWorksController;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you keep PigInterpreter.java unchanged?

@@ -114,6 +110,8 @@
private SparkDependencyResolver dep;
private String sparkUrl;

private RemoteWorksController remoteWorksController;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you keep SparkInterpreter.java unchanged?

struct RemoteZeppelinServerController {
2: string eventOwnerKey
3: string msg
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every message RemoteInterpreter send to ZeppelinServer using RemoteInterpreterEventClient is RemoteInterpreterEvent type. Shell we avoid defining new type - ZeppelinServerResourceParagraphRunner and RemoteZeppelinServerController - for the payload ?
Because all other event are sending their payload using RemoteInterpreterEvent instead of defining their own struct in thrift idl. So let's do it in consistent way.

@@ -110,4 +122,6 @@ service RemoteInterpreterService {
RemoteApplicationResult loadApplication(1: string applicationInstanceId, 2: string packageInfo, 3: string noteId, 4: string paragraphId);
RemoteApplicationResult unloadApplication(1: string applicationInstanceId);
RemoteApplicationResult runApplication(1: string applicationInstanceId);

void onReceivedResourceParagraphRunners(1: RemoteInterpreterEvent response);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is message from ZeppelinServer to Interpreter process. Like other methods definitions, shell we not use RemoteInterpreterEvent and use primitive types instead? If you have a complex type payload, you can json serialize and send it as a string. (see angularObjectAdd at line 118)

@cloverhearts
Copy link
Member Author

@Leemoonsoo
Thank you for your advice.
I have finished editing.
May I ask you for a review once again?

@Leemoonsoo
Copy link
Member

Could you keep following files unchanged and resolve conflict with master?

bigquery/src/main/java/org/apache/zeppelin/bigquery/BigQueryInterpreter.java
cassandra/src/main/java/org/apache/zeppelin/cassandra/CassandraInterpreter.java
flink/src/main/java/org/apache/zeppelin/flink/FlinkInterpreter.java
ignite/src/main/java/org/apache/zeppelin/ignite/IgniteSqlInterpreter.java
jdbc/src/main/java/org/apache/zeppelin/jdbc/JDBCInterpreter.java
kylin/src/main/java/org/apache/zeppelin/kylin/KylinInterpreter.java
lens/src/main/java/org/apache/zeppelin/lens/LensInterpreter.java
markdown/src/main/java/org/apache/zeppelin/markdown/Markdown.java
pig/src/main/java/org/apache/zeppelin/pig/PigInterpreter.java
postgresql/src/main/java/org/apache/zeppelin/postgresql/PostgreSqlInterpreter.java
python/src/main/java/org/apache/zeppelin/python/PythonInterpreterPandasSql.java
scalding/src/main/java/org/apache/zeppelin/scalding/ScaldingInterpreter.java
spark/src/main/java/org/apache/zeppelin/spark/DepInterpreter.java
spark/src/main/java/org/apache/zeppelin/spark/PySparkInterpreter.java
spark/src/main/java/org/apache/zeppelin/spark/SparkInterpreter.java
spark/src/main/java/org/apache/zeppelin/spark/SparkSqlInterpreter.java
zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/dev/DevInterpreter.java
zeppelin-interpreter/src/test/java/org/apache/zeppelin/interpreter/remote/mock/MockInterpreterA.java
zeppelin-interpreter/src/test/java/org/apache/zeppelin/interpreter/remote/mock/MockInterpreterB.java
zeppelin-interpreter/src/test/java/org/apache/zeppelin/interpreter/remote/mock/MockInterpreterAngular.java
zeppelin-interpreter/src/test/java/org/apache/zeppelin/interpreter/remote/mock/MockInterpreterResourcePool.java
zeppelin-zengine/src/main/java/org/apache/zeppelin/interpreter/InterpreterFactory.java
zeppelin-zengine/src/test/java/org/apache/zeppelin/interpreter/mock/MockInterpreter2.java

Except for that, LGTM

# Conflicts:
#	zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/InterpreterContext.java
#	zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/thrift/InterpreterCompletion.java
#	zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/thrift/RemoteApplicationResult.java
#	zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/thrift/RemoteInterpreterContext.java
#	zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/thrift/RemoteInterpreterEvent.java
#	zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/thrift/RemoteInterpreterResult.java
#	zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/thrift/RemoteInterpreterService.java
@cloverhearts
Copy link
Member Author

One test case is experiencing a problem.
I will solve this problem.

@cloverhearts
Copy link
Member Author

Okay, fixed all problem.
always ci green
i was change test case for zrun in angular object watch test.
@Leemoonsoo Could you please review for this pr?

@Leemoonsoo
Copy link
Member

I still can see some unnecessary changes on the import code block in many files.
Could you manage them unchanged? I have listed files in previous comment.

@cloverhearts
Copy link
Member Author

ci retry

@cloverhearts
Copy link
Member Author

@Leemoonsoo Sorry, I think I misunderstood your answer. All edits are complete.
Thank you for your reply.

@Leemoonsoo
Copy link
Member

@cloverhearts Thanks for the great contribution.
LGTM and merge to master if there're no more discussions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants