New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ZEPPELIN-1165 : WIP] Code-based job workflow #1799
Conversation
create new issue on jira |
@cloverhearts This is very interesting. I have a few questions
|
@zjffdu In fact, this feature has a dependency on Spark. Since calls can be made at any time in the code, we can use them together during analysis or in combination with external libraries. By default, this is not a complete implementation of the workflow.
or
or
|
@zjffdu |
@cloverhearts What I mean is that the code like following would be called many times by users
It is just like some code templates, so what I suggest is that we can create a high level workflow framework which use these apis internally. And for users, they just need to specify the dependency between paragraphs using this framework, they don't need to check job status like the code above. |
BTW, in the first phase we can provide the high-level framework to allow user to call it programmatically, And in the second phase, it would be better to allow user to do it though drag & drop in UI. |
@zjffdu perhaps, Woluld you give me for many opinion this about? |
Thanks @cloverhearts , after reading #1176. This PR is the first phase of this feature (implement low level api for workflow), is that correct ? |
@zjffdu |
From my point of view this kind of functionality shall be provided by the core framework. This is a kind defining the paragraph execution workflow implicitely without the need to program explicitely. |
I'm also a little bit confused what this PR really is about - the pictures above point to paragraph execution order and control but the discussion also points to Notebook execution workflows. Often paragraphs within notebooks depend on others and therefore they need to be executed in a certain order. I feel like this kind of paragraph execution control shall be handeled by the core framework based on settings for each paragraph within the notebook. Additionally: In some places within the discussion the implementation of that feature on interpreter level was mentioned. It is not clear to me why the notebook workflow definition feature shall be reimplemented in different interpreters in different ways. Instead the internals of a notebook are of no interest when it is executed within a workflow - all that matters is success or failure and a definition at the workflow level what shall happen in case of a failure. So from my point of view the notebook workflow feature should also be implemented in the core code independently from the different interpreters available. |
How about not repeating
or just
|
Yes, apart from workflow, this feature is essential. (Get paragraph status) And we will separate the functions related to the workflow into other PRs. For example, getting paragraph status, deleting paragraph output. Thank you a lot for your opinion. |
@Leemoonsoo |
As far as I remember another discussion the paragraph IDs will change if you export/import or copy a notebook (not sure which one applies). If that is the case the workflow will be broken after import. If the user in front of the screen is not familiar with the code and logic of the notebook, it might be difficult to fix. What about a simple "z.wait(ordernumber or paragraphId)" function which makes the paragraph wait for the paragraph referenced by the ordernumber or id to finish successfully or cancel the paragraph execution in case of an error? This way all paragraphs without z.wait will be executed in parallel and those calling z.wait would be executed in sequence to the ones they depend on. And additionally this kind of functionality would not be mixed with the job handling on notebook level. |
@rasehorn Case 1
Or Case 2
And would you please more explain regarding |
@cloverhearts Also: I'm only talking about the use case to ensure a certain sequence of paragraph executions when runAll is called for the notebook. If you explicitely call z.run(paragraphId) within a certain notebook after runAll() was called, you propably execute those paragraphs twice. The easiest way to ensure a certain sequence of paragraph execution after runAll() was issued is to make the paragraphs wait for the one they depend on to finish. Lets say we have three paragraphs. From my point of view this would be the easiest way for a ZeppelinUser to ensure a certain sequence of paragraph execution including control which paragraphs are executed in parallel. To answer your particular question: |
I agree with @rasehorn that workflow execution should be done in a high level framework. User just need to define the workflow (specify the dependencies between paragraphs). I also paste one image to illustrate my current idea. In the following screenshot, we have 4 paragraphs, paragraph 1 needs to run first and paragraph 2,3,4 can be run concurrently after paragraph 1. So on each paragraph's top right area, we can allow user to specify this paragraph's dependencies. Here, paragraph_1 has no dependencies, and paragraph 2,3,4 depends on paragraph 1. After the workflow is defined (dependencies are specified), we can click the button on the top right of the note to run all the paragraphs on the note. We could also provide rest api for run this whole note. |
@cloverhearts Stop?? |
close #83 close #86 close #125 close #133 close #139 close #146 close #193 close #203 close #246 close #262 close #264 close #273 close #291 close #299 close #320 close #347 close #389 close #413 close #423 close #543 close #560 close #658 close #670 close #728 close #765 close #777 close #782 close #783 close #812 close #822 close #841 close #843 close #878 close #884 close #918 close #989 close #1076 close #1135 close #1187 close #1231 close #1304 close #1316 close #1361 close #1385 close #1390 close #1414 close #1422 close #1425 close #1447 close #1458 close #1466 close #1485 close #1492 close #1495 close #1497 close #1536 close #1545 close #1561 close #1577 close #1600 close #1603 close #1678 close #1695 close #1739 close #1748 close #1765 close #1767 close #1776 close #1783 close #1799
When can we expect this functionality to be available in zeppelin? |
@aviralKumar13 I don't think this PR is in progress now. It might be better to do that in other schedule framework. e.g airflow, What zeppelin needs to provide is a simple use api to invoke running paragraph/notes. And this PR would be helpful on that. #3887 |
@zjffdu thanks for the info , I am looking for dependency based execution in zeppelin where in we can have one paragraph waiting for another paragraph execution to finish . Airflow can help me do the schedule , but getting response from zeppelin paragraph whether it got completed or not is not possible as on today , right ? I need to poll the spark job interface for the status of job. |
@aviralKumar13 You can take a look at #3887 which provide easy api for note/paragraph execution and status polling |
okay it looks great .. is it on track for merge to master ? Roughly when this will be available ? |
I think it will be merged to master soon. |
This PR has stopped functioning. |
What is this PR for?
Code based workflow (work in progress)
Re-implementation on this pr
#1176
or
or
Workflow process feature.
(To ensure the success of each paragraph, it is possible to run consecutively.)
Case 1
Please check the following flowchart.
Case 2
In general, when run a plurality of Paragraph, it performs Note entire run.
This is a good way to run a lot of Paragraph contained in the Note.
However, the problem occurs if the Interpreter of Paragraph different.
For Paragraph each using a different type of one of the Interpreter Note but run in sequence, the end is all different.
For example, Markdown is a very fast Interpreter.
The process is completed very quickly.
This is a problem in the sequential execution Paragraph.
This feature ensures a certain execution order Notebook with each Interpreter.
Case 3
For concurrent job in the workflow ...
If the current functional design is supposed to run at the same time, as follows
It is to share the results of the job.
But if the situation need to run the job at the same time, subject to their execution flow.
** The results will have to succeed, the following paragraph will be executed. **
What type of PR is it?
Improvement
jira
https://issues.apache.org/jira/browse/ZEPPELIN-1165
![cap 2016-07-14 15-11-07-036]
Questions: