Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[chore] Change class name from process definition to workflow #26

Merged
merged 12 commits into from
Nov 16, 2022
2 changes: 2 additions & 0 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,8 @@ jobs:
repository: apache/dolphinscheduler
path: dolphinscheduler
submodules: true
# Temporary add to make https://github.com/apache/dolphinscheduler-sdk-python/issues/12 work
# ref: refs/pull/12918/head
- name: Cache local Maven repository
uses: actions/cache@v3
with:
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
# Cache
__pycache__/
.tox/
.pytest_cache/

# Build
build/
Expand Down
4 changes: 2 additions & 2 deletions DEVELOP.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,9 +46,9 @@ define by code, user usually do not care user, tenant, or queue exists or not. A
a new workflow by the code his/her definition. So we have some **side object** in `pydolphinscheduler/side`
directory, their only check object exists or not, and create them if not exists.

### Process Definition
### Workflow

pydolphinscheduler workflow object name, process definition is also same name as Java object(maybe would be change to
pydolphinscheduler workflow object name, workflow is also same name as Java object(maybe would be change to
other word for more simple).

### Tasks
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ python ./tutorial.py
> tenant value in `example/tutorial.py`. For now the value is `tenant_exists`, please change it to username exists
> in you environment.

After command execute, you could see a new project with single process definition named *tutorial* in the
After command execute, you could see a new project with single workflow named *tutorial* in the
[UI-project list](https://dolphinscheduler.apache.org/en-us/docs/latest/user_doc/guide/project/project-list.html).

## Develop
Expand Down
12 changes: 10 additions & 2 deletions UPDATING.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,18 @@ under the License.
Updating is try to document non-backward compatible updates which notice users the detail changes about pydolphinscheduler.
It started after version 2.0.5 released

## dev
## Main

* Remove parameter ``task_location`` in process definition and Java Gateway service ([#11681](https://github.com/apache/dolphinscheduler/pull/11681))
* Remove the spark version of spark task ([#11860](https://github.com/apache/dolphinscheduler/pull/11860)).
* Change class name from process definition to workflow ([#26](https://github.com/apache/dolphinscheduler-sdk-python/pull/26))
* Deprecated class `ProcessDefinition` to `Workflow`
* Deprecated class `SubProcess` to `SubWorkflow`, and change parameter name from `process_definition_name` to `workflow_name`
* Deprecated class `Dependent` parameter from `process_definition_name` to `workflow_name`
* And all above deprecated will be remove in version 4.1.0

## 3.1.0

* Remove parameter ``task_location`` in process definition and Java Gateway service ([#11681](https://github.com/apache/dolphinscheduler/pull/11681))

## 3.0.0

Expand Down
50 changes: 25 additions & 25 deletions docs/source/concept.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,25 +20,25 @@ Concepts

In this section, you would know the core concepts of *PyDolphinScheduler*.

Process Definition
------------------
Workflow
--------

Process definition describe the whole things except `tasks`_ and `tasks dependence`_, which including
Workflow describe the whole things except `tasks`_ and `tasks dependence`_, which including
name, schedule interval, schedule start time and end time. You would know scheduler

Process definition could be initialized in normal assign statement or in context manger.
Workflow could be initialized in normal assign statement or in context manger.

.. code-block:: python

# Initialization with assign statement
pd = ProcessDefinition(name="my first process definition")
pd = Workflow(name="my first workflow")

# Or context manger
with ProcessDefinition(name="my first process definition") as pd:
with Workflow(name="my first workflow") as pd:
pd.submit()

Process definition is the main object communicate between *PyDolphinScheduler* and DolphinScheduler daemon.
After process definition and task is be declared, you could use `submit` and `run` notify server your definition.
Workflow is the main object communicate between *PyDolphinScheduler* and DolphinScheduler daemon.
After workflow and task is be declared, you could use `submit` and `run` notify server your definition.

If you just want to submit your definition and create workflow, without run it, you should use attribute `submit`.
But if you want to run the workflow after you submit it, you could use attribute `run`.
Expand Down Expand Up @@ -84,7 +84,7 @@ Tenant is the user who run task command in machine or in virtual machine. it cou
.. code-block:: python

#
pd = ProcessDefinition(name="process definition tenant", tenant="tenant_exists")
pd = Workflow(name="workflow tenant", tenant="tenant_exists")

.. note::

Expand All @@ -93,9 +93,9 @@ Tenant is the user who run task command in machine or in virtual machine. it cou
Execution Type
~~~~~~~~~~~~~~

Decision which behavior to run when process definition have multiple instances. when process definition
Decision which behavior to run when workflow have multiple instances. when workflow
schedule interval is too short, it may cause multiple instances run at the same time. We can use this
parameter to control the behavior about how to run those process definition instances. Currently we
parameter to control the behavior about how to run those workflow instances. Currently we
have four execution type:

* ``parallel`` (default value): it means all instances will allow to run even though the previous
Expand All @@ -105,7 +105,7 @@ have four execution type:
* ``serial_discard``: it means the all instance will be discard(abandon) if the previous instance
is not finished.
* ``serial_priority``: it means the all instance will wait for the previous instance to finish,
and all the waiting instances will be executed base on process definition priority order.
and all the waiting instances will be executed base on workflow priority order.

Parameter ``execution type`` can be set in

Expand All @@ -114,8 +114,8 @@ Parameter ``execution type`` can be set in

.. code-block:: python

pd = ProcessDefinition(
name="process-definition",
pd = Workflow(
name="workflow_name",
execution_type="parallel"
)

Expand All @@ -141,7 +141,7 @@ If you want to see all type of tasks, you could see :doc:`tasks/index`.
Tasks Dependence
~~~~~~~~~~~~~~~~

You could define many tasks in on single `Process Definition`_. If all those task is in parallel processing,
You could define many tasks in on single `Workflow`_. If all those task is in parallel processing,
then you could leave them alone without adding any additional information. But if there have some tasks should
not be run unless pre task in workflow have be done, we should set task dependence to them. Set tasks dependence
have two mainly way and both of them is easy. You could use bitwise operator `>>` and `<<`, or task attribute
Expand All @@ -164,23 +164,23 @@ have two mainly way and both of them is easy. You could use bitwise operator `>>
# for some tasks have same dependence.
task1 >> [task2, task3]

Task With Process Definition
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Task With Workflow
~~~~~~~~~~~~~~~~~~

In most of data orchestration cases, you should assigned attribute `process_definition` to task instance to
decide workflow of task. You could set `process_definition` in both normal assign or in context manger mode
In most of data orchestration cases, you should assigned attribute `workflow` to task instance to
decide workflow of task. You could set `workflow` in both normal assign or in context manger mode

.. code-block:: python

# Normal assign, have to explicit declaration and pass `ProcessDefinition` instance to task
pd = ProcessDefinition(name="my first process definition")
shell_task = Shell(name="shell", command="echo shell task", process_definition=pd)
# Normal assign, have to explicit declaration and pass `Workflow` instance to task
pd = Workflow(name="my first workflow")
shell_task = Shell(name="shell", command="echo shell task", workflow=pd)

# Context manger, `ProcessDefinition` instance pd would implicit declaration to task
with ProcessDefinition(name="my first process definition") as pd:
# Context manger, `Workflow` instance pd would implicit declaration to task
with Workflow(name="my first workflow") as pd:
shell_task = Shell(name="shell", command="echo shell task",

With both `Process Definition`_, `Tasks`_ and `Tasks Dependence`_, we could build a workflow with multiple tasks.
With both `Workflow`_, `Tasks`_ and `Tasks Dependence`_, we could build a workflow with multiple tasks.

Authentication Token
--------------------
Expand Down
2 changes: 1 addition & 1 deletion docs/source/tasks/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ In this section
kubernetes

datax
sub_process
sub_workflow

sagemaker
mlflow
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,22 +15,22 @@
specific language governing permissions and limitations
under the License.
Sub Process
===========
Sub Workflow
============

.. automodule:: pydolphinscheduler.tasks.sub_process
.. automodule:: pydolphinscheduler.tasks.sub_workflow


YAML file example
-----------------

.. literalinclude:: ../../../examples/yaml_define/SubProcess.yaml
.. literalinclude:: ../../../examples/yaml_define/SubWorkflow.yaml
:start-after: # under the License.
:language: yaml



example_subprocess.yaml:
example_sub_workflow.yaml:

.. literalinclude:: ../../../examples/yaml_define/example_sub_workflow.yaml
:start-after: # under the License.
Expand Down
58 changes: 29 additions & 29 deletions docs/source/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ There are two types of tutorials: traditional and task decorator.
versatility to the traditional way because it only supported Python functions and without build-in tasks
supported. But it is helpful if your workflow is all built with Python or if you already have some Python
workflow code and want to migrate them to pydolphinscheduler.
- **YAML File**: We can use pydolphinscheduler CLI to create process using YAML file: :code:`pydolphinscheduler yaml -f tutorial.yaml`.
- **YAML File**: We can use pydolphinscheduler CLI to create workflow using YAML file: :code:`pydolphinscheduler yaml -f tutorial.yaml`.
We can find more YAML file examples in `examples/yaml_define <https://github.com/apache/dolphinscheduler-sdk-python/tree/main/examples/yaml_define>`_

.. tab:: Tradition
Expand Down Expand Up @@ -72,7 +72,7 @@ First of all, we should import the necessary module which we would use later jus
:start-after: [start package_import]
:end-before: [end package_import]

In tradition tutorial we import :class:`pydolphinscheduler.core.process_definition.ProcessDefinition` and
In tradition tutorial we import :class:`pydolphinscheduler.core.workflow.Workflow` and
:class:`pydolphinscheduler.tasks.shell.Shell`.

If you want to use other task type you could click and :doc:`see all tasks we support <tasks/index>`
Expand All @@ -84,16 +84,16 @@ First of all, we should import the necessary module which we would use later jus
:start-after: [start package_import]
:end-before: [end package_import]

In task decorator tutorial we import :class:`pydolphinscheduler.core.process_definition.ProcessDefinition` and
In task decorator tutorial we import :class:`pydolphinscheduler.core.workflow.Workflow` and
:func:`pydolphinscheduler.tasks.func_wrap.task`.

Process Definition Declaration
------------------------------
workflow Declaration
--------------------

We should instantiate :class:`pydolphinscheduler.core.process_definition.ProcessDefinition` object after we
import them from `import necessary module`_. Here we declare basic arguments for process definition(aka, workflow).
We define the name of :code:`ProcessDefinition`, using `Python context manager`_ and it **the only required argument**
for `ProcessDefinition`. Besides, we also declare three arguments named :code:`schedule` and :code:`start_time`
We should instantiate :class:`pydolphinscheduler.core.workflow.Workflow` object after we
import them from `import necessary module`_. Here we declare basic arguments for workflow.
We define the name of :code:`Workflow`, using `Python context manager`_ and it **the only required argument**
for `Workflow`. Besides, we also declare three arguments named :code:`schedule` and :code:`start_time`
which setting workflow schedule interval and schedule start_time, and argument :code:`tenant` defines which tenant
will be running this task in the DolphinScheduler worker. See :ref:`section tenant <concept:tenant>` in
*PyDolphinScheduler* :doc:`concept` for more information.
Expand All @@ -116,12 +116,12 @@ will be running this task in the DolphinScheduler worker. See :ref:`section tena

.. literalinclude:: ../../examples/yaml_define/tutorial.yaml
:start-after: # under the License.
:end-before: # Define the tasks under the workflow
:end-before: # Define the tasks within the workflow
:language: yaml

We could find more detail about :code:`ProcessDefinition` in :ref:`concept about process definition <concept:process definition>`
if you are interested in it. For all arguments of object process definition, you could find in the
:class:`pydolphinscheduler.core.process_definition` API documentation.
We could find more detail about :code:`Workflow` in :ref:`concept about workflow <concept:workflow>`
if you are interested in it. For all arguments of object workflow, you could find in the
:class:`pydolphinscheduler.core.workflow` API documentation.

Task Declaration
----------------
Expand All @@ -144,7 +144,7 @@ Task Declaration

We declare four tasks to show how to create tasks, and both of them are created by the task decorator which
using :func:`pydolphinscheduler.tasks.func_wrap.task`. All we have to do is add a decorator named
:code:`@task` to existing Python function, and then use them inside :class:`pydolphinscheduler.core.process_definition`
:code:`@task` to existing Python function, and then use them inside :class:`pydolphinscheduler.core.workflow`

.. literalinclude:: ../../src/pydolphinscheduler/examples/tutorial_decorator.py
:dedent: 0
Expand All @@ -157,13 +157,13 @@ Task Declaration
.. tab:: YAML File

.. literalinclude:: ../../examples/yaml_define/tutorial.yaml
:start-after: # Define the tasks under the workflow
:start-after: # Define the tasks within the workflow
:language: yaml

Setting Task Dependence
-----------------------

After we declare both process definition and task, we have four tasks that are independent and will be running
After we declare both workflow and task, we have four tasks that are independent and will be running
in parallel. If you want to start one task until some task is finished, you have to set dependence on those
tasks.

Expand Down Expand Up @@ -193,7 +193,7 @@ and task `task_child_two` was done, because both two task is `task_union`'s upst
We can use :code:`deps:[]` to set task dependence

.. literalinclude:: ../../examples/yaml_define/tutorial.yaml
:start-after: # Define the tasks under the workflow
:start-after: # Define the tasks within the workflow
:language: yaml

.. note::
Expand All @@ -210,7 +210,7 @@ After that, we finish our workflow definition, with four tasks and task dependen
local, we should let the DolphinScheduler daemon know how the definition of workflow. So the last thing we
have to do is submit the workflow to the DolphinScheduler daemon.

Fortunately, we have a convenient method to submit workflow via `ProcessDefinition` attribute :code:`run` which
Fortunately, we have a convenient method to submit workflow via `Workflow` attribute :code:`run` which
will create workflow definition as well as workflow schedule.

.. tab:: Tradition
Expand Down Expand Up @@ -245,24 +245,24 @@ At last, we could execute this workflow code in your terminal like other Python

If you do not start your DolphinScheduler API server, you could find how to start it in
:ref:`start:start Python gateway service` for more detail. Besides attribute :code:`run`, we have attribute
:code:`submit` for object `ProcessDefinition` which just submits workflow to the daemon but does not set
the workflow schedule information. For more detail, you could see :ref:`concept:process definition`.
:code:`submit` for object `Workflow` which just submits workflow to the daemon but does not set
the workflow schedule information. For more detail, you could see :ref:`concept:workflow`.

DAG Graph After Tutorial Run
----------------------------

After we run the tutorial code, you could log in DolphinScheduler web UI, go and see the
`DolphinScheduler project page`_. They is a new process definition be created by *PyDolphinScheduler* and it
`DolphinScheduler project page`_. They is a new workflow be created by *PyDolphinScheduler* and it
named "tutorial" or "tutorial_decorator". The task graph of workflow like below:

.. literalinclude:: ../../src/pydolphinscheduler/examples/tutorial.py
:language: text
:lines: 24-28

Create Process Using YAML File
------------------------------
Create Workflow Using YAML File
-------------------------------

We can use pydolphinscheduler CLI to create process using YAML file
We can use pydolphinscheduler CLI to create workflow using YAML file

.. code-block:: bash

Expand All @@ -271,7 +271,7 @@ We can use pydolphinscheduler CLI to create process using YAML file
We can use the following four special grammars to define workflows more flexibly.

- :code:`$FILE{"file_name"}`: Read the file (:code:`file_name`) contents and replace them to that location.
- :code:`$WORKFLOW{"other_workflow.yaml"}`: Refer to another process defined using YAML file (:code:`other_workflow.yaml`) and replace the process name in this location.
- :code:`$WORKFLOW{"other_workflow.yaml"}`: Refer to another workflow defined using YAML file (:code:`other_workflow.yaml`) and replace the workflow name in this location.
- :code:`$ENV{env_name}`: Read the environment variable (:code:`env_name`) and replace it to that location.
- :code:`${CONFIG.key_name}`: Read the configuration value of key (:code:`key_name`) and it them to that location.

Expand All @@ -290,7 +290,7 @@ For exmaples, our file directory structure is as follows:
├── Dependent.yaml
├── example_datax.json
├── example_sql.sql
├── example_subprocess.yaml
├── example_sub_workflow.yaml
├── Flink.yaml
├── Http.yaml
├── MapReduce.yaml
Expand All @@ -300,17 +300,17 @@ For exmaples, our file directory structure is as follows:
├── Shell.yaml
├── Spark.yaml
├── Sql.yaml
├── SubProcess.yaml
├── SubWorkflow.yaml
└── Switch.yaml

After we run

.. code-block:: bash

pydolphinscheduler yaml -file yaml_define/SubProcess.yaml
pydolphinscheduler yaml -file yaml_define/SubWorkflow.yaml


the :code:`$WORKFLOW{"example_sub_workflow.yaml"}` will be set to :code:`$WORKFLOW{"yaml_define/example_sub_workflow.yaml"}`, because :code:`./example_subprocess.yaml` does not exist and :code:`yaml_define/example_sub_workflow.yaml` does.
the :code:`$WORKFLOW{"example_sub_workflow.yaml"}` will be set to :code:`$WORKFLOW{"yaml_define/example_sub_workflow.yaml"}`, because :code:`./example_sub_workflow.yaml` does not exist and :code:`yaml_define/example_sub_workflow.yaml` does.

Furthermore, this feature supports recursion all the way down.

Expand Down
2 changes: 1 addition & 1 deletion examples/yaml_define/Condition.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
workflow:
name: "Condition"

# Define the tasks under the workflow
# Define the tasks within the workflow
tasks:
- { "task_type": "Shell", "name": "pre_task_1", "command": "echo pre_task_1" }
- { "task_type": "Shell", "name": "pre_task_2", "command": "echo pre_task_2" }
Expand Down