Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Add initial design doc for core Couler API #24

Merged
merged 4 commits into from
Sep 28, 2020
Merged

Conversation

terrytangyuan
Copy link
Member

@terrytangyuan terrytangyuan commented Aug 31, 2020

Hi community,

We are opening this PR to share our thoughts on supporting multiple workflow engines. We'd appreciate any feedback and suggestions. In addition, if you are interested in contributing either a new backend or functionalities of the existing backend, please let us know in this pull request.

Signed-off-by: terrytangyuan <terrytangyuan@gmail.com>
Signed-off-by: terrytangyuan <terrytangyuan@gmail.com>

Core operations (`couler.ops`):

* `run_step(step_def)` where `step_def` is the step definition that can be a container spec, Python function,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this name step a little bit confusing since step means differently in Tekton and Argo?

In addition, I am a little curious about the design of the DAG in Couler. In Argo, DAG is one of the templates, but in Tekton, there is no concept of templates and users can simply run their tasks in DAG.

Copy link
Member Author

@terrytangyuan terrytangyuan Sep 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question! Thanks for chiming in.

I think it should be fine as long as we document it properly. In Couler, a step simply means a node in the workflow graph (the "smallest" unit in some sense) which is essentially the same for Argo and Tekton. In Tekton, a task consists of multiple steps and a pipeline consists of multiple tasks (example).

Perhaps we can model the analogy like the following:

  • step (Couler) = step (Argo) = step (Tekton)
  • reusable step (Couler) = template (Argo) = task (Tekton)
  • workflow (Couler) = workflow (Argo) = pipeline (Tekton)

Even though this may not be 100% accurate but we can try model this as closely as possible to cover most use cases. What do you think?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to extend the analogy above to other systems like Airflow, Dagster, and Prefect

Workflow Engine Couler Argo Tekton Airflow Dagster Prefect
Step Step step step task solid task
Composite steps Reusable step template task SubDag or TaskGroup composite solid ???
Worfklow Workflow workflow pipeline DAG pipeline Flow

I am not sure about the definition of Reusable step, but I assume it means composite steps where it references task for Tekton?

I think it will be nice to have a comparison table and keep it updated to help adoption for people from other backends and for implementation for new backends.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Great idea. I'll add this table to the doc. By "reusable step", I meant parameterized template that can be used to define a step where users only have to specify a few parameters.

@chinazj
Copy link

chinazj commented Sep 3, 2020

some suggestions:

  1. Do we need an initialization step?
  2. tekton's PipelineResources should be added as a parameter to the step funcition. These parameters have different types. ** Treat parameters as a structure or class**.
type parameter struct {
      param_type 
      url
      value
}

@terrytangyuan
Copy link
Member Author

@chinazj Thanks for the great questions. See my replies below:

  1. Do you mean the initialization of the workflow? Currently the workflow and its attributes are managed globally and we have an experimental couler._cleanup() to clean up the states so users can start defining a new workflow. This may not be ideal/friendly to some users and it's definitely good to have an object/scope to perform initialization and hold the states. cc @chunyang-wen who had similar thoughts on this.

  2. Yes, parameters/artifacts are definitely things we plan to support in the future. For Argo backend, we have input/output as args in the step function but we have not thought of an unified API for this yet. The plan is to provide a minimal set of standardized APIs first and then the community can help bring up and contribute new proposals on designs like parameters/artifacts.

@xinbinhuang
Copy link
Contributor

Hi thanks for the great work!

I am curious about a few things about the goals of couler:

  1. Who is the main audience?
  2. What are the preferred programming paradigm (i.e. functional?)
  3. What is the high level system design/model on how couler interact and manage backends ?
  4. What are the main use cases and current workloads you have in production in Alibaba?

@terrytangyuan
Copy link
Member Author

terrytangyuan commented Sep 3, 2020

@xinbinhuang Thanks for the feedback and questions. Please see my replies below:

  1. Data scientists and engineers who want to define workflows for machine learning application or DevOps/automation tasks, at ease without learning any complicated OOP and DSL.

  2. Functional and imperative.

  3. Have you seen the following in the doc? Basically we want to use this mechanism to switch backends as well as potentially providing configurations in submit(config=workflow_config(schedule="* * * * 1")) that may be backend-specific. Implementations for each backend can be flexible as long as the APIs specific to the backend can work together.

    Backends (`couler.backends`):
    * `get_backend()`
    * `use_backend("argo")`
    
  4. We cannot disclose any specific use cases but we currently use Couler for ML tasks mostly. The main benefits of Couler are: the standardized interface across different engines so it's easy to switch engines for different use cases for better performance; reusable templates; shared best practices/validation/optimizations, etc.

* `when(cond, if_op, else_op)` where
* `cond` can be any of the following predicates: `equal()`, `not_equal()`, `bigger()`, `smaller()`, `bigger_equal()`, `smaller_equal()`.
* The operation defined in `if_op` will be executed when `cond` is true. Otherwise, `else_op` will be executed.
* `while_loop(cond, func, *args, **kwargs)` where
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming suggestion: while_loop -> loop?

Copy link
Member Author

@terrytangyuan terrytangyuan Sep 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"loop" by itself might be a bit confusing as it can also be interpreted as "for loop"


* `get_backend()`
* `use_backend("argo")`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a parameters section? (or at least think about that behavior?)

I think the matrix definition proposed above is great, also would love to see how parameters are handled. I know that you may define global parameters, which were shared across the steps in a workflow (for airflow at least). This could be achieved in argo, I'm not sure how that is handled in tekton.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I see that is a future thing (input/output/parameters), though from my perspective this is one thing we want to get to early

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently support input parameters via args, e.g. see this test for examples. However I have not put thoughts into other backends for parameters yet. It might require a dedicated design doc for that.


result = flip_coin()
couler.when(couler.equal(result, "heads"), lambda: heads())
couler.when(couler.equal(result, "tails"), lambda: tails())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the functional beauty of this but having to specify the when away from the involved step function spreads the logics into two places

I'm not sure how tekton does this, but could we have it available as a decorator, e.g.

from couler import when, equal

@when(equal(flip_coin().result, "heads")
def heads():
  ...

I feel it's more concise to "see" the dag flow when such logics are located together, especially when you have large dag that involves tons of steps.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's definitely a good thought by having the conditional logic and the op closer to each other. One concern is that users like data scientists and analysts might not know decorators well and may get confused. Though we can consider this decorator as an additional convenience method that's equivalent to the existing API. Feel free to propose again in the future when the existing implementation is in place and we'll take a look together with other backends as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good discussion. I have a little bit different opinion on it. Having a "decorator" looks to have the conditional logic sitting together with the op logic, however, it does not really show the full workflow, instead, the line 77/78/79 are showing the full workflow, like a "main" function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a similar comment as @binarycrayon. I think it will be nice to provide @when, @task or other constructs to users, but I agree that we can provide another set of these "higher-level" functional APIs which utilize the "lower-level" APIs (i.e. run_step, etc)

Comment on lines +25 to +27
* `when(cond, if_op, else_op)` where
* `cond` can be any of the following predicates: `equal()`, `not_equal()`, `bigger()`, `smaller()`, `bigger_equal()`, `smaller_equal()`.
* The operation defined in `if_op` will be executed when `cond` is true. Otherwise, `else_op` will be executed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we missing a next op that simply specifies the order of steps? (e.g. a linear workflow)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it was intentional since Couler's APIs are imperative so users can define the op and once it's declared it will be added automatically to the workflow as one of the steps. Users can also set the dependencies explicitly to construct a linear workflow if they want.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add dag and set_dependencies in this doc?

Based on the current dag design, if users want to add a when judge, they can only add couler.when outside of the couler.dag. What's the best practice of this case?

@paguos
Copy link
Contributor

paguos commented Sep 27, 2020

Does couler support local execution of workflows? And it would be great to expand the API to support serverless frameworks like AWS lambda

@terrytangyuan
Copy link
Member Author

terrytangyuan commented Sep 28, 2020

@paguos Either local execution or serverless frameworks has not been taken into consideration yet. For Argo backend, Argo Events might be a potential direction. In the meantime, feel free to share any thoughts and possible approaches with us.

@merlintang
Copy link
Member

/lgtm

@terrytangyuan terrytangyuan merged commit 9c3addb into master Sep 28, 2020
@terrytangyuan terrytangyuan deleted the api-design branch September 28, 2020 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants