Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add run_if param to TaskSettings #53
add run_if param to TaskSettings #53
Changes from all commits
08a09bc
1da52a3
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
A task in Databricks workflows refers to a single unit of work that is executed as part of a larger data processing pipeline. Tasks are typically designed to perform a specific set of operations on data, such as loading data from a source, transforming the data, and storing it in a destination. In brickflow, tasks as designed in such a way that
Assuming, that this is already read - workflow and workflow object is created
Task
Databricks workflow task can be created by decorating a python function with brickflow's task function
Task dependency
Define task dependency by using a variable "depends_on" in the task function. You can provide the dependent tasks as direct python callables or string or list of callables/strings
Task parameters
Task parameters can be defined as key value pairs in the function definition on which task is defined
Common task parameters
In the workflows section, we saw how the common task parameters are created at the workflow level. Now in this section, we shall see how to use the common task parameters
Inbuilt task parameters
There are many inbuilt task parameters that be accessed using brickflow context like above
Clusters
There is a flexibility to use different clusters for each task or assign custom clusters
Libraries
There is a flexibility to use specific libraries for a particular task
Task types
There are different task types that are supported by brickflow right now. The default task type that is used by brickflow is NOTEBOOK
Trigger rules
There are two types of trigger rules that can be applied on a task. It can be either ALL_SUCCESS or NONE_FAILED
Tasks conditional run
Adding condition for task running based on result of parent tasks
This option is determining whether the task is run once its dependencies have been completed. Available options:
ALL_SUCCESS
: All dependencies have executed and succeededAT_LEAST_ONE_SUCCESS
: At least one dependency has succeededNONE_FAILED
: None of the dependencies have failed and at least one was executedALL_DONE
: All dependencies completed and at least one was executedAT_LEAST_ONE_FAILED
: At least one dependency failedALL_FAILED
: ALl dependencies have failedAirflow Operators
We have adopted/extended certain airflow operators that might be needed to run as a task in databricks workflows. Typically for airflow operators we return the operator and brickflow will execute the operator based on task return type.
Bash Operator
You will be able to use bash operator as below
Task Dependency Sensor
Even if you migrate to databricks workflows, brickflow gives you the flexibility to have a dependency on the airflow job
Autosys Sensor
This operator calls an Autosys API and is used to place a dependency on Autosys jobs, when necessary.
Workflow Dependency Sensor
Wait for a workflow to finish before kicking off the current workflow's tasks