Skip to content

[WIP][AIRFLOW-594] Add dag source plugin hooks#4531

Closed
drewsonne wants to merge 11 commits intoapache:masterfrom
drewsonne:add_dag_source_plugin_hooks
Closed

[WIP][AIRFLOW-594] Add dag source plugin hooks#4531
drewsonne wants to merge 11 commits intoapache:masterfrom
drewsonne:add_dag_source_plugin_hooks

Conversation

@drewsonne
Copy link
Contributor

@drewsonne drewsonne commented Jan 15, 2019

Make sure you have checked all steps below.

Jira

Description

  • Here are some details about my PR, including screenshots of any UI changes:

    This PR is a precursor to loading DAGs from pip packages. It provides a new plugin structure, AirflowPluginDagSource with two methods collect_dags_from_plugins and put_dag_plugins_on_disk where a Plugin author can:

  • implement the former and provide logic to inject the airflow.model.DAG objects into a provided airflow.modal.DagBag object using their own custom logic,

  • or implement the latter and provide logic to write the dag python files to disk in the dag folder where they will be collected by the normal DAG loading setup.

From discussion below, potential use cases could be:

  • to load dags from pip packages
  • to load dags from (for example) S3, GCP Cloud Storage, Azure Blob Storage (using the built in hooks/connections);
  • integration with say, https://github.com/wooga/airconditioner without mixing yaml and python for dag definitions;
  • tighter integration with https://github.com/python-bonobo/bonobo
  • as an intermediary step to handle some automatic lineage injection on DAG load
  • perform some validation or linting of DAGs before loading.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added.
    • All the public functions and the classes in the PR contain docstrings that explain what it does

Code Quality

  • Passes flake8

@bjoernpollex-sc
Copy link

It feels like creating a plugin to deploy DAGs is an unnecessary extra step. Also, DAGs aren't really plugins, right? Wouldn't it be easier to just provide a separate entry-point for DAGs? #730, which never got merged, has an example of this.

Btw., thanks for working on this, and also #4412, these are great improvements!

@drewsonne
Copy link
Contributor Author

drewsonne commented Jan 15, 2019

This doesn't assume DAGs are plugins, and although it would be easier for this specific functionality, yes, if someone wanted to:

then we need new PRs for each of those pieces of functionality and wait for the release cycle, unless we're building airflow from src in our prod env's.

If this (eventually, after I clean the PR up) is approved, the next step is a built-in plugin to load DAGs from entrypoints.

@bjoernpollex-sc
Copy link

Great points, I was too focused on my own particular use-case!

@drewsonne
Copy link
Contributor Author

You did make me think some of these through a bit harder, and I think I should change the PR to reflect :-)

@drewsonne drewsonne force-pushed the add_dag_source_plugin_hooks branch from 609eb7d to 0a3477d Compare January 16, 2019 06:59
@drewsonne drewsonne changed the title [WIP] Add dag source plugin hooks [WIP][AIRFLOW-594] Add dag source plugin hooks Jan 16, 2019
@drewsonne drewsonne force-pushed the add_dag_source_plugin_hooks branch 7 times, most recently from 3494b72 to f6fddde Compare January 22, 2019 18:18
@drewsonne drewsonne force-pushed the add_dag_source_plugin_hooks branch 4 times, most recently from adad27b to 1c64713 Compare February 14, 2019 05:33
@drewsonne drewsonne force-pushed the add_dag_source_plugin_hooks branch 2 times, most recently from 9b2221d to c17d546 Compare February 23, 2019 19:43
@drewsonne drewsonne force-pushed the add_dag_source_plugin_hooks branch 2 times, most recently from b64cb4e to 5aba041 Compare April 6, 2019 08:15
docs/conf.py Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This look like a unreleated changes. This should be done in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I was trying to figure out why the doc step was failing in the build on travis

docs/plugins.rst Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I thought about it yesterday and I was planning to improve it.

docs/plugins.rst Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* For ``AirflowPluginDagSource`` plugin, ``name``, ``add_dags_to_dagbag``, and
* For ::class:`~airflow.plugin.AirflowPluginDagSource` plugin, ``name``, ``add_dags_to_dagbag``, and

This will allow automatic linking to the class documentation when such documentation will be created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gotta fix the test cases and then I'll crack on with the doc problems :-D

Add property to the AirflowPlugin object which takes a list of `AirflowPluginDagSource` objects to provide a structure to aggregate DAGs from disparate sources and provide them to Airflow.
Provides Hooks in model initialisaiton to allow callbacks to DAG source plugins to inject DAGs at runtime.
@drewsonne drewsonne force-pushed the add_dag_source_plugin_hooks branch from 29cfc09 to 6037183 Compare April 11, 2019 19:20
@drewsonne drewsonne closed this Apr 12, 2019
@drewsonne drewsonne deleted the add_dag_source_plugin_hooks branch April 12, 2019 05:34
@drewsonne drewsonne restored the add_dag_source_plugin_hooks branch April 12, 2019 05:53
@drewsonne drewsonne deleted the add_dag_source_plugin_hooks branch April 12, 2019 05:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants