Skip to content

bballamudi/marquez-airflow

 
 

Repository files navigation

marquez-airflow

CircleCI codecov status Gitter version license

A library that integrates Airflow DAGs with Marquez for automatic metadata collection.

Status

This library is under active development at WeWork.

Requirements

Installation

$ pip install marquez-airflow

To install from source run:

$ python setup.py install

Usage

To use this library, the line from airflow import DAG needs to be replaced by from marquez_airflow import DAG, see example below:

from marquez_airflow import DAG
from airflow.operators.dummy_operator import DummyOperator


DAG_NAME = 'my_DAG_name'

default_args = {
    'marquez_location': 'github://data-dags/dag_location/',
    'marquez_input_urns': ["s3://some_data", "s3://more_data"],
    'marquez_output_urns': ["s3://output_data"],
    
    'owner': ...,
    'depends_on_past': False,
    'start_date': ...,
}

dag = DAG(DAG_NAME, schedule_interval='*/10 * * * *',
          default_args=default_args, description="yet another DAG")

run_this = DummyOperator(task_id='run_this', dag=dag)
run_this_too = DummyOperator(task_id='run_this_too', dag=dag)
run_this_too.set_upstream(run_this)

Contributing

See CONTRIBUTING.md for more details about how to contribute.

Packages

No packages published

Languages

  • Python 90.3%
  • Shell 9.0%
  • Dockerfile 0.7%