Skip to content

Latest commit

 

History

History
421 lines (374 loc) · 23.2 KB

code.rst

File metadata and controls

421 lines (374 loc) · 23.2 KB

API Reference

Operators

Operators allow for generation of certain types of tasks that become nodes in the DAG when instantiated. All operators derive from BaseOperator and inherit many attributes and methods that way. Refer to the BaseOperator documentation for more details.

There are 3 main types of operators:

  • Operators that performs an action, or tell another system to perform an action
  • Transfer operators move data from one system to another
  • Sensors are a certain type of operator that will keep running until a certain criterion is met. Examples include a specific file landing in HDFS or S3, a partition appearing in Hive, or a specific time of the day. Sensors are derived from BaseSensorOperator and run a poke method at a specified poke_interval until it returns True.

BaseOperator

All operators are derived from BaseOperator and acquire much functionality through inheritance. Since this is the core of the engine, it's worth taking the time to understand the parameters of BaseOperator to understand the primitive features that can be leveraged in your DAGs.

airflow.models.BaseOperator

BaseSensorOperator

All sensors are derived from BaseSensorOperator. All sensors inherit the timeout and poke_interval on top of the BaseOperator attributes.

airflow.sensors.base_sensor_operator.BaseSensorOperator

Core Operators

Operators

airflow.operators.bash_operator.BashOperator

airflow.operators.python_operator.BranchPythonOperator

airflow.operators.check_operator.CheckOperator

airflow.operators.docker_operator.DockerOperator

airflow.operators.dummy_operator.DummyOperator

airflow.operators.druid_check_operator.DruidCheckOperator

airflow.operators.email_operator.EmailOperator

airflow.operators.generic_transfer.GenericTransfer

airflow.operators.hive_to_druid.HiveToDruidTransfer

airflow.operators.hive_to_mysql.HiveToMySqlTransfer

airflow.operators.hive_to_samba_operator.Hive2SambaOperator

airflow.operators.hive_operator.HiveOperator

airflow.operators.hive_stats_operator.HiveStatsCollectionOperator

airflow.operators.check_operator.IntervalCheckOperator

airflow.operators.jdbc_operator.JdbcOperator

airflow.operators.latest_only_operator.LatestOnlyOperator

airflow.operators.mssql_operator.MsSqlOperator

airflow.operators.mssql_to_hive.MsSqlToHiveTransfer

airflow.operators.mysql_operator.MySqlOperator

airflow.operators.mysql_to_hive.MySqlToHiveTransfer

airflow.operators.oracle_operator.OracleOperator

airflow.operators.pig_operator.PigOperator

airflow.operators.postgres_operator.PostgresOperator

airflow.operators.presto_check_operator.PrestoCheckOperator

airflow.operators.presto_check_operator.PrestoIntervalCheckOperator

airflow.operators.presto_to_mysql.PrestoToMySqlTransfer

airflow.operators.presto_check_operator.PrestoValueCheckOperator

airflow.operators.python_operator.PythonOperator

airflow.operators.python_operator.PythonVirtualenvOperator

airflow.operators.s3_file_transform_operator.S3FileTransformOperator

airflow.operators.s3_to_hive_operator.S3ToHiveTransfer

airflow.operators.s3_to_redshift_operator.S3ToRedshiftTransfer

airflow.operators.python_operator.ShortCircuitOperator

airflow.operators.http_operator.SimpleHttpOperator

airflow.operators.slack_operator.SlackAPIOperator

airflow.operators.slack_operator.SlackAPIPostOperator

airflow.operators.sqlite_operator.SqliteOperator

airflow.operators.subdag_operator.SubDagOperator

airflow.operators.dagrun_operator.TriggerDagRunOperator

airflow.operators.check_operator.ValueCheckOperator

airflow.operators.redshift_to_s3_operator.RedshiftToS3Transfer

Sensors

airflow.sensors.external_task_sensor.ExternalTaskSensor

airflow.sensors.hdfs_sensor.HdfsSensor

airflow.sensors.hive_partition_sensor.HivePartitionSensor

airflow.sensors.http_sensor.HttpSensor

airflow.sensors.metastore_partition_sensor.MetastorePartitionSensor

airflow.sensors.named_hive_partition_sensor.NamedHivePartitionSensor

airflow.sensors.s3_key_sensor.S3KeySensor

airflow.sensors.s3_prefix_sensor.S3PrefixSensor

airflow.sensors.sql_sensor.SqlSensor

airflow.sensors.time_sensor.TimeSensor

airflow.sensors.time_delta_sensor.TimeDeltaSensor

airflow.sensors.web_hdfs_sensor.WebHdfsSensor

Community-contributed Operators

Operators

airflow.contrib.operators.awsbatch_operator.AWSBatchOperator

airflow.contrib.operators.bigquery_check_operator.BigQueryCheckOperator

airflow.contrib.operators.bigquery_check_operator.BigQueryValueCheckOperator

airflow.contrib.operators.bigquery_check_operator.BigQueryIntervalCheckOperator

airflow.contrib.operators.bigquery_get_data.BigQueryGetDataOperator

airflow.contrib.operators.bigquery_operator.BigQueryCreateEmptyTableOperator

airflow.contrib.operators.bigquery_operator.BigQueryCreateExternalTableOperator

airflow.contrib.operators.bigquery_operator.BigQueryOperator

airflow.contrib.operators.bigquery_table_delete_operator.BigQueryTableDeleteOperator

airflow.contrib.operators.bigquery_to_bigquery.BigQueryToBigQueryOperator

airflow.contrib.operators.bigquery_to_gcs.BigQueryToCloudStorageOperator

airflow.contrib.operators.cassandra_to_gcs.CassandraToGoogleCloudStorageOperator

airflow.contrib.operators.databricks_operator.DatabricksSubmitRunOperator

airflow.contrib.operators.dataflow_operator.DataFlowJavaOperator

airflow.contrib.operators.dataflow_operator.DataflowTemplateOperator

airflow.contrib.operators.dataflow_operator.DataFlowPythonOperator

airflow.contrib.operators.dataproc_operator.DataprocClusterCreateOperator

airflow.contrib.operators.dataproc_operator.DataprocClusterScaleOperator

airflow.contrib.operators.dataproc_operator.DataprocClusterDeleteOperator

airflow.contrib.operators.dataproc_operator.DataProcPigOperator

airflow.contrib.operators.dataproc_operator.DataProcHiveOperator

airflow.contrib.operators.dataproc_operator.DataProcSparkSqlOperator

airflow.contrib.operators.dataproc_operator.DataProcSparkOperator

airflow.contrib.operators.dataproc_operator.DataProcHadoopOperator

airflow.contrib.operators.dataproc_operator.DataProcPySparkOperator

airflow.contrib.operators.dataproc_operator.DataprocWorkflowTemplateBaseOperator

airflow.contrib.operators.dataproc_operator.DataprocWorkflowTemplateInstantiateOperator

airflow.contrib.operators.dataproc_operator.DataprocWorkflowTemplateInstantiateInlineOperator

airflow.contrib.operators.datastore_export_operator.DatastoreExportOperator

airflow.contrib.operators.datastore_import_operator.DatastoreImportOperator

airflow.contrib.operators.discord_webhook_operator.DiscordWebhookOperator

airflow.contrib.operators.druid_operator.DruidOperator

airflow.contrib.operators.ecs_operator.ECSOperator

airflow.contrib.operators.emr_add_steps_operator.EmrAddStepsOperator

airflow.contrib.operators.emr_create_job_flow_operator.EmrCreateJobFlowOperator

airflow.contrib.operators.emr_terminate_job_flow_operator.EmrTerminateJobFlowOperator

airflow.contrib.operators.file_to_gcs.FileToGoogleCloudStorageOperator

airflow.contrib.operators.file_to_wasb.FileToWasbOperator

airflow.contrib.operators.gcp_container_operator.GKEClusterCreateOperator

airflow.contrib.operators.gcp_container_operator.GKEClusterDeleteOperator

airflow.contrib.operators.gcs_download_operator.GoogleCloudStorageDownloadOperator

airflow.contrib.operators.gcs_list_operator.GoogleCloudStorageListOperator

airflow.contrib.operators.gcs_operator.GoogleCloudStorageCreateBucketOperator

airflow.contrib.operators.gcs_to_bq.GoogleCloudStorageToBigQueryOperator

airflow.contrib.operators.gcs_to_gcs.GoogleCloudStorageToGoogleCloudStorageOperator

airflow.contrib.operators.gcs_to_s3.GoogleCloudStorageToS3Operator

airflow.contrib.operators.hipchat_operator.HipChatAPIOperator

airflow.contrib.operators.hipchat_operator.HipChatAPISendRoomNotificationOperator

airflow.contrib.operators.hive_to_dynamodb.HiveToDynamoDBTransferOperator

airflow.contrib.operators.jenkins_job_trigger_operator.JenkinsJobTriggerOperator

airflow.contrib.operators.jira_operator.JiraOperator

airflow.contrib.operators.kubernetes_pod_operator.KubernetesPodOperator

airflow.contrib.operators.mlengine_operator.MLEngineBatchPredictionOperator

airflow.contrib.operators.mlengine_operator.MLEngineModelOperator

airflow.contrib.operators.mlengine_operator.MLEngineVersionOperator

airflow.contrib.operators.mlengine_operator.MLEngineTrainingOperator

airflow.contrib.operators.mongo_to_s3.MongoToS3Operator

airflow.contrib.operators.mysql_to_gcs.MySqlToGoogleCloudStorageOperator

airflow.contrib.operators.postgres_to_gcs_operator.PostgresToGoogleCloudStorageOperator

airflow.contrib.operators.pubsub_operator.PubSubTopicCreateOperator

airflow.contrib.operators.pubsub_operator.PubSubTopicDeleteOperator

airflow.contrib.operators.pubsub_operator.PubSubSubscriptionCreateOperator

airflow.contrib.operators.pubsub_operator.PubSubSubscriptionDeleteOperator

airflow.contrib.operators.pubsub_operator.PubSubPublishOperator

airflow.contrib.operators.qubole_check_operator.QuboleCheckOperator

airflow.contrib.operators.qubole_check_operator.QuboleValueCheckOperator

airflow.contrib.operators.qubole_operator.QuboleOperator

airflow.contrib.operators.s3_list_operator.S3ListOperator

airflow.contrib.operators.s3_to_gcs_operator.S3ToGoogleCloudStorageOperator

airflow.contrib.operators.segment_track_event_operator.SegmentTrackEventOperator

airflow.contrib.operators.sftp_operator.SFTPOperator

airflow.contrib.operators.slack_webhook_operator.SlackWebhookOperator

airflow.contrib.operators.snowflake_operator.SnowflakeOperator

airflow.contrib.operators.spark_jdbc_operator.SparkJDBCOperator

airflow.contrib.operators.spark_sql_operator.SparkSqlOperator

airflow.contrib.operators.spark_submit_operator.SparkSubmitOperator

airflow.contrib.operators.sqoop_operator.SqoopOperator

airflow.contrib.operators.ssh_operator.SSHOperator

airflow.contrib.operators.vertica_operator.VerticaOperator

airflow.contrib.operators.vertica_to_hive.VerticaToHiveTransfer

airflow.contrib.operators.winrm_operator.WinRMOperator

Sensors

airflow.contrib.sensors.aws_redshift_cluster_sensor.AwsRedshiftClusterSensor

airflow.contrib.sensors.bash_sensor.BashSensor

airflow.contrib.sensors.bigquery_sensor.BigQueryTableSensor

airflow.contrib.sensors.cassandra_sensor.CassandraRecordSensor

airflow.contrib.sensors.datadog_sensor.DatadogSensor

airflow.contrib.sensors.emr_base_sensor.EmrBaseSensor

airflow.contrib.sensors.emr_job_flow_sensor.EmrJobFlowSensor

airflow.contrib.sensors.emr_step_sensor.EmrStepSensor

airflow.contrib.sensors.file_sensor.FileSensor

airflow.contrib.sensors.ftp_sensor.FTPSensor

airflow.contrib.sensors.ftp_sensor.FTPSSensor

airflow.contrib.sensors.gcs_sensor.GoogleCloudStorageObjectSensor

airflow.contrib.sensors.gcs_sensor.GoogleCloudStorageObjectUpdatedSensor

airflow.contrib.sensors.gcs_sensor.GoogleCloudStoragePrefixSensor

airflow.contrib.sensors.hdfs_sensor.HdfsSensorFolder

airflow.contrib.sensors.hdfs_sensor.HdfsSensorRegex

airflow.contrib.sensors.jira_sensor.JiraSensor

airflow.contrib.sensors.pubsub_sensor.PubSubPullSensor

airflow.contrib.sensors.qubole_sensor.QuboleSensor

airflow.contrib.sensors.redis_key_sensor.RedisKeySensor

airflow.contrib.sensors.sftp_sensor.SFTPSensor

airflow.contrib.sensors.wasb_sensor.WasbBlobSensor

Macros

Here's a list of variables and macros that can be used in templates

Default Variables

The Airflow engine passes a few variables by default that are accessible in all templates

Variable Description
{{ ds }} the execution date as YYYY-MM-DD
{{ ds_nodash }} the execution date as YYYYMMDD

{{ prev_ds }}

the previous execution date as YYYY-MM-DD. if {{ ds }} is 2016-01-08 and schedule_interval is @weekly, {{ prev_ds }} will be 2016-01-01.

{{ next_ds }}

the next execution date as YYYY-MM-DD. if {{ ds }} is 2016-01-01 and schedule_interval is @weekly, {{ prev_ds }} will be 2016-01-08.

{{ yesterday_ds }} yesterday's date as YYYY-MM-DD
{{ yesterday_ds_nodash }} yesterday's date as YYYYMMDD
{{ tomorrow_ds }} tomorrow's date as YYYY-MM-DD
{{ tomorrow_ds_nodash }} tomorrow's date as YYYYMMDD
{{ ts }} same as execution_date.isoformat()
{{ ts_nodash }} same as ts without - and :
{{ execution_date }} the execution_date, (datetime.datetime)
{{ prev_execution_date }} the previous execution date (if available) (datetime.datetime)
{{ next_execution_date }} the next execution date (datetime.datetime)
{{ dag }} the DAG object
{{ task }} the Task object
{{ macros }} a reference to the macros package, described below
{{ task_instance }} the task_instance object
{{ end_date }} same as {{ ds }}
{{ latest_date }} same as {{ ds }}
{{ ti }} same as {{ task_instance }}

{{ params }}

a reference to the user-defined params dictionary which can be overridden by the dictionary passed through trigger_dag -c if you enabled dag_run_conf_overrides_params` inairflow.cfg``

{{ var.value.my_var }} global defined variables represented as a dictionary

{{ var.json.my_var.path }}

global defined variables represented as a dictionary with deserialized JSON object, append the path to the key within the JSON object

{{ task_instance_key_str }}

a unique, human-readable key to the task instance formatted {dag_id}_{task_id}_{ds}

{{ conf }}

the full configuration object located at airflow.configuration.conf which represents the content of your airflow.cfg

{{ run_id }} the run_id of the current DAG run
{{ dag_run }} a reference to the DagRun object

{{ test_mode }}

whether the task instance was called using the CLI's test subcommand

Note that you can access the object's attributes and methods with simple dot notation. Here are some examples of what is possible: {{ task.owner }}, {{ task.task_id }}, {{ ti.hostname }}, ... Refer to the models documentation for more information on the objects' attributes and methods.

The var template variable allows you to access variables defined in Airflow's UI. You can access them as either plain-text or JSON. If you use JSON, you are also able to walk nested structures, such as dictionaries like: {{ var.json.my_dict_var.key1 }}

Macros

Macros are a way to expose objects to your templates and live under the macros namespace in your templates.

A few commonly used libraries and methods are made available.

Variable Description
macros.datetime The standard lib's datetime.datetime
macros.timedelta

The standard lib's datetime.timedelta

macros.dateutil A reference to the dateutil package
macros.time The standard lib's time
macros.uuid The standard lib's uuid
macros.random The standard lib's random

Some airflow specific macros are also defined:

airflow.macros

airflow.macros.hive.closest_ds_partition

airflow.macros.hive.max_partition

Models

Models are built on top of the SQLAlchemy ORM Base class, and instances are persisted in the database.

airflow.models

Hooks

Hooks are interfaces to external platforms and databases, implementing a common interface when possible and acting as building blocks for operators.

airflow.hooks.dbapi_hook.DbApiHook

airflow.hooks.docker_hook.DockerHook

airflow.hooks.hive_hooks

airflow.hooks.http_hook.HttpHook

airflow.hooks.druid_hook.DruidDbApiHook

airflow.hooks.druid_hook.DruidHook

airflow.hooks.hdfs_hook.HDFSHook

airflow.hooks.jdbc_hook.JdbcHook

airflow.hooks.mssql_hook.MsSqlHook

airflow.hooks.mysql_hook.MySqlHook

airflow.hooks.oracle_hook.OracleHook

airflow.hooks.pig_hook.PigCliHook

airflow.hooks.postgres_hook.PostgresHook

airflow.hooks.presto_hook.PrestoHook

airflow.hooks.S3_hook.S3Hook

airflow.hooks.samba_hook.SambaHook

airflow.hooks.slack_hook.SlackHook

airflow.hooks.sqlite_hook.SqliteHook

airflow.hooks.webhdfs_hook.WebHDFSHook

airflow.hooks.zendesk_hook.ZendeskHook

Community contributed hooks

airflow.contrib.hooks.aws_dynamodb_hook.AwsDynamoDBHook

airflow.contrib.hooks.aws_hook.AwsHook

airflow.contrib.hooks.aws_lambda_hook.AwsLambdaHook

airflow.contrib.hooks.azure_data_lake_hook.AzureDataLakeHook

airflow.contrib.hooks.azure_fileshare_hook.AzureFileShareHook

airflow.contrib.hooks.bigquery_hook.BigQueryHook

airflow.contrib.hooks.cassandra_hook.CassandraHook

airflow.contrib.hooks.cloudant_hook.CloudantHook

airflow.contrib.hooks.databricks_hook.DatabricksHook

airflow.contrib.hooks.datadog_hook.DatadogHook

airflow.contrib.hooks.datastore_hook.DatastoreHook

airflow.contrib.hooks.discord_webhook_hook.DiscordWebhookHook

airflow.contrib.hooks.emr_hook.EmrHook

airflow.contrib.hooks.fs_hook.FSHook

airflow.contrib.hooks.ftp_hook.FTPHook

airflow.contrib.hooks.ftp_hook.FTPSHook

airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook

airflow.contrib.hooks.gcp_container_hook.GKEClusterHook

airflow.contrib.hooks.gcp_dataflow_hook.DataFlowHook

airflow.contrib.hooks.gcp_dataproc_hook.DataProcHook

airflow.contrib.hooks.gcp_mlengine_hook.MLEngineHook

airflow.contrib.hooks.gcp_pubsub_hook.PubSubHook

airflow.contrib.hooks.gcs_hook.GoogleCloudStorageHook

airflow.contrib.hooks.jenkins_hook.JenkinsHook

airflow.contrib.hooks.jira_hook.JiraHook

airflow.contrib.hooks.mongo_hook.MongoHook

airflow.contrib.hooks.pinot_hook.PinotDbApiHook

airflow.contrib.hooks.qubole_hook.QuboleHook

airflow.contrib.hooks.redis_hook.RedisHook

airflow.contrib.hooks.redshift_hook.RedshiftHook

airflow.contrib.hooks.salesforce_hook.SalesforceHook

airflow.contrib.hooks.segment_hook.SegmentHook

airflow.contrib.hooks.sftp_hook.SFTPHook

airflow.contrib.hooks.slack_webhook_hook.SlackWebhookHook

airflow.contrib.hooks.snowflake_hook.SnowflakeHook

airflow.contrib.hooks.spark_jdbc_hook.SparkJDBCHook

airflow.contrib.hooks.spark_sql_hook.SparkSqlHook

airflow.contrib.hooks.spark_submit_hook.SparkSubmitHook

airflow.contrib.hooks.sqoop_hook.SqoopHook

airflow.contrib.hooks.ssh_hook.SSHHook

airflow.contrib.hooks.vertica_hook.VerticaHook

airflow.contrib.hooks.wasb_hook.WasbHook

airflow.contrib.hooks.winrm_hook.WinRMHook

Executors

Executors are the mechanism by which task instances get run.

airflow.executors.local_executor.LocalExecutor

airflow.executors.celery_executor.CeleryExecutor

airflow.executors.sequential_executor.SequentialExecutor

Community-contributed executors

airflow.contrib.executors.mesos_executor.MesosExecutor