Operators allow for generation of certain types of tasks that become nodes in the DAG when instantiated. All operators derive from BaseOperator
and inherit many attributes and methods that way. Refer to the BaseOperator documentation for more details.
There are 3 main types of operators:
- Operators that performs an action, or tell another system to perform an action
- Transfer operators move data from one system to another
- Sensors are a certain type of operator that will keep running until a certain criterion is met. Examples include a specific file landing in HDFS or S3, a partition appearing in Hive, or a specific time of the day. Sensors are derived from
BaseSensorOperator
and run a poke method at a specifiedpoke_interval
until it returnsTrue
.
All operators are derived from BaseOperator
and acquire much functionality through inheritance. Since this is the core of the engine, it's worth taking the time to understand the parameters of BaseOperator
to understand the primitive features that can be leveraged in your DAGs.
airflow.models.BaseOperator
All sensors are derived from BaseSensorOperator
. All sensors inherit the timeout
and poke_interval
on top of the BaseOperator
attributes.
airflow.sensors.base_sensor_operator.BaseSensorOperator
airflow.operators.bash_operator.BashOperator
airflow.operators.python_operator.BranchPythonOperator
airflow.operators.check_operator.CheckOperator
airflow.operators.docker_operator.DockerOperator
airflow.operators.dummy_operator.DummyOperator
airflow.operators.druid_check_operator.DruidCheckOperator
airflow.operators.email_operator.EmailOperator
airflow.operators.generic_transfer.GenericTransfer
airflow.operators.hive_to_druid.HiveToDruidTransfer
airflow.operators.hive_to_mysql.HiveToMySqlTransfer
airflow.operators.hive_to_samba_operator.Hive2SambaOperator
airflow.operators.hive_operator.HiveOperator
airflow.operators.hive_stats_operator.HiveStatsCollectionOperator
airflow.operators.check_operator.IntervalCheckOperator
airflow.operators.jdbc_operator.JdbcOperator
airflow.operators.latest_only_operator.LatestOnlyOperator
airflow.operators.mssql_operator.MsSqlOperator
airflow.operators.mssql_to_hive.MsSqlToHiveTransfer
airflow.operators.mysql_operator.MySqlOperator
airflow.operators.mysql_to_hive.MySqlToHiveTransfer
airflow.operators.oracle_operator.OracleOperator
airflow.operators.pig_operator.PigOperator
airflow.operators.postgres_operator.PostgresOperator
airflow.operators.presto_check_operator.PrestoCheckOperator
airflow.operators.presto_check_operator.PrestoIntervalCheckOperator
airflow.operators.presto_to_mysql.PrestoToMySqlTransfer
airflow.operators.presto_check_operator.PrestoValueCheckOperator
airflow.operators.python_operator.PythonOperator
airflow.operators.python_operator.PythonVirtualenvOperator
airflow.operators.s3_file_transform_operator.S3FileTransformOperator
airflow.operators.s3_to_hive_operator.S3ToHiveTransfer
airflow.operators.s3_to_redshift_operator.S3ToRedshiftTransfer
airflow.operators.python_operator.ShortCircuitOperator
airflow.operators.http_operator.SimpleHttpOperator
airflow.operators.slack_operator.SlackAPIOperator
airflow.operators.slack_operator.SlackAPIPostOperator
airflow.operators.sqlite_operator.SqliteOperator
airflow.operators.subdag_operator.SubDagOperator
airflow.operators.dagrun_operator.TriggerDagRunOperator
airflow.operators.check_operator.ValueCheckOperator
airflow.operators.redshift_to_s3_operator.RedshiftToS3Transfer
airflow.sensors.external_task_sensor.ExternalTaskSensor
airflow.sensors.hdfs_sensor.HdfsSensor
airflow.sensors.hive_partition_sensor.HivePartitionSensor
airflow.sensors.http_sensor.HttpSensor
airflow.sensors.metastore_partition_sensor.MetastorePartitionSensor
airflow.sensors.named_hive_partition_sensor.NamedHivePartitionSensor
airflow.sensors.s3_key_sensor.S3KeySensor
airflow.sensors.s3_prefix_sensor.S3PrefixSensor
airflow.sensors.sql_sensor.SqlSensor
airflow.sensors.time_sensor.TimeSensor
airflow.sensors.time_delta_sensor.TimeDeltaSensor
airflow.sensors.web_hdfs_sensor.WebHdfsSensor
airflow.contrib.operators.awsbatch_operator.AWSBatchOperator
airflow.contrib.operators.bigquery_check_operator.BigQueryCheckOperator
airflow.contrib.operators.bigquery_check_operator.BigQueryValueCheckOperator
airflow.contrib.operators.bigquery_check_operator.BigQueryIntervalCheckOperator
airflow.contrib.operators.bigquery_get_data.BigQueryGetDataOperator
airflow.contrib.operators.bigquery_operator.BigQueryCreateEmptyTableOperator
airflow.contrib.operators.bigquery_operator.BigQueryCreateExternalTableOperator
airflow.contrib.operators.bigquery_operator.BigQueryOperator
airflow.contrib.operators.bigquery_table_delete_operator.BigQueryTableDeleteOperator
airflow.contrib.operators.bigquery_to_bigquery.BigQueryToBigQueryOperator
airflow.contrib.operators.bigquery_to_gcs.BigQueryToCloudStorageOperator
airflow.contrib.operators.cassandra_to_gcs.CassandraToGoogleCloudStorageOperator
airflow.contrib.operators.databricks_operator.DatabricksSubmitRunOperator
airflow.contrib.operators.dataflow_operator.DataFlowJavaOperator
airflow.contrib.operators.dataflow_operator.DataflowTemplateOperator
airflow.contrib.operators.dataflow_operator.DataFlowPythonOperator
airflow.contrib.operators.dataproc_operator.DataprocClusterCreateOperator
airflow.contrib.operators.dataproc_operator.DataprocClusterScaleOperator
airflow.contrib.operators.dataproc_operator.DataprocClusterDeleteOperator
airflow.contrib.operators.dataproc_operator.DataProcPigOperator
airflow.contrib.operators.dataproc_operator.DataProcHiveOperator
airflow.contrib.operators.dataproc_operator.DataProcSparkSqlOperator
airflow.contrib.operators.dataproc_operator.DataProcSparkOperator
airflow.contrib.operators.dataproc_operator.DataProcHadoopOperator
airflow.contrib.operators.dataproc_operator.DataProcPySparkOperator
airflow.contrib.operators.dataproc_operator.DataprocWorkflowTemplateBaseOperator
airflow.contrib.operators.dataproc_operator.DataprocWorkflowTemplateInstantiateOperator
airflow.contrib.operators.dataproc_operator.DataprocWorkflowTemplateInstantiateInlineOperator
airflow.contrib.operators.datastore_export_operator.DatastoreExportOperator
airflow.contrib.operators.datastore_import_operator.DatastoreImportOperator
airflow.contrib.operators.discord_webhook_operator.DiscordWebhookOperator
airflow.contrib.operators.druid_operator.DruidOperator
airflow.contrib.operators.ecs_operator.ECSOperator
airflow.contrib.operators.emr_add_steps_operator.EmrAddStepsOperator
airflow.contrib.operators.emr_create_job_flow_operator.EmrCreateJobFlowOperator
airflow.contrib.operators.emr_terminate_job_flow_operator.EmrTerminateJobFlowOperator
airflow.contrib.operators.file_to_gcs.FileToGoogleCloudStorageOperator
airflow.contrib.operators.file_to_wasb.FileToWasbOperator
airflow.contrib.operators.gcp_container_operator.GKEClusterCreateOperator
airflow.contrib.operators.gcp_container_operator.GKEClusterDeleteOperator
airflow.contrib.operators.gcs_download_operator.GoogleCloudStorageDownloadOperator
airflow.contrib.operators.gcs_list_operator.GoogleCloudStorageListOperator
airflow.contrib.operators.gcs_operator.GoogleCloudStorageCreateBucketOperator
airflow.contrib.operators.gcs_to_bq.GoogleCloudStorageToBigQueryOperator
airflow.contrib.operators.gcs_to_gcs.GoogleCloudStorageToGoogleCloudStorageOperator
airflow.contrib.operators.gcs_to_s3.GoogleCloudStorageToS3Operator
airflow.contrib.operators.hipchat_operator.HipChatAPIOperator
airflow.contrib.operators.hipchat_operator.HipChatAPISendRoomNotificationOperator
airflow.contrib.operators.hive_to_dynamodb.HiveToDynamoDBTransferOperator
airflow.contrib.operators.jenkins_job_trigger_operator.JenkinsJobTriggerOperator
airflow.contrib.operators.jira_operator.JiraOperator
airflow.contrib.operators.kubernetes_pod_operator.KubernetesPodOperator
airflow.contrib.operators.mlengine_operator.MLEngineBatchPredictionOperator
airflow.contrib.operators.mlengine_operator.MLEngineModelOperator
airflow.contrib.operators.mlengine_operator.MLEngineVersionOperator
airflow.contrib.operators.mlengine_operator.MLEngineTrainingOperator
airflow.contrib.operators.mongo_to_s3.MongoToS3Operator
airflow.contrib.operators.mysql_to_gcs.MySqlToGoogleCloudStorageOperator
airflow.contrib.operators.postgres_to_gcs_operator.PostgresToGoogleCloudStorageOperator
airflow.contrib.operators.pubsub_operator.PubSubTopicCreateOperator
airflow.contrib.operators.pubsub_operator.PubSubTopicDeleteOperator
airflow.contrib.operators.pubsub_operator.PubSubSubscriptionCreateOperator
airflow.contrib.operators.pubsub_operator.PubSubSubscriptionDeleteOperator
airflow.contrib.operators.pubsub_operator.PubSubPublishOperator
airflow.contrib.operators.qubole_check_operator.QuboleCheckOperator
airflow.contrib.operators.qubole_check_operator.QuboleValueCheckOperator
airflow.contrib.operators.qubole_operator.QuboleOperator
airflow.contrib.operators.s3_list_operator.S3ListOperator
airflow.contrib.operators.s3_to_gcs_operator.S3ToGoogleCloudStorageOperator
airflow.contrib.operators.segment_track_event_operator.SegmentTrackEventOperator
airflow.contrib.operators.sftp_operator.SFTPOperator
airflow.contrib.operators.slack_webhook_operator.SlackWebhookOperator
airflow.contrib.operators.snowflake_operator.SnowflakeOperator
airflow.contrib.operators.spark_jdbc_operator.SparkJDBCOperator
airflow.contrib.operators.spark_sql_operator.SparkSqlOperator
airflow.contrib.operators.spark_submit_operator.SparkSubmitOperator
airflow.contrib.operators.sqoop_operator.SqoopOperator
airflow.contrib.operators.ssh_operator.SSHOperator
airflow.contrib.operators.vertica_operator.VerticaOperator
airflow.contrib.operators.vertica_to_hive.VerticaToHiveTransfer
airflow.contrib.operators.winrm_operator.WinRMOperator
airflow.contrib.sensors.aws_redshift_cluster_sensor.AwsRedshiftClusterSensor
airflow.contrib.sensors.bash_sensor.BashSensor
airflow.contrib.sensors.bigquery_sensor.BigQueryTableSensor
airflow.contrib.sensors.cassandra_sensor.CassandraRecordSensor
airflow.contrib.sensors.datadog_sensor.DatadogSensor
airflow.contrib.sensors.emr_base_sensor.EmrBaseSensor
airflow.contrib.sensors.emr_job_flow_sensor.EmrJobFlowSensor
airflow.contrib.sensors.emr_step_sensor.EmrStepSensor
airflow.contrib.sensors.file_sensor.FileSensor
airflow.contrib.sensors.ftp_sensor.FTPSensor
airflow.contrib.sensors.ftp_sensor.FTPSSensor
airflow.contrib.sensors.gcs_sensor.GoogleCloudStorageObjectSensor
airflow.contrib.sensors.gcs_sensor.GoogleCloudStorageObjectUpdatedSensor
airflow.contrib.sensors.gcs_sensor.GoogleCloudStoragePrefixSensor
airflow.contrib.sensors.hdfs_sensor.HdfsSensorFolder
airflow.contrib.sensors.hdfs_sensor.HdfsSensorRegex
airflow.contrib.sensors.jira_sensor.JiraSensor
airflow.contrib.sensors.pubsub_sensor.PubSubPullSensor
airflow.contrib.sensors.qubole_sensor.QuboleSensor
airflow.contrib.sensors.redis_key_sensor.RedisKeySensor
airflow.contrib.sensors.sftp_sensor.SFTPSensor
airflow.contrib.sensors.wasb_sensor.WasbBlobSensor
Here's a list of variables and macros that can be used in templates
The Airflow engine passes a few variables by default that are accessible in all templates
Variable | Description |
---|---|
{{ ds }} |
the execution date as YYYY-MM-DD |
{{ ds_nodash }} |
the execution date as YYYYMMDD |
|
the previous execution date as |
|
the next execution date as |
{{ yesterday_ds }} |
yesterday's date as YYYY-MM-DD |
{{ yesterday_ds_nodash }} |
yesterday's date as YYYYMMDD |
{{ tomorrow_ds }} |
tomorrow's date as YYYY-MM-DD |
{{ tomorrow_ds_nodash }} |
tomorrow's date as YYYYMMDD |
{{ ts }} |
same as execution_date.isoformat() |
{{ ts_nodash }} |
same as ts without - and : |
{{ execution_date }} |
the execution_date, (datetime.datetime) |
{{ prev_execution_date }} |
the previous execution date (if available) (datetime.datetime) |
{{ next_execution_date }} |
the next execution date (datetime.datetime) |
{{ dag }} |
the DAG object |
{{ task }} |
the Task object |
{{ macros }} |
a reference to the macros package, described below |
{{ task_instance }} |
the task_instance object |
{{ end_date }} |
same as {{ ds }} |
{{ latest_date }} |
same as {{ ds }} |
{{ ti }} |
same as {{ task_instance }} |
|
a reference to the user-defined params dictionary which can be overridden by the dictionary passed through |
{{ var.value.my_var }} |
global defined variables represented as a dictionary |
|
global defined variables represented as a dictionary with deserialized JSON object, append the path to the key within the JSON object |
|
a unique, human-readable key to the task instance formatted |
|
the full configuration object located at |
{{ run_id }} |
the run_id of the current DAG run |
{{ dag_run }} |
a reference to the DagRun object |
|
whether the task instance was called using the CLI's test subcommand |
Note that you can access the object's attributes and methods with simple dot notation. Here are some examples of what is possible: {{ task.owner }}
, {{ task.task_id }}
, {{ ti.hostname }}
, ... Refer to the models documentation for more information on the objects' attributes and methods.
The var
template variable allows you to access variables defined in Airflow's UI. You can access them as either plain-text or JSON. If you use JSON, you are also able to walk nested structures, such as dictionaries like: {{ var.json.my_dict_var.key1 }}
Macros are a way to expose objects to your templates and live under the macros
namespace in your templates.
A few commonly used libraries and methods are made available.
Variable | Description |
---|---|
macros.datetime |
The standard lib's datetime.datetime |
macros.timedelta |
|
macros.dateutil |
A reference to the dateutil package |
macros.time |
The standard lib's time |
macros.uuid |
The standard lib's uuid |
macros.random |
The standard lib's random |
Some airflow specific macros are also defined:
airflow.macros
airflow.macros.hive.closest_ds_partition
airflow.macros.hive.max_partition
Models are built on top of the SQLAlchemy ORM Base class, and instances are persisted in the database.
airflow.models
Hooks are interfaces to external platforms and databases, implementing a common interface when possible and acting as building blocks for operators.
airflow.hooks.dbapi_hook.DbApiHook
airflow.hooks.docker_hook.DockerHook
airflow.hooks.hive_hooks
airflow.hooks.http_hook.HttpHook
airflow.hooks.druid_hook.DruidDbApiHook
airflow.hooks.druid_hook.DruidHook
airflow.hooks.hdfs_hook.HDFSHook
airflow.hooks.jdbc_hook.JdbcHook
airflow.hooks.mssql_hook.MsSqlHook
airflow.hooks.mysql_hook.MySqlHook
airflow.hooks.oracle_hook.OracleHook
airflow.hooks.pig_hook.PigCliHook
airflow.hooks.postgres_hook.PostgresHook
airflow.hooks.presto_hook.PrestoHook
airflow.hooks.S3_hook.S3Hook
airflow.hooks.samba_hook.SambaHook
airflow.hooks.slack_hook.SlackHook
airflow.hooks.sqlite_hook.SqliteHook
airflow.hooks.webhdfs_hook.WebHDFSHook
airflow.hooks.zendesk_hook.ZendeskHook
airflow.contrib.hooks.aws_dynamodb_hook.AwsDynamoDBHook
airflow.contrib.hooks.aws_hook.AwsHook
airflow.contrib.hooks.aws_lambda_hook.AwsLambdaHook
airflow.contrib.hooks.azure_data_lake_hook.AzureDataLakeHook
airflow.contrib.hooks.azure_fileshare_hook.AzureFileShareHook
airflow.contrib.hooks.bigquery_hook.BigQueryHook
airflow.contrib.hooks.cassandra_hook.CassandraHook
airflow.contrib.hooks.cloudant_hook.CloudantHook
airflow.contrib.hooks.databricks_hook.DatabricksHook
airflow.contrib.hooks.datadog_hook.DatadogHook
airflow.contrib.hooks.datastore_hook.DatastoreHook
airflow.contrib.hooks.discord_webhook_hook.DiscordWebhookHook
airflow.contrib.hooks.emr_hook.EmrHook
airflow.contrib.hooks.fs_hook.FSHook
airflow.contrib.hooks.ftp_hook.FTPHook
airflow.contrib.hooks.ftp_hook.FTPSHook
airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook
airflow.contrib.hooks.gcp_container_hook.GKEClusterHook
airflow.contrib.hooks.gcp_dataflow_hook.DataFlowHook
airflow.contrib.hooks.gcp_dataproc_hook.DataProcHook
airflow.contrib.hooks.gcp_mlengine_hook.MLEngineHook
airflow.contrib.hooks.gcp_pubsub_hook.PubSubHook
airflow.contrib.hooks.gcs_hook.GoogleCloudStorageHook
airflow.contrib.hooks.jenkins_hook.JenkinsHook
airflow.contrib.hooks.jira_hook.JiraHook
airflow.contrib.hooks.mongo_hook.MongoHook
airflow.contrib.hooks.pinot_hook.PinotDbApiHook
airflow.contrib.hooks.qubole_hook.QuboleHook
airflow.contrib.hooks.redis_hook.RedisHook
airflow.contrib.hooks.redshift_hook.RedshiftHook
airflow.contrib.hooks.salesforce_hook.SalesforceHook
airflow.contrib.hooks.segment_hook.SegmentHook
airflow.contrib.hooks.sftp_hook.SFTPHook
airflow.contrib.hooks.slack_webhook_hook.SlackWebhookHook
airflow.contrib.hooks.snowflake_hook.SnowflakeHook
airflow.contrib.hooks.spark_jdbc_hook.SparkJDBCHook
airflow.contrib.hooks.spark_sql_hook.SparkSqlHook
airflow.contrib.hooks.spark_submit_hook.SparkSubmitHook
airflow.contrib.hooks.sqoop_hook.SqoopHook
airflow.contrib.hooks.ssh_hook.SSHHook
airflow.contrib.hooks.vertica_hook.VerticaHook
airflow.contrib.hooks.wasb_hook.WasbHook
airflow.contrib.hooks.winrm_hook.WinRMHook
Executors are the mechanism by which task instances get run.
airflow.executors.local_executor.LocalExecutor
airflow.executors.celery_executor.CeleryExecutor
airflow.executors.sequential_executor.SequentialExecutor
airflow.contrib.executors.mesos_executor.MesosExecutor