Content
Airflow has limited support for Microsoft Azure: interfaces exist only for Azure Blob Storage and Azure Data Lake. Hook, Sensor and Operator for Blob Storage and Azure Data Lake Hook are in contrib section.
Airflow can be configured to read and write task logs in Azure Blob Storage. See write-logs-azure
.
All classes communicate via the Window Azure Storage Blob protocol. Make sure that a Airflow connection of type wasb
exists. Authorization can be done by supplying a login (=Storage account name) and password (=KEY), or login and SAS token in the extra field (see connection wasb_default
for an example).
The operators are defined in the following module:
airflow.contrib.sensors.wasb_sensor
airflow.contrib.operators.wasb_delete_blob_operator
airflow.contrib.operators.file_to_wasb
They use airflow.contrib.hooks.wasb_hook.WasbHook
to communicate with Microsoft Azure.
Cloud variant of a SMB file share. Make sure that a Airflow connection of type wasb
exists. Authorization can be done by supplying a login (=Storage account name) and password (=Storage account key), or login and SAS token in the extra field (see connection wasb_default
for an example).
It uses airflow.contrib.hooks.azure_fileshare_hook.AzureFileShareHook
to communicate with Microsoft Azure.
AzureCosmosDBHook communicates via the Azure Cosmos library. Make sure that a Airflow connection of type azure_cosmos
exists. Authorization can be done by supplying a login (=Endpoint uri), password (=secret key) and extra fields database_name and collection_name to specify the default database and collection to use (see connection azure_cosmos_default
for an example).
The operators are defined in the following modules:
airflow.contrib.operators.azure_cosmos_operator
airflow.contrib.sensors.azure_cosmos_sensor
They also use airflow.contrib.hooks.azure_cosmos_hook.AzureCosmosDBHook
to communicate with Microsoft Azure.
AzureDataLakeHook communicates via a REST API compatible with WebHDFS. Make sure that a Airflow connection of type azure_data_lake
exists. Authorization can be done by supplying a login (=Client ID), password (=Client Secret) and extra fields tenant (Tenant) and account_name (Account Name) (see connection azure_data_lake_default
for an example).
The operators are defined in the following modules:
airflow.contrib.operators.adls_list_operator
airflow.contrib.operators.adls_to_gcs
They also use airflow.contrib.hooks.azure_data_lake_hook.AzureDataLakeHook
to communicate with Microsoft Azure.
Azure Container Instances provides a method to run a docker container without having to worry about managing infrastructure. The AzureContainerInstanceHook requires a service principal. The credentials for this principal can either be defined in the extra field key_path
, as an environment variable named AZURE_AUTH_LOCATION
, or by providing a login/password and tenantId in extras.
The operator is defined in the airflow.contrib.operators.azure_container_instances_operator
module.
They also use airflow.contrib.hooks.azure_container_volume_hook.AzureContainerVolumeHook
, airflow.contrib.hooks.azure_container_registry_hook.AzureContainerRegistryHook
and airflow.contrib.hooks.azure_container_instance_hook.AzureContainerInstanceHook
to communicate with Microsoft Azure.
The AzureContainerRegistryHook requires a host/login/password to be defined in the connection.
Airflow has extensive support for Amazon Web Services. But note that the Hooks, Sensors and Operators are in the contrib section.
Airflow can be configured to read and write task logs in Amazon Simple Storage Service (Amazon S3). See write-logs-amazon
.
The operators are defined in the following modules:
airflow.contrib.operators.emr_add_steps_operator
airflow.contrib.operators.emr_create_job_flow_operator
airflow.contrib.operators.emr_terminate_job_flow_operator
They also use airflow.contrib.hooks.emr_hook.EmrHook
to communicate with Amazon Web Service.
The operators are defined in the following modules:
airflow.operators.s3_file_transform_operator
airflow.contrib.operators.s3_list_operator
airflow.contrib.operators.s3_to_gcs_operator
airflow.contrib.operators.s3_to_gcs_transfer_operator
airflow.operators.s3_to_hive_operator
They also use airflow.hooks.S3_hook.S3Hook
to communicate with Amazon Web Service.
The operator is defined in the airflow.contrib.operators.awsbatch_operator.AWSBatchOperator
module.
The operators are defined in the following modules:
airflow.contrib.sensors.aws_redshift_cluster_sensor
airflow.operators.redshift_to_s3_operator
airflow.operators.s3_to_redshift_operator
They also use airflow.contrib.hooks.redshift_hook.RedshiftHook
to communicate with Amazon Web Service.
The operator is defined in the airflow.contrib.operators.hive_to_dynamodb
module.
It uses airflow.contrib.hooks.aws_dynamodb_hook.AwsDynamoDBHook
to communicate with Amazon Web Service.
It uses airflow.contrib.hooks.aws_lambda_hook.AwsLambdaHook
to communicate with Amazon Web Service.
It uses airflow.contrib.hooks.aws_firehose_hook.AwsFirehoseHook
to communicate with Amazon Web Service.
For more instructions on using Amazon SageMaker in Airflow, please see the SageMaker Python SDK README.
The operators are defined in the following modules:
airflow.contrib.operators.sagemaker_training_operator
airflow.contrib.operators.sagemaker_tuning_operator
airflow.contrib.operators.sagemaker_model_operator
airflow.contrib.operators.sagemaker_transform_operator
airflow.contrib.operators.sagemaker_endpoint_config_operator
airflow.contrib.operators.sagemaker_endpoint_operator
They uses airflow.contrib.hooks.sagemaker_hook.SageMakerHook
to communicate with Amazon Web Service.
With contributions from Databricks, Airflow has several operators which enable the submitting and running of jobs to the Databricks platform. Internally the operators talk to the api/2.0/jobs/runs/submit
endpoint.
The operators are defined in the airflow.contrib.operators.databricks_operator
module.
Airflow has extensive support for the Google Cloud Platform. But note that most Hooks and Operators are in the contrib section. Meaning that they have a beta status, meaning that they can have breaking changes between minor releases.
See the GCP connection type <howto/connection/gcp>
documentation to configure connections to GCP.
Airflow can be configured to read and write task logs in Google Cloud Storage. See write-logs-gcp
.
All hooks is based on airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook
.
The operators are defined in the following module:
airflow.contrib.operators.bigquery_check_operator
airflow.contrib.operators.bigquery_get_data
airflow.contrib.operators.bigquery_table_delete_operator
airflow.contrib.operators.bigquery_to_bigquery
airflow.contrib.operators.bigquery_to_gcs
They also use airflow.contrib.hooks.bigquery_hook.BigQueryHook
to communicate with Google Cloud Platform.
The operator is defined in the airflow.contrib.operators.gcp_spanner_operator
package.
They also use airflow.contrib.hooks.gcp_spanner_hook.CloudSpannerHook
to communicate with Google Cloud Platform.
The operator is defined in the airflow.contrib.operators.gcp_sql_operator
package.
They also use airflow.contrib.hooks.gcp_sql_hook.CloudSqlDatabaseHook
and airflow.contrib.hooks.gcp_sql_hook.CloudSqlHook
to communicate with Google Cloud Platform.
The operator is defined in the airflow.contrib.operators.gcp_bigtable_operator
package.
They also use airflow.contrib.hooks.gcp_bigtable_hook.BigtableHook
to communicate with Google Cloud Platform.
The operator is defined in the airflow.contrib.operators.gcp_cloud_build_operator
package.
They also use airflow.contrib.hooks.gcp_cloud_build_hook.CloudBuildHook
to communicate with Google Cloud Platform.
The operators are defined in the airflow.contrib.operators.gcp_compute_operator
package.
They also use airflow.contrib.hooks.gcp_compute_hook.GceHook
to communicate with Google Cloud Platform.
The operators are defined in the airflow.contrib.operators.gcp_function_operator
package.
They also use airflow.contrib.hooks.gcp_function_hook.GcfHook
to communicate with Google Cloud Platform.
The operators are defined in the airflow.contrib.operators.dataflow_operator
package.
They also use airflow.contrib.hooks.gcp_dataflow_hook.DataFlowHook
to communicate with Google Cloud Platform.
The operators are defined in the airflow.contrib.operators.dataproc_operator
package.
airflow.contrib.operators.datastore_export_operator.DatastoreExportOperator
Export entities from Google Cloud Datastore to Cloud Storage.
airflow.contrib.operators.datastore_import_operator.DatastoreImportOperator
Import entities from Cloud Storage to Google Cloud Datastore.
They also use airflow.contrib.hooks.datastore_hook.DatastoreHook
to communicate with Google Cloud Platform.
airflow.contrib.operators.mlengine_operator.MLEngineBatchPredictionOperator
Start a Cloud ML Engine batch prediction job.
airflow.contrib.operators.mlengine_operator.MLEngineModelOperator
Manages a Cloud ML Engine model.
airflow.contrib.operators.mlengine_operator.MLEngineTrainingOperator
Start a Cloud ML Engine training job.
airflow.contrib.operators.mlengine_operator.MLEngineVersionOperator
Manages a Cloud ML Engine model version.
The operators are defined in the airflow.contrib.operators.mlengine_operator
package.
They also use airflow.contrib.hooks.gcp_mlengine_hook.MLEngineHook
to communicate with Google Cloud Platform.
The operators are defined in the following module:
airflow.contrib.operators.file_to_gcs
airflow.contrib.operators.gcs_acl_operator
airflow.contrib.operators.gcs_download_operator
airflow.contrib.operators.gcs_list_operator
airflow.contrib.operators.gcs_operator
airflow.contrib.operators.gcs_to_bq
airflow.contrib.operators.gcs_to_gcs
airflow.contrib.operators.mysql_to_gcs
airflow.contrib.operators.mssql_to_gcs
airflow.contrib.sensors.gcs_sensor
airflow.contrib.operators.gcs_delete_operator
They also use airflow.contrib.hooks.gcs_hook.GoogleCloudStorageHook
to communicate with Google Cloud Platform.
The operators are defined in the following module:
airflow.contrib.operators.gcp_transfer_operator
airflow.contrib.sensors.gcp_transfer_operator
They also use airflow.contrib.hooks.gcp_transfer_hook.GCPTransferServiceHook
to communicate with Google Cloud Platform.
The operator is defined in the airflow.contrib.operators.gcp_vision_operator
package.
They also use airflow.contrib.hooks.gcp_vision_hook.CloudVisionHook
to communicate with Google Cloud Platform.
The operator is defined in the airflow.contrib.operators.gcp_text_to_speech_operator
package.
They also use airflow.contrib.hooks.gcp_text_to_speech_hook.GCPTextToSpeechHook
to communicate with Google Cloud Platform.
The operator is defined in the airflow.contrib.operators.gcp_speech_to_text_operator
package.
They also use airflow.contrib.hooks.gcp_speech_to_text_hook.GCPSpeechToTextHook
to communicate with Google Cloud Platform.
The operator is defined in the airflow.contrib.operators.gcp_translate_speech_operator
package.
- They also use
airflow.contrib.hooks.gcp_speech_to_text_hook.GCPSpeechToTextHook
and airflow.contrib.hooks.gcp_translate_hook.CloudTranslateHook
to communicate with Google Cloud Platform.
airflow.contrib.operators.gcp_translate_operator.CloudTranslateTextOperator
Translate a string or list of strings.
The operator is defined in the airflow.contrib.operators.gcp_translate_operator
package.
The operators are defined in the airflow.contrib.operators.gcp_video_intelligence_operator
package.
They also use airflow.contrib.hooks.gcp_video_intelligence_hook.CloudVideoIntelligenceHook
to communicate with Google Cloud Platform.
The operators are defined in the airflow.contrib.operators.gcp_container_operator
package.
They also use airflow.contrib.hooks.gcp_container_hook.GKEClusterHook
to communicate with Google Cloud Platform.
The operators are defined in the airflow.contrib.operators.gcp_natural_language_operator
package.
They also use airflow.contrib.hooks.gcp_natural_language_operator.CloudNaturalLanguageHook
to communicate with Google Cloud Platform.
The operators are defined in the airflow.contrib.operators.gcp_dlp_operator
package.
They also use airflow.contrib.hooks.gcp_dlp_hook.CloudDLPHook
to communicate with Google Cloud Platform.
The operators are defined in the airflow.contrib.operators.gcp_tasks_operator
package.
They also use airflow.contrib.hooks.gcp_tasks_hook.CloudTasksHook
to communicate with Google Cloud Platform.
Apache Airflow has a native operator and hooks to talk to Qubole, which lets you submit your big data jobs directly to Qubole from Apache Airflow.
The operators are defined in the following module:
airflow.contrib.operators.qubole_operator
airflow.contrib.sensors.qubole_sensor
airflow.contrib.sensors.qubole_sensor
airflow.contrib.operators.qubole_check_operator