Skip to content


[AIRFLOW-XXX] Simplify AWS/Azure/Databricks operators listing (apache…
Browse files Browse the repository at this point in the history

cherry-picked from 6a66ece
  • Loading branch information
mik-laj authored and kaxil committed Mar 30, 2020
1 parent 0431195 commit 31d5c0b
Showing 1 changed file with 50 additions and 120 deletions.
170 changes: 50 additions & 120 deletions docs/integration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,21 +46,13 @@ Airflow connection of type ``wasb`` exists. Authorization can be done by supplyi
login (=Storage account name) and password (=KEY), or login and SAS token in the extra
field (see connection ``wasb_default`` for an example).

Interface with Azure Blob Storage.

Checks if a blob is present on Azure Blob storage.

Deletes blob(s) on Azure Blob Storage.

Checks if blobs matching a prefix are present on Azure Blob storage.
The operators are defined in the following module:

Uploads a local file to a container as a blob.
* :mod:`airflow.contrib.sensors.wasb_sensor`
* :mod:`airflow.contrib.operators.wasb_delete_blob_operator`
* :mod:`airflow.contrib.operators.file_to_wasb`

They use :class:`airflow.contrib.hooks.wasb_hook.WasbHook` to communicate with Microsoft Azure.

Azure File Share
Expand All @@ -70,8 +62,7 @@ type ``wasb`` exists. Authorization can be done by supplying a login (=Storage a
and password (=Storage account key), or login and SAS token in the extra field
(see connection ``wasb_default`` for an example).

Interface with Azure File Share.
It uses :class:`airflow.contrib.hooks.azure_fileshare_hook.AzureFileShareHook` to communicate with Microsoft Azure.

Azure CosmosDB
Expand All @@ -81,15 +72,12 @@ Airflow connection of type ``azure_cosmos`` exists. Authorization can be done by
login (=Endpoint uri), password (=secret key) and extra fields database_name and collection_name to specify the
default database and collection to use (see connection ``azure_cosmos_default`` for an example).

Interface with Azure CosmosDB.
The operators are defined in the following modules:

Simple operator to insert document into CosmosDB.

Simple sensor to detect document existence in CosmosDB.
* :mod:`airflow.contrib.operators.azure_cosmos_operator`
* :mod:`airflow.contrib.sensors.azure_cosmos_sensor`

They also use :class:`airflow.contrib.hooks.azure_cosmos_hook.AzureCosmosDBHook` to communicate with Microsoft Azure.

Azure Data Lake
Expand All @@ -99,14 +87,12 @@ Airflow connection of type ``azure_data_lake`` exists. Authorization can be done
login (=Client ID), password (=Client Secret) and extra fields tenant (Tenant) and account_name (Account Name)
(see connection ``azure_data_lake_default`` for an example).

Interface with Azure Data Lake.
The operators are defined in the following modules:

Lists the files located in a specified Azure Data Lake path.
* :mod:`airflow.contrib.operators.adls_list_operator`
* :mod:`airflow.contrib.operators.adls_to_gcs`

Copies files from an Azure Data Lake path to a Google Cloud Storage bucket.
They also use :class:`airflow.contrib.hooks.azure_data_lake_hook.AzureDataLakeHook` to communicate with Microsoft Azure.

Azure Container Instances
Expand All @@ -118,20 +104,13 @@ credentials for this principal can either be defined in the extra field ``key_pa
environment variable named ``AZURE_AUTH_LOCATION``,
or by providing a login/password and tenantId in extras.

The AzureContainerRegistryHook requires a host/login/password to be defined in the connection.

Interface with Azure Container Volumes
The operator is defined in the :mod:`airflow.contrib.operators.azure_container_instances_operator` module.

Start/Monitor a new ACI.

Wrapper around a single ACI.

Interface with ACR
They also use :class:`airflow.contrib.hooks.azure_container_volume_hook.AzureContainerVolumeHook`,
:class:`airflow.contrib.hooks.azure_container_registry_hook.AzureContainerRegistryHook` and
:class:`airflow.contrib.hooks.azure_container_instance_hook.AzureContainerInstanceHook` to communicate with Microsoft Azure.

The AzureContainerRegistryHook requires a host/login/password to be defined in the connection.

.. _AWS:
Expand All @@ -152,87 +131,61 @@ See :ref:`write-logs-amazon`.

Interface with AWS EMR.

Adds steps to an existing EMR JobFlow.
The operators are defined in the following modules:

Creates an EMR JobFlow, reading the config from the EMR connection.

Terminates an EMR JobFlow.
* :mod:`airflow.contrib.operators.emr_add_steps_operator`
* :mod:`airflow.contrib.operators.emr_create_job_flow_operator`
* :mod:`airflow.contrib.operators.emr_terminate_job_flow_operator`

They also use :class:`airflow.contrib.hooks.emr_hook.EmrHook` to communicate with Amazon Web Service.


Interface with AWS S3.

Copies data from a source S3 location to a temporary location on the local filesystem.

Lists the files matching a key prefix from a S3 location.
The operators are defined in the following modules:

Syncs an S3 location with a Google Cloud Storage bucket.

Syncs an S3 bucket with a Google Cloud Storage bucket using the GCP Storage Transfer Service.

Moves data from S3 to Hive. The operator downloads a file from S3, stores the file locally before loading it into a Hive table.
* :mod:`airflow.operators.s3_file_transform_operator`
* :mod:`airflow.contrib.operators.s3_list_operator`
* :mod:`airflow.contrib.operators.s3_to_gcs_operator`
* :mod:`airflow.contrib.operators.s3_to_gcs_transfer_operator`
* :mod:`airflow.operators.s3_to_hive_operator`

They also use :class:`airflow.hooks.S3_hook.S3Hook` to communicate with Amazon Web Service.

AWS Batch Service

Execute a task on AWS Batch Service.

The operator is defined in the :class:`airflow.contrib.operators.awsbatch_operator.AWSBatchOperator` module.

AWS RedShift

Waits for a Redshift cluster to reach a specific status.

Interact with AWS Redshift, using the boto3 library.
The operators are defined in the following modules:

Executes an unload command to S3 as CSV with or without headers.

Executes an copy command from S3 as CSV with or without headers.
* :mod:`airflow.contrib.sensors.aws_redshift_cluster_sensor`
* :mod:`airflow.operators.redshift_to_s3_operator`
* :mod:`airflow.operators.s3_to_redshift_operator`

They also use :class:`airflow.contrib.hooks.redshift_hook.RedshiftHook` to communicate with Amazon Web Service.

AWS DynamoDB

Moves data from Hive to DynamoDB.
The operator is defined in the :class:`airflow.contrib.operators.hive_to_dynamodb` module.

Interface with AWS DynamoDB.
It uses :class:`airflow.contrib.hooks.aws_dynamodb_hook.AwsDynamoDBHook` to communicate with Amazon Web Service.

AWS Lambda

Interface with AWS Lambda.

It uses :class:`airflow.contrib.hooks.aws_lambda_hook.AwsLambdaHook` to communicate with Amazon Web Service.

AWS Kinesis

Interface with AWS Kinesis Firehose.
It uses :class:`airflow.contrib.hooks.aws_firehose_hook.AwsFirehoseHook` to communicate with Amazon Web Service.

Amazon SageMaker
Expand All @@ -242,27 +195,16 @@ For more instructions on using Amazon SageMaker in Airflow, please see `the Sage

.. _the SageMaker Python SDK README:

Interface with Amazon SageMaker.

Create a SageMaker training job.
The operators are defined in the following modules:

Create a SageMaker tuning job.

Create a SageMaker model.

Create a SageMaker transform job.

Create a SageMaker endpoint config.

Create a SageMaker endpoint.

They uses :class:`airflow.contrib.hooks.sagemaker_hook.SageMakerHook` to communicate with Amazon Web Service.

.. _Databricks:

Expand All @@ -273,19 +215,7 @@ With contributions from `Databricks <>`__, Airflow has se
which enable the submitting and running of jobs to the Databricks platform. Internally the
operators talk to the ``api/2.0/jobs/runs/submit`` `endpoint <>`_.

Submits a Spark job run to Databricks using the
API endpoint.

Runs an existing Spark job in Databricks using the
API endpoint.

The operators are defined in the :class:`airflow.contrib.operators.databricks_operator` module.

.. _GCP:

Expand Down

0 comments on commit 31d5c0b

Please sign in to comment.