Skip to content

Latest commit

 

History

History
319 lines (264 loc) · 30.2 KB

installation.rst

File metadata and controls

319 lines (264 loc) · 30.2 KB

Installation

Getting Airflow

Airflow is published as apache-airflow package in PyPI. Installing it however might be sometimes tricky because Airflow is a bit of both a library and application. Libraries usually keep their dependencies open and applications usually pin them, but we should do neither and both at the same time. We decided to keep our dependencies as open as possible (in setup.py) so users can install different version of libraries if needed. This means that from time to time plain pip install apache-airflow will not work or will produce unusable Airflow installation.

In order to have repeatable installation, however, starting from Airflow 1.10.10 and updated in Airflow 1.10.12 we also keep a set of "known-to-be-working" constraint files in the constraints-master and constraints-1-10 orphan branches. Those "known-to-be-working" constraints are per major/minor python version. You can use them as constraint files when installing Airflow from PyPI. Note that you have to specify correct Airflow version and python versions in the URL.

Prerequisites

On Debian based Linux OS:

sudo apt-get update
sudo apt-get install build-essential
  1. Installing just Airflow
AIRFLOW_VERSION=1.10.15
PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
# For example: 3.6
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
# For example: https://raw.githubusercontent.com/apache/airflow/constraints-1.10.15/constraints-3.6.txt
pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"

Note

On November 2020, new version of PIP (20.3) has been released with a new, 2020 resolver. This resolver does not yet work with Apache Airflow and might leads to errors in installation - depends on your choice of extras. In order to install Airflow you need to either downgrade pip to version 20.2.4 pip upgrade --pip==20.2.4 or, in case you use Pip 20.3, you need to add option --use-deprecated legacy-resolver to your pip install command.

  1. Installing with extras (for example postgres, google)
AIRFLOW_VERSION=1.10.15
PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
pip install "apache-airflow[postgres,google]==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"

Note

On November 2020, new version of PIP (20.3) has been released with a new, 2020 resolver. This resolver does not yet work with Apache Airflow and might leads to errors in installation - depends on your choice of extras. In order to install Airflow you need to either downgrade pip to version 20.2.4 pip upgrade --pip==20.2.4 or, in case you use Pip 20.3, you need to add option --use-deprecated legacy-resolver to your pip install command.

You need certain system level requirements in order to install Airflow. Those are requirements that are known to be needed for Linux system (Tested on Ubuntu Buster LTS) :

sudo apt-get install -y --no-install-recommends \
        freetds-bin \
        krb5-user \
        ldap-utils \
        libffi6 \
        libsasl2-2 \
        libsasl2-modules \
        libssl1.1 \
        locales  \
        lsb-release \
        sasl2-bin \
        sqlite3 \
        unixodbc

You also need database client packages (Postgres or MySQL) if you want to use those databases.

If the airflow command is not getting recognized (can happen on Windows when using WSL), then ensure that ~/.local/bin is in your PATH environment variable, and add it in if necessary:

PATH=$PATH:~/.local/bin

Extra Packages

The apache-airflow PyPI basic package only installs what's needed to get started. Subpackages can be installed depending on what will be useful in your environment. For instance, if you don't need connectivity with Postgres, you won't have to go through the trouble of installing the postgres-devel yum package, or whatever equivalent applies on the distribution you are using.

Behind the scenes, Airflow does conditional imports of operators that require these extra dependencies.

Here's the list of the subpackages and what they enable:

Fundamentals:

subpackage install command enables
all pip install 'apache-airflow[all]' All Airflow features known to man
all_dbs pip install 'apache-airflow[all_dbs]' All databases integrations
devel pip install 'apache-airflow[devel]' Minimum dev tools requirements
devel_all pip install 'apache-airflow[devel_all]' All dev tools requirements
devel_azure pip install 'apache-airflow[devel_azure]' Azure development requirements
devel_ci pip install 'apache-airflow[devel_ci]' Development requirements used in CI
devel_hadoop pip install 'apache-airflow[devel_hadoop]' Airflow + dependencies on the Hadoop stack
doc pip install 'apache-airflow[doc]' Packages needed to build docs
password pip install 'apache-airflow[password]' Password authentication for users

Apache Software:

subpackage install command enables
atlas pip install 'apache-airflow[apache.atlas]' Apache Atlas to use Data Lineage feature
cassandra pip install 'apache-airflow[apache.cassandra]' Cassandra related operators & hooks
druid pip install 'apache-airflow[apache.druid]' Druid related operators & hooks
hdfs pip install 'apache-airflow[apache.hdfs]' HDFS hooks and operators
hive pip install 'apache-airflow[apache.hive]' All Hive related operators
presto pip install 'apache-airflow[apache.presto]' All Presto related operators & hooks
webhdfs pip install 'apache-airflow[webhdfs]' HDFS hooks and operators

Services:

subpackage install command enables
aws pip install 'apache-airflow[amazon]' Amazon Web Services
azure pip install 'apache-airflow[microsoft.azure]' Microsoft Azure
azure_blob_storage pip install 'apache-airflow[azure_blob_storage]' Microsoft Azure (blob storage)
azure_cosmos pip install 'apache-airflow[azure_cosmos]' Microsoft Azure (cosmos)
azure_container_instances pip install 'apache-airflow[azure_container_instances]' Microsoft Azure (container instances)
azure_data_lake pip install 'apache-airflow[azure_data_lake]' Microsoft Azure (data lake)
azure_secrets pip install 'apache-airflow[azure_secrets]' Microsoft Azure (secrets)
azure pip install 'apache-airflow[microsoft.azure]' Microsoft Azure
cloudant pip install 'apache-airflow[cloudant]' Cloudant hook
databricks pip install 'apache-airflow[databricks]' Databricks hooks and operators
datadog pip install 'apache-airflow[datadog]' Datadog hooks and sensors
gcp pip install 'apache-airflow[gcp]' Google Cloud
github_enterprise pip install 'apache-airflow[github_enterprise]' GitHub Enterprise auth backend
google pip install 'apache-airflow[google]' Google Cloud (same as gcp)
google_auth pip install 'apache-airflow[google_auth]' Google auth backend
hashicorp pip install 'apache-airflow[hashicorp]' Hashicorp Services (Vault)
jira pip install 'apache-airflow[jira]' Jira hooks and operators
qds pip install 'apache-airflow[qds]' Enable QDS (Qubole Data Service) support
salesforce pip install 'apache-airflow[salesforce]' Salesforce hook
sendgrid pip install 'apache-airflow[sendgrid]' Send email using sendgrid
segment pip install 'apache-airflow[segment]' Segment hooks and sensors
sentry pip install 'apache-airflow[sentry]'  
slack pip install 'apache-airflow[slack]' :class:`airflow.providers.slack.operators.slack.SlackAPIOperator`
snowflake pip install 'apache-airflow[snowflake]' Snowflake hooks and operators
vertica pip install 'apache-airflow[vertica]' Vertica hook support as an Airflow backend

Software:

subpackage install command enables
async pip install 'apache-airflow[async]' Async worker classes for Gunicorn
celery pip install 'apache-airflow[celery]' CeleryExecutor
dask pip install 'apache-airflow[dask]' DaskExecutor
docker pip install 'apache-airflow[docker]' Docker hooks and operators
elasticsearch pip install 'apache-airflow[elasticsearch]' Elasticsearch hooks and Log Handler
kubernetes pip install 'apache-airflow[cncf.kubernetes]' Kubernetes Executor and operator
mongo pip install 'apache-airflow[mongo]' Mongo hooks and operators
mssql (deprecated) pip install 'apache-airflow[microsoft.mssql]' Microsoft SQL Server operators and hook, support as an Airflow backend. Uses pymssql. Will be replaced by subpackage odbc.
mysql pip install 'apache-airflow[mysql]' MySQL operators and hook, support as an Airflow backend. The version of MySQL server has to be 5.6.4+. The exact version upper bound depends on version of mysqlclient package. For example, mysqlclient 1.3.12 can only be used with MySQL server 5.6.4 through 5.7.
oracle pip install 'apache-airflow[oracle]' Oracle hooks and operators
pinot pip install 'apache-airflow[pinot]' Pinot DB hook
postgres pip install 'apache-airflow[postgres]' PostgreSQL operators and hook, support as an Airflow backend
rabbitmq pip install 'apache-airflow[rabbitmq]' RabbitMQ support as a Celery backend
redis pip install 'apache-airflow[redis]' Redis hooks and sensors
samba pip install 'apache-airflow[samba]' :class:`airflow.providers.apache.hive.transfers.hive_to_samba.HiveToSambaOperator`
statsd pip install 'apache-airflow[statsd]' Needed by StatsD metrics
virtualenv pip install 'apache-airflow[virtualenv]'  

Other:

subpackage install command enables
cgroups pip install 'apache-airflow[cgroups]' Needed To use CgroupTaskRunner
crypto pip install 'apache-airflow[crypto]' Cryptography libraries
grpc pip install 'apache-airflow[grpc]' Grpc hooks and operators
jdbc pip install 'apache-airflow[jdbc]' JDBC hooks and operators
kerberos pip install 'apache-airflow[kerberos]' Kerberos integration for Kerberized Hadoop
ldap pip install 'apache-airflow[ldap]' LDAP authentication for users
papermill pip install 'apache-airflow[papermill]' Papermill hooks and operators
ssh pip install 'apache-airflow[ssh]' SSH hooks and Operator
winrm pip install 'apache-airflow[microsoft.winrm]' WinRM hooks and operators

Initializing Airflow Database

Airflow requires a database to be initialized before you can run tasks. If you're just experimenting and learning Airflow, you can stick with the default SQLite option. If you don't want to use SQLite, then take a look at :doc:`howto/initialize-database` to setup a different database.

After configuration, you'll need to initialize the database before you can run tasks:

airflow db init