Skip to content

Latest commit

 

History

History
102 lines (70 loc) · 2.29 KB

2017-01-20-setting-up-apache-airflow-on-aws-ec2-instance.md

File metadata and controls

102 lines (70 loc) · 2.29 KB
title date categories tags
Setting up Apache Airflow on AWS EC2 instance
2017-01-20 07:53:45 -0800
help
aws
Airflow

Setting up Airflow on AWS Linux was not direct, because of outdated default packages. For example I had trouble using setuid in Upstart config, because AWS Linux AMI came with 0.6.5 version of Upstart.

AMI Version: amzn-ami-hvm-2016.09.1.20161221-x86_64-gp2 (ami-c51e3eb6)

Install gcc, python-devel, and python-setuptools

sudo yum install gcc-c++ python-devel python-setuptools

Upgrade pip

sudo pip install --upgrade pip

Install airflow using pip

sudo /usr/local/bin/pip install airflow[s3, hive, python]

Create User and Group

sudo groupadd airflow
sudo useradd airflow -g airflow
sudo passwd -d airflow

This will create a password less user airflow

Initialize Airflow

su airflow
cd ~
airflow initdb

Test run

su airflow
cd ~
airflow webserver

You should be able to view Airflow ui at port 8080

Upstart Config for Airflow Webserver

Now let's use upstart to manage Airflow process and respawning

This Amazon Linux AMI comes with Upstart 0.6.5, which is very sad. So setuid and setgid doesnot work.

{{< include_code file="/etc/init/airflow-webserver.conf" lang="SYSTEMD" >}}

You should be able to view airflow-webserver in initctl list

sudo initctl list

Start Airflow with upstart

sudo initctl start airflow-webserver

You can find the process id at /home/airflow/airflow/airflow-webserver.pid

Upstart Config for Airflow Scheduler

{{< include_code file="/etc/init/airflow-scheduler.conf" lang="SYSTEMD" >}}

Start Airflow Scheduler with upstart

sudo initctl start airflow-scheduler

This should keep Airflow Scheduler running in the background and respawn it in case of failures.

References