title | date | categories | tags | |||
---|---|---|---|---|---|---|
Setting up Apache Airflow on AWS EC2 instance |
2017-01-20 07:53:45 -0800 |
|
|
Setting up Airflow on AWS Linux was not direct, because of outdated default packages. For example I had trouble using setuid
in Upstart config, because AWS Linux AMI came with 0.6.5
version of Upstart.
AMI Version: amzn-ami-hvm-2016.09.1.20161221-x86_64-gp2 (ami-c51e3eb6)
sudo yum install gcc-c++ python-devel python-setuptools
sudo pip install --upgrade pip
sudo /usr/local/bin/pip install airflow[s3, hive, python]
sudo groupadd airflow
sudo useradd airflow -g airflow
sudo passwd -d airflow
This will create a password less user airflow
su airflow
cd ~
airflow initdb
su airflow
cd ~
airflow webserver
You should be able to view Airflow ui at port 8080
Now let's use upstart to manage Airflow process and respawning
This Amazon Linux AMI comes with Upstart 0.6.5
, which is very sad. So setuid
and setgid
doesnot work.
{{< include_code file="/etc/init/airflow-webserver.conf" lang="SYSTEMD" >}}
You should be able to view airflow-webserver
in initctl list
sudo initctl list
sudo initctl start airflow-webserver
You can find the process id at /home/airflow/airflow/airflow-webserver.pid
{{< include_code file="/etc/init/airflow-scheduler.conf" lang="SYSTEMD" >}}
sudo initctl start airflow-scheduler
This should keep Airflow Scheduler running in the background and respawn it in case of failures.
- https://github.com/apache/incubator-airflow/tree/master/scripts/upstart
- https://upstart.ubuntu.com/cookbook/
- https://airflow.incubator.apache.org/
- https://superuser.com/questions/213416/running-upstart-jobs-as-unprivileged-users
- https://serverfault.com/questions/357060/how-should-i-use-sudo-from-an-upstart-script
- https://unix.stackexchange.com/questions/192945/user-without-a-password-how-can-one-login-into-that-account-from-a-non-root-ac