# Preparation for DAGs development 

## 1. Connect Airflow and Postgres

Before developing DAGs with Airflow and Postgres, we need to add a connection for Airflow to find the databases. 
As usual, we login to the cluster:

In [None]:
# Replace the command with your own one inside the single quotes and run the cell
# Example OC_LOGIN_COMMAND='oc login --token=sha256~3bR5KXgwiUoaQiph2_kIXCDQnVfm_HQy3YwU2m-UOrs --server=https://c109-e.us-east.containers.cloud.ibm.com:31656'
OC_LOGIN_COMMAND='_replace_this_string_by_pasting_the_clipboard_'
$OC_LOGIN_COMMAND

Then, we need to retrieve two values (the hostname and the port) that we will use immediately. Prepare for copy-and-paste them below in the Airflow new connection menu item.

In [None]:
internalservice=$(oc get svc | grep ClusterIP | awk '{print $1}')
internalhostname=$(oc get svc $internalservice -o go-template --template='{{.metadata.name}}.{{.metadata.namespace}}.svc.cluster.local')
internalport=$(oc get svc | grep ClusterIP | awk '{print $5}' | cut -f1 -d'/')
echo Internal hostname of Postgres: $internalhostname
echo Internal port of Postgres: $internalport

In order to create the connection, we need to access the Airflow admin interface as we did during the **Airflow Deployment** section:

![](../pictures/airflowroute.png)



Copy-and-paste the values we obtained before in the new connection menu:

![](../pictures/airflow_postgres_conn.png)

## 2. Install the databand monitoring packages 

Airflow will report the pipeline information to Databand and it will be done via the python packages that we will install now. Actually, we already installed Databand packages during the chapter [Airflow integration](./4_airflow_int.ipynb). The following commands are an alternative way that uses a bundled installation syntax and states explicitly the airflow and postgres features:

In [None]:
# Install python package to report Postgres and Airflow information to Databand
oc project airflow

oc rsh  --shell=/bin/bash airflow-worker-0 /home/airflow/.local/bin/pip install 'databand[airflow,postgres]'
POD_SCHEDULER=$(oc get pods | grep airflow-scheduler | awk '{print $1}')
oc rsh  --shell=/bin/bash $POD_SCHEDULER /home/airflow/.local/bin/pip install 'databand[airflow,postgres]'

echo 'databand[airflow,postgres]'installed in airflow-worker-0 and $POD_SCHEDULER

Notice that you would never touch a running container like this to install python packages or additional software in a real production environment. The right way is customizing or extending the docker image as documented [here](https://airflow.apache.org/docs/docker-stack/build.html#extending-vs-customizing-the-image)

## 3. Transfer of DAGs to Airflow

Now, we will transfer some files from our local machine to the Airflow containers. Please ensure that you are in the local directory where the our sample DAGs are located. If you cloned this git repository, the directory is simply called `dags`, under the root level (go up if you are in the jupyter directory)

In [None]:
# you may need to modify the cd command to place yourself in the DAGs directory
pwd
cd ../dags
ls -l

If you did it right you will see several python file and the `sql`subdirectory. Something like this:

Now, we will transfer some files:

In [None]:
oc rsh airflow-worker-0  mkdir -p /opt/airflow/dags/sql

for file in *.py
do 
  oc cp $file airflow-worker-0:dags/
done

for file in sql/*
do 
  oc cp $file airflow-worker-0:dags/sql
done

Note that this is just one of the possibilities to add customized DAGs to Airflow. Other options, some of them more elegant, are documented [here](https://airflow.apache.org/docs/helm-chart/stable/manage-dags-files.html)





---

Next Section: [Hardware Provisioning](./8_SQL_dag_dev.ipynb)

[Return to main](../README.md)