## Lesson 3: Building a Simple Pipeline

Before you transform your RAG prototype into an automated pipeline, you will learn some basic Airflow syntax.

### 3.1. Airflow UI

You will use the Airflow UI to visualize the dags, track their status and trigger them manually. Run the cell below to get the link to your Airflow UI. If asked for username and password, type `airflow` for both.

In [None]:
import os
airflow_ui = os.environ.get('DLAI_LOCAL_URL').format(port=8080)
airflow_ui #username:airflow password:airflow (if asked)

### 3.2. Airflow Components - Optional Reading

You've already seen one of the Airflow components which is the Airflow UI hosted on the API server which is shown in the diagram below. Airflow has other components that interact all together to process and run the dags you write. In this course, the components are already set up for you (each component is running in a docker container). If you'd like to know how to install Airflow locally on your machine, please check the resource section below in this notebook. 

<img src="airflow_architecture_3.png" width="400">

You will write your dags as python files and save them in a dags folder. Once you add a new dag to your Airflow environment:
1. The dag processor parses your dag and stores a serialized version of the dag in the Airflow metadata database.
2. The scheduler checks the serialized dags to determine whether any dag is eligible for execution based on its defined schedule.
3. The tasks are then scheduled and subsequently queued. The workers poll the queue for any queued task instances they can run.
4. The worker who picked up the task instance runs it, and metadata such as the task instance status is sent from the worker via the API server to be stored in the Airflow metadata database. 
5. Some of this information, such as the task instance status, is in turn important for the scheduler. It monitors all dags and, as soon as their dependencies are fulfilled, schedules task instances to run.

While this process is going on in the background, the Airflow UI, served by the API server, displays information about the current dag and task statuses that it retrieves from the Airflow metadata database.

If you'd like to learn about Airflow components, you can check chapter 5 of [this practical guide](https://www.astronomer.io/ebooks/practical-guide-to-apache-airflow-3/?utm_source=deeplearning-ai&utm_medium=content&utm_campaign=genai-course-6-25).

### 3.3. Your First DAG

You'll now write your first day. The magic command `%%writefile` copies the content of the cell to the file `my_first_dag.py` stored under the `dags` folder. The `dags` folder, which is provided to you in this lab environment, will be automatically checked by the dag processor. Once the dag processor finds `my_first_dag.py`, it will automatically parse it and you can then view it in the Airflow UI.

#### 3.3.1. My first dag with 2 tasks

Run the following cell. After around 30 seconds, you should see the first dag in the UI. 

In [None]:
%%writefile ../../dags/my_first_dag.py

from airflow.sdk import dag, task, chain


@dag
def my_first_dag():

    @task
    def my_task_1():
        return {"my_word" : "Airflow!"}
    
    _my_task_1 = my_task_1()

    @task 
    def my_task_2(my_dict):
        print(my_dict["my_word"])

    _my_task_2 = my_task_2(my_dict=_my_task_1)

    
my_first_dag()

**Note**: Where is the `dags` folder? In this environment, the `dags` folder lives at this address: `/home/jovyan/dags` (not in the lesson folders). You don't have direct access to the `dags` folder; but if you want to download all the dag files of this course, you can find them in this [github repo](https://github.com/astronomer/orchestrating-workflows-for-genai-deeplearning-ai). The repo also contains instructions on how to run Airflow locally.

#### 3.3.2. My first dag with 3 tasks

<div style="background-color:#fff6e4; padding:15px; border-width:3px; border-color:#f5ecda; border-style:solid; border-radius:6px"> ⏳ <b>Note <code>Airflow UI</code>:</b> 

<p>Changes to dags may take up to 30 seconds to show up in the Airflow UI in this environment! </p>
<p>In the Airflow UI, if you see the error "504 Gateway Timeout", this can happen after 2 hours or after some time of inactivity 25 minutes (if there's no activity for 20 minutes, the jupyter kernel stops and if there's no kernel for 5 minutes, then the jupyter notebook stops and the resources are released). In this case, make sure to refresh the notebook, run the cell that outputs the link to the Airflow UI and then use the link to open the Airflow UI. </p>
</div>

In [None]:
%%writefile ../../dags/my_first_dag.py

from airflow.sdk import dag, task, chain


@dag
def my_first_dag():

    @task
    def my_task_1():
        return {"my_word" : "Airflow!"}
    
    _my_task_1 = my_task_1()

    @task 
    def my_task_2(my_dict):
        print(my_dict["my_word"])

    _my_task_2 = my_task_2(my_dict=_my_task_1)

    @task 
    def my_task_3():
        print("Hi from my_task_3!")

    _my_task_3 = my_task_3()

    chain(_my_task_1, _my_task_3)   
        

my_first_dag()

### 3.4. Your Second DAG

In [None]:
%%writefile ../../dags/my_second_dag.py  
from airflow.sdk import dag, task, chain


@dag
def my_second_dag():
    @task
    def my_task_1():
        return 23

    _my_task_1 = my_task_1()

    @task
    def my_task_2():
        return 42

    _my_task_2 = my_task_2()

    @task
    def my_task_3(num1, num2):
        return num1 + num2

    _my_task_3 = my_task_3(num1=_my_task_1, num2=_my_task_2)

    @task
    def my_task_4():
        return "Math!"

    _my_task_4 = my_task_4()

    chain([_my_task_2, _my_task_3], _my_task_4)


my_second_dag()


### 3.5. Resources

How to install Airflow locally:

- If you're familiar with running Docker containers, you can check this guide: [Running Airflow in Docker](https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html)
- If you'd like an easier approach to start with Airflow, you can use [Astro CLI](https://www.astronomer.io/docs/astro/cli/get-started-cli):
  - Make sure to check the last optional video of this course "How to Set up a Local Airflow Environment" that shows you how to replicate the same lab environment locally. It has this companion [github repo](https://github.com/astronomer/orchestrating-workflows-for-genai-deeplearning-ai).

Airflow features:  

- [Introduction to the TaskFlow API and Airflow decorators](https://www.astronomer.io/docs/learn/airflow-decorators/): Learn more about decorators generally in Python and specifically in Airflow.
- [Manage task and task group dependencies in Airflow](https://www.astronomer.io/docs/learn/managing-dependencies/): Learn more about setting dependencies between tasks using the `chain` function and other methods.
- [Airflow Operators](https://www.astronomer.io/docs/learn/what-is-an-operator): Learn more about operator classes which can be used alongside `@task` to create Airflow tasks.

<div style="background-color:#fff6ff; padding:13px; border-width:3px; border-color:#efe6ef; border-style:solid; border-radius:6px">

<p> ⬇ &nbsp; <b>Download Notebooks:</b> 1) click on the <em>"File"</em> option on the top menu of the notebook and then 2) click on <em>"Download as"</em> and select <em>"Notebook (.ipynb)"</em>.</p>

<p> 📒 &nbsp; For more help, please see the <em>"Appendix – Tips, Help, and Download"</em> Lesson.</p>

</div>