![](images/arch1.png)

# Apache Airflow One-Node Architecture

Apache Airflow is an open-source platform for authoring, scheduling, and monitoring workflows. In a one-node architecture, all core components of Airflow run on a single machine. This setup is often used for development, testing, or small-scale deployments.

## Architecture Overview

In a one-node setup, the following key components run on the same machine:

1. **Scheduler**: Manages the execution of tasks by scheduling and monitoring dependencies.
2. **Web Server (UI)**: Provides a web-based interface for managing and visualizing workflows.
3. **Metadata Database**: Stores DAGs (Directed Acyclic Graphs), task states, logs, and other metadata.
4. **Executor**: Executes the tasks. Common executors for one-node setups include `SequentialExecutor` and `LocalExecutor`.
5. **Worker** (optional): Handles task execution if using `CeleryExecutor`, though often not used in single-node setups.

### Key Components

| Component          | Description                                                                                   |
|--------------------|-----------------------------------------------------------------------------------------------|
| **Scheduler**      | Orchestrates the execution of tasks by scheduling and monitoring dependencies.                |
| **Web Server (UI)**| Flask-based web interface for monitoring and managing DAGs and task instances.                |
| **Metadata DB**    | Stores the state of DAGs, tasks, and logs. Uses SQLite by default in single-node setups.      |
| **Executor**       | Executes tasks in the workflow. LocalExecutor is common in single-node setups.                |
| **DAGs Folder**    | Directory where DAG files (Python scripts) are stored.                                        |
| **Workers**        | (Optional) Executors typically act as workers in single-node setups.                         |

## One-Node Workflow Diagram

```plaintext
             +-------------------------------------+
             |           Airflow Web UI            |
             |    (Flask-based Visualization)      |
             +-------------------------------------+
                           |
                           |
                 +---------------------+
                 |   Airflow Scheduler |
                 |  (Schedules Tasks)  |
                 +---------------------+
                           |
                           |
                 +---------------------+
                 |   Local Executor    |
                 | (Executes Tasks)    |
                 +---------------------+
                           |
                 +---------------------+
                 | Metadata Database   |
                 |  (SQLite by default)|
                 +---------------------+



## Workflow of a One-Node Setup

1. **DAG Creation**: DAGs define workflows and are written as Python scripts.
2. **Scheduler**: Monitors the DAGs directory and schedules tasks based on dependencies and execution time.
3. **Executor**: Executes tasks either sequentially or in parallel, depending on the executor type.
4. **Web UI**: Displays DAGs, task statuses, and logs, while providing control over execution.
5. **Metadata Database**: Tracks task states, DAG runs, and execution logs.

## Common Executors in One-Node Setup

- **SequentialExecutor**: Executes one task at a time, ideal for debugging and testing.
- **LocalExecutor**: Executes multiple tasks in parallel using subprocesses on the same node.

## Use Cases for One-Node Architecture

- **Development and Testing**: Useful for building and debugging DAGs locally.
- **Small-Scale Workflows**: Suitable for simple workflows with minimal resource requirements.

## Summary

A one-node Airflow architecture is a simple and efficient setup for local development or lightweight workflows. All components—Scheduler, Web UI, Executor, and Metadata DB—are hosted on the same machine, making it easy to manage and deploy.


# Clarification: Who Executes the Task in Airflow?

In Apache Airflow, the **Executor** is responsible for **managing task execution**, but it does not always directly execute tasks. Instead, the Executor delegates tasks to other underlying mechanisms such as subprocesses or workers, depending on the Executor type.

## How Task Execution Works

1. **Scheduler**: Determines which tasks are ready to run based on dependencies and schedules.
2. **Executor**: Manages and delegates the task execution to appropriate task runners.
3. **Task Runner**: The actual execution is carried out by task runners such as subprocesses or external workers.

## Key Executors and Execution Flow

### 1. **SequentialExecutor**
- Runs tasks one at a time.
- The Scheduler executes the task directly in the same process.
  
### 2. **LocalExecutor**
- Executes tasks in parallel using multiple local subprocesses.
- **Subprocess Task Runners** handle the actual execution of tasks.

### 3. **CeleryExecutor**
- Tasks are added to a **Celery queue**.
- **Celery Workers** consume tasks from the queue and execute them.
- The Executor manages task distribution but does not directly execute them.

### 4. **KubernetesExecutor**
- Launches each task as a separate **Kubernetes pod**.
- Kubernetes handles the execution of tasks in the pods.

## Summary of Roles

| Component      | Role                                                                                               |
|----------------|----------------------------------------------------------------------------------------------------|
| **Scheduler**  | Determines which tasks are ready to run and passes them to the Executor.                           |
| **Executor**   | Delegates tasks to subprocesses or workers and manages execution flow.                             |
| **Task Runner**| The actual entity that runs the task, e.g., a subprocess (LocalExecutor) or worker (CeleryExecutor).|

## Conclusion

While the **Executor** is critical for managing and delegating tasks, it does not always execute them directly. In distributed setups like **CeleryExecutor** or **KubernetesExecutor**, the actual task execution is performed by workers or pods, not the Executor itself. This distinction is essential for understanding how Airflow scales and manages workloads.


### in production environment we use multi node architecture 

___
### Execution flow