#  🚀 Airflow 2.3 Installation Guide (WSL + MySQL + Celery Executor)


This guide explains how to set up **Apache Airflow 2.3** inside **WSL (Ubuntu)** using **MySQL** as the metadata DB and **CeleryExecutor** for distributed task execution.

---

## 1️⃣ Prerequisites

* Windows 10/11 with **WSL2** enabled
* Installed **Ubuntu (20.04 or 22.04)** from Microsoft Store
* Python **3.9** (Airflow 2.3 doesn’t support Python 3.12)
* MySQL server installed (inside WSL)

---

## 2️⃣ Update System & Install Dependencies

```bash
sudo apt update && sudo apt upgrade -y
sudo apt install -y python3.9 python3.9-venv python3.9-distutils \
    mysql-server libmysqlclient-dev gcc build-essential \
    libssl-dev libffi-dev python3-dev
```

---

## 3️⃣ Create Airflow Project Directory

```bash
mkdir ~/airflow_course && cd ~/airflow_course
python3.9 -m venv venv
source venv/bin/activate
```

---

## 4️⃣ Install Apache Airflow 2.3

Set environment variables:

```bash
export AIRFLOW_HOME=~/airflow_course/airflow
```

Install Airflow with MySQL + Celery extras:

```bash
pip install "apache-airflow[celery,mysql]==2.3.0" --constraint \
"https://raw.githubusercontent.com/apache/airflow/constraints-2.3.0/constraints-3.9.txt"
```

---

## 5️⃣ Configure MySQL for Airflow

Start MySQL service:

```bash
sudo service mysql start
```

Login to MySQL:

```bash
sudo mysql -u root
```

Inside MySQL shell:

```sql
CREATE DATABASE airflow_db;
CREATE USER 'airflow_user'@'localhost' IDENTIFIED BY 'airflow_pass';
GRANT ALL PRIVILEGES ON airflow_db.* TO 'airflow_user'@'localhost';
FLUSH PRIVILEGES;
EXIT;
```

---

## 6️⃣ Initialize Airflow Config

Generate default config:

```bash
airflow db init
```

This will create `~/airflow_course/airflow/airflow.cfg`.

---

## 7️⃣ Update airflow\.cfg

Edit:

```bash
nano ~/airflow_course/airflow/airflow.cfg
```

Change:

### Executor:

```ini
executor = CeleryExecutor
```

### Database:

```ini
sql_alchemy_conn = mysql+mysqldb://airflow_user:airflow_pass@localhost/airflow_db
```

### Celery:

```ini
[celery]
broker_url = redis://localhost:6379/0
result_backend = db+mysql://airflow_user:airflow_pass@localhost/airflow_db
```

*(Here we use Redis as the Celery broker. Install it with `sudo apt install redis-server`.)*

---

## 8️⃣ Initialize DB with MySQL

```bash
airflow db reset -y   # if re-running
airflow db init
```

---

## 9️⃣ Create Admin User

```bash
airflow users create \
    --username admin \
    --firstname First \
    --lastname Last \
    --role Admin \
    --email admin@example.com \
    --password admin
```

---

## 🔟 Start Airflow Components

Open 3 terminals (all inside `~/airflow_course` and `source venv/bin/activate`):

**Terminal 1 – Scheduler**

```bash
airflow scheduler
```

**Terminal 2 – Webserver**

```bash
airflow webserver -p 8080 --hostname 0.0.0.0
```

**Terminal 3 – Celery Worker**

```bash
airflow celery worker
```

(Optional) **Celery Flower (monitoring UI):**

```bash
airflow celery flower
```

---

## 1️⃣1️⃣ Access Airflow UI

Open in Windows browser:
👉 [http://localhost:8080](http://localhost:8080)
Login with:

* **Username**: admin
* **Password**: admin

---

## 1️⃣2️⃣ Verify Installation

1. Enable example DAGs in `airflow.cfg`:

   ```ini
   load_examples = True
   ```
2. Restart scheduler & webserver.
3. Trigger a sample DAG in the UI and check if a worker executes it.

---

✅ Done! You now have **Airflow 2.3 running with MySQL + CeleryExecutor** inside WSL.

---

Would you like me to also make you a **visual architecture diagram** (Airflow Scheduler + Webserver + Worker + MySQL + Redis) so you can keep it in your documentation?


# Errors Faced

I have compiled the following documentation for the errors you encountered during your Airflow installation. It's organized to help you quickly identify the issue, understand its cause, and apply the correct solution.

## Airflow Installation & Configuration Errors

-----

### 1\. `ImportError: Module "airflow.utils.net" does not define a "getfqdn" attribute/class`

**Description:** This error occurs when the Airflow scheduler or other components try to start but fail because a required function, `airflow.utils.net.getfqdn`, is missing.

**Cause:** This is a compatibility issue with Airflow versions 2.4 and newer. The `airflow.utils.net.getfqdn` function was removed in favor of Python's standard `socket.getfqdn` function. Your `airflow.cfg` file is configured to use the deprecated function.

**Solution:**

1.  Open your `airflow.cfg` file.
2.  In the `[core]` section, find the `hostname_callable` key.
3.  Change its value from `airflow.utils.net.getfqdn` to `socket.getfqdn`.
4.  Save the file and restart your Airflow components.

-----

### 2\. `sqlalchemy.exc.OperationalError: (MySQLdb.OperationalError) (1054, "Unknown column 'task_instance.run_id'")`

**Description:** When you try to run the Airflow scheduler or other components, they fail with an error stating that the `run_id` column is not found in the `task_instance` table.

**Cause:** This error indicates a schema mismatch. You have an updated version of the Airflow code (2.0+) that expects the database schema to be at a certain revision, but your database is on an older schema. The `run_id` column was added in a database migration that has not yet been applied.

**Solution:**

1.  Run the database migration command to update the schema to the correct version.
    ```bash
    airflow db upgrade
    ```
2.  Once the command finishes successfully, you can start the Airflow scheduler and webserver.

-----

### 3\. `sqlalchemy.exc.OperationalError: (MySQLdb.OperationalError) (1061, "Duplicate key name 'idx_job_state_heartbeat'")`

**Description:** The `airflow db upgrade` command fails with a "Duplicate key name" error when attempting to add an index.

**Cause:** This indicates a corrupted or incomplete database migration. A previous attempt to run `airflow db upgrade` was interrupted after it created the index but before it recorded the migration as complete. As a result, the index exists, but Airflow's internal versioning doesn't know about it, so it tries to create it again.

**Solution:**

1.  Connect to your MySQL database.
    ```bash
    mysql -u anil -p airflow_db
    ```
2.  Manually drop the duplicate index.
    ```sql
    ALTER TABLE job DROP INDEX idx_job_state_heartbeat;
    ```
3.  Exit MySQL and re-run the `airflow db upgrade` command.

-----

### 4\. `sqlalchemy.exc.OperationalError: (MySQLdb.OperationalError) (1060, "Duplicate column name 'extra'")`

**Description:** The `airflow db upgrade` command fails with a "Duplicate column name" error when trying to add a new column to a table.

**Cause:** Similar to the previous error, this is a sign of a partial database migration. The `extra` column was added to the `log` table, but the migration script was interrupted before completing.

**Solution:**

1.  Connect to your MySQL database.
    ```bash
    mysql -u anil -p airflow_db
    ```
2.  Manually drop the duplicate column.
    ```sql
    ALTER TABLE log DROP COLUMN extra;
    ```
3.  Exit MySQL and re-run the `airflow db upgrade` command.

-----

### General Recommendation: Re-initialization

When faced with persistent or complex schema issues, the safest and most effective solution is to **start with a clean database**. This avoids any lingering inconsistencies and ensures that `airflow db init` can run smoothly.

**How to re-initialize your Airflow database:**

1.  **Drop and re-create the database**:
    ```bash
    mysql -u anil -p
    DROP DATABASE airflow_db;
    CREATE DATABASE airflow_db CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
    GRANT ALL PRIVILEGES ON airflow_db.* TO 'anil'@'localhost';
    FLUSH PRIVILEGES;
    exit;
    ```
2.  **Initialize Airflow database**:
    ```bash
    airflow db init
    ```