Here’s a breakdown and **comparison** of **templated vs non-templated tasks** in Airflow, with a focus on the BashOperator, as covered in your transcript:

---

### 🔧 What Are Airflow Templates?
- Templates in Airflow let you **dynamically substitute values at runtime** using [Jinja templating](https://jinja.palletsprojects.com/).
- You use double curly braces `{{ }}` for substitution.
- This keeps your DAGs **cleaner**, **more flexible**, and **scalable**.

---

### 🛑 Non-Templated Example

If you need to log:
```
Reading file1.txt
Reading file2.txt
```
You’d need **2 separate tasks** manually:

```python
task1 = BashOperator(
    task_id='read_file1',
    bash_command='echo Reading file1.txt',
    dag=dag,
)

task2 = BashOperator(
    task_id='read_file2',
    bash_command='echo Reading file2.txt',
    dag=dag,
)
```

> ❌ Hard to maintain if you have 100+ files  
> ❌ Repetitive code  
> ❌ Not flexible

---

### ✅ Templated Example

You define a **template string** with a placeholder using Jinja:

```python
templated_command = 'echo Reading {{ params.filename }}'
```

Now reuse it:

```python
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime

with DAG(
    'templated_bash_operator',
    start_date=datetime(2025, 4, 1),
    schedule_interval=None,
    catchup=False,
) as dag:

    task1 = BashOperator(
        task_id='read_file1',
        bash_command=templated_command,
        params={'filename': 'file1.txt'}
    )

    task2 = BashOperator(
        task_id='read_file2',
        bash_command=templated_command,
        params={'filename': 'file2.txt'}
    )

    task1 >> task2
```

> ✅ Cleaner, reusable logic  
> ✅ Only the `params` need to change  
> ✅ Easy to loop/generate dynamically if needed

---

### 🔄 Bonus: Looping Over Files (Dynamic Task Mapping in Airflow 2.3+)

```python
files = ['file1.txt', 'file2.txt', 'file3.txt']

from airflow.decorators import task, dag
from airflow.operators.bash import BashOperator

@dag(start_date=datetime(2025, 4, 1), schedule_interval=None, catchup=False)
def dynamic_bash_template():

    @task
    def list_files():
        return files

    def make_task(filename):
        return BashOperator(
            task_id=f'read_{filename.replace(".", "_")}',
            bash_command='echo Reading {{ params.filename }}',
            params={'filename': filename}
        )

    file_list = list_files()
    file_list.output.map(make_task)

dag = dynamic_bash_template()
```

> ✅ Fully dynamic with Airflow 2.3+ task mapping!

---


### **Creating a templated BashOperator**
You've successfully created a BashOperator that cleans a given data file by executing a script called cleandata.sh. This works, but unfortunately requires the script to be run only for the current day. Some of your data sources are occasionally behind by a couple of days and need to be run manually.

You successfully modify the cleandata.sh script to take one argument - the date in YYYYMMDD format. Your testing works at the command-line, but you now need to implement this into your Airflow DAG. For now, use the term {{ ds_nodash }} in your template - you'll see exactly what this means later on.

In [None]:
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime

default_args = {
  'start_date': datetime(2023, 4, 15),
}

cleandata_dag = DAG('cleandata',
                    default_args=default_args,
                    schedule_interval='@daily')

# Create a templated command to execute
# 'bash cleandata.sh datestring'
templated_command = """
bash cleandata.sh {{ ds_nodash }}
"""

# Modify clean_task to use the templated command
clean_task = BashOperator(task_id='cleandata_task',
                          bash_command=templated_command,
                          dag=cleandata_dag)


### **Templates with multiple arguments**
You wish to build upon your previous DAG and modify the code to support two arguments - the date in YYYYMMDD format, and a file name passed to the cleandata.sh script.

In [None]:
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime

default_args = {
  'start_date': datetime(2023, 4, 15),
}

cleandata_dag = DAG('cleandata',
                    default_args=default_args,
                    schedule_interval='@daily')

# Modify the templated command to handle a
# second argument called filename.
templated_command = """
  bash cleandata.sh {{ ds_nodash }} {{ params.filename }}
"""

# Modify clean_task to pass the new argument
clean_task = BashOperator(task_id='cleandata_task',
                          bash_command=templated_command,
                          params={'filename': 'salesdata.txt'},
                          dag=cleandata_dag)

# Create a new BashOperator clean_task2
clean_task2 = BashOperator(task_id='cleandata_task2',
                          bash_command=templated_command,
                          params={'filename': 'supportdata.txt'},
                          dag=cleandata_dag)
                           
# Set the operator dependencies
clean_task >> clean_task2



---

### 🐍 1. **Templated `PythonOperator` Example**

The `PythonOperator` runs Python functions and can access context variables like `ds`, `ts`, etc., which can be templated.

#### ✅ Example: Use `ds` (execution date) in a Python function

```python
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime

def print_execution_date(execution_date, **kwargs):
    print(f"Running DAG on: {execution_date}")

with DAG(
    dag_id='templated_python_operator',
    start_date=datetime(2025, 4, 1),
    schedule_interval='@daily',
    catchup=False,
) as dag:

    task = PythonOperator(
        task_id='print_execution_date',
        python_callable=print_execution_date,
        op_args=['{{ ds }}'],  # Jinja templating in op_args
        provide_context=True
    )
```

> 🧠 `{{ ds }}` is replaced by the current DAG execution date (e.g. `2025-04-11`)

---

### 📧 2. **Templated `EmailOperator` Example**

The `EmailOperator` sends emails using Airflow’s configured email backend. You can use templating to customize the subject or body.

#### ✅ Example: Use templated subject and HTML body

```python
from airflow import DAG
from airflow.operators.email import EmailOperator
from datetime import datetime

with DAG(
    dag_id='templated_email_operator',
    start_date=datetime(2025, 4, 1),
    schedule_interval=None,
    catchup=False,
) as dag:

    send_email = EmailOperator(
        task_id='send_email_task',
        to='team@example.com',
        subject='Report for {{ ds }}',
        html_content="""
            <h3>Daily Report</h3>
            <p>The DAG ran on {{ ds }} and completed successfully.</p>
        """,
    )
```

> 📨 Output will be:
> - **Subject:** Report for 2025-04-11
> - **Body:** HTML showing that the DAG ran successfully on that day

---

### 🧩 Summary

| Operator        | What’s Templated?                     | Useful Context Vars     | Notes |
|----------------|----------------------------------------|--------------------------|-------|
| `BashOperator`  | Shell command (`bash_command`)         | `{{ ds }}`, `{{ params }}` | Clean logging, flexible tasks |
| `PythonOperator`| Function args (`op_args`, `op_kwargs`) | `{{ ds }}`, `{{ ts }}`, `{{ dag }}` | Use `provide_context=True` |
| `EmailOperator` | `subject`, `html_content`, `body`     | `{{ ds }}`, `{{ task_instance }}` | Useful for alerts and reports |

---



---

### 🔁 1. **Advanced Templating with `for` Loops in Jinja**

#### ✅ Use Case:
Instead of creating multiple tasks for multiple filenames, iterate through a list in a single task using Jinja's `for` loop.

#### 🧪 Example:

```python
templated_command = """
{% for filename in params.filenames %}
    echo Reading {{ filename }}
{% endfor %}
"""

read_files = BashOperator(
    task_id='read_multiple_files',
    bash_command=templated_command,
    params={'filenames': ['file1.txt', 'file2.txt']},
    dag=dag
)
```

> 🧠 This creates a single task that logs:
> ```
> Reading file1.txt
> Reading file2.txt
> ```

> ⚡ **Scales well** for 100+ files without needing 100 tasks!

---

### 📅 2. **Built-in Template Variables**

Airflow injects special runtime context variables into templates. These include:

| Variable          | Description                                           | Example Output     |
|-------------------|-------------------------------------------------------|---------------------|
| `{{ ds }}`         | DAG run date (YYYY-MM-DD)                            | `2025-04-11`        |
| `{{ ds_nodash }}`  | Same as `ds` but without dashes                      | `20250411`          |
| `{{ prev_ds }}`    | Previous DAG run date                                | `2025-04-10`        |
| `{{ prev_ds_nodash }}` | Previous DAG run date without dashes            | `20250410`          |
| `{{ dag }}`        | Current DAG object                                   | `<DAG object>`      |
| `{{ conf }}`       | Airflow config values (if needed)                   | `<config>`          |

> 🔍 These are **strings**, not datetime objects.

---

### 🧮 3. **Airflow Macros**

The `macros` variable allows access to helper functions and Python standard objects inside templates.

#### 📦 Key Macros:

| Macro                     | Description                                                | Example                             |
|--------------------------|------------------------------------------------------------|-------------------------------------|
| `macros.datetime`         | Python's `datetime.datetime` object                        | Use for date formatting             |
| `macros.timedelta`        | Python's `timedelta` object                                | Useful for date math                |
| `macros.uuid`             | Generates UUIDs                                            | `1b4e28ba-2fa1-11d2-883f-0016d3cca427` |
| `macros.ds_add(ds, n)`    | Adds or subtracts days from a date string                  | `macros.ds_add('20200418', 2)` → `20200420` |

---

### 📘 Reference
🔗 Official Airflow Macros Documentation:  
👉 [https://airflow.apache.org/docs/stable/macros-ref.html](https://airflow.apache.org/docs/stable/macros-ref.html)

---

### 🧩 Summary

| Feature            | Benefit                                      | Use Case                              |
|--------------------|-----------------------------------------------|----------------------------------------|
| Jinja `for` loop   | Iterates over lists                          | Single task to handle multiple items   |
| Template variables | Access to DAG run context                    | Dynamic file naming, logs, etc.        |
| Macros             | Extend templating with Python-like logic     | Date math, UUID generation, etc.       |

---



---

## 🌍 Real-World Scenario:
> You’re building a DAG that processes daily log files for multiple departments. Each log file is named like `sales_{{ ds }}.log`, `marketing_{{ ds }}.log`, etc. You want to:
- Dynamically generate file paths using `ds`
- Iterate through departments
- Log the processing
- Use macros to calculate yesterday’s date and include it in the log

---

## 🛠️ DAG with Full Templating Power

### 📦 Requirements:
- Process files for multiple departments in a single BashOperator
- Use `{{ ds }}` for the current run date
- Use `macros.ds_add(ds, -1)` for yesterday’s date
- Iterate using a Jinja `for` loop

---

### 🧪 Code Example

```python
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime

default_args = {
    'start_date': datetime(2025, 4, 10),
}

with DAG(
    'templated_log_processor',
    default_args=default_args,
    schedule_interval='@daily',
    catchup=False,
) as dag:

    departments = ['sales', 'marketing', 'finance']

    templated_command = """
    echo "Run date: {{ ds }}"
    echo "Yesterday was: {{ macros.ds_add(ds, -1) }}"
    
    {% for dept in params.departments %}
        echo "Processing {{ dept }} logs for date {{ ds }}..."
        echo "File: /logs/{{ dept }}/{{ ds }}.log"
        echo "Archived version: /archive/{{ dept }}/{{ macros.ds_add(ds, -1) }}.log"
    {% endfor %}
    """

    process_logs = BashOperator(
        task_id='process_daily_logs',
        bash_command=templated_command,
        params={'departments': departments},
    )
```

---

### 🧾 What It Does

✅ Logs the current DAG run date  
✅ Logs yesterday's date using a macro  
✅ Iterates through each department  
✅ Prints both current and archived log file paths

---

### 📤 Output (for 2025-04-11 DAG run):

```
Run date: 2025-04-11
Yesterday was: 2025-04-10

Processing sales logs for date 2025-04-11...
File: /logs/sales/2025-04-11.log
Archived version: /archive/sales/2025-04-10.log

Processing marketing logs for date 2025-04-11...
File: /logs/marketing/2025-04-11.log
Archived version: /archive/marketing/2025-04-10.log

Processing finance logs for date 2025-04-11...
File: /logs/finance/2025-04-11.log
Archived version: /archive/finance/2025-04-10.log
```

---

### 🧠 Learnings Recap:
| Feature Used       | Why It’s Useful                                      |
|--------------------|------------------------------------------------------|
| `{{ ds }}`         | Gets the DAG run date dynamically                    |
| `macros.ds_add()`  | Allows date math inside templates                    |
| Jinja `for` loop   | Avoids writing one task per department               |
| `params` dict      | Passes custom variables into templated commands      |

---



### **Using lists with templates**
Once again, you decide to make some modifications to the design of your cleandata workflow. This time, you realize that you need to run the command cleandata.sh with the date argument and the file argument as before, except now you have a list of 30 files. You do not want to create 30 tasks, so your job is to modify the code to support running the argument for 30 or more files.

The Python list of files is already created for you, simply called filelist.

In [None]:
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime

filelist = [f'file{x}.txt' for x in range(30)]

default_args = {
  'start_date': datetime(2020, 4, 15),
}

cleandata_dag = DAG('cleandata',
                    default_args=default_args,
                    schedule_interval='@daily')

# Modify the template to handle multiple files in a 
# single run.
templated_command = """
  <% for filename in params.filenames %>
  bash cleandata.sh {{ ds_nodash }} {{ filename }};
  <% endfor %>
"""

# Modify clean_task to use the templated command
clean_task = BashOperator(task_id='cleandata_task',
                          bash_command=templated_command,
                          params={'filenames': filelist},
                          dag=cleandata_dag)
                          


### **Sending templated emails**
While reading through the Airflow documentation, you realize that various operations can use templated fields to provide added flexibility. You come across the docs for the EmailOperator and see that the content can be set to a template. You want to make use of this functionality to provide more detailed information regarding the output of a DAG run.

In [None]:
from airflow import DAG
from airflow.operators.email import EmailOperator
from datetime import datetime

# Create the string representing the html email content
html_email_str = """
Date: {{ ds }}
Username: {{ params.username }}
"""

email_dag = DAG('template_email_test',
                default_args={'start_date': datetime(2023, 4, 15)},
                schedule_interval='@weekly')
                
email_task = EmailOperator(task_id='email_task',
                           to='testuser@datacamp.com',
                           subject="{{ macros.uuid.uuid4() }}",
                           html_content=html_email_str,
                           params={'username': 'testemailuser'},
                           dag=email_dag)


 **Real-world DAG using branching** 
---

## 🌍 Real-World Scenario:  
> You manage data quality checks. You want to run **different validation tasks on odd and even days**.  
For example:
- **Even days**: Run full data integrity check.
- **Odd days**: Run only basic null check.

You’ll use `BranchPythonOperator` to decide which path to follow.

---

## 🧠 Key Concepts Used:
| Feature              | Role                                                                 |
|----------------------|----------------------------------------------------------------------|
| `BranchPythonOperator` | Controls the flow based on logic (even/odd days)                     |
| `kwargs['ds_nodash']`  | Provides runtime date in `YYYYMMDD` format                          |
| `.provide_context=True`| Makes runtime context available inside your Python function         |
| `.>>` & `<<` (bitshift)| Sets task dependencies in a readable way                            |

---

## 🧪 Code Example:

```python
from airflow import DAG
from airflow.operators.python import PythonOperator, BranchPythonOperator
from airflow.operators.empty import EmptyOperator
from datetime import datetime

default_args = {
    'start_date': datetime(2025, 4, 10),
}

def choose_path(**kwargs):
    run_date = int(kwargs['ds_nodash'])
    if run_date % 2 == 0:
        return 'full_integrity_check'
    else:
        return 'null_check'

with DAG(
    'branching_data_quality_dag',
    default_args=default_args,
    schedule_interval='@daily',
    catchup=False,
) as dag:

    start = EmptyOperator(task_id='start')

    branch = BranchPythonOperator(
        task_id='branch_task',
        python_callable=choose_path,
        provide_context=True
    )

    full_integrity_check = EmptyOperator(task_id='full_integrity_check')
    full_check_followup = EmptyOperator(task_id='full_check_followup')

    null_check = EmptyOperator(task_id='null_check')
    null_check_followup = EmptyOperator(task_id='null_check_followup')

    join = EmptyOperator(task_id='join', trigger_rule='none_failed_or_skipped')

    # Set dependencies
    start >> branch
    branch >> full_integrity_check >> full_check_followup >> join
    branch >> null_check >> null_check_followup >> join
```

---

## 📤 What It Does:

- Runs `branch_task` after the `start` task
- If the day is **even** (e.g., 20250412), it:
  - Runs `full_integrity_check` → `full_check_followup`
- If the day is **odd** (e.g., 20250411), it:
  - Runs `null_check` → `null_check_followup`
- Both paths merge into the `join` task

---

## 📊 Airflow Graph View:

```plaintext
start
  |
branch_task
 /         \
full       null
 |           |
followup   followup
   \       /
     join
```

---

## 🔍 Optional Enhancements:

- Replace `EmptyOperator` with real logic (e.g., `PythonOperator`, `BashOperator`)
- Add logging inside each task to indicate which path is being followed
- Use `trigger_rule='none_failed_or_skipped'` on `join` to handle skipped paths gracefully

---


### **Define a BranchPythonOperator**
After learning about the power of conditional logic within Airflow, you wish to test out the BranchPythonOperator. You'd like to run a different code path if the current execution date represents a new year (ie, 2020 vs 2019).

The DAG is defined for you, along with the tasks in question. Your current task is to implement the BranchPythonOperator.

In [None]:
# Create a function to determine if years are different
def year_check(**kwargs):
    current_year = int(kwargs['ds_nodash'][0:4])
    previous_year = int(kwargs['prev_ds_nodash'][0:4])
    if current_year == previous_year:
        return 'current_year_task'
    else:
        return 'new_year_task'

# Define the BranchPythonOperator
branch_task = BranchPythonOperator(task_id='branch_task', dag=branch_dag,
                                   python_callable=year_check, provide_context=True)
# Define the dependencies
branch_task >> current_year_task
branch_task >> new_year_task

### **Branch troubleshooting**
While working with a workflow defined by a colleague, you notice that a branching operator executes, but there's never any change in the DAG results. You realize that regardless of the state defined by the branching operator, all other tasks complete, even as some should be skipped.

Use what you've learned to determine the most likely reason that the branching operator is ineffective.

Great observation — this kind of issue is actually pretty common when using the `BranchPythonOperator` for the first time. Here's a **quick diagnosis and the most likely cause**:

---

## 🧠 **Root Cause: Downstream Tasks Aren’t Properly Connected to the Branch**
> **All tasks run** despite the branch decision because **they are not directly downstream of the BranchPythonOperator.**

---

## ✅ Branching Works **Only If:**
1. All downstream tasks from the branch are **direct children** of the `BranchPythonOperator`.
2. The non-selected tasks will only be skipped **if they are part of the branch’s downstream**.
3. Any task that is not explicitly part of the branching path **will still execute** as usual.

---

## 🔍 Visualizing the Wrong Setup:

```plaintext
start
  |
branch_task
  |
task_a     ← not directly downstream
task_b     ← not directly downstream
```

> Even if `branch_task` returns only `task_a`, `task_b` still runs because it's not controlled by the branch.

---

## ✅ Correct Setup (All Paths Flow Through Branch):

```plaintext
start
  |
branch_task
 /       \
task_a   task_b
```

> Now, if `branch_task` returns `'task_a'`, only `task_a` runs. `task_b` will be **skipped automatically.**

---

## ✔ Fix the Issue by:
- Making sure **every downstream path from the branch is explicitly connected**
- Using Airflow’s `.set_downstream()` or `>>` operators correctly

---

## 🧪 Example Fix in Code:

```python
branch >> task_a
branch >> task_b
```

---



### **Production pipeline in Airflow**:

| 🔢 | Topic                    | Command / Example                             | 🔍Explanation                                                                 |
|-----|---------------------------|------------------------------------------------|--------------------------------------------------------------------------------------|
| 1️⃣ | **Run a Task**           | `airflow tasks test <dag_id> <task_id> <date>` | Runs **one task** from a DAG for a specific date (for testing).                     |
| 2️⃣ | **Run a Full DAG**       | `airflow dags trigger -e <date> <dag_id>`     | Runs the **entire DAG** as if it were executed on that date.                        |
| 3️⃣ | **BashOperator**         | `bash_command="echo Hello"`                   | Runs Bash commands inside your DAG.                                                 |
| 4️⃣ | **PythonOperator**       | `python_callable=my_function`                | Runs a Python function.                                                             |
| 5️⃣ | **BranchPythonOperator**| Like PythonOperator + `provide_context=True`  | Chooses **which task to run next** based on logic (e.g., even vs odd day).          |
| 6️⃣ | **FileSensor**           | `filepath='/data/file.csv'`                  | Waits until a file exists at the given location before continuing.                  |
| 7️⃣ | **Template Fields**      | Example: `bash_command="{{ ds }}"`            | Use **Jinja templates** to insert dynamic values like execution date.               |
| 8️⃣ | **Check Template Support**| `help(BashOperator)` in Python                | Shows which fields (like `bash_command`) can use templates (look for `template_fields`). |


### **Creating a production pipeline #1**
You've learned a lot about how Airflow works - now it's time to implement your workflow into a production pipeline consisting of many objects including sensors and operators. Your boss is interested in seeing this workflow become automated and able to provide SLA reporting as it provides some extra leverage for closing a deal the sales staff is working on. The sales prospect has indicated that once they see updates in an automated fashion, they're willing to sign-up for the indicated data service.

From what you've learned about the process, you know that there is sales data that will be uploaded to the system. Once the data is uploaded, a new file should be created to kick off the full processing, but something isn't working correctly.

Refer to the source code of the DAG to determine if anything extra needs to be added.

In [None]:
from airflow import DAG
from airflow.sensors.filesystem import FileSensor

# Import the needed operators
from airflow.operators.bash import BashOperator
from airflow.operators.python import PythonOperator
from datetime import date, datetime

def process_data(**context):
  file = open('/home/repl/workspace/processed_data.tmp', 'w')
  file.write(f'Data processed on {date.today()}')
  file.close()

    
dag = DAG(dag_id='etl_update', default_args={'start_date': datetime(2023,4,1)})

sensor = FileSensor(task_id='sense_file', 
                    filepath='/home/repl/workspace/startprocess.txt',
                    poke_interval=5,
                    timeout=15,
                    dag=dag)

bash_task = BashOperator(task_id='cleanup_tempfiles', 
                         bash_command='rm -f /home/repl/*.tmp',
                         dag=dag)

python_task = PythonOperator(task_id='run_processing', 
                             python_callable=process_data,
                             dag=dag)

sensor >> bash_task >> python_task


### **Creating a production pipeline #2**
Continuing on your last workflow, you'd like to add some additional functionality, specifically adding some SLAs to the code and modifying the sensor components.

Refer to the source code of the DAG to determine if anything extra needs to be added. The default_args dictionary has been defined for you, though it may require further modification.

In [None]:
from airflow import DAG
from airflow.sensors.filesystem import FileSensor
from airflow.operators.bash import BashOperator
from airflow.operators.python import PythonOperator
from dags.process import process_data
from datetime import timedelta, datetime

# Update the default arguments and apply them to the DAG
default_args = {
  'start_date': datetime(2023,1,1),
  'sla': timedelta(minutes=90)
}

dag = DAG(dag_id='etl_update', default_args=default_args)

sensor = FileSensor(task_id='sense_file', 
                    filepath='/home/repl/workspace/startprocess.txt',
                    poke_interval=45,
                    dag=dag)

bash_task = BashOperator(task_id='cleanup_tempfiles', 
                         bash_command='rm -f /home/repl/*.tmp',
                         dag=dag)

python_task = PythonOperator(task_id='run_processing', 
                             python_callable=process_data,
                             provide_context=True,
                             dag=dag)

sensor >> bash_task >> python_task


### **Adding the final changes to your pipeline**
To finish up your workflow, your manager asks that you add a conditional logic check to send a sales report via email, only if the day is a weekday. Otherwise, no email should be sent. In addition, the email task should be templated to include the date and a project name in the content.

The branch callable is already defined for you.

In [None]:
from airflow import DAG
from airflow.sensors.filesystem import FileSensor
from airflow.operators.bash import BashOperator
from airflow.operators.python import PythonOperator
from airflow.operators.python import BranchPythonOperator
from airflow.operators.empty import EmptyOperator
from airflow.operators.email import EmailOperator
from dags.process import process_data
from datetime import datetime, timedelta

# Update the default arguments and apply them to the DAG.

default_args = {
  'start_date': datetime(2023,1,1),
  'sla': timedelta(minutes=90)
}
    
dag = DAG(dag_id='etl_update', default_args=default_args)

sensor = FileSensor(task_id='sense_file', 
                    filepath='/home/repl/workspace/startprocess.txt',
                    poke_interval=45,
                    dag=dag)

bash_task = BashOperator(task_id='cleanup_tempfiles', 
                         bash_command='rm -f /home/repl/*.tmp',
                         dag=dag)

python_task = PythonOperator(task_id='run_processing', 
                             python_callable=process_data,
                             provide_context=True,
                             dag=dag)

email_subject="""
  Email report for {{ params.department }} on {{ ds_nodash }}
"""

email_report_task = EmailOperator(task_id='email_report_task',
                                  to='sales@mycompany.com',
                                  subject=email_subject,
                                  html_content='',
                                  params={'department': 'Data subscription services'},
                                  dag=dag)

no_email_task = EmptyOperator(task_id='no_email_task', dag=dag)

def check_weekend(**kwargs):
    dt = datetime.strptime(kwargs['execution_date'],"%Y-%m-%d")
    # If dt.weekday() is 0-4, it's Monday - Friday. If 5 or 6, it's Sat / Sun.
    if (dt.weekday() < 5):
        return 'email_report_task'
    else:
        return 'no_email_task'
    
branch_task = BranchPythonOperator(task_id='check_if_weekend',
                                   python_callable=check_weekend,
                                   provide_context=True,
                                   dag=dag)

sensor >> bash_task >> python_task

python_task >> branch_task >> [email_report_task, no_email_task]