## Airflow sensors

#### Sensor details 
Derived from airflow.sensors.base_sensor_operator
- Sensor arguments:
- mode - How to check for the condition
- mode='poke' - The default, run repeatedly
- mode='reschedule' - Give up task slot and try again later
- poke_interval - How often to wait between checks
- timeout - How long to wait before failing task
- Also includes normal operator attributes

#### File sensor 
- Is part of the airflow.sensors library
- Checks for the existence of a file at a certain location
- Can also check if any files exist within a directory

In [None]:
from airflow.sensors.filesystem import FileSensor 
file_sensor_task = FileSensor(task_id='file_sense',      
                              filepath='salesdata.csv',       
                              poke_interval=300,        
                              dag=sales_report_dag)

init_sales_cleanup >> file_sensor_task >> generate_report

## Airflow executors 

#### What is an executor?
- Executors run tasks 
- Different executors handle running the tasks differently
- Example executors:\
  SequentialExecutor\
  LocalExecutor\
  KubernetesExecutor


In [None]:
from airflow import DAG
from airflow.operators.bash import BashOperator
from airflow.sensors.filesystem import FileSensor
from datetime import datetime

report_dag = DAG(
    dag_id = 'execute_report',
    schedule_interval = "0 0 * * *"
)

precheck = FileSensor(
    task_id='check_for_datafile',
    filepath='salesdata_ready.csv',
    start_date=datetime(2024,1,20),
    mode='reschedule',
    dag=report_dag
)

generate_report_task = BashOperator(
    task_id='generate_report',
    bash_command='generate_report.sh',
    start_date=datetime(2024,1,20),
    dag=report_dag
)

precheck >> generate_report_task

## Debugging and troubleshooting in Airflow

#### Two quick methods: 
- Run airflow dags list-import-errors
- Run python3 <dagfile.py>

## SLAs and reporting in Airflow

#### Defining SLAs 
- Using the 'sla' argument on the task

In [None]:
task1 = BashOperator(task_id='sla_task',        
                     bash_command='runcode.sh',     
                     sla=timedelta(seconds=30),   
                     dag=dag)

# On the default_args dictionary
default_args={'sla': timedelta(minutes=20),
              'start_date': datetime(2023,2,20)
             }

dag = DAG('sla_dag', default_args=default_args)

### Defining an SLA
You've successfully implemented several Airflow workflows into production, but you don't currently have any method of determining if a workflow takes too long to run. After consulting with your manager and your team, you decide to implement an SLA at the DAG level on a test workflow.

All appropriate Airflow libraries have been imported for you.

In [None]:
# Import the timedelta object
from datetime import timedelta

# Create the dictionary entry
default_args = {
  'start_date': datetime(2024, 1, 20),
  'sla': timedelta(minutes=30)
}

# Add to the DAG
test_dag = DAG('test_workflow', default_args=default_args, schedule_interval=None)

### Defining a task SLA
After completing the SLA on the entire workflow, you realize you really only need the SLA timing on a specific task instead of the full workflow.

The appropriate Airflow libraries are imported for you.

In [None]:
# Import the timedelta object
from datetime import timedelta

test_dag = DAG('test_workflow', start_date=datetime(2024,1,20), schedule_interval=None)

# Create the task with the SLA
task1 = BashOperator(task_id='first_task',
                     sla=timedelta(hours=3),
                     bash_command='initialize_data.sh',
                     dag=test_dag)

### Generate and email a report
Airflow provides the ability to automate almost any style of workflow. You would like to receive a report from Airflow when tasks complete without requiring constant monitoring of the UI or log files. You decide to use the email functionality within Airflow to provide this message.

All the typical Airflow components have been imported for you, and a DAG is already defined as report_dag.

In [None]:
# Define the email task
email_report = EmailOperator(
        task_id='email_report',
        to='airflow@datacamp.com',
        subject='Airflow Monthly Report',
        html_content="""Attached is your monthly workflow report - please refer to it for more detail""",
        files=['monthly_report.pdf'],
        dag=report_dag
)

# Set the email task to run after the report is generated
email_report << generate_report

### Adding status emails
You've worked through most of the Airflow configuration for setting up your workflows, but you realize you're not getting any notifications when DAG runs complete or fail. You'd like to setup email alerting for the success and failure cases, but you want to send it to two addresses.

In [None]:
from airflow import DAG
from airflow.operators.bash import BashOperator
from airflow.sensors.filesystem import FileSensor
from datetime import datetime

default_args={
    'email': ['airflowalerts@datacamp.com','airflowadmin@datacamp.com'],
    'email_on_failure': True,
    'email_on_success': True
}

report_dag = DAG(
    dag_id = 'execute_report',
    schedule_interval = "0 0 * * *",
    default_args=default_args
)

precheck = FileSensor(
    task_id='check_for_datafile',
    filepath='salesdata_ready.csv',
    start_date=datetime(2023,2,20),
    mode='reschedule',
    dag=report_dag)

generate_report_task = BashOperator(
    task_id='generate_report',
    bash_command='generate_report.sh',
    start_date=datetime(2023,2,20),
    dag=report_dag
)

precheck >> generate_report_task
