Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@task.sensor does not receive injected values #33121

Closed
1 of 2 tasks
scr-oath opened this issue Aug 4, 2023 · 2 comments
Closed
1 of 2 tasks

@task.sensor does not receive injected values #33121

scr-oath opened this issue Aug 4, 2023 · 2 comments
Labels
area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet

Comments

@scr-oath
Copy link

scr-oath commented Aug 4, 2023

Apache Airflow version

Other Airflow 2 version (please specify below)

What happened

Airflow version: MWAA (2.5.1)
Context/use-case: migrating from oozie where there are several coordinated workflows that operate for a given day and look for their "inputs" to start the work.

Tried having the producer and consumer both scheduled @daily and for the consumer to start with a @task.sensor that looked for a _SUCCESS file (in s3) for its ds/datestamp.

I'm told that the v1 style of sensor as an object supports this - haven't tried yet, but was going from the get-go with v2 style as more natural and modern, and very surprised to find that the context variables were not injected into @task.sensor functions the way they were for @task functions.

@dag(
    schedule='@daily',
    start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),
    tags=['triggering', 'datasets']
)
def ds_consumer():
    @task.sensor(poke_interval=60, timeout=86400, mode='reschedule')
    def wait_for_upstream(ds=None) -> PokeReturnValue:
        assert ds is not None
        # ... always hits ^ assertion

What you think should happen instead

@task.sensor-decorated functions should support injected kwargs in the same way as @task-decorated functions.

How to reproduce

Write a simple dag with

import pendulum
from airflow.datasets import Dataset
from airflow.decorators import task
from airflow.models.dag import dag
from airflow.providers.amazon.aws.hooks.s3 import S3Hook
from airflow.sensors.base import PokeReturnValue
from urllib3.util import parse_url, Url

my_dataset = Dataset('s3://mybucket/path/to/my/dataset/')


@dag(
    schedule='@daily',
    start_date=pendulum.datetime(2023, 8, 1, tz="UTC"),
    tags=['triggering', 'datasets']
)
def task_sensor_reduction():
    @task.sensor(poke_interval=60, timeout=86400, mode='reschedule')
    def wait_for_upstream(ds=None) -> PokeReturnValue:
        assert ds is not None
        my_dataset_url: Url = parse_url(my_dataset.uri)
        my_key: str = my_dataset_url.path
        my_key = my_key.removeprefix('/').removesuffix('/') + '/' + ds
        my_bucket = my_dataset_url.hostname
        s3 = S3Hook()
        done = s3.check_for_key(key=my_key, bucket_name=my_bucket)
        return PokeReturnValue(is_done=done, xcom_value=ds)

    @task
    def do_work(ds):
        pass  # ...

    upstream_task = wait_for_upstream()
    work_task = do_work(upstream_task)


dag = task_sensor_reduction()

if __name__ == '__main__':
    dag.test()

see

[2023-08-04, 16:54:26 UTC] {{taskinstance.py:1768}} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/airflow/sensors/base.py", line 199, in execute
    poke_return = self.poke(context)
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/airflow/decorators/sensor.py", line 60, in poke
    return self.python_callable(*self.op_args, **self.op_kwargs)
  File "/usr/local/airflow/dags/task_sensor_reduction.py", line 24, in wait_for_upstream
    assert ds is not None
AssertionError

Operating System

MWAA 2.5.1

Versions of Apache Airflow Providers

MWAA 2.5.1

Deployment

Amazon (AWS) MWAA

Deployment details

latest version 2.5.1

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@scr-oath scr-oath added area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Aug 4, 2023
@josh-fell
Copy link
Contributor

josh-fell commented Aug 4, 2023

This is fixed as of Airflow 2.5.2 via #29146. Confirmed repro on 2.5.1 and fix with 2.5.2+ (including 2.6.0, 2.6.3).

@scr-oath If you are unable to upgrade to a more recent version of Airflow, you can use the get_current_context() function to access the execution context in your sensor task:

...
from airflow.operators.python import get_current_context

@dag(
    schedule="@daily",
    start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),
    tags=["triggering", "datasets"],
    catchup=False,
)
def ds_consumer():
    @task.sensor(poke_interval=60, timeout=86400, mode="reschedule")
    def wait_for_upstream() -> PokeReturnValue:
        ds = get_current_context()["ds"]
        assert ds is not None

...

@scr-oath
Copy link
Author

scr-oath commented Aug 4, 2023

awesome - thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet
Projects
None yet
Development

No branches or pull requests

2 participants