Skip to content

Conversation

@nailo2c
Copy link
Contributor

@nailo2c nailo2c commented Nov 4, 2025

Closes: #50327

Why

def remove_task_decorator(python_source: str, task_decorator_name: str) -> str:
"""
Remove @task or similar decorators as well as @setup and @teardown.
:param python_source: The python source code
:param task_decorator_name: the decorator name
"""
source_tree = cst.parse_module(python_source)
modified_tree = source_tree.visit(_TaskDecoratorRemover(task_decorator_name))
return modified_tree.code

The function remove_task_decorator uses cst.parse_module to parse the python source.

However, when the function source contains unusual indentation (e.g., a comment at column-0 inside an indented function), cst.parse_module fails and raises a cst.ParserSyntaxError.

    def b_task():
        print("hello")
##################
        print("more hello")

Because this exception was not caught, the function fails early. This prevents the @task.kubernetes decorator from being removed, causing it to be incorrectly included in the generated /tmp/script.py.

This leads to the final error reported in the issue: NameError: name 'task' is not defined when the pod tries to execute the script.

How

Add a regex fallback for @task.kubernetes and handle the indentation issue.

What

image

@uranusjr
Copy link
Member

uranusjr commented Nov 4, 2025

I feel this should be fixed in cst instead. Airflow shouldn’t try to work around this on its own.

@amoghrajesh
Copy link
Contributor

I think I agree with @uranusjr here, more of a CST problem that Airflow? Working around would be less than ideal here.

@potiuk
Copy link
Member

potiuk commented Nov 4, 2025

I think it should be solved by us but elsewhere. The root cause is this function:

    def get_python_source(self):
        raw_source = inspect.getsource(self.python_callable)
        res = textwrap.dedent(raw_source)
        res = remove_task_decorator(res, self.custom_operator_name)
        return res

task-sdk/src/airflow/sdk/bases/decorator.py

We are dedenting the function source ourselves - ans this is where error is introduced.

@potiuk
Copy link
Member

potiuk commented Nov 4, 2025

And here:

    def get_python_source(self):
        """Return the source of self.python_callable."""
        return textwrap.dedent(inspect.getsource(self.python_callable))

providers/standard/src/airflow/providers/standard/operators/python.py

@potiuk
Copy link
Member

potiuk commented Nov 4, 2025

We should likely remove the lines containing only comment before dedenting - that should fix the problem

@uranusjr
Copy link
Member

uranusjr commented Nov 5, 2025

Ah that makes sense, nice analysis @potiuk

@amoghrajesh
Copy link
Contributor

Nice investigation @potiuk!

@ashb
Copy link
Member

ashb commented Nov 5, 2025

Here's an idea out of left field: don't do any pre-processing or removing at all, but instead put this in the generated script:

from types import SimpleNamespace
task = SimpleNamespace(kubernetes=lambda f: f)

Then we could leave it in the script as:

@task.kubernetes
def my_fn(...): ...

# ...

try:
  my_fn(...)
  ...

That way we might not need to a) use cst to strip the decorator, b) nor worry about dedenting anything?

@nailo2c
Copy link
Contributor Author

nailo2c commented Nov 6, 2025

Hi folks, thanks for all the reviews. I believe we now have a better approach version :)

The current version follows @potiuk’s suggestion, it gets rid of lines that start with # in get_python_source.

Please let me know if there's anything else I can improve 💪

截圖 2025-11-05 下午1 12 03

@nailo2c nailo2c changed the title sdk: add regex fallback when removing task decorators (#50327) sdk: Refactor get_python_source to strip comments (#50327) Nov 6, 2025
@potiuk
Copy link
Member

potiuk commented Nov 6, 2025

@ashb

from types import SimpleNamespace
task = SimpleNamespace(kubernetes=lambda f: f)

Yeah, my thought exactly that we could not remove the decorators but replace them with something empty. But I think in this case also indentation matter - because we would have to keep the functions at the nested indentation level in the generated script (those callbacks are converted to top-level functions now). I am not sure if that can be handled easily. No esy idea for that one or that it's worth the effort.

I think for now simple removeal of the comments has drawbacks of course. It does not handle all cases - for example it will not work with multi-line strings and likely some other constructs.

    def b_task():
        print("""
hello
"""
        print("more hello")

In order to properly handle those constructs, we would really have to parse the whole code with AST and know which lines are supposed to be indented and which not.

For example this case is not easy to handle without knowing that hello2 is part of a multi-line string -- you need to parse the whole python file to know.

    def b_task():
        print("""
hello

        hello 2
"""
        print("more hello")

Or a better would would be simply to forbid those cases and turn ParserError into a more meaningful one (only use plain functions, do not break indentation. That wouls be simple- we extract the whispace from the first line and reject any function that has line that do not begin with the same whitespace prefix.

That would be simple and effective solution (and back-compatible - those functions now cause ParsingException). We can easily add this limitation and even document it.

@potiuk
Copy link
Member

potiuk commented Nov 16, 2025

Any other comments @ashb @uranusjr ?

@potiuk potiuk merged commit bca0183 into apache:main Dec 1, 2025
83 checks passed
RoyLee1224 pushed a commit to RoyLee1224/airflow that referenced this pull request Dec 3, 2025
…pache#57782)

* Add regex fallback for task decorator removal in case of parsing errors

* Refactor task decorator removal to eliminate regex fallback and improve parsing reliability

* Add unit test for stripping decorators and comments from task source

* Ensure newline at the end of Python source in DecoratedOperator and add assertions for module loading in tests
Copilot AI pushed a commit to jason810496/airflow that referenced this pull request Dec 5, 2025
…pache#57782)

* Add regex fallback for task decorator removal in case of parsing errors

* Refactor task decorator removal to eliminate regex fallback and improve parsing reliability

* Add unit test for stripping decorators and comments from task source

* Ensure newline at the end of Python source in DecoratedOperator and add assertions for module loading in tests
itayweb pushed a commit to itayweb/airflow that referenced this pull request Dec 6, 2025
…pache#57782)

* Add regex fallback for task decorator removal in case of parsing errors

* Refactor task decorator removal to eliminate regex fallback and improve parsing reliability

* Add unit test for stripping decorators and comments from task source

* Ensure newline at the end of Python source in DecoratedOperator and add assertions for module loading in tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Kubernetes task decorator can be broken by comments

5 participants