-
Notifications
You must be signed in to change notification settings - Fork 16.3k
sdk: Refactor get_python_source to strip comments (#50327)
#57782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sdk: Refactor get_python_source to strip comments (#50327)
#57782
Conversation
|
I feel this should be fixed in cst instead. Airflow shouldn’t try to work around this on its own. |
|
I think I agree with @uranusjr here, more of a CST problem that Airflow? Working around would be less than ideal here. |
|
I think it should be solved by us but elsewhere. The root cause is this function: def get_python_source(self):
raw_source = inspect.getsource(self.python_callable)
res = textwrap.dedent(raw_source)
res = remove_task_decorator(res, self.custom_operator_name)
return res
We are dedenting the function source ourselves - ans this is where error is introduced. |
|
And here: def get_python_source(self):
"""Return the source of self.python_callable."""
return textwrap.dedent(inspect.getsource(self.python_callable))
|
|
We should likely remove the lines containing only comment before dedenting - that should fix the problem |
…ve parsing reliability
|
Ah that makes sense, nice analysis @potiuk |
|
Nice investigation @potiuk! |
|
Here's an idea out of left field: don't do any pre-processing or removing at all, but instead put this in the generated script: Then we could leave it in the script as: @task.kubernetes
def my_fn(...): ...
# ...
try:
my_fn(...)
...That way we might not need to a) use cst to strip the decorator, b) nor worry about dedenting anything? |
…dd assertions for module loading in tests
|
Hi folks, thanks for all the reviews. I believe we now have a better approach version :) The current version follows @potiuk’s suggestion, it gets rid of lines that start with Please let me know if there's anything else I can improve 💪
|
get_python_source to strip comments (#50327)
from types import SimpleNamespace
task = SimpleNamespace(kubernetes=lambda f: f)Yeah, my thought exactly that we could not remove the decorators but replace them with something empty. But I think in this case also indentation matter - because we would have to keep the functions at the nested indentation level in the generated script (those callbacks are converted to top-level functions now). I am not sure if that can be handled easily. No esy idea for that one or that it's worth the effort. I think for now simple removeal of the comments has drawbacks of course. It does not handle all cases - for example it will not work with multi-line strings and likely some other constructs. def b_task():
print("""
hello
"""
print("more hello")In order to properly handle those constructs, we would really have to parse the whole code with AST and know which lines are supposed to be indented and which not. For example this case is not easy to handle without knowing that hello2 is part of a multi-line string -- you need to parse the whole python file to know. def b_task():
print("""
hello
hello 2
"""
print("more hello")Or a better would would be simply to forbid those cases and turn ParserError into a more meaningful one (only use plain functions, do not break indentation. That wouls be simple- we extract the whispace from the first line and reject any function that has line that do not begin with the same whitespace prefix. That would be simple and effective solution (and back-compatible - those functions now cause ParsingException). We can easily add this limitation and even document it. |
…pache#57782) * Add regex fallback for task decorator removal in case of parsing errors * Refactor task decorator removal to eliminate regex fallback and improve parsing reliability * Add unit test for stripping decorators and comments from task source * Ensure newline at the end of Python source in DecoratedOperator and add assertions for module loading in tests
…pache#57782) * Add regex fallback for task decorator removal in case of parsing errors * Refactor task decorator removal to eliminate regex fallback and improve parsing reliability * Add unit test for stripping decorators and comments from task source * Ensure newline at the end of Python source in DecoratedOperator and add assertions for module loading in tests
…pache#57782) * Add regex fallback for task decorator removal in case of parsing errors * Refactor task decorator removal to eliminate regex fallback and improve parsing reliability * Add unit test for stripping decorators and comments from task source * Ensure newline at the end of Python source in DecoratedOperator and add assertions for module loading in tests

Closes: #50327
Why
airflow/task-sdk/src/airflow/sdk/definitions/_internal/decorators.py
Lines 76 to 85 in 734960d
The function
remove_task_decoratorusescst.parse_moduleto parse the python source.However, when the function source contains unusual indentation (e.g., a comment at column-0 inside an indented function),
cst.parse_modulefails and raises acst.ParserSyntaxError.Because this exception was not caught, the function fails early. This prevents the
@task.kubernetesdecorator from being removed, causing it to be incorrectly included in the generated/tmp/script.py.This leads to the final error reported in the issue:
NameError: name 'task' is not definedwhen the pod tries to execute the script.How
Add a regex fallback for
@task.kubernetesand handle the indentation issue.What