-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a check for not templateable fields #29821
Conversation
Nice one! |
(cherry picked from commit 7963360)
(cherry picked from commit 7963360)
@hussein-awala what if my custom operator needs to use templating for |
@Inetov This is not possible for now, I will check if templating |
@Inetov @hussein-awala see feature request for this #31801 |
@hussein-awala Unfortunately, this change is going to break our Airflow environment. Is there any reason why Actually, it's working just fine! But you're going to prohibit that in future versions, and I'd like to know the rationale behind that decision. Thank you! |
I think that's a good idea to make a PR to allow to exclude some of the fields from that check. I think PR for that would be good @gdavoian |
@potiuk do you have any ideas on how it would fit into And that actually leads to my initial question: why did we ever need that backward-incompatible check in the first place? |
@gdavoian I agree with Jarek. This PR was created to avoid the rendering issues with BaseOperator attributes like IMHO, you can create a test that loops over the excluded attributes, try to use a templated value, and check the value in runtime after rendering it. To know what attributes could be excluded, you can start by excluding all the params and keeping only the params that pass the test. |
But how does this PR help solve the rendering issue? I understand that it tries to warn the users against their potentially mistaken intentions, though in fact it simply replaces one type of exception raised with another, reduces flexibility, and introduces a backward-incompatible change as a side effect. So the person who tried to add
This sounds like a workaround for a workaround :) For some reason, you've prohibited the rendering of all the |
@gdavoidan - what I proposed and I believe @hussein-awala agrees with is that you define which fields of BaseOperator you exclude from the check. This is what my proposal is about. Find out which of those fields CAN be templated and remove them from the check. Basically allow-list of fields you want to allow templating. Rather than allow all by default. There is no need to "specify" them as extra field - just hard-code them. What @hussein-awala proposes is that you attempt to test it and figure out which fields can be templated by templating then one by one and trying to serialize the DAG. because not all of them CAN be serialized and you need to serialize DAG to DB to execute it. And not all of them will take effect when templated. See description of the original error #29819 which explains the reason for doing it. Try to do what the author of the orignal issue and see it for yourself when you remove the protection. You don't need to believe anyone's word just try it and read the issue descrption.
No. You simply didn't dig deep enough. Look at the issue and try to understand what happened there. This is solution to a real problem that overcautiously excluded all the fields. What we are kindly asking you to do is to find out which fields out of those can be templated. Just looking at your examples neither The Similary Similarly
That simply would not work. You need So assumption when that PR got merged that most (if not all) of the fields of BaseOperators are like that and all of them should be excluded by default. That was goo d assumption, but over-cautious. Some parameters can be templated it seems. You seem to find one that might be allow-listed. Maybe more can be as well. What we are kindly asking you to review all of those, possibly test (as @hussein-awala proposed) and come with PR where you specifically allow-list those fields that CAN be allowlisted and you tested that they work. Is it too much of an ask? |
Most of these attributes should not be used as a templated field, so the PR helped quickly fail the dag parsing instead of falling in runtime.
Most of the BaseOperator fields are used in parsing time and not runtime, for example, execution_date, priority_weight, the different callbacks, the different dependencies params, retry params, and the pool (I don't know why you think it could be templated, it's used to see if we can queue the task or not, so before executing the job), and even doc_md which is a tasks param more than task instance param. For the email, we used it in the TaskInstance class when the TI fails, so after executing it, it's ok to exclude it. I'm not sure if we can exclude the other params; as I mentioned, most of them are used by the scheduler, executor, or webserver before executing the task. |
@potiuk @hussein-awala Thank you for the explanations! I'm starting to understand the problem and what I'm supposed to do. I'll check and get back to you soon. My apologies for any earlier misunderstandings! |
No worries :). Airflow is pretty ... complicated. ... and even after years of working on it every day I find something new every day or make wrong assumption. See that one for example #35539 (comment) just about 2 minutes ago. ... |
Hm, you were right, looks like only |
I would say - just I believe it is better to not allow to modify Of course that would require several layers of other security issues, but generally speaking (similarly to SQL INJECTION kind of vulnerabilities) - if you have a content generated from (potentially) user input, you should always sanitize it and not use it any context that is anywhere near security - bound decisions. |
@potiuk @hussein-awala You're welcome to review #35546 :) |
closes: #29819
There are some fields processed by the scheduler and they are not templateable, but currently we don't have any check for these fields. In this PR extract the parameters list for the method
__init__
ofBaseOperator
and I check if there is a field in thetemplate_fields
in this list to raise an exception, where all theBaseOperator
parameters are processed and used by the scheduler during processing time.