Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Undefined Jinja Variables Rule #11241

Merged

Conversation

ashmeet13
Copy link
Contributor

@ashmeet13 ashmeet13 commented Oct 2, 2020

Hi,

Adding a rule to check for Undefined Jinja Variables (Issue: #11144) when upgrading to Airflow2.0

Logic - Use a DagBag to pull all dags and iterate over every dag. For every dag the task will be rendered using
an updated Jinja Environment using - jinja2.DebugUndefined. This will render the template leaving undefined variables
as they were. Using regex we can extract the variables and present possible error cases when upgrading.

Since I am fairly new to Airflow - please do guide if there is a better approach to implement this rule.
Thanks!

Related: #8765
Fixes: #11144

@boring-cyborg
Copy link

boring-cyborg bot commented Oct 2, 2020

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (flake8, pylint and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@turbaszek turbaszek added the area:upgrade Facilitating migration to a newer version of Airflow label Oct 3, 2020
@potiuk potiuk linked an issue Oct 3, 2020 that may be closed by this pull request
@potiuk
Copy link
Member

potiuk commented Oct 3, 2020

Can you please rebase to latest v1-10-test ? It should work now.

@potiuk
Copy link
Member

potiuk commented Oct 3, 2020

@ashmeet13 ^^

@ashmeet13
Copy link
Contributor Author

Hey @potiuk rebased the branch.
Two of the checks seemed to have failed -

  1. CI Build / Quarantined tests
  2. CI Build / K8s: 3.6 v1.17.5 image

Could you please guide towards how should I be fixing them?

@potiuk
Copy link
Member

potiuk commented Oct 4, 2020

The quarantined tests are kind and f ok to fail. Those are flaky tests that we know are not stable. The k8s ones are likely a transient error . So do not worry about those

@ashmeet13
Copy link
Contributor Author

Got it @potiuk. Thanks for letting me now.

@ashmeet13 ashmeet13 force-pushed the undefined-jinja-variable-rule branch from f08f4c7 to 60d021a Compare October 6, 2020 10:26
@ashmeet13
Copy link
Contributor Author

Hey @potiuk, made a few changes in my branch today and rebased to latest v1-10-test. Seeing a lot more failed/cancelled tests.
What should I be doing?

Also - who do I request a review from?
Thank you!

@potiuk
Copy link
Member

potiuk commented Oct 6, 2020

Not sure what's going on here but there is at least one commit that should not be there from what I see (the first one). So something's going on with this (and I See some strange/intermittent errors). can you please try to rebase again on top of v1-10-test and push ? v1-10-test seems to build all right.

@ashmeet13
Copy link
Contributor Author

ashmeet13 commented Oct 7, 2020

I think I have messed up the branch.
I see that this commit - Conditional MySQL Client installation was included but should not have been here.

When I try to rebase again as you suggested it continues to Applying: Conditional MySQL Client installation which is a commit I do not know how I have managed to make here.

Not really sure what to do. Sorry for the trouble. What would your suggestion be to do?

@ashmeet13 ashmeet13 marked this pull request as draft October 7, 2020 15:18
@ashmeet13 ashmeet13 force-pushed the undefined-jinja-variable-rule branch from 60d021a to fcb3b8d Compare October 7, 2020 15:24
@ashmeet13 ashmeet13 closed this Oct 8, 2020
@ashmeet13 ashmeet13 force-pushed the undefined-jinja-variable-rule branch from fcb3b8d to 25b6cb0 Compare October 8, 2020 06:00
@ashmeet13 ashmeet13 reopened this Oct 8, 2020
@github-actions
Copy link

github-actions bot commented Oct 8, 2020

The Build Workflow run is cancelling this PR. It in earlier duplicate of 1029499 run.

@ashmeet13
Copy link
Contributor Author

Hey @potiuk - took a while but I believe the branch is fixed now.
Also the PR did get closed in the middle when trying to fix the branch. My bad on screwing up the rebase and fixing the branch.

Marking the branch ready for review

@ashmeet13 ashmeet13 marked this pull request as ready for review October 8, 2020 07:31

return messages

def _dag_level_(self, dag):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the name of this method. Could you use a more descriptive name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing this with a better name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed names in 08b0f75

dags = dagbag.dags
messages = []
for dag_id, dag in dags.items():
dag_messages = self._dag_level_(dag)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it is worth omitting checking some DAGs in some cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't exactly get this?
Which DAGs could be omitted from checking?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DAGs in which the user has decided about a specific behavior should not be checked.

with DAG(dag_id="case_a"):
with DAG(dag_id="case_b", template_undefined=jinja2.StrictUndefined):
with DAG(dag_id="case_c", template_undefined=jinja2.DebugUndefined):
with DAG(dag_id="case_d", template_undefined=jinja2.Undefined):

Only dag_a should be checked. The others should be omitted as they will work properly in Airlfow 2.0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any progress?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this comment - my fault.
How exactly would we check this?

My thought process -
This would require the check to read the Python scripts in which the DAGs have been defined. Then read line by line where the DAG is defined and then check whether template_undefined is defined or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that it is only possible if we make changes to the DAG constructor and get_template_env method.

template_undefined: Type[jinja2.StrictUndefined] = jinja2.StrictUndefined,

-        template_undefined: Type[jinja2.StrictUndefined] = jinja2.StrictUndefined,
+        template_undefined: Optional[Type[jinja2.StrictUndefined]] = None,

'undefined': self.template_undefined,

-            'undefined': self.template_undefined,
+            'undefined': self.template_undefined or jinja2.StrictUndefined,,

After making the above changes, we can check the class attribute to test our condition without any major problems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Makes sense.
I believe this would require a new PR for the change?

Should I raise one for the same?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this require new PR? I think you can update this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Working on the changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the changes w.r.t to v1-10-stable as the target branch.
The code changes you had suggested were w.r.t to master branch.

Also added the logic to skip check for DAG when template_undefined is explicitly mentioned

for key, value in rendered_content.items():
debug_error_messages.union(self._check_rendered_content(str(value)))
return debug_error_messages

Copy link
Member

@mik-laj mik-laj Oct 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the expected behavior in the "else" case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. Got neglected from my end. Working on adding this logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the logic in 08b0f75

@kaxil
Copy link
Member

kaxil commented Oct 29, 2020

@turbaszek I have changed the base to v1-10-stable and rebased it accordingly.
There are two checks that seem to be failing -

  1. CI Build / Core:Sqlite Py3.8 (pull_request)
  2. CI Build / Quarantined:Pg9.6,Py3.6 (pull_request)

What should be done to fix them?

Ignore the Quarantined one, core:Sqlite one seems to be just flaky so that can be ignored too

title = "Jinja Template Variables cannot be undefined"

description = """\
Jinja Templates have been updated to the following rule - jinja2.StrictUndefined
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it should be written that the user should correct the templates or pass template_undefined=jinja2.Undefined to the DAG.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Added these instructions in description

title = "Jinja Template Variables cannot be undefined"

description = """\
Jinja Templates have been updated to the following rule - jinja2.StrictUndefined
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Jinja Templates have been updated to the following rule - jinja2.StrictUndefined
The default behavior for DAG's Jinja templates has changed. Now, more restrictive validation of non-existent variables is applied - `jinja2.StrictUndefined`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this suggestion

@github-actions
Copy link

The Workflow run is cancelling this PR. Building images for the PR has failed. Follow the the workflow link to check the reason.

@mik-laj
Copy link
Member

mik-laj commented Nov 4, 2020

@turbaszek Can you look at it?

@ashmeet13
Copy link
Contributor Author

@turbaszek requesting for a review whenever you get the chance.
Thanks!

@turbaszek
Copy link
Member

@ashmeet13 this change looks good to me, can you please squash the commits and the rebase onto current v1-10-test?

@ashmeet13
Copy link
Contributor Author

Yup. Just to confirm what exactly needs to be done?
Do I change the target branch back to v1-10-test and then rebase my commits with v1-10-test?

Could you clarify @turbaszek

@turbaszek
Copy link
Member

Yup. Just to confirm what exactly needs to be done?
Do I change the target branch back to v1-10-test and then rebase my commits with v1-10-test?

Could you clarify @turbaszek

Sorry my bad! I meant v1-10-stable as it is now. The was fixed so it would be nice to:

  • squash all your commits form this branch (squashing will help with rebasing)
  • rebase it onto v1-10-stable branch

In this way we will be able to run whole CI tests hopefully

@ashmeet13 ashmeet13 force-pushed the undefined-jinja-variable-rule branch 2 times, most recently from 5017fdc to 5ef4259 Compare November 17, 2020 13:46
@potiuk
Copy link
Member

potiuk commented Nov 17, 2020

I guess that was a bad rebase :) @ashmeet13 ? Seems that it is going to be better now.

@ashmeet13
Copy link
Contributor Author

Yup.
Figured it out. Just saw #development - A PR from v1-10-test was merged.
Rebased. Hopefully this works. My bad for the confusion!

@ashmeet13
Copy link
Contributor Author

The checks have passed. Anything else needed to be done @turbaszek ?

Adding a rule to check for undefined jinja variables when upgrading to Airflow2.0
@ashmeet13
Copy link
Contributor Author

Can this PR be merged @turbaszek @potiuk if no other changes required?

@turbaszek turbaszek merged commit 18100a0 into apache:v1-10-stable Nov 19, 2020
@boring-cyborg
Copy link

boring-cyborg bot commented Nov 19, 2020

Awesome work, congrats on your first merged pull request!

kaxil pushed a commit that referenced this pull request Nov 20, 2020
Adding a rule to check for undefined jinja variables when upgrading to Airflow2.0

(cherry picked from commit 18100a0)
kaxil pushed a commit that referenced this pull request Nov 21, 2020
Adding a rule to check for undefined jinja variables when upgrading to Airflow2.0

(cherry picked from commit 18100a0)
cfei18 pushed a commit to cfei18/incubator-airflow that referenced this pull request Mar 5, 2021
)

Adding a rule to check for undefined jinja variables when upgrading to Airflow2.0

(cherry picked from commit 18100a0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:upgrade Facilitating migration to a newer version of Airflow
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create UndefinedJinjaVariablesRule to ease upgrade to 2.0
5 participants