Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an airflow upgrade-check command in 1.10 to ease upgrade path to 2.0. #8765

Closed
10 of 19 tasks
ashb opened this issue May 7, 2020 · 17 comments
Closed
10 of 19 tasks
Assignees
Labels
area:upgrade Facilitating migration to a newer version of Airflow kind:meta High-level information important to the community

Comments

@ashb
Copy link
Member

ashb commented May 7, 2020

To make it easier for users to upgrade from 1.10 to 2.0 (when it eventually comes out) we should create a single upgrade-check command in 1.10 that checks the following things. We could also have a mode that makes some of these changes in place (with confirmation from user) to automate it.

Rules

Major breaking changes:

Config related changes:

  • HostnameCallableRule - Unify hostname_callable option in core section Create HostnameCallableRule to ease upgrade to Airflow 2.0 #11044
  • StatNameHandlerNotSupportedRule - Drop plugin support for stat_name_handler
  • LoggingConfigurationRule - Logging configuration has been moved to new section
  • NoGCPServiceAccountKeyInConfigRule - Remove gcp_service_account_keys option in airflow.cfg file
  • FernetEnabledRule - Fernet is enabled by default
  • KubernetesWorkerAnnotationsRule - Changes to propagating Kubernetes worker annotations
  • LegacyUIDeprecatedRule - Deprecate legacy UI in favor of FAB RBAC UI
  • TaskHandlersMovedRule - GCSTaskHandler has been moved, WasbTaskHandler has been moved, StackdriverTaskHandler has been moved , S3TaskHandler has been moved, ElasticsearchTaskHandler has been moved, CloudwatchTaskHandler has been moved
  • SendGridMovedRule - SendGrid emailer has been moved
  • CustomExecutorsRequireFullPathRule - Custom executors is loaded using full import path

Import changes:

  • ImportChangesRule - uses a map old_operator_name -> list of possible problems so we can create a single DagBag and scan all used operators and raise information about changes. It should also suggest what providers packages users should use.

How to guide

To implement a new rule we had to create a class that inherits from airflow.upgrade.rules.base_rule.BaseRule. It will be auto-registered and used by airflow upgrade-check command. The custom rule class has to have title, description properties and should implement check method which returns a list of error messages in case of incompatibility.

For example:

class ConnTypeIsNotNullableRule(BaseRule):
title = "Connection.conn_type is not nullable"
description = """\
The `conn_type` column in the `connection` table must contain content. Previously, this rule was \
enforced by application logic, but was not enforced by the database schema.
If you made any modifications to the table directly, make sure you don't have null in the conn_type column.\
"""
@provide_session
def check(self, session=None):
invalid_connections = session.query(Connection).filter(Connection.conn_type.is_(None))
return (
'Connection<id={}", conn_id={}> have empty conn_type field.'.format(conn.id, conn.conn_id)
for conn in invalid_connections
)

Remeber to open the PR against v1-10-test branch.

@ashb ashb added the kind:feature Feature Requests label May 7, 2020
@ashb ashb added this to the Airflow 1.10.12 milestone Jul 8, 2020
@ashb ashb changed the title Create an airflow upgrade-check command in 1.10 to eas upgrade path to 2.0. Create an airflow upgrade-check command in 1.10 to ease upgrade path to 2.0. Jul 8, 2020
@turbaszek

This comment has been minimized.

@turbaszek

This comment has been minimized.

@mik-laj
Copy link
Member

mik-laj commented Jul 10, 2020

@turbaszek Can you create Google Docs about it? Without a broader context, the titles of changes are of little help.

I think we should now prepare a plan that will describe how we can detect breaking change and determine what to do when we detect them.

# Move operators to providers package (Rule 1)

We should load all DAG using DAGBag and check whether any deprecated operators are used. When it is detected, we can make changes automatically using Bowler.

This rule resolves the following entries from UPDATING.md
-....

A lot of changes are backward compatible, and if not, we also need to detect and notify them.

# Detectt provide_context in Python Operaator

We should check ....when this happens, we should inform the user that in Airflow this code may cause problems.

This rule resolves the following entries from UPDATING.md
- Remove provide_context

@potiuk
Copy link
Member

potiuk commented Jul 10, 2020

I think some of the changes we can apply easily and safely. I think for example all the deprecation notices can be fixed rather easily - we already have information about it (we test all the depracations - I already used it in backport packages) - so we can super-easily and safely apply those in automated way. Similarly some of the changes above (rename parameter names etc.). I would really like to provide such tool to the users where it is no-brainer and can be automated.

For most other changes I think we should simply detect potential problems and flag them - providing helpful hint how the problem could be fixed.

And I agree GDOc about it might be better than issues/splitting them now. We can even split the work among ourselves and make recommendations individually for each of those - happy to share this with the rest. Then we can review them and turn into issues.

@kaxil
Copy link
Member

kaxil commented Jul 11, 2020

We should definitely create the rules similar to what Kamil did in his PR and additionally have a flag that automatically fixes (where possible).

Google Doc, Confluence (https://cwiki.apache.org/confluence/display/AIRFLOW/) is fine if we want to review and add comments.

@ashb
Copy link
Member Author

ashb commented Jul 17, 2020

Gdoc fine for building up, but anything we point end users at should be in our official docs.

A mode to doc things automatically is desirable, but probably not on by default - as much as I wish everyone used git, I know some people out there don't, so we shouldn't change their files without confirmation (perhaps show a diff then ask y/n?)

@turbaszek
Copy link
Member

so we shouldn't change their files without confirmation (perhaps show a diff then ask y/n?)

Bowler has this option

@turbaszek
Copy link
Member

I've drafted this doc:
https://docs.google.com/document/d/17tB9KZrH871q3AEafqR_i2I7Nrn-OT7le_P49G65VzM/edit?usp=sharing

The notes from UPDATING.md are split into few general groups (some are present in few places). I suppose that not all entries are important to users / will impact their Airflows.

@mik-laj mik-laj added the area:upgrade Facilitating migration to a newer version of Airflow label Jul 23, 2020
@mik-laj
Copy link
Member

mik-laj commented Jul 23, 2020

@turbaszek I looked at this document. Great job!

I have looked at what changes we have and I think one tool may not solve all our problems. I think that apart from this command, we should take additional action.

  • Complete the documentation to make the information for the user clearer, e.g. prepare a migration guide for CLI that will compare new and old commands
  • Many problems will only happen during the execution of the task. We can catch these exceptions and then save them to the database to make them more accessible to the user. The infinite log file is not the best way to communicate with the end-user. Especially when we do not have one fie, but an infinite number of them.

I also added other ideas to new section - "Other ideas" in your docs.

PS. I created "area:upgrade" label on Github to track the related issues.

@turbaszek
Copy link
Member

  • Complete the documentation to make the information for the user clearer, e.g. prepare a migration guide for CLI that will compare new and old commands

Yes I think the upgrade guide for CLI is a much better option than scrpits checking.

  • Many problems will only happen during the execution of the task. We can catch these exceptions and then save them to the database to make them more accessible to the user. The infinite log file is not the best way to communicate with the end-user. Especially when we do not have one fie, but an infinite number of them.

Interesting idea, but do you have any idea how to catch "these exceptions" (related to Airflow 2.0) not all?

@mik-laj
Copy link
Member

mik-laj commented Jul 23, 2020

Interesting idea, but do you have any idea how to catch "these exceptions" (related to Airflow 2.0) not all?

I think we can detect all problems and display them to the user. Why do you want to hide some problems?

@turbaszek
Copy link
Member

So your idea is to collect all exceptions to db, not only those related to migration / Airflow2?

@mik-laj
Copy link
Member

mik-laj commented Jul 23, 2020

We can catch all the warnings, but those related to migration fall into one of two categories: DeprecationWarning, FutureWarning.

@turbaszek
Copy link
Member

I got it, I was misled by the "exceptions" but we are talking about warnings

@turbaszek
Copy link
Member

Base for airflow upgrade-check command is merged (#9276 ) so we can decide what rules we would like to implement. Once we have an initial list I will create issues so multiple people can work on this.

Based on this gdoc:

https://docs.google.com/document/d/17tB9KZrH871q3AEafqR_i2I7Nrn-OT7le_P49G65VzM/edit?usp=sharing

We may consider the following rules as a good start (those are major breaking changes):

  • ConnTypeIsNotNullableRule - Not-nullable conn_type column in connection table (done)
  • UniqueConnIdRule - Unique conn_id in connection table
  • CutomOperatorUsesMetaclassRule - BaseOperator uses metaclass
  • UsingSQLFromBaseHookRule - Remove SQL support in base_hook
  • ChainBetwenDAGAndOperatorNotAllowedRule - Assigning task to a DAG using bitwise shift (bit-shift) operators are no longer supported
  • AirflowMacroPluginRemovedRule - Removal of airflow.AirflowMacroPlugin class
  • NoAdditionalArgsInOperatorsRule - Additional arguments passed to BaseOperator cause an exception
  • MesosExecutorRemovedRule - Removal of Mesos Executor

and config related changes:

  • HostnameCallableRule - Unify hostname_callable option in core section
  • StatNameHandlerNotSupportedRule - Drop plugin support for stat_name_handler
  • LoggingConfigurationRule - Logging configuration has been moved to new section
  • NoGCPServiceAccountKeyInConfigRule - Remove gcp_service_account_keys option in airflow.cfg file
  • FernetEnabledRule - Fernet is enabled by default
  • KubernetesWorkerAnnotationsRule - Changes to propagating Kubernetes worker annotations
  • LegacyUIDeprecatedRule - Deprecate legacy UI in favor of FAB RBAC UI
  • TaskHandlersMovedRule - GCSTaskHandler has been moved, WasbTaskHandler has been moved, StackdriverTaskHandler has been moved , S3TaskHandler has been moved, ElasticsearchTaskHandler has been moved, CloudwatchTaskHandler has been moved
  • SendGridMovedRule - SendGrid emailer has been moved
  • CustomExecutorsRequireFullPathRule - Custom executors is loaded using full import path

In case of operator related changes I would like to suggest a single rule OperatorChangesRule. It will use a map old_operator_name -> list of possible problems so we can create a single DagBag and scan all used operators and raise information about changes. It should also suggest what providers packages users should use. I think this is much better approach than having a rule for each operator...

I think we should first focus on the "check and warn" approach so we have any tool. Once we have it we may consider "check and apply changes" - even, in this case, we should probably first focus on major problems (config, new imports, etc). The second step can use "report" generated by airflow upgrade-check to apply changes (still can be single command but run wit additional flag).

Summoning all elders: @mik-laj @vikramkoka @ashb @potiuk @kaxil @feluelle @dimberman @houqp @ryw @Fokko

@potiuk
Copy link
Member

potiuk commented Sep 14, 2020

Sounds great to start with. I am sure we will have more rules added by the users, but those seem like great to create issues for.
And yeah. One rule to rule them all ("naming changes") sound great. But I'd name it "ImportChangesRule". We renamed Hooks, and Secret Backends and Sensors - not only operators :).

Elder Jarek.

@turbaszek turbaszek added kind:meta High-level information important to the community and removed kind:feature Feature Requests labels Sep 21, 2020
ashmeet13 added a commit to ashmeet13/airflow that referenced this issue Oct 6, 2020
Part of Issue apache#8765 - adding a rule to check for
undefined jinja variables when upgrading to Airflow2.0
Logic - Use a DagBag to pull all dags and iterate over
every dag. For every dag the task will be rendered using
an updated Jinja Environment using - jinja2.DebugUndefined
This will render the template leaving undefined variables
as they were. Using regex we can extract the variables
and present possible error cases when upgrading.
@ashb ashb closed this as completed Dec 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:upgrade Facilitating migration to a newer version of Airflow kind:meta High-level information important to the community
Projects
None yet
Development

No branches or pull requests

5 participants