-
Notifications
You must be signed in to change notification settings - Fork 13.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add airflow upgrade_check command (MVP) #9276
Conversation
e3f9d5f
to
97654ae
Compare
97654ae
to
37888e6
Compare
I like the idea, but we have to make sure that this tool is usable without installing airflow 2.0 first. The usage pattern will be A) run the upgrade check on airflow 1.10 B) fix all the problems C) re-run it d) only afterwards install airflow 2.0 over the 1.10 So I do not think it's a good idea to have it as airflow 2.0 CLI. I t should likelly be a separate package or even a standalone script that you can run without upgrading airflow to 2.0 |
@potiuk You're right. This complicates the situation a bit, but it's worth doing. I thought that we could in the future backport the airflow.upgrade package to the latest Airflow 1.10 release, but that would not allow us to test these changes on a regular basis with the latest Airflow 1.10. The new package is a better idea. I was too much inspired by Ash's original proposal - #8765 |
A nice framework |
We should possibly just so this directly on 1.10 branches though. The way I said we would do upgrades was "install latest 1.10 version, fix warnings, then install 2.0”. Doing a separate package seems overkill, we can just add more rules as we work them out. Many we can check by capturing the existing DeprecationWarnings we already emit. |
from airflow.upgrade.rules.base_rule import BaseRule | ||
|
||
|
||
class BaseFormatter(metaclass=ABCMeta): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please can we not have one class per file - this isn't Java.
All of these formatter classes would live in the same module - airflow/upgrade/formatters.py
# invalid_connections = session.query(Connection).filter(Connection.conn_type.is_(None)) | ||
invalid_connections = [Connection(conn_id="AA")] | ||
return ( | ||
f'Connection<id={quote(conn.id)}", conn_id={conn.conn_id}> have empty conn_type field.' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{conn.id!r}
seems more natural than shell quoting function.
Though conn.id is an integer, so quoting it is not needed. Did you mean to do this on conn_id instead?
This make a silent assumption that you always want to migrate to latest 1.10 and then to 2.0 which we might, or might not require - depending on how (and if) we are going to migrate the database. I do not think this is a hard prerequisite (is it @ashb ?) to migrate to latest 1.10 first? I think it will needlessly complicate the migration path - especially if (which I presume is the case) w are not going to migrate the database. |
"""A long description explaining the problem in detail. This can be an entry from UPDATING.md file.""" | ||
pass | ||
|
||
def check(self) -> List[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return a list of Problems instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem stores references to the rule, so the problem must be initiated as follows: Problem(rule=rule, message=message)
. It seemed too talkative to me right now. This code will be repeated in every rule. Now the rules are a little easier to write. We can add support for returning it in the future when we need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it longer and will come back with another way to store the reference to the rule. However, this is a detail. I will wait until we set other high-level requirements.
What do you mean about "not migrate the database"? As in you expect people to start 2.0 with a clean/empty metadata DB? |
I wasn't planning on making it a silent assumption but the official upgrade path:
This approach (latest 1.10.x without warnings, then major upgrade) is based loosely off how Django handles upgrades and deprecation, https://docs.djangoproject.com/en/3.0/howto/upgrade-version/ |
I do not know what are the plans for now I do not know if it's been discussed - the migrations are long time separated between the databases in 1.10 and 2.0 so I am not sure if there is a clear path (yet) on how to migrate from a version of 1.10 to 2.0. There are few paths that might be taken for that. I see at least three:
I think all of those scenarios are possible and valid, and only 1) requires migrating to the latest Airflow before switching to 2.0. There are pros and cons for all those scenarios and I am not sure we are ready to make a decision on how to do it without checking with our users and possibly listening to how they would run the migration in their environments. I think we should discuss which of the migration paths we are going to support from the community point of view. |
If we don't give users the ability to test changes without jumping straight to 2.0 then a lot of bigger companies are going to be very reluctant to upgrade. (It's hard enough to get them to upgrade minor Airflow versions.)
It is a hard requirement that it is possible to update DB from 1.10 to 2.0. Resetting the DB means a very likely chance that Airflow would then go and re-run months or years worth of tasks. |
I am willing to do the work necessary to make it possible to upgrade a DB in place. I feel very strongly that not allowing it would be a disaster for uptake of 2.0. |
Absolutely all those scenarios should include testing. But I think there will be many companies that will not like to make two migrations - 1.10.x whatever they have now - 1.10.12 -> 2.0.0., they would rather migrate to 2.0 straight. Of course, we might choose to not support this path - as community. But I do not think it should be taken as granted without discussion and hearing the user's opinion on that.
I agree "clean-cut' might not be the best way of doing it, yet I imagine scenarios where this might be the path taken by our users. I think it's worth starting the discussion on the devlist a least, as this might be a viable option for some of the customers (no capacity for that from my side though currently). I do not think we had an opportunity to migrate the "major" largely backwards in-compatible version yet, so hearing what the users think about this might be a good way to make good decisions on it. |
Absolutely. This is a must - we have to definitely support that. I have no doubt whatsoever that we should do it. I't just the question if a) we also support scenario with clean cut b) how we migrate the DB for users which are not on the latest (1.10.12?) version. I do not have ready answers yet. I am just asking questions for now as I see different options are possible. |
https://docs.google.com/presentation/d/1lye_qklHkrucVrMyCzApdBSBSsnjkg3oGGlBsjlQT2I/edit?ts=5eb32894#slide=id.g84d35adb5c_0_14 (slide 60 if that link doesn't take you there) was where I talked about my proposed upgrade path. (None of these discussions affect the code in the PR, just where we put it in the end.) |
Yeah. I had some thoughts and discussed with some of the stakeholders as well (from the Composer team where lots of the Airflow users are), and I think we all agree that going through latest 1.10.* release makes most sense. |
If no one is against it, I will set the target of this PR to v1-10-test when I return to work on this change. I will try to come back quickly so that other people can add their rule based on it. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This is a DRAFT and I still have to complete the missing documentation and unit tests.
I wanted to prepare a minimal valuable product for the upgrade_check command. Only one rule is implemented, but I think that's enough to show the feature. In the next steps, we can think about what rules we need to create, create tickets, and then work on the next rules by the whole community.
Recently, we have a lot of new beginner contributors and I think we can try to engage them for this task. Most rules will be small and independent, so they will be great candidates for good-first-issue.
An example of the result of this command is presented in the termianlshot below
![Screenshot 2020-06-13 at 12 58 38](https://user-images.githubusercontent.com/12058428/84566968-a63cc200-ad75-11ea-870b-60d07d0c44fa.png)
Example JSON:
Part of: #8765
Make sure to mark the boxes below before creating PR: [x]
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.