Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: remove unused state - SHUTDOWN #33746

Merged
merged 13 commits into from
Aug 28, 2023
Merged

Conversation

Bisk1
Copy link
Contributor

@Bisk1 Bisk1 commented Aug 25, 2023

This state is never set in production code, so it may as well be deprecated. Also it is not part of API spec.

Updated test cases to use FAILED or RESTARTING as relevant.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

This state is never set in production code, so it may as well be deprecated.
@boring-cyborg boring-cyborg bot added the area:webserver Webserver related Issues label Aug 25, 2023
@Bisk1 Bisk1 changed the title Remove unused state: SHUTDOWN Cleanup: remove unused state - SHUTDOWN Aug 25, 2023
@Bisk1 Bisk1 changed the title Cleanup: remove unused state - SHUTDOWN Refactor: remove unused state - SHUTDOWN Aug 25, 2023
@Bisk1
Copy link
Contributor Author

Bisk1 commented Aug 25, 2023

I found out about it when I was troubleshooting another issue. A task state is never set to SHUTDOWN except for tests. I think originally this state was supposed to handle tasks that were running but were externally cleared (what is currently handled by RESTARTING).

Apparently the author of that change (#16681) mistakenly assumed that this state is used for tasks externally marked as failed. In reality, when marking a task externally as failed, webserver sets it directly to FAILED state, as can be verified by code inspection.

@Bisk1
Copy link
Contributor Author

Bisk1 commented Aug 25, 2023

I would also like to update the diagram that was updated last time here:
https://github.com/apache/airflow/pull/16681/files#diff-e2ff0dfeb5664ddfe3f94a6d997d043e7280fbbf564ac248d26ce9c97f5b0a2e
but I can' t find it the .drawio file in your Confluence (there is only .png in the repository).

Copy link
Member

@hussein-awala hussein-awala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately removing a state from the classes State, JobState or TaskInstanceState is a breaking change.

@Bisk1
Copy link
Contributor Author

Bisk1 commented Aug 26, 2023

Unfortunately removing a state from the classes State, JobState or TaskInstanceState is a breaking change.

@hussein-awala
What exactly gets broken by this change? Since this state unused, everything I remove is dead code (except for testing).

@potiuk
Copy link
Member

potiuk commented Aug 26, 2023

Are you sure there are no users that have a "shutdown" state in their DB and would like to see that task displayed in the UI ? Airflow users have also historical database with historical tasks status.

My question is here - did you consider the case that somene has such task in the database? what will happen there?

I am not saying it is wrong to remove it, I am just asking if you considered it and and thought about the historical data affecting the current code.

  • What will happen if someone has historical entries with this state?
  • Is it likely to happen? (maybe not - in which case above question does not matter, but it depends on the answer to that question.
  • Wil there be a graceful fallback in case it happens?

I think answer to those questions is important when you decide to remove something. It is easy to say "I remove something because it is not used". But in case "it was used" - you need to have analysed what impact it has.

Can you please answer the above questions @Bisk1 ?

@Bisk1
Copy link
Contributor Author

Bisk1 commented Aug 27, 2023

@potiuk I understand the concerns about historical state, I'll try to answer your questions and make a case why is it safe to ignore it.

  • I tested it and if a task is in SHUTDOWN state and this state was removed from the State enum, the task instance will still be visible in UI - the only difference is the box color in grid (it'll be white because it's the default) and 500 error in taskInstances/{task_id} REST endpoint for this task because schema validation failure, but it doesn't prevent all UI features from loading and browsing this task's details, logs, etc., so I'd say nothing really breaks from the UI user perspective
    remvoed_shutdown

  • SHUTDOWN was not a terminal state but a temporary one that was supposed to quickly transition to something else - only an Airflow bug could have caused a situation where a task got stuck in this state in DB for longer and is still there after Airflow restart, so even if historical state like that exists in some Airflow instances, it is hardly a useful information for Airflow users - it merely provides evidence of some bug in an older Airflow version

  • the last Airflow version that could set TaskInstance to SHUTDOWN was 2.1.4 (released 2021-09-18) so having recent instances of tasks in this state could only happen if an Airflow instance wasn't updated for at least 2 years - it certainly limits the number of users affected by a lot

  • there is a precedent for removing states here: https://github.com/apache/airflow/pull/25507/files#diff-960201decc4b79dda0f0850a378ad5cc1caf83dd154593d8a4025af7913e82dfL52 someone could also argue that that change could affect historical state of tasks but apparently there were no complaints

I would also like to mention that I made this change because the current description of states makes it difficult to understand the code. The description doesn't reflect reality, so I've spend a couple of days debugging Airflow trying to pinpoint my issue because of tech debt like this. Cleaning it up would be beneficial for future maintainers who would have easier time contributing.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Thanks for so thorough analysis @Bisk1 . Really helpful.

I think the API call given 500 is quite worrying though. The task is there and someone could have written an automation that analyses past tasks - and for example builds some kinds of dashboard out ot that.

The REST API of ours is explicitly part of our public API interface (https://airflow.apache.org/docs/apache-airflow/stable/public-airflow-interface.html) so it's not only UI users that are of our concern. Airflow is a platform that our users can extend, and REST API is a crucial component for that.

I believe we should keep the API working and returning information about the task in such state if someone has it in the DB.

A lot of our users have old versions of airlfow (for various reasons), so I expect that we might want to get someone upgrading from 2.1.4 to 2.7.0 directly and those (potential) "shutdown" tasks will be quite recent ones. This might break the automation users might have. We would like to avoid that.

It should be possible by slightly modifying API to include the "shutdown" state in it "extra" and add comment about why we are doing it in the code even it the enum is removed.

I think that would be a better solution.

@potiuk
Copy link
Member

potiuk commented Aug 27, 2023

Also - there was a bit differnt story with Smart Sensors - they were an "experimental" feature. Anyone using them SHOULD expect that it might change and break things - it's been announced as having "experimental" status when it was added so users using it really signed up for changes and removal of it from the API is not covered by our SemVer promises.

@Bisk1
Copy link
Contributor Author

Bisk1 commented Aug 27, 2023

@potiuk I was trying to follow your suggestion and wanted to modify API to have special handling for SHUTDOWN state until I found another issue: this state was never added to API spec - the enum for TaskState in v1.yaml doesn't have it (which is the reason for 500 error).

- null
- success
- running
- failed
- upstream_failed
- skipped
- up_for_retry
- up_for_reschedule
- queued
- none
- scheduled
- deferred
- removed
- restarting

I did my test with querying a SHUTDOWN task as above on main branch and on the latest release and 500 error is also reproduced there.

Do you think I should fix it and add SHUTDOWN to spec for sake of handling historical tasks? Because it seems that we could simply take advantage of the fact that it was never handled properly in API (and nobody reported a bug?) and remove it from the code silently. You can't break something that didn't exist 😄 so API's backward compatibility should be fine

I used good ol' Paint because couldn't find the original .drawio file
@potiuk
Copy link
Member

potiuk commented Aug 27, 2023

@potiuk I was trying to follow your suggestion and wanted to modify API to have special handling for SHUTDOWN state until I found another issue: this state was never added to API spec - the enum for TaskState in v1.yaml doesn't have it (which is the reason for 500 error).

Ah.. cool then. No. Of course there i sno need to ADD it back. It would only make sense if the API had ever had it. I looked back to the very beginning and it had never been there.

In this case - I am pretty good.

airflow/utils/state.py Outdated Show resolved Hide resolved
Copy link
Member

@uranusjr uranusjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending CI

@potiuk
Copy link
Member

potiuk commented Aug 28, 2023

Looks like we have confirmation it looks safe to remove @hussein-awala . Still "requesting changes? ?

@potiuk potiuk merged commit 7faa727 into apache:main Aug 28, 2023
42 checks passed
@Bisk1 Bisk1 deleted the remove-unused-state branch August 28, 2023 11:23
@eladkal
Copy link
Contributor

eladkal commented Aug 28, 2023

This requires also documentation update
https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/tasks.html#task-instances

@Bisk1
Copy link
Contributor Author

Bisk1 commented Aug 29, 2023

@ephraimbuddy
Copy link
Contributor

Looks like we should add a significant news fragment for this @Bisk1

@ephraimbuddy ephraimbuddy added this to the Airflow 2.7.2 milestone Aug 30, 2023
@potiuk
Copy link
Member

potiuk commented Aug 30, 2023

Looks like we should add a significant news fragment for this @Bisk1

@ephraimbuddy -> I am not so sure. Look at the discussion . It's really a cleanup. SHUTDOWN had not been used for years (effectively removed in Airflow 2.2 released 2 years ago) . It has not been even properly (ever) handled in the API (500 error).
There is no visble UI effect even if you have it in the database.

It's really a cleanup and if anything it COULD be added to 2.2.0 release note's "shoutouts" as afterthought.

@ashb
Copy link
Member

ashb commented Aug 30, 2023

We need to add TaskInstanceState.SHUTDOWN back even if nothing uses it -- this counts as a breaking change since it has been documented and exists in the code.

By removing this as we have done this caused import errors. One such use I know of is https://github.com/astronomer/astro-sdk/blob/75838143c1b3181b636259e5c39fa27a70d89c3c/python-sdk/src/astro/sql/operators/cleanup.py#L167

(This showed up as a DAG parse error on our tests against Airflow main)

@uranusjr
Copy link
Member

Is TaskInstanceState.SHUTDOWN documented, or just State.SHUTDOWN?

@potiuk
Copy link
Member

potiuk commented Aug 30, 2023

We need to add TaskInstanceState.SHUTDOWN back even if nothing uses it -- this counts as a breaking change since it has been documented and exists in the code.

Good point. We should also add "airflow.utils.state" to the https://airflow.apache.org/docs/apache-airflow/stable/public-airflow-interface.html as this is something that other users might rely on.

It's been enum and it's been released in the past and we need to keep it backwards-compatible.

@potiuk
Copy link
Member

potiuk commented Aug 30, 2023

Can you please add back the states (just enums) @Bisk1 with comment that they are unused?

@uranusjr * Is TaskInstanceState.SHUTDOWN documented, or just State.SHUTDOWN?

I will add whole state.py module to public interface.

@potiuk
Copy link
Member

potiuk commented Sep 3, 2023

Added PR #34059

potiuk added a commit to potiuk/airflow that referenced this pull request Sep 3, 2023
The state enums are likely to be used externally and they should
not be changed in backwards-incompatible way.

Discussed in apache#33746
@Bisk1
Copy link
Contributor Author

Bisk1 commented Sep 4, 2023

@potiuk Sorry, I missed that message somehow @uranusjr thanks for fixing it

@ephraimbuddy ephraimbuddy added the type:misc/internal Changelog: Misc changes that should appear in change log label Oct 3, 2023
ephraimbuddy pushed a commit that referenced this pull request Oct 5, 2023
Co-authored-by: daniel.dylag <danieldylag1990@gmail.com>
(cherry picked from commit 7faa727)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:webserver Webserver related Issues type:misc/internal Changelog: Misc changes that should appear in change log
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants