-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Public Interface description to Airflow documentation #28300
Conversation
About 80% of it was written by the GPT Chat :D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides several structural changes, my main concern is the naming. To me "public APIs" 99% of the time relates to a REST API. I think "public interfaces" is better suited here. WDYT?
Yes. That might avoid a lot of confusion. |
6ec44ba
to
065138c
Compare
065138c
to
dc71c20
Compare
First round review passed - all comments applied @BasPH @o-nikolas |
(and docs building is fixed here). |
dc71c20
to
ad15bc9
Compare
ad15bc9
to
9d56cd0
Compare
Next round of reviews completed. I've also added Trigger as it was missing. I think it's coming up nicely with everyone's contributions. Please take a look and see if there are other things I missing, maybe it is close to mergability BTW. The more review. the less GPT Chatbot's text remains:) . |
9d56cd0
to
f7499c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nitpicky consistency things, but overall looking good!
There are a number of different executor implementations built-in Airflow, each with its own unique | ||
characteristics and capabilities. Executor interface was available in earlier version of Airflow but | ||
only as of version 2.6 executors are decoupled and Airflow does not rely on built-in set of executors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the key message is here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message is that only as 2.6 executors are truly extendable. Because they weren't before. There are certain modifications and changes in the interface (as explained and being implemented in https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-51+Removing+Executor+Coupling+from+Core+Airlfow ) that allow you to implement custom executor without hittting some roadblocks.
We just failed in delivering it before even if it was somewhat promised. And this is to warn users they should only target 2.6+. I want to avoid the situation that people mistakenly think they can do it in earlier versions. Of course there is the "version" selector at the top of the documentation, but many people do not realise that the feature they see there in the "stable" version are not available in their version and loose time and effort to try to make them work. And then they come up with bad feelings about Airflow, because we failed to warn them clearly enough.
Just one example from last week:
User thinking that breadth-first mapping is implemented in 2.3
User is at 2.3.4 in Composer and cannot upgrade to 2.5.0:
User: any possibility to implement an inheritance on TaskGroup and add the "jinja interpretation mechanism" somehow?? even simplistically, just to get a {{ds_nodash}} interpretation (edited)
Jarek: No
User: RIP 10 days of work lmao
I want to malke sure our users don't even start working on 2.3, 2.4, 2.5 with custom executors basing on the new documentation that's coming on 2.6 where we explicitly admit Executors are extendable. Before we hinted at it but never admitted to it. Now when we explicitly state it that users "can" do it, we should tell them they should only do it in 2.5 and explain why. Otherwise they will start working on it with earlier versions and when they fail, they will complain. We can prevent it by being explicit about it and if they complain we can link to that doc and tell them "you had a chance to read it, it's your fault". Otherwise they can tell "you have not explained this explicitly that it's only 2.6" and they will be right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is simple not true though -- I know of multiple people (Astro and others) who have successfully written custom executors against 2.1 and above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is simple not true though -- I know of multiple people (Astro and others) who have successfully written custom executors against 2.1 and above.
It's very strong statement to claim something is not true. But I think we are mixing two things here. I know there are people who wrote some implementations of custom executors that worked in some cases. This is true. I agree with that statement. And I am not arguing with that statement at all. In fact I wholeheartedly agree with your statement @ashb. And I believe it is not invalidating what I wrote (or at least what my intention was). What I wrote that the earlier attempts to write executor could have failed in a number of non-obvious cases because there were a number of hard-coded behaviours that made executior not a "well defined API".
I do not want to make an attempt to describe what things you should avoid and what problems you would hit with tryibng to implement Executors before. I think it's a topic that requires some intrinsic analysis of where the "hard-coded" behaviours could cause the problems. And if someone would like to attempt it - I am all ears.
But - let me rephrase it in the way that will make it clear that there were successful attempts before, but you could have hit difficult to diagnose and solve problems because built-in executors were hard-coded (and I will mention that a there were cases successes in writing limited implementations of executors). I hope this will be acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the note now:
There are a number of different executor implementations built-in Airflow, each with its own unique
characteristics and capabilities. Executor interface was available in earlier version of Airflow but
only as of version 2.6 executors are fully decoupled and Airflow does not rely on built-in set of executors.
You could have implemented (and succeeded) with implementing Executors before Airflow 2.6 and a number
of people succeeded in doing so, but there were some hard-coded behaviours that preferred in-built
executors, and custom executors could not provide full functionality that built-in executors had.
I hope it is much closer to "truth" now - it mentions that it was possible to implement executors before and people succeed with it (truth 1) but that you could have hit the limits because of some hard-coded behaviours (truth 2) and that in 2.6 you can rely on public api of executor to be on-par with the built-in executors (truth 3)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope there are no more non-truths in this statement @ashb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW. I can also link to AIP-51 for details of what "could not be done" before. I think it describes it pretty well (and completely). But I think we rarely link to AIPs in our docs (and it was criricised before when I added it), so I think just leaving a generic note is better.
Hey @BasPH - explained reasoning why I addded those a bit longer explanation and context. I hope this is good for you as well. One of the cases which I want to address is to be able to explain people that if they do rely on some of those things being explained as "public interface" , even if they do, whenever they complain or raise questions about that I want to simply send them there - "Please read the docs, it's all explained there and you likely missed it". That saves enormous mental effort and time in disputes when you engage in such discussions (which are often) "but you did not explain it, I had the right to think it works this way". Also just to stress a bit that my "thinking" is not theorethical. It is very much based on practice of interacting with our users A LOT. I would encourage everyone to engage more in discussions we have in our community - or at least review them and follow. So far not many of us, maintainers are engaged, but I am. And I think I understand quite well - and likely bettter than others - what our users are struggling with as well what is helpful for people. I am not sure if you realize, but over the last 6 months or so the status of Github Discussions of ours looks like that (I have - very consistently 11/15 "most helpful" answers last 30 days while the next person never had more than 1-2: And the raise in a number of discussions looks like that - so this is an extremely important medium where our users look for help: |
BTW. That was a good exercise (and learning) for GPT Chat after the reviews. I don't think it is good in writing docs like that - it's far to chatty (it's a chat bot eventually). Almost none of the original text from the GPT chat is left, and the number of omissions (which were added during the review) and mistakes it made was quite big. |
a4fe7c4
to
19ef1e7
Compare
fc5bddc
to
ab55b7b
Compare
Hey everyone - I spoke to @TohnJhomas and after we merged the big restructuring of his in #27235 we decided together to move all the changes for "customizing airflow" and "public-airlfow-interface" to "administration-and-deployment". This is mostly what it is all about - people writing DAGs are rarely concerned with changes that would involve breaking the API - those are more people who are installing and administering airlfow that are concerned about it, but I think we can still discuss what is the best place if others think otherwise. Looking forward to merging that one afte the big restructure of docs is merged. |
Ok. I brought it to the state that - I hope we can start discussing what is missing. I re-wrote the headers to reflect the current thinking of mine how our users could find this page useful, but also to make this page a really comprehensive and complete description of what the users might expect from Airflow as a "library". This is the current proposal for the "structure" of the page: And I like that we can get rid of other pages that were "almost, but not quite entirely unlike what i wanted to achieve" (quoting my favourite writer). |
5c14858
to
c78e64b
Compare
@BasPH ? |
c78e64b
to
90e231d
Compare
90e231d
to
5ad52da
Compare
Hey @BasPH - just to repeat it - is your "change requested" still valid? Is the current approach good enough for you or do you think there is something stil not OK with teh approach I finally came up with after listening and getting feedback and applying multiple ideas on how to address the issue that has been already discussed in the devlist as needed ? I think we need to start openly communicate with our users on what they should expect as something they can rely on, And I think this is the first step tp do it. Anything else that needs to be done here do you thinkl? I would really love to merge it to move forward and seems it's been stalled on your review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did another review. Starting to look nice and clear!
Airflow is a complex system and since it is a platform, it is supposed to be extended by the users by writing custom code wherever they miss functionality in Airflow's core or its providers. This page is an attempt to have a single place where we can express our intention of what is the Public Interface of Airflow that the user can depend on when implementing such customizations. This is never 100% possible, we know that some users workflows might depend on stuff that is internal implementation details or behaviours, however the user doing so should be aware of the risk they are taking by relying on something that was not intentionally exposed. This page is intended to serve as a guideline for the users who would like to make a decision to rely on some of the Airflow behaviours, so that they know whether the API they want to rely on were explicitly intended by Airflow community to expose as Public, or not. Co-authored-by: Bas Harenslak <basph@users.noreply.github.com>
340673d
to
858e6f9
Compare
Indeed. I like how this change now consolidates few places where we tried to communicate what should be considered as "public" in one dedicated place and I like the way how concise it is after all those corrections and feedback. I updated it now:
I hope we will be able to get it in soon :) |
Airflow is a complex system and since it is a platform, it is supposed to be extended by the users by writing custom code wherever they miss functionality in Airflow's core or its providers. This page is an attempt to have a single place where we can express our intention of what is the Public Interface of Airflow that the user can depend on when implementing such customizations. This is never 100% possible, we know that some users workflows might depend on stuff that is internal implementation details or behaviours, however the user doing so should be aware of the risk they are taking by relying on something that was not intentionally exposed. This page is intended to serve as a guideline for the users who would like to make a decision to rely on some of the Airflow behaviours, so that they know whether the API they want to rely on were explicitly intended by Airflow community to expose as Public, or not. Co-authored-by: Bas Harenslak <basph@users.noreply.github.com> Co-authored-by: Bas Harenslak <basph@users.noreply.github.com> (cherry picked from commit 352d492)
The Airflow models have been accidentally removed from the docs when apache#28300 was implemented. The whole models documentation have been removed accidentally, even if there were references to the actual classes and packages used as the models package was entirely excluded. This PR fixes it by selectively including the models that should be included and by linking the package indexes directly in the public interface documentation.
The Airflow models have been accidentally removed from the docs when #28300 was implemented. The whole models documentation have been removed accidentally, even if there were references to the actual classes and packages used as the models package was entirely excluded. This PR fixes it by selectively including the models that should be included and by linking the package indexes directly in the public interface documentation.
The Airflow models have been accidentally removed from the docs when #28300 was implemented. The whole models documentation have been removed accidentally, even if there were references to the actual classes and packages used as the models package was entirely excluded. This PR fixes it by selectively including the models that should be included and by linking the package indexes directly in the public interface documentation. (cherry picked from commit 6720a90)
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.