-
Notifications
You must be signed in to change notification settings - Fork 13.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect automatically the lack of reference to the guide in the operator descriptions #9290
Detect automatically the lack of reference to the guide in the operator descriptions #9290
Conversation
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a -1 from me -- instead of having a separate guide page for each operator we should instead have better content on the API doc page for the operator itself.
Guides should be limited to case where we need to document/illustrate workflows that use more than a single operator.
And at any rate, this PR cannot be merged as you have just needed 31 new guides to be written.
Hello @ashb, I believe there is misunderstanding on your side. What this PR does exactly is IF there is a separate guide, THEN require link from operator's docstring. IF there is no separate guide, THEN no link needed. Does it somehow clarify the situation? You are absolutely right though that separate guides are not needed all the time, but when they exist, it's better to promote them by putting link in the docstring. |
Gotcha, yes totally misread what it was doing. Sorry! (That also explains why there were only 31 errors!) |
`airflow.providers.amazon.aws.operators.google_api_to_s3_transfer.GoogleApiToS3TransferOperator`.""" | ||
`airflow.providers.amazon.aws.operators.google_api_to_s3_transfer.GoogleApiToS3TransferOperator`. | ||
|
||
.. seealso:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be added to this class. You can skip all files from the airflow.contrib directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this is not under airflow.contrib
. So why should it be skipped in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a deprecated class. We will transfer many classes to the airflow.providers package. We only have references here to maintain backward compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formerly the core code was maintained by the original creators - Airbnb. The code that was in the contrib package was supported by the community. The project was passed to the Apache community and currently the entire code is maintained by the community, so now the division has no justification, and it is only due to historical reasons.
https://airflow.readthedocs.io/en/stable/_api/index.html
We wanted to fix this, so we created a new airflow.providers package to organize these operators.
In the airflow.{operators,sensors} package, we only have a small set of core classes.
@@ -101,6 +101,10 @@ class ECSOperator(BaseOperator): # pylint: disable=too-many-instance-attributes | |||
Only required if you want logs to be shown in the Airflow UI after your job has | |||
finished. | |||
:type awslogs_stream_prefix: str | |||
|
|||
.. seealso:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be fantastic if the parameter description was at the end of the class description. The link to the guide should be above the description of the parameters. Look how it is done in airflow.providers.google
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is example: https://airflow.readthedocs.io/en/latest/_api/airflow/providers/google/cloud/operators/cloud_storage_transfer_service/index.html#airflow.providers.google.cloud.operators.cloud_storage_transfer_service.CloudDataTransferServiceCreateJobOperator
When this is under the parameters description, it will not fulfill its role, because it will be hardly visible. This will not promote guides.
c744406
to
7ecbab1
Compare
for py_module_path in python_module_paths: | ||
with open(py_module_path) as f: | ||
py_content = f.read() | ||
for existing_operator in operator_names: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for existing_operator in operator_names: | |
if "This module is deprecated" in py_content: | |
continue | |
for existing_operator in operator_names: |
docs/build
Outdated
# Real class definition is found and docstring does not contain reference to the existing guide | ||
if class_def is not None and f":ref:`howto/operator:{existing_operator}`" not in ast.get_docstring(class_def): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Real class definition is found and docstring does not contain reference to the existing guide | |
if class_def is not None and f":ref:`howto/operator:{existing_operator}`" not in ast.get_docstring(class_def): | |
# Real class definition is not found | |
if class_def is None: | |
continue | |
doc = ast.get_docstring(class_def) | |
if "This class is deprecated." in docstring: | |
continue | |
if f":ref:`howto/operator:{existing_operator}`" in doc: | |
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation must build properly for this change to be accepted. In some places we don't want links, so we have to handle it in the code.
Could you wait with your change until the other change is merged? Otherwise, we will have a lot of conflicts, which can be a bit problematic. This change is needed to release backport packages, so many users hope that we will finish it ASAP. |
The transfer package change is merged now - so please rebase. those "to" operators moved to "transfers" package. |
Thank you everyone for the input, especially @mik-laj! 👍 Will do all these fixes later today. |
1e894a0
to
1023771
Compare
@ivan-afonichkin Ohh. CI is still sad. Can you fix it? |
@mik-laj Yes, I will do it, just didn't have time to fix everything. I will be back home in 3-4h and fix it :) |
glob(f"{ROOT_PACKAGE_DIR}/operators/*.py"), | ||
glob(f"{ROOT_PACKAGE_DIR}/sensors/*.py"), | ||
glob(f"{ROOT_PACKAGE_DIR}/providers/**/operators/*.py", recursive=True), | ||
glob(f"{ROOT_PACKAGE_DIR}/providers/**/sensors/*.py", recursive=True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add transfers
package here also?
Hey @mik-laj, |
@ivan-afonichkin It looks like one of the changes that I merged had a defect. I'm already looking at it. |
@kaxil you self-requested a review 3 days ago. Would you like to add something here? |
@ivan-afonichkin Fix has been merged to master. Can you do a rebase? |
Check specific folders for operators/sensors Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
…iptions function Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
Check specific folders for operators/sensors Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
…iptions function Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
8a27fc6
to
f498c04
Compare
Hey @mik-laj it seems all checks have passed, thanks for your change! :) |
Awesome work, congrats on your first merged pull request! |
@ivan-afonichkin Thank you for a great contribution. This will make our work much easier, because documentation reviews will be easier. What are your plans for the next change? |
@mik-laj Thanks a lot, very glad it was merged, and thanks a lot for all the help and support! |
@mik-laj Your experience with AST would probably be helpful with this ticket, but we still need to do a few things before we can implement it. |
…or descriptions (apache#9290) Co-authored-by: ivan.afonichkin <ivan.afonichkin@transferwise.com> Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
Proposed change automatically detects lack of reference to the guide in the operator's description.
This is done by first checking if "class {operator_name}" is in the file. After that we build AST tree to check if there is really such a class defined. When the class is found, we just check if docstring contains ":ref:`howto/operator:{operator_name}`".
This implementation doesn't import operator's module, so no need to install dependencies.
Issue: #8894