Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect automatically the lack of reference to the guide in the operator descriptions #9290

Merged

Conversation

ivan-afonichkin
Copy link
Contributor

@ivan-afonichkin ivan-afonichkin commented Jun 14, 2020

Proposed change automatically detects lack of reference to the guide in the operator's description.

This is done by first checking if "class {operator_name}" is in the file. After that we build AST tree to check if there is really such a class defined. When the class is found, we just check if docstring contains ":ref:`howto/operator:{operator_name}`".

This implementation doesn't import operator's module, so no need to install dependencies.

Issue: #8894

@boring-cyborg
Copy link

boring-cyborg bot commented Jun 14, 2020

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (flake8, pylint and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://apache-airflow-slack.herokuapp.com/

ashb
ashb previously requested changes Jun 15, 2020
Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a -1 from me -- instead of having a separate guide page for each operator we should instead have better content on the API doc page for the operator itself.

Guides should be limited to case where we need to document/illustrate workflows that use more than a single operator.

And at any rate, this PR cannot be merged as you have just needed 31 new guides to be written.

@ivan-transferwise
Copy link
Contributor

Hello @ashb, I believe there is misunderstanding on your side. What this PR does exactly is IF there is a separate guide, THEN require link from operator's docstring. IF there is no separate guide, THEN no link needed. Does it somehow clarify the situation?

You are absolutely right though that separate guides are not needed all the time, but when they exist, it's better to promote them by putting link in the docstring.

@ashb
Copy link
Member

ashb commented Jun 15, 2020

Gotcha, yes totally misread what it was doing. Sorry!

(That also explains why there were only 31 errors!)

@ashb ashb dismissed their stale review June 15, 2020 09:58

I can't read

docs/build Outdated Show resolved Hide resolved
docs/build Outdated Show resolved Hide resolved
docs/build Outdated Show resolved Hide resolved
`airflow.providers.amazon.aws.operators.google_api_to_s3_transfer.GoogleApiToS3TransferOperator`."""
`airflow.providers.amazon.aws.operators.google_api_to_s3_transfer.GoogleApiToS3TransferOperator`.

.. seealso::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be added to this class. You can skip all files from the airflow.contrib directory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is not under airflow.contrib. So why should it be skipped in this case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a deprecated class. We will transfer many classes to the airflow.providers package. We only have references here to maintain backward compatibility.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formerly the core code was maintained by the original creators - Airbnb. The code that was in the contrib package was supported by the community. The project was passed to the Apache community and currently the entire code is maintained by the community, so now the division has no justification, and it is only due to historical reasons.
https://airflow.readthedocs.io/en/stable/_api/index.html
We wanted to fix this, so we created a new airflow.providers package to organize these operators.
In the airflow.{operators,sensors} package, we only have a small set of core classes.

@@ -101,6 +101,10 @@ class ECSOperator(BaseOperator): # pylint: disable=too-many-instance-attributes
Only required if you want logs to be shown in the Airflow UI after your job has
finished.
:type awslogs_stream_prefix: str

.. seealso::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be fantastic if the parameter description was at the end of the class description. The link to the guide should be above the description of the parameters. Look how it is done in airflow.providers.google

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivan-afonichkin ivan-afonichkin force-pushed the reference-lack-operator-description branch from c744406 to 7ecbab1 Compare June 16, 2020 14:27
for py_module_path in python_module_paths:
with open(py_module_path) as f:
py_content = f.read()
for existing_operator in operator_names:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for existing_operator in operator_names:
if "This module is deprecated" in py_content:
continue
for existing_operator in operator_names:

docs/build Outdated
Comment on lines 195 to 196
# Real class definition is found and docstring does not contain reference to the existing guide
if class_def is not None and f":ref:`howto/operator:{existing_operator}`" not in ast.get_docstring(class_def):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Real class definition is found and docstring does not contain reference to the existing guide
if class_def is not None and f":ref:`howto/operator:{existing_operator}`" not in ast.get_docstring(class_def):
# Real class definition is not found
if class_def is None:
continue
doc = ast.get_docstring(class_def)
if "This class is deprecated." in docstring:
continue
if f":ref:`howto/operator:{existing_operator}`" in doc:
continue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation must build properly for this change to be accepted. In some places we don't want links, so we have to handle it in the code.

@mik-laj
Copy link
Member

mik-laj commented Jun 16, 2020

Could you wait with your change until the other change is merged? Otherwise, we will have a lot of conflicts, which can be a bit problematic. This change is needed to release backport packages, so many users hope that we will finish it ASAP.
#9320

@potiuk
Copy link
Member

potiuk commented Jun 17, 2020

The transfer package change is merged now - so please rebase. those "to" operators moved to "transfers" package.

@ivan-afonichkin
Copy link
Contributor Author

Thank you everyone for the input, especially @mik-laj! 👍

Will do all these fixes later today.

@ivan-afonichkin ivan-afonichkin force-pushed the reference-lack-operator-description branch from 1e894a0 to 1023771 Compare June 17, 2020 15:53
@mik-laj
Copy link
Member

mik-laj commented Jun 17, 2020

@ivan-afonichkin Ohh. CI is still sad. Can you fix it?

@ivan-afonichkin
Copy link
Contributor Author

@mik-laj Yes, I will do it, just didn't have time to fix everything. I will be back home in 3-4h and fix it :)

Comment on lines +178 to +181
glob(f"{ROOT_PACKAGE_DIR}/operators/*.py"),
glob(f"{ROOT_PACKAGE_DIR}/sensors/*.py"),
glob(f"{ROOT_PACKAGE_DIR}/providers/**/operators/*.py", recursive=True),
glob(f"{ROOT_PACKAGE_DIR}/providers/**/sensors/*.py", recursive=True),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add transfers package here also?

@ivan-afonichkin
Copy link
Contributor Author

Hey @mik-laj, CI Build / Build docs has finally passed, but some databases checks failed with no apparent reason to me, any idea what could be wrong there?

@mik-laj
Copy link
Member

mik-laj commented Jun 17, 2020

@ivan-afonichkin It looks like one of the changes that I merged had a defect. I'm already looking at it.

@mik-laj
Copy link
Member

mik-laj commented Jun 17, 2020

@kaxil you self-requested a review 3 days ago. Would you like to add something here?

@mik-laj
Copy link
Member

mik-laj commented Jun 17, 2020

@ivan-afonichkin Fix has been merged to master. Can you do a rebase?

ivan-transferwise and others added 14 commits June 18, 2020 08:03
Check specific folders for operators/sensors

Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
…iptions function

Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
Check specific folders for operators/sensors

Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
…iptions function

Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
@ivan-afonichkin ivan-afonichkin force-pushed the reference-lack-operator-description branch from 8a27fc6 to f498c04 Compare June 18, 2020 07:04
@ivan-afonichkin
Copy link
Contributor Author

Hey @mik-laj it seems all checks have passed, thanks for your change! :)
Can we merge this PR?

@mik-laj mik-laj merged commit 40bf8f2 into apache:master Jun 18, 2020
@boring-cyborg
Copy link

boring-cyborg bot commented Jun 18, 2020

Awesome work, congrats on your first merged pull request!

@mik-laj
Copy link
Member

mik-laj commented Jun 18, 2020

@ivan-afonichkin Thank you for a great contribution. This will make our work much easier, because documentation reviews will be easier. What are your plans for the next change?

@ivan-afonichkin
Copy link
Contributor Author

@mik-laj Thanks a lot, very glad it was merged, and thanks a lot for all the help and support!
Regarding the next change, I didn't choose an issue yet, but maybe you have something on your mind that will be very beneficial and needs to be done? If not, I will just go through the backlog :)

@mik-laj
Copy link
Member

mik-laj commented Jun 18, 2020

@mik-laj Your experience with AST would probably be helpful with this ticket, but we still need to do a few things before we can implement it.
#8765
If we can set the details and unblock the work, I will call you If you or your company uses Airflow in production, then maybe you would like to share your opinion in this thread
#9276
After that, we don't have an idea for a ticket that might be of interest to you. However, I create all new tickets all the time, so you will definitely find something for you.

kaxil pushed a commit to kaxil/airflow that referenced this pull request Jun 27, 2020
…or descriptions (apache#9290)

Co-authored-by: ivan.afonichkin <ivan.afonichkin@transferwise.com>
Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants