Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move providers docs to separate package + Spell-check in a common job with docs-build #12527

Merged
merged 12 commits into from
Nov 22, 2020

Conversation

mik-laj
Copy link
Member

@mik-laj mik-laj commented Nov 21, 2020

I am extracting the provider's docs to a separate documentation package, so that we can update it independently of other documentation.
Unfortunately, it has now turned out that building documentation from multiple packages and checking spelling in a separate job does not work well with each other. The spell checker checks that all links are valid, but this is not possible if the documentation has not been built at least once.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@mik-laj mik-laj changed the title Move providers references to separate packages Move providers references to separate packages + Disabling spell-check in separate job Nov 21, 2020
@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I get correctly thaat the spell check will happen inside the docs build utomatically? If so - we should also remove the flags in ci_docs.sh, I think and we should fix teh documentation in CONTRIBUTING and Breeze about it (just look for --docs-only or --spellcheck-only).

@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@mik-laj
Copy link
Member Author

mik-laj commented Nov 21, 2020

As far as I know, this documentation building doesn't mean the spelling will be checked. These are two separate steps.

This change causes spelling and building to be done in one job on the CI, but if the user wants to iterate locally over the documentation in small steps, they can still do so. If they don't build all the documentation beforehand (a situation similar to spell check in a separate CI Job), the process may end with a message that not all was successful, but the spelling will be checked and the user can start working on other changes. If they want a full check, they can send the change to the CI.

All this will allow us to shorten the iteration time, because we will only perform selected steps and iterate over them.

$ time ./build_docs.py --spellcheck-only --package apache-airflow
real	1m10.113s
user	0m54.709s
sys	0m12.332s
$ time ./build_docs.py --docs-only --package apache-airflow
real	1m32.180s
user	1m24.968s
sys	0m5.662s

The documentation for building the documentation has been. moved to /docs/README.rst to make it easier to find. For now this file may not be very visible, but I will clean up this directory a bit soon and it will be the only .rst file in this directory. All other files will be in subdirectories following to the structure of the output directory.

@mik-laj mik-laj changed the title Move providers references to separate packages + Disabling spell-check in separate job Move providers references to separate package + Disabling spell-check in separate job Nov 21, 2020
@mik-laj mik-laj changed the title Move providers references to separate package + Disabling spell-check in separate job Move providers references to separate package + Spell-check in a common job with docs-build Nov 21, 2020
@potiuk
Copy link
Member

potiuk commented Nov 21, 2020

The documentation for building the documentation has been. moved to /docs/README.rst to make it easier to find. For now this file may not be very visible, but I will clean up this directory a bit soon and it will be the only .rst file in this directory. All other files will be in subdirectories following to the structure of the output directory.

So why do we neet do join it in one job then? What are the benefits?

If they are split into two separate jobs, they run in parallel and you can get the feedback much faster if you run a small doc-only PR and there are not many PRs in the queue.

In case we run those two jobs in parallel, rather than waiting 2:40 seconds user will have to wait 1:32. I think this is a good enough reason to keep those jobs as two separate jobs. Unless I am missing something.

@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@mik-laj
Copy link
Member Author

mik-laj commented Nov 21, 2020

@potiuk The problem is that the spell check will fail because sphinx will not have the inventory file it needs. This file is only created when building documentation in HTML format.

To understand the problem that I am trying to solve I need to tell you a little bit about how airflow_internsphinx works. It is an extension that works with sphinx.ext.intersphinx.. sphinx.ext.intersphinx allows us to create links between different documentation. For this purpose, the HTML documentation creates an objects.inv file in the output directory during build.
For example:

https://airflow.readthedocs.io/en/latest/objects.inv
http://apache-airflow-docs.s3-website.eu-central-1.amazonaws.com/docs/apache-airflow-providers-apache-kylin/latest/objects.inv

This file lists all the objects that are in the document along with the relative path to this document.
For example:

:class:`airflow.providers.apache.kylin.hooks.kylin` => _api/airflow/providers/apache/kylin/hooks/kylin/index.html#airflow.providers.apache.kylin.hooks.kylin.KylinHook

If you want to see the full list of contents then you can run the script /docs/list-roles.py.

airflow_intersphiinx is an extension that sets up intersphinx for each documentation package we build. For each, two potential locations of the objects.inv file are checked:

  • Local objects.inv - this file may not exist if documentation for the package has not been built.
  • Remote cache on S3

If both sources are not found, we have a warning and the documentation building/spell checking fails. This can happen when we add a new package that has not yet been built, but another package that is being built earlier references it. And it surely will, as airflow_intersphinx creates a reference between all packages.

My proposed solution is to detect this situation and build problematic packages again. Then the local objects.inv files will be used and the whole operation will be successful.

This workaround cannot be used for spell checking. If we have a problem while checking the spelling, we have no other choice - we have to fail the build. Therefore, I would like to combine these steps into one job CI so that spell check has access to the local cache.

I realize that this is not a perfect solution, but I think it will not be too burdensome for us as we can build the documentation quickly locally. It was a big problem before, if building documentation always took 10+ minutes. Then running a task on CI allowed us to unblock our computer.

@potiuk
Copy link
Member

potiuk commented Nov 21, 2020

I see - ok @mik-laj - many thanks for the detailed explanation. It's now perfectly clear. Great stuff!

@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@mik-laj mik-laj changed the title Move providers references to separate package + Spell-check in a common job with docs-build Move providers docs to separate package + Spell-check in a common job with docs-build Nov 21, 2020
@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

… Move providers references to separate packages
@mik-laj mik-laj requested a review from potiuk November 21, 2020 22:28
… fixup! Move providers references to separate packages
@mik-laj mik-laj marked this pull request as ready for review November 21, 2020 22:37
@@ -946,7 +946,7 @@ Doc-only changes
- [AIRFLOW-XXX] Fix typo - AWS DynamoDB Hook (#6319)
- [AIRFLOW-XXX] Fix Documentation for adding extra Operator Links (#6301)
- [AIRFLOW-XXX] Add section on task lifecycle & correct casing in docs (#4681)
- [AIRFLOW-XXX] Make it clear that 1.10.5 wasn't accidentally omitted from UPDATING.md (#6240)
- [AIRFLOW-XXX] Make it clear that 1.10.5 was not accidentally omitted from UPDATING.md (#6240)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the Sphinx has started detecting some extra typos.

@@ -29,7 +29,7 @@ integrations:
external-doc-url: https://snowflake.com/
how-to-guide:
- /docs/howto/operator/snowflake.rst
tags: [software]
tags: [service]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to avoid creating a new page with only one operator. Earlier I was hoping that Yandex would add other integrations as well and then a separate section would make sense, but they didn't.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Nov 22, 2020
@github-actions
Copy link

The PR needs to run all tests because it modifies core of Airflow! Please rebase it to latest master or ask committer to re-run it!

@mik-laj mik-laj merged commit ef4af21 into apache:master Nov 22, 2020
@mik-laj mik-laj deleted the provider-ref branch November 22, 2020 08:29
@mik-laj mik-laj mentioned this pull request Nov 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
full tests needed We need to run full set of tests for this PR to merge kind:documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants