Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Make docsite rebuilds smarter/faster #45062

Merged
merged 9 commits into from
Sep 12, 2018
Merged

Conversation

dagwieers
Copy link
Contributor

@dagwieers dagwieers commented Sep 2, 2018

SUMMARY

The purpose of this PR is to make rebuilding the docsite as fast as possible by ensuring that we only need to rebuild what has changed, rather than rebuild everything all the time.

This PR includes:

  • Disable automatic cleanup of generated files (make clean becomes explicit)
    • Processed files go from 2502 to 2092 (rebuilding the user documentation)
  • Make plugin_formatter idempotent (rewrite file only if it has changed)
    • Processed files go from 2092 to 12 (rebuilding modules + plugins docs)
  • Make generate_man.py idempotent
    • Processed files go from 12 to 3
  • Make dump_keywords.py idempotent
    • Processed files go from 3 to 2
  • Make dump_config.py idempotent
    • Processed files go from 2 to 1
  • Make testing_formatter.sh idempotent
    • Processed files go from 1 to 0

PS This means that for e.g. rebuilding the docs for the website, you probably want to run make clean explicitly first.

ISSUE TYPE
  • Docs Pull Request
COMPONENT NAME

docsite

ANSIBLE VERSION

v2.8

@ansibot
Copy link
Contributor

ansibot commented Sep 2, 2018

@ansibot ansibot added affects_2.7 This issue/PR affects Ansible v2.7 docs This issue/PR relates to or includes documentation. needs_triage Needs a first human triage before being processed. small_patch support:core This issue/PR relates to code supported by the Ansible Engineering Team. labels Sep 2, 2018
@dagwieers dagwieers changed the title Docs: Don't clean up Docs: Make docsite rebuilds smarter Sep 2, 2018
@dagwieers
Copy link
Contributor Author

cc @felixfontein

@dagwieers dagwieers changed the title Docs: Make docsite rebuilds smarter Docs: Make docsite rebuilds smarter/faster Sep 2, 2018
@ansibot ansibot added needs_ci This PR requires CI testing to be performed. Please close and re-open this PR to trigger CI. stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. and removed needs_ci This PR requires CI testing to be performed. Please close and re-open this PR to trigger CI. stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. labels Sep 2, 2018
if os.path.exists(output_name):
sha1_new = sha1(data).hexdigest()
sha1_old = sha1(open(output_name).read()).hexdigest()
if sha1_new == sha1_old:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply compare contents, instead of computing sha1 of both first? Since you read both into memory, that's more efficient :)

Also: it currently fails under Python 3:

Traceback (most recent call last):
  File "../bin/dump_config.py", line 84, in <module>
    sys.exit(main(sys.argv[:]))
  File "../bin/dump_config.py", line 72, in main
    sha1_old = sha1(open(output_name).read()).hexdigest()
TypeError: Unicode-objects must be encoded before hashing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: since you're not closing the file, this might break under strange file systems (such under Windows -- no idea if anyone ever tries to do that, tough :) ).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixfontein When the script finishes, the file is closed. I don't think it is a problem as every call generates a single file anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably makes sense to create a new utils.py module which has a function write_to_file_if_different(filename, data) which does the heavy lifting (and where this is done cleanly), so that all the details are only repeated once.

@felixfontein
Copy link
Contributor

For me, with this change (adjusted to work with Python 3 and doing direct comparsions instead of computing sha1), I got some timing improvements when running the update docs process for six modules. First run:

real	2m18.633s
user	6m0.063s
sys	0m4.443s

Running it again (with no change):

real	1m10.700s
user	2m40.249s
sys	0m2.763s

So yeah, my laptop still gets warm (and I can hear when it's done when the fan stops making noise), but it got much better :)

@dagwieers
Copy link
Contributor Author

dagwieers commented Sep 2, 2018

What command did you use ? I can run it under 12 seconds when there are no changes.

real	0m11.330s
user	0m10.443s
sys	0m0.889s

(Doing sha1 checksums or comparing file content doesn't make a real impact)

@dagwieers dagwieers force-pushed the docs-make branch 3 times, most recently from 8e729d4 to e75b81b Compare September 2, 2018 04:43
@ansibot

This comment has been minimized.

@abadger
Copy link
Contributor

abadger commented Sep 12, 2018

Regarding the swapping branches case... there will be a cornercase problem where generated rst files that are not present in the other branch are now not cleaned before the branch is built (This also affects devel when a module or other documentation is removed). Using Makefiles usually works around this by having a master list of all source files somewhere in the Makefile but we don't have that in the docs build.

Note that this is not a new problem. When module docs moved to a subdirectory I found that I had a lot of crufty files left over in the directory. I would have had to make clean prior to checking out the module docs move in order to clean those up.

@dagwieers
Copy link
Contributor Author

@abadger Sure, as I stated, doing a make clean becomes a conscious decision rather a mandatory action. Not making this possible is simply not acceptable if you want people to enjoy improving documentation (whether it means a module doc or a small change in a guide).

I don't mind creating a separate target in the Makefile that doesn't do the clean, but IMO that muddies everything even more. The only situation where this is a real problem is the docs generated for the website (which is a contained process we manage), for most contributors having stale, unused, unreferenced leftovers is not a real concern IMO. And if so, we simply have to document make clean in the process.

PS I don't think we want to backport this. Documentation work is mostly done (and has the most impact) in the devel branch.

@dagwieers dagwieers force-pushed the docs-make branch 3 times, most recently from 90fb2d4 to 54742b5 Compare September 12, 2018 14:28
@ansibot ansibot removed the stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. label Sep 12, 2018
@@ -0,0 +1,23 @@
# Copyright: (c) 2018, Dag Wieers <dag@wieers.com>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this private (prepend the filename with a leading underscore)? The reason is that it is only used at buildtime by our build scripts. It isn't used at runtime at all. Marking it private clues other people in to the fact that they shouldn't use it in runtime code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I would prefer that if this functionality is useful, it does get used in runtime code, rather than being reinvented. That's why I would prefer it in a normal library, helper.py was a perfect fit IMO. But announcing it as "not to be used" seems to be the wrong reflex ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's more code that is being used at build-time coming from the normal libraries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a use case in the runtime code in mind that you can point me at?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and yes, build time using run time code is fine. What's misorganized is when build time uses runtime code which nothing in runtime uses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Private normally means 'not exposed for external use', because you want to avoid people to use it. I don't see why this function should not be used. This is another mental exercise and very ambiguous. Should we avoid people using this function, is there a greater harm that I am not seeing or should we encourage to reuse it if it is useful ? And why make an exceptional case for one simple function.

Currently we have utils/plugins_docs.py which is not private either, contains 2 functions that are only used by the same scripts. So I think you all have too much time on your hands to split hairs for things that don't matter as much as you think they do. Maybe, you can do those cleanups in a separate PR and merge this. Because I have had it with those small pet-peeves. I'd like to spend my time on more important things, if you don't mind :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plugin_docs is an example of something that should not have been committed into utils without more thought.... however, if you ore closely, get_docstring() is in multiple files in runtime. it uses add_fragments. add_fragments use merge_fragment. So everything in utils/plugin_docs is used by runtime.

If we have a runtime use case for this, then I see value in making sure it's a good api and looking at making it good for general usage. If we don't have a runtime usage then we end up having to maintain an api which may not be well suited to the task people put it too. that's why things not used at runtime should remain private until a use case emerges. Then we can make sure the API is sufficient for the use cases and make it public.

@ansibot ansibot added needs_ci This PR requires CI testing to be performed. Please close and re-open this PR to trigger CI. stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. labels Sep 12, 2018
@ansibot ansibot removed needs_ci This PR requires CI testing to be performed. Please close and re-open this PR to trigger CI. stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. labels Sep 12, 2018
@ansibot

This comment has been minimized.

@ansibot ansibot added the needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. label Sep 12, 2018
@abadger
Copy link
Contributor

abadger commented Sep 12, 2018

@abadger Sure, as I stated, doing a make clean becomes a conscious decision rather a mandatory action. Not making this possible is simply not acceptable if you want people to enjoy improving documentation (whether it means a module doc or a small change in a guide).

I don't mind creating a separate target in the Makefile that doesn't do the clean, but IMO that muddies everything even more.

Yeah, I agree... don't create a separate Makefile target.

My note isn't really about running make clean... it's more that make clean is insufficient and we're slightly increasing the boundaries where that is the case. We can fix make clean by specifying all of the source files in the Makefile or we can continue to ignore those cornercases. Since it's an existing problem, docsite rebuild should be designed to use a clean checkout rather than relying on make clean.

@mattclay mattclay added the ci_verified Changes made in this PR are causing tests to fail. label Sep 12, 2018
* Make helper function private as it's only used by the build tools at the moment
* Use to_bytes instead of encode() for encoding safety.
* Make the race window in update_file_if_different() smaller.
@ansibot ansibot removed the ci_verified Changes made in this PR are causing tests to fail. label Sep 12, 2018
@abadger abadger merged commit 310b0a2 into ansible:devel Sep 12, 2018
Contributor Experience automation moved this from Needs review to Done Sep 12, 2018
@abadger
Copy link
Contributor

abadger commented Sep 12, 2018

Updated the PR to address those concerns and merged to devel.

@dagwieers
Copy link
Contributor Author

Thanks :-)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
affects_2.7 This issue/PR affects Ansible v2.7 docs This issue/PR relates to or includes documentation. needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. support:core This issue/PR relates to code supported by the Ansible Engineering Team.
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

8 participants