-
Notifications
You must be signed in to change notification settings - Fork 23.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs: Make docsite rebuilds smarter/faster #45062
Conversation
docs/bin/dump_config.py
Outdated
if os.path.exists(output_name): | ||
sha1_new = sha1(data).hexdigest() | ||
sha1_old = sha1(open(output_name).read()).hexdigest() | ||
if sha1_new == sha1_old: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not simply compare contents, instead of computing sha1 of both first? Since you read both into memory, that's more efficient :)
Also: it currently fails under Python 3:
Traceback (most recent call last):
File "../bin/dump_config.py", line 84, in <module>
sys.exit(main(sys.argv[:]))
File "../bin/dump_config.py", line 72, in main
sha1_old = sha1(open(output_name).read()).hexdigest()
TypeError: Unicode-objects must be encoded before hashing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also: since you're not closing the file, this might break under strange file systems (such under Windows -- no idea if anyone ever tries to do that, tough :) ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felixfontein When the script finishes, the file is closed. I don't think it is a problem as every call generates a single file anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably makes sense to create a new utils.py
module which has a function write_to_file_if_different(filename, data)
which does the heavy lifting (and where this is done cleanly), so that all the details are only repeated once.
For me, with this change (adjusted to work with Python 3 and doing direct comparsions instead of computing
Running it again (with no change):
So yeah, my laptop still gets warm (and I can hear when it's done when the fan stops making noise), but it got much better :) |
What command did you use ? I can run it under 12 seconds when there are no changes.
(Doing sha1 checksums or comparing file content doesn't make a real impact) |
8e729d4
to
e75b81b
Compare
This comment has been minimized.
This comment has been minimized.
Regarding the swapping branches case... there will be a cornercase problem where generated rst files that are not present in the other branch are now not cleaned before the branch is built (This also affects devel when a module or other documentation is removed). Using Makefiles usually works around this by having a master list of all source files somewhere in the Makefile but we don't have that in the docs build. Note that this is not a new problem. When module docs moved to a subdirectory I found that I had a lot of crufty files left over in the directory. I would have had to make clean prior to checking out the module docs move in order to clean those up. |
@abadger Sure, as I stated, doing a I don't mind creating a separate target in the Makefile that doesn't do the clean, but IMO that muddies everything even more. The only situation where this is a real problem is the docs generated for the website (which is a contained process we manage), for most contributors having stale, unused, unreferenced leftovers is not a real concern IMO. And if so, we simply have to document PS I don't think we want to backport this. Documentation work is mostly done (and has the most impact) in the devel branch. |
90fb2d4
to
54742b5
Compare
lib/ansible/utils/doc_helpers.py
Outdated
@@ -0,0 +1,23 @@ | |||
# Copyright: (c) 2018, Dag Wieers <dag@wieers.com> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this private (prepend the filename with a leading underscore)? The reason is that it is only used at buildtime by our build scripts. It isn't used at runtime at all. Marking it private clues other people in to the fact that they shouldn't use it in runtime code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I would prefer that if this functionality is useful, it does get used in runtime code, rather than being reinvented. That's why I would prefer it in a normal library, helper.py was a perfect fit IMO. But announcing it as "not to be used" seems to be the wrong reflex ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's more code that is being used at build-time coming from the normal libraries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a use case in the runtime code in mind that you can point me at?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and yes, build time using run time code is fine. What's misorganized is when build time uses runtime code which nothing in runtime uses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Private normally means 'not exposed for external use', because you want to avoid people to use it. I don't see why this function should not be used. This is another mental exercise and very ambiguous. Should we avoid people using this function, is there a greater harm that I am not seeing or should we encourage to reuse it if it is useful ? And why make an exceptional case for one simple function.
Currently we have utils/plugins_docs.py which is not private either, contains 2 functions that are only used by the same scripts. So I think you all have too much time on your hands to split hairs for things that don't matter as much as you think they do. Maybe, you can do those cleanups in a separate PR and merge this. Because I have had it with those small pet-peeves. I'd like to spend my time on more important things, if you don't mind :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plugin_docs is an example of something that should not have been committed into utils without more thought.... however, if you ore closely, get_docstring() is in multiple files in runtime. it uses add_fragments. add_fragments use merge_fragment. So everything in utils/plugin_docs is used by runtime.
If we have a runtime use case for this, then I see value in making sure it's a good api and looking at making it good for general usage. If we don't have a runtime usage then we end up having to maintain an api which may not be well suited to the task people put it too. that's why things not used at runtime should remain private until a use case emerges. Then we can make sure the API is sufficient for the use cases and make it public.
54742b5
to
501f4d4
Compare
501f4d4
to
7bf9f68
Compare
This comment has been minimized.
This comment has been minimized.
Yeah, I agree... don't create a separate Makefile target. My note isn't really about running make clean... it's more that make clean is insufficient and we're slightly increasing the boundaries where that is the case. We can fix make clean by specifying all of the source files in the Makefile or we can continue to ignore those cornercases. Since it's an existing problem, docsite rebuild should be designed to use a clean checkout rather than relying on make clean. |
7bf9f68
to
1924dc1
Compare
1924dc1
to
9f12801
Compare
* Make helper function private as it's only used by the build tools at the moment * Use to_bytes instead of encode() for encoding safety. * Make the race window in update_file_if_different() smaller.
Updated the PR to address those concerns and merged to devel. |
Thanks :-) |
SUMMARY
The purpose of this PR is to make rebuilding the docsite as fast as possible by ensuring that we only need to rebuild what has changed, rather than rebuild everything all the time.
This PR includes:
make clean
becomes explicit)plugin_formatter
idempotent (rewrite file only if it has changed)generate_man.py
idempotentdump_keywords.py
idempotentdump_config.py
idempotenttesting_formatter.sh
idempotentPS This means that for e.g. rebuilding the docs for the website, you probably want to run
make clean
explicitly first.ISSUE TYPE
COMPONENT NAME
docsite
ANSIBLE VERSION
v2.8