New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add checksum parameter to Document #793
Conversation
Also, the code to generate a checksum for a file is kind of a duplicate of the code of the checksummer for pathnames. I did not want to create a dependency between a datasource and the checksummer; while this might be okay for the default filesystem datasource, external datasources would not want to depend on the internal checksummer. (Would also be an argument to have pathnames calculate their own checksum, but we've discussed that already 😉) |
This looks good! Some remarks:
|
|
Not entirely correct: the checksum being different is a necessary but not sufficient condition for content to be loaded. For instance, an item (whose checksum is identical) that includes content from another item (whose checksum is different) will need to be recompiled even though its checksum hasn’t changed. The idea of only loading data when necessary is nonetheless quite interesting and is something I plan to tackle in Nanoc at some point in the future. |
@@ -249,7 +251,7 @@ def parse(content_filename, meta_filename, _kind) | |||
content = pieces[4] | |||
|
|||
# Done | |||
[meta, content] | |||
[meta, content, meta_raw] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think meta_raw
is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nm—it is!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update the #parse
documentation to reflect the changed return value?
👍 apart from comments |
Yeah, I cut a corner in writing that down. What I meant is: content only needs to be loaded when a reason is found to recompile, since determining that reason no longer requires the item contents.
I think it could be as simple as allowing |
Yup.
I have a few ideas regarding optimisations like these, and I’d rather write down these thoughts in a proper RFC first. (Can’t seem to find the time, though.) |
Okay, keep me updated when that RFC goes out 😄 Finishing this PR now. |
All done. |
Looks good, thanks! |
Add checksum parameter to Document
Some follow-up discussion:
|
Lazy loading still makes sense, not for memory reasons, but to avoid the overhead of parsing the file and its metadata. The checksum can be made without parsing – only when the checksum is different, the file actually needs to be parsed. |
I’d argue that in most cases, parsing the metadata doesn’t impose enough of an overhead to try to avoid doing it—but I haven’t profiled this extensively to be certain of that. |
This is the “nothing changed” case, correct? |
Yes. |
As suggested in #790, this pull request allows a
Document
to specify a pre-calculated checksum. This gives data sources the ability to determine a checksum for the items and layouts they create.This pull request:
checksum
parameter to the initializer ofDocument
(as well asnew_item
andnew_layout
)Checksummer
to use thechecksum
onItem
andLayout
if presentThis realizes a major compilation speed gain for my website (from 11s to 3s if nothing is changed), and paves the way for lazy loading of items, which could bring down compilation time even further.