-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Report broken links only once #1819
Conversation
22b7640
to
94b9e78
Compare
84cd6b7
to
354f269
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work! If I understand the implementation correctly:
SiteLinkManager
stores intralinks to be validated after page generation is finished (rather than real-time) in order to achieve the consolidated logging.SiteLinkManager
takes some functionalities fromlinkProcessor
that are required to determine whether a particular node should be collected for its intralink.- Therefore,
linkProcessor
validate intralinks "contract" is changed to working with the resource paths instead of the actual nodes.
My thought was to keep the intralink validation in real-time but add the (cwf, link)
pairs that errored out to the SiteLinkManager
's collection to be output after page generation finished. That way, we can keep linkProcessor
's way of doing things, just added more on top of it.
I think your implementation is fine too (maybe we could check first if there is any difference, functional or otherwise, on validating in real-time vs at the end?)
The last point is a regression on a vital function of intralink validation, so we need to be extra careful here that we don't create unintended outcomes. It is generally better to add new functions rather than changing existing ones, but if it is unavoidable, we have to make sure it's right.
Got it, thank you for the explanation! 🙇 I took a look through and here are some notes on it:
Behaviour wise, it looks like there shouldn't be a difference (besides collecting and then validating all at once, later) |
@@ -1103,6 +1111,8 @@ class Site { | |||
} else { | |||
isCompleted = await this.generatePagesAsyncThrottled(task.pages, progressBar); | |||
} | |||
|
|||
this.siteLinkManager.validateAllIntralinks(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm 👍, I'll leave @ryoarmanda to the final look and merge
Just adding a note that while this preserves the previous behaviour (across rebuilds, invalid links are only logged for rebuilt files and their dependencies), perhaps we we could improve on this in a future issue / pr to persist the invalid links across rebuilds. (one more to the list for #1418)
E.g.
markbind s
- all invalid links are reported- change some file -> only invalid links belonging to the file (or its dependencies) are reported
- author has to scroll up to the "first batch" of messages to see the invalid links
Or
markbind s -o -b
- all invalid links are reported for the opened page (and its dependencies), then for the background pages- change opened file -> only invalid links belonging to the file (or its dependencies) are reported
- background pages are reported "disjointed" from the earlier batch, same issue with the author having to piece together the information again
I think your implementation is fine too (maybe we could check first if there is any difference, functional or otherwise, on validating in real-time vs at the end?)
No preference on this for now, but doing it at the end should facilitate the 2nd item (hash checking) in #1418, where it'll likely be necessary to do it at the end (only once all hashes have been collected from all pages).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! The suggestion @ang-zeyu mentioned to persist the invalid links across rebuilds sounds like a nice improvement too.
No preference on this for now, but doing it at the end should facilitate the 2nd item (hash checking) in #1418, where it'll likely be necessary to do it at the end (only once all hashes have been collected from all pages).
Ah in that case, doing it at the end would prepare to go forth with the hash checking as well.
While experimenting for a fix to #1723, I realize that since now the links are collected and then validated, the output might end up like this if the mentioned issue is fixed: Wonder if there are any opinions/thoughts if I revert the behavior to real-time while keeping the rest of code structure from this PR the same? |
Quoting Ze Yu's reply from here:
|
I think we should be fixing fundamental issue here instead: #1633 #1723 should be naturally fixed as a result too. (but we can still update E.g. maybe use this https://www.npmjs.com/package/progress#interrupt Or (hopefully) there's even a progress bar library that works out of the box with winston 👀 |
Thanks, @jovyntls, in this case, I guess real-time will have to mean that we can check on a per-page basis (depending on how hash checking can be done, especially when dealing with includes). I think the user experience is slightly better to have the warnings pop up as we go, which gives users the freedom to stop the build/serve whenever they notice such logs in the console.
Indeed 1723 is somewhat a duplicate and adjusting the order is not the fix that I am going for (In my screenshot the link reporting does have the linebreak). I am asking more for user experience wise (should they see the broken link reported as they go, which was the previous experience, or should they see it at the end of site generation).
Indeed my working solution uses the idea of interrupt, will hopefully push up a PR to continue the discussion from there.
At first, I thought the problem is with the progress bar library and even tried another popular one (cli-progress), but it ended up having the same issue as well and no real good solutions for that (basically people have been filing issues about how difficult it is to work with loggers but not much was done to help). My sketch of the solution is to wrap the winston loggers and keep a reference to the progress bar so that the bar can be interrupted whenever there's a call to the logging methods. Again, will push up for discussion when it's done. Thanks @ang-zeyu for looking into this. |
^ just my personal preference, as an author, I would prefer to deal with link validation issues "in one block" of messages, rather than finding them disjointed all over the place. Having the messages come out real time may not be too helpful during |
Hmm that's quite true for 'markbind serve'. I'm more thinking of 'markbind build', in a larger site which the build time can be more than a few seconds, I would imagine it to be useful to know the error immediately so that I can go and fix it before I am ready to deploy this build. Certainly some personal preference is involved... |
Agreed, I think the clincher might be this ⬇️
|
What is the purpose of this pull request?
Resolves the first checkbox of #1418 (Report each broken link only once)
Fixes #1723 by only logging the warnings after the progress bar has completed
Overview of changes:
linkProcessor.validateIntralink
was modified such that it is responsible only for validating a resourcePath and a cwf; handling the node and checking whether or not to validate was moved toSiteLinkManager.collectIntralink
.Anything you'd like to highlight / discuss:
Testing instructions:
Add a broken link to a page, e.g.
Proposed commit message: (wrap lines at 72 characters)
Checklist: ☑️