New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Efficient validation for intra-link with hash #2465
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2465 +/- ##
==========================================
+ Coverage 50.98% 51.11% +0.12%
==========================================
Files 124 124
Lines 5305 5355 +50
Branches 1137 1152 +15
==========================================
+ Hits 2705 2737 +32
- Misses 2311 2328 +17
- Partials 289 290 +1 ☔ View full report in Codecov by Sentry. |
Could you test with the cs2103T website to check the before & after timing of running |
Co-authored-by: Liu Yongliang <41845017+tlylt@users.noreply.github.com>
…into validate_hash
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yiwen101
Thank you for implementing this!
The implementation looks good to me!
I tested it on my end and it seems to be working well!
I did not realise any noticable slow down at all!
One thing I realised is that we can add the function documentation (and also improve the old ones if needed) what do you think? For instance functions like setHeadingId
. I think we should maintain a good amount of function documentation!
👍 For your test run on 2103T website, how many valid intra-link hash errors were detected? Would be useful info for @damithc for follow-up broken link fixes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your hard work! I didn't spot anything else while reading through the code. I agree with Elton that including a bit of comment documentation would be good.
One thing I wonder is if it would be possible and desired to add something that tests this to the test site.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @yiwen101 thanks for the work! I think this is a bit tricky so please try to explain the code a little. I added some questions and also some nits.
Co-authored-by: Chan Yu Cheng <77204346+yucheng11122017@users.noreply.github.com>
Sorry for getting back to the review late; was overwhelmed by a hackathon due Friday this week @kaixin-hc @EltonGohJH @yucheng11122017 The only exception is the "print" method in SitelinkManager. Although it is most for test purpose, I believe that it is a necessary evils and the best resort among all choices, so should leave it as it is. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @yiwen101 thanks for the work and sorry for the late review! I think the implementation and so on is good: just some comments regarding the naming.
* Add sections that could be reached by intra-link with hash to this node to FilePathToHashesMap, | ||
* The reachable sections include nodes with ids and headings. | ||
* | ||
* ForceWrite should only be called when processing heading node with the maintainHashesForInclude method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* ForceWrite should only be called when processing heading node with the maintainHashesForInclude method. | |
* forceWrite should only be used when processing heading node. |
Could you explain why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think also the name forceWrite
is not accurate. This also seems like a boolean instead of an override
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I have tried improving the code quality in a later commit; this parameter no longer exists.
Co-authored-by: Chan Yu Cheng <77204346+yucheng11122017@users.noreply.github.com>
Co-authored-by: Chan Yu Cheng <77204346+yucheng11122017@users.noreply.github.com>
Co-authored-by: Chan Yu Cheng <77204346+yucheng11122017@users.noreply.github.com>
@yucheng11122017 In a later commit, I made following changes to existing methods to improve code quality. They are: I hope that the code becomes clearer after the changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Other than the naming of the class for testing
What is the purpose of this pull request?
working on #1418
Overview of changes:
Store all the elements with ids (accessible) in the siteLinkManager
Validate intra-link with hash at the link processor;
Fixes 15 detected invalid intra-link with hash in the current documentation.
This is a draft PR; has encountered issue and stuck when working on this issue, so I post my work so far to seek help.
Current implementation:
1 for nodes with node.attribs.ids, simply add to the collection in the siteLinkManager
2 for header tags, add to the collection in the siteLinkManager after they have been granted ids
3 for include nodes, after they have been processed, recursively add their and their children ids to the
collection in the siteLinkManager; if they/their children are header tags, grant them ids with the same util method as in 2
Current issue:
1 some header added in step 3 seems to be off:
2 there are still some hashes missing, not collected:
Anything you'd like to highlight/discuss:
Testing instructions:
Proposed commit message: (wrap lines at 72 characters)
Implement efficient validation for hash intra-link
Checklist: ☑️
Reviewer checklist:
Indicate the SEMVER impact of the PR:
At the end of the review, please label the PR with the appropriate label:
r.Major
,r.Minor
,r.Patch
.Breaking change release note preparation (if applicable):