Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficient validation for intra-link with hash #2465

Merged
merged 30 commits into from Mar 29, 2024

Conversation

yiwen101
Copy link
Contributor

@yiwen101 yiwen101 commented Mar 16, 2024

What is the purpose of this pull request?

  • Documentation update
  • Bug fix
  • Feature addition or enhancement
  • Code maintenance
  • DevOps
  • Improve developer experience
  • Others, please explain:

working on #1418

Overview of changes:
Store all the elements with ids (accessible) in the siteLinkManager
Validate intra-link with hash at the link processor;
Fixes 15 detected invalid intra-link with hash in the current documentation.


This is a draft PR; has encountered issue and stuck when working on this issue, so I post my work so far to seek help.

Current implementation:
1 for nodes with node.attribs.ids, simply add to the collection in the siteLinkManager

2 for header tags, add to the collection in the siteLinkManager after they have been granted ids
3 for include nodes, after they have been processed, recursively add their and their children ids to the
collection in the siteLinkManager; if they/their children are header tags, grant them ids with the same util method as in 2
Screenshot 2024-03-16 at 10 36 58

Current issue:
1 some header added in step 3 seems to be off:
Screenshot 2024-03-16 at 10 22 30
Screenshot 2024-03-16 at 10 39 07

2 there are still some hashes missing, not collected:
Screenshot 2024-03-16 at 10 29 50

Anything you'd like to highlight/discuss:

Testing instructions:

Proposed commit message: (wrap lines at 72 characters)
Implement efficient validation for hash intra-link


Checklist: ☑️

  • Updated the documentation for feature additions and enhancements
  • Added tests for bug fixes or features
  • Linked all related issues
  • No unrelated changes

Reviewer checklist:

Indicate the SEMVER impact of the PR:

  • Major (when you make incompatible API changes)
  • Minor (when you add functionality in a backward compatible manner)
  • Patch (when you make backward compatible bug fixes)

At the end of the review, please label the PR with the appropriate label: r.Major, r.Minor, r.Patch.

Breaking change release note preparation (if applicable):

  • To be included in the release note for any feature that is made obsolete/breaking

Give a brief explanation note about:

  • what was the old feature that was made obsolete
  • any replacement feature (if any), and
  • how the author should modify his website to migrate from the old feature to the replacement feature (if possible).

@yiwen101 yiwen101 marked this pull request as draft March 16, 2024 02:17
@yiwen101
Copy link
Contributor Author

It turns out that my code is good. All the 15 mismatches are actually legitimate broken links :(

I will fix all the broken links with hash in this PR as well. For ease of verification of reviewer, I will also post pictures of the positions of these broken hash links here:

Screenshot 2024-03-19 at 15 40 52 Screenshot 2024-03-19 at 15 44 06 Screenshot 2024-03-19 at 15 37 08

Screenshot 2024-03-19 at 15 35 39

arkBind/markbind/assets/121547057/c14aeb4a-e60e-42ed-98f5-92b96d25ebab"> Screenshot 2024-03-19 at 15 29 48 Screenshot 2024-03-19 at 15 28 08 Screenshot 2024-03-19 at 15 25 46 Screenshot 2024-03-19 at 15 22 36 Screenshot 2024-03-19 at 15 13 51

@yiwen101 yiwen101 marked this pull request as ready for review March 19, 2024 17:55
Copy link

codecov bot commented Mar 19, 2024

Codecov Report

Attention: Patch coverage is 63.63636% with 20 lines in your changes are missing coverage. Please review.

Project coverage is 51.11%. Comparing base (9da549c) to head (49236f8).

Files Patch % Lines
packages/core/src/html/SiteLinkManager.ts 54.16% 10 Missing and 1 partial ⚠️
packages/core/test/unit/utils/utils.ts 60.00% 4 Missing ⚠️
packages/core/src/html/linkProcessor.ts 76.92% 3 Missing ⚠️
packages/core/src/html/headerProcessor.ts 60.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2465      +/-   ##
==========================================
+ Coverage   50.98%   51.11%   +0.12%     
==========================================
  Files         124      124              
  Lines        5305     5355      +50     
  Branches     1137     1152      +15     
==========================================
+ Hits         2705     2737      +32     
- Misses       2311     2328      +17     
- Partials      289      290       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@yiwen101 yiwen101 changed the title Validate hash (intermediate work) Validate intra-link with hash Mar 19, 2024
@yiwen101 yiwen101 changed the title Validate intra-link with hash Efficient validation for intra-link with hash Mar 19, 2024
@tlylt
Copy link
Contributor

tlylt commented Mar 19, 2024

Efficient

Could you test with the cs2103T website to check the before & after timing of running markbind-serve/build over a large website, to validate whether this solution is in practice efficient?

@yiwen101
Copy link
Contributor Author

yiwen101 commented Mar 20, 2024

Efficient

Could you test with the cs2103T website to check the before & after timing of running markbind-serve/build over a large website, to validate whether this solution is in practice efficient?

Thanks for the comment, the following if the result on my end:
valid-hash branch, intrasiteLinkValidation {enabled: true}
Screenshot 2024-03-20 at 10 49 51

valid-hash branch, intrasiteLinkValidation {enabled: false}
Screenshot 2024-03-20 at 10 52 29

master-branch, intrasiteLinkValidation {enabled: false}
Screenshot 2024-03-20 at 10 55 10

The additional cost should be negligible, and literally invisible (as master branch in theory should run faster when all other factor hold constant, but it turns out run the slowest, suggesting that even the fluctuation in runtime caused by other factors has significantly higher impact than the runtime contributed by the changes)

yiwen101 and others added 3 commits March 20, 2024 11:07
Copy link
Contributor

@EltonGohJH EltonGohJH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yiwen101
Thank you for implementing this!

The implementation looks good to me!

I tested it on my end and it seems to be working well!
I did not realise any noticable slow down at all!

One thing I realised is that we can add the function documentation (and also improve the old ones if needed) what do you think? For instance functions like setHeadingId. I think we should maintain a good amount of function documentation!

packages/core/src/html/SiteLinkManager.ts Outdated Show resolved Hide resolved
@tlylt
Copy link
Contributor

tlylt commented Mar 20, 2024

The additional cost should be negligible, and literally invisible (as master branch in theory should run faster when all other factor hold constant, but it turns out run the slowest, suggesting that even the fluctuation in runtime caused by other factors has significantly higher impact than the runtime contributed by the changes)

👍

For your test run on 2103T website, how many valid intra-link hash errors were detected? Would be useful info for @damithc for follow-up broken link fixes.

Copy link
Contributor

@kaixin-hc kaixin-hc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your hard work! I didn't spot anything else while reading through the code. I agree with Elton that including a bit of comment documentation would be good.

One thing I wonder is if it would be possible and desired to add something that tests this to the test site.

Copy link
Contributor

@yucheng11122017 yucheng11122017 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @yiwen101 thanks for the work! I think this is a bit tricky so please try to explain the code a little. I added some questions and also some nits.

packages/core/src/html/headerProcessor.ts Outdated Show resolved Hide resolved
packages/core/src/html/headerProcessor.ts Show resolved Hide resolved
packages/core/src/html/headerProcessor.ts Show resolved Hide resolved
packages/core/src/html/SiteLinkManager.ts Outdated Show resolved Hide resolved
packages/core/src/html/SiteLinkManager.ts Show resolved Hide resolved
packages/core/src/html/SiteLinkManager.ts Outdated Show resolved Hide resolved
yiwen101 and others added 2 commits March 24, 2024 23:24
Co-authored-by: Chan Yu Cheng <77204346+yucheng11122017@users.noreply.github.com>
@yiwen101
Copy link
Contributor Author

Sorry for getting back to the review late; was overwhelmed by a hackathon due Friday this week

@kaixin-hc
Thank you for the careful review and suggestions. Could you help elaborate further on how could I add this to test site?
Current implementation of the functional test seems only adequate of recursively comparing the files in the expected folder and the actual files during the build process. I do not know how to "expect" error logs. The PR that brings in the intra-link validation feature also did not add functional test.

@EltonGohJH @yucheng11122017
Thank you for the careful review; I have made the requested changes and added method document.

The only exception is the "print" method in SitelinkManager. Although it is most for test purpose, I believe that it is a necessary evils and the best resort among all choices, so should leave it as it is.

Copy link
Contributor

@yucheng11122017 yucheng11122017 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @yiwen101 thanks for the work and sorry for the late review! I think the implementation and so on is good: just some comments regarding the naming.

* Add sections that could be reached by intra-link with hash to this node to FilePathToHashesMap,
* The reachable sections include nodes with ids and headings.
*
* ForceWrite should only be called when processing heading node with the maintainHashesForInclude method.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* ForceWrite should only be called when processing heading node with the maintainHashesForInclude method.
* forceWrite should only be used when processing heading node.

Could you explain why?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think also the name forceWrite is not accurate. This also seems like a boolean instead of an override

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I have tried improving the code quality in a later commit; this parameter no longer exists.

packages/core/src/html/SiteLinkManager.ts Outdated Show resolved Hide resolved
packages/core/src/html/SiteLinkManager.ts Outdated Show resolved Hide resolved
packages/core/src/html/SiteLinkManager.ts Outdated Show resolved Hide resolved
packages/core/src/html/headerProcessor.ts Outdated Show resolved Hide resolved
packages/core/src/html/headerProcessor.ts Outdated Show resolved Hide resolved
yiwen101 and others added 7 commits March 28, 2024 20:33
Co-authored-by: Chan Yu Cheng <77204346+yucheng11122017@users.noreply.github.com>
Co-authored-by: Chan Yu Cheng <77204346+yucheng11122017@users.noreply.github.com>
Co-authored-by: Chan Yu Cheng <77204346+yucheng11122017@users.noreply.github.com>
@yiwen101
Copy link
Contributor Author

@yucheng11122017
Thank you for your careful review and various suggestions on improving code quality.

In a later commit, I made following changes to existing methods to improve code quality. They are:
1 rename "processAndReturnHeadingId" back to "setHeadingId", and modify its behaviour accordingly.
2 remove the "forceWrite" from maintainFilePathToHashesMap
3 modify the maintainHashesForInclude accordingly.

I hope that the code becomes clearer after the changes.

Copy link
Contributor

@EltonGohJH EltonGohJH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Other than the naming of the class for testing

packages/core/test/unit/utils/utils.ts Outdated Show resolved Hide resolved
@kaixin-hc kaixin-hc merged commit 66591c5 into MarkBind:master Mar 29, 2024
9 checks passed
@kaixin-hc kaixin-hc added this to the v5.4.1 milestone Apr 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants