Skip to content
This repository was archived by the owner on Dec 8, 2017. It is now read-only.

XPath rewrite#20

Open
divergentdave wants to merge 9 commits into
18F:masterfrom
divergentdave:xpath-rewrite
Open

XPath rewrite#20
divergentdave wants to merge 9 commits into
18F:masterfrom
divergentdave:xpath-rewrite

Conversation

@divergentdave
Copy link
Copy Markdown

I noticed this library works by modifying document.innerHTML. This approach can have a lot of downsides, especially when used on sites with dynamic content.

  • Any references to DOM objects held by other code won't be valid after replacing document.innerHTML
  • If citations appear in a tag's attributes, the replacement links will break the document
  • If there are script tags, they might get run twice when the DOM is rebuilt from setting innerHTML (don't quote me on this)
  • Performance hit from operating on markup and content in one big string
  • Further performance hit due to modifying innerHTML inside of a loop

This PR is a rewrite of the script, so that it modifies the document by only touching the text nodes that contain citations. First, using XPath, it takes a snapshot of all text nodes that aren't direct children of <script>, <style>, or <a> tags. Each text node is scanned individually for citations, and there are early outs for all-whitespace text nodes or for text nodes with no citations. Then, using the matches found by the citation library, the original text is sliced up, new text nodes and link elements are inserted in its place, and the old text node is removed. As before, this gets run on DOMContentLoaded in browser contexts, and the functions are exported for use elsewhere in non-browser contexts. I also upgraded the version of the citation library, and used the built in links feature to get GPO links for each citation.

I did a quick, unscientific speed comparison of the code before and after on a CRS report with plenty of citations, and I measured the runtime for the relevant JS method to be ~6s before and ~70ms after.

Let me know if you have any questions. Of course, this is a breaking change since the function now operates in-place on a DOM, rather than returning an HTML string. I have also changed the main function name, along with its signature.

jsdom doesn't play well with browserify/uglifyjs
Note that some links went away because GPO FDsys doesn't have documents
beyond certain date ranges. These limitations are reflected in the link
handling logic of the citation library, but were not reflected
previously in this repository.
@divergentdave
Copy link
Copy Markdown
Author

Speaking of breaking, I have pushed more commits to fix test breakage, etc.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant