Improve parser performance by 50% #79

thecodrr · 2024-04-05T12:20:20Z

This PR significantly improves parse & parseSlice performance by optimizing how Prosemirror does tag matching:

Running querySelectorAll on the main dom node once instead of calling matches on each node individually
Using a simple condition for tag name selectors like li, p etc. This avoids calling any browser API and works much faster.

These changes shouldn't break anything.

Benchmarks

Before:

After:

That's a solid ~40% improvement.

thecodrr · 2024-04-05T12:32:09Z

I tried running the tests but I keep getting this error:

     RangeError: Can not convert <paragraph("foo")> to a Fragment (looks like multiple versions of prosemirror-model were loaded)

marijnh · 2024-04-05T17:29:38Z

What kind of things are you doing where parser performance is a big issue? I'm not sure adding this complexity for a 50% increase is worth it. Adding a simple branch for plain tag matching to matches() might be a reasonable idea (though I'm not sure why browser implementations of DOMElement.matches wouldn't already be optimized that way).

You're probably using yarn or a very old npm to install your dependencies if you're getting duplicated modules like that.

thecodrr · 2024-04-05T19:28:07Z

What kind of things are you doing where parser performance is a big issue?

We are currently using ProseMirror in a note taking app which stores these notes in HTML and later the user can view/edit them. For that to work, the parse is obviously invovled. It isn't an issue for smaller notes but for large notes (> 300K words) optimizing the parser can bring down the waiting time between user clicking on a note and it appearing on the screen ready to edit. Even a small improvement helps.

I'm not sure adding this complexity for a 50% increase is worth it.

I mean, it is a total of 30 lines and 2 methods. Even that can be further reduced by refactoring the code. 50% increase is not trivial for the amount of changes this PR makes.

Adding a simple branch for plain tag matching to matches() might be a reasonable idea (though I'm not sure why browser implementations of DOMElement.matches wouldn't already be optimized that way).

Unfortunately, matches is on the hot path. Adding any branch on it will only make things "slightly" better. Browser implementations are optimized but when you start matching thousands of nodes & CSS selectors in a loop, it's obvious why the browser can't do it fast enough.

The changes I have made use a simple rule of doing the work once per parse instead of once per match. It can be changed to look less complex. It really isn't doing a whole lot.

Of course, the decision rests with you. I'd be happy to make any further changes.

You're probably using yarn or a very old npm to install your dependencies if you're getting duplicated modules like that.

No, actually, I am running npm v10.2.4. I just ran npm i and then npm run test.

marijnh · 2024-04-08T06:51:11Z

Benchmarking these changes by parsing and re-parsing the example document in the demo of the dev demo page, I don't see an actual noticeable speed improvement. Are you using any particularly complex selectors in your schema?

thecodrr · 2024-04-08T19:42:28Z

I don't see an actual noticeable speed improvement.

The speed difference is most noticeable on Chromium-based browsers, unfortunately. Here's a really simple down snippet that benchmarks the changes made in this PR:

const doc = document.createElement("div");
for (let i = 0; i < 100000; ++i) {
  const element = document.createElement("p");
  element.classList.add("something");
  doc.append(element);
}

console.time("creating set");
const set = new Set(doc.querySelectorAll("p.something").values());
const matchers = {
  ["p.something"]: (node) => set.has(node),
};
console.timeEnd("creating set");

const results = [];
for (let i = 0; i < 10; ++i) {
  const loopStart = performance.now();
  for (const node of doc.children) {
  }
  const loopEnd = performance.now() - loopStart;

  const matcherStart = performance.now();
  for (const node of doc.children) {
    matchers["p.something"](node);
  }
  const matcherEnd = performance.now() - matcherStart;

  const matchesStart = performance.now();
  for (const node of doc.children) {
    node.matches("p.something");
  }
  const matchesEnd = performance.now() - matchesStart;

  results.push({ loop: loopEnd, matcher: matcherEnd - loopEnd, matches: matchesEnd - loopEnd });
}

console.table(results);

I ran these in Google Chrome and Firefox, and got the following results:

Chrome:

Firefox:

This tries to exclude the time it takes to loop over the elements (not super accurately, though). The crux is that the speed difference is most significant on Chrome, and on Firefox it actually gets slower. However, if you include the cost of creating the set of nodes, this PR isn't really looking all that great (at least that's what independent benchmarking is showing). This is very different from what I am seeing after benchmarking ProseMirror with these changes which makes me wonder whether the performance difference is due to something else?

marijnh · 2024-04-08T19:50:04Z

I benchmarked both Firefox and Chrome. Neither showed a significant difference. I'm not interested in micro-benchmarks that just run one leaf function—there'd have to be a noticeable improvement in DOMParser.parse's time for this to be interesting.

thecodrr · 2024-04-08T19:52:39Z

How many nodes are in the document you benchmarked on? I'll run the benchmarks again. It's possible that I am doing something wrong/different because I am seeing 40% improvement in DOMParser.parse.

marijnh · 2024-04-09T10:12:15Z

I was parsing a 3000-node document in a loop for a few seconds, counting the amount of parses per second.

cache node matchers to improve tag matching performance

6526dc5

thecodrr changed the title ~~Cache node matchers to improve tag matching performance~~ Improve tag matching performance by 50% Apr 5, 2024

thecodrr changed the title ~~Improve tag matching performance by 50%~~ Improve parser performance by 50% Apr 5, 2024

fix tag name matcher not working

087fef7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve parser performance by 50% #79

Improve parser performance by 50% #79

thecodrr commented Apr 5, 2024 •

edited

Loading

thecodrr commented Apr 5, 2024

marijnh commented Apr 5, 2024

thecodrr commented Apr 5, 2024

marijnh commented Apr 8, 2024

thecodrr commented Apr 8, 2024 •

edited

Loading

marijnh commented Apr 8, 2024 •

edited

Loading

thecodrr commented Apr 8, 2024 •

edited

Loading

marijnh commented Apr 9, 2024

Improve parser performance by 50% #79

Are you sure you want to change the base?

Improve parser performance by 50% #79

Conversation

thecodrr commented Apr 5, 2024 • edited Loading

Benchmarks

Before:

After:

thecodrr commented Apr 5, 2024

marijnh commented Apr 5, 2024

thecodrr commented Apr 5, 2024

marijnh commented Apr 8, 2024

thecodrr commented Apr 8, 2024 • edited Loading

marijnh commented Apr 8, 2024 • edited Loading

thecodrr commented Apr 8, 2024 • edited Loading

marijnh commented Apr 9, 2024

thecodrr commented Apr 5, 2024 •

edited

Loading

thecodrr commented Apr 8, 2024 •

edited

Loading

marijnh commented Apr 8, 2024 •

edited

Loading

thecodrr commented Apr 8, 2024 •

edited

Loading