Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve encode performance #52

Closed
wants to merge 4 commits into from

Conversation

tmehnert
Copy link
Contributor

Hi. I have looked at issue #1 and find out that the current htmlEncode can be improved in performance. First i have tried to include html-entities, which will work but i was not brave enogh to include it, because it bundles a lot of stuff into the views bundle. Also htmlEncode is a public API of the library and i don't wanted to change it's produced encoding, because it might break user tests. So i ended up by creating a "own" implementation again, to speed things up.

Micro-Benchmarks

The speed improvement highly depends on the input and the underlying javascript engine. I have prepared some micro-benchmarks for you, that shows the new version should be faster. Just click to run it, these benchmarks will only run the htmlEncode function with some input.

Complate-cpp Benchmarks

I have also run complate-cpp rendering benchmarks with the current and the optimized implementation. I'm using a Intel i7-12700KF which runs an Ubuntu 20.04 on WSL2. All results are mean values of all iterations, i have taken best of five runs each, rounded to one decimal.

Test Current Optimized
QuickJS, no todos 92.7 us 60.0 us
QuickJS, 10 todos 1.2 ms 771.1 us
QuickJS, 100 todos 11.5 ms 7.5 ms
V8, no todos 10.7 us 9.1 us
V8, 10 todos 149.4 us 123.0 us
V8, 100 todos 1.44 ms 1.26 ms

To run the complate-cpp benchmarks by yourself on Linux, assuming you have the dependencies installed on your system.

mkdir build
cd build
cmake -DBUILD_TESTS=on ..
make
test/lib/complate-tests QuickJsRendererBenchmark
test/lib/complate-tests V8RendererBenchmark

Copy link
Contributor

@FND FND left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! I appreciate your thoughtful approach and agree that a third-party library would be undesirable.

src/html.js Outdated Show resolved Hide resolved
src/html.js Show resolved Hide resolved
@tmehnert tmehnert marked this pull request as draft November 30, 2021 22:33
Last version had some penalties, on the non-fastpath.
Also this implementation tends to be the stable in performance
across different javascript engines.
@tmehnert
Copy link
Contributor Author

tmehnert commented Dec 2, 2021

I have done a new version based on the algorithm of html-entities. I would really like to present you a easy oneliner, but it seems to that the loop based version deliver more stable performance across various engines.

Complate-Cpp Benchmarks

I ran the Benchmarks again with our new algorithms. Loop-Based is the one i have commited.

Test Without Fastpath Loop-Based
QuickJS, no todos 66.4 us 61.3 us
QuickJS, 10 todos 828.5 us 773.8.1 us
QuickJS, 100 todos 8.0 ms 7.5 ms
V8, no todos 9.6 us 9.0 us
V8, 10 todos 138.1 us 125.2 us
V8, 100 todos 1.37 ms 1.21 ms

I have also updated the online benchmark.

@tmehnert tmehnert marked this pull request as ready for review December 2, 2021 19:44
@tmehnert
Copy link
Contributor Author

tmehnert commented Dec 2, 2021

@FND what is your opinion on that?

Copy link
Contributor

@FND FND left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty solid!

I don't currently have sufficient brains left to fully grok all the details, but hoping to return to that soon. Relying on - and validating - an existing implementation's experience certainly seems like a good choice - especially in the face of an ever-changing engine landscape. Thanks again for your efforts and for sticking with it.

I'm happy to run further performance tests (→ Array#join) myself and also tweak some minor stylistic preferences (→ bouncer pattern, avoiding const because I like to buck the trend, that sorta thing) as soon as I get to it.

src/html.js Outdated
res += str.substring(lastIndex);
}
} else {
res = str;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I prefer the bouncer pattern to avoid excessive nesting, reduce action at a distance etc.

if(!match) {
    return str;
}

let rx = attribute ? /[&<>'"]/g : /[&<>]/g;

return res;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. yeah it would better match the nature of the function. If it would be able to speak it would say:

I have seen so many strings in my life, i expect there are no evill in it and then it's no more todo. Just take it.
But when i found something, then i have to do a bit more to protect you.
😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty solid!

Btw. I have to thank. You have created an amazing project (i really love JSX) and spend your time to let me part of it ❤️

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad you find it useful and are willing to contribute.

I should mention that I'm not currently using complate myself and thus might not always be able to provide a timely response. So just to be safe, you might wanna send a quick note (e.g. by creating an issue) before starting any larger efforts. Having said that, there are other people around here who are perfectly capable of providing feedback.

src/html.js Outdated
let res;
let match = rx.exec(str);
if(match) {
res = "";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It used to be the case that string concatenation was expensive, so people collected strings in an array which was eventually #joind. We might eke out a little more performance that way?

Copy link
Contributor Author

@tmehnert tmehnert Dec 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have seen so many unexpected results, that it would be always worth a try.
According to The Performance of JavaScript String Concat form 2017, i expect that we are not gaining performance as long as we are not throwing a whole html document in one single #encodeHtml call.

But it would be explainable that #join will outperform at some input, when we make three assumptions about an engine:

  • Array is fast.
  • Strings are stored in an immutable piece of memory.
  • Substrings are stored as a non owning reference to the original string.

Then just put the strings together will result in creating a new string, copy old content and deal with the old string which have to be deleted (or marked for it) on each concatination. Where on the #join strategy the result string will be created only once at the end, where the length of the underlying piece of memory is known.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... when we take the medium post, then might the #generateAttributes will profit from using string concatination instead of #join, becuase it's a small amount of short strings. I make a try on that idea.

Copy link
Contributor Author

@tmehnert tmehnert Dec 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... when we take the medium post, then might the #generateAttributes will profit from using string concatination instead of #join, becuase it's a small amount of short strings. I make a try on that idea.

Bad idea from me to enlarge the context. We should stick on our work at #encodeHtml first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like string concatenation is indeed faster these days: https://jsben.ch/VBdEz - I'm not entirely sure how representative that sample really is, but it seems to be a negligible micro-optimization either way.

naming, bouncer pattern, `let` consistency
@FND
Copy link
Contributor

FND commented Dec 7, 2021

As noted earlier, I've tweaked a few details in a5830ea (see commit message); if you're fine with that, I think this is ready to go.

@tmehnert
Copy link
Contributor Author

tmehnert commented Dec 7, 2021

Your change looks good to me, the names are much more better. I ran all my tests and all have passed. The bechmark shows a little drop on QuickJs (0.4ms @ 100 todos), not noticeable on other engines. But with what we achieved, this seems to me irrelevant.

I agree, let's take this nice profit. 👍

@FND
Copy link
Contributor

FND commented Dec 8, 2021

✅ Squashed as fb7fd6b and released as v0.16.10.

I'm surprised there's a difference to the previous version, but that's just me idly wondering whether that's just chance - but yeah, negligible either way.

Thank you.

@FND FND closed this Dec 8, 2021
@tmehnert tmehnert deleted the improve-encode-performance branch December 8, 2021 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants