perf: cache the creation of the algorithm #47

H4ad · 2023-04-08T20:48:26Z

By submitting a PR to this repository, you agree to the terms within the Auth0 Code of Conduct. Please see the contributing guidelines for how to create and submit a high-quality PR for this repo.

Description

This library is used by node-jws, every time someone wants to validate a token, they call jwa to create the verify function and then discard the object.

Code usage reference

Because of this, I read the code of this library and I found that the object with sign and verify function is created every call and also runs a regex to get the algorithm and the bits.

The current performance is:

jwa(RS256) x 5,825,423 ops/sec ±1.12% (92 runs sampled)
jwa(RS384) x 6,216,852 ops/sec ±0.47% (94 runs sampled)
jwa(RS512) x 6,046,150 ops/sec ±1.25% (89 runs sampled)
jwa(PS256) x 4,306,111 ops/sec ±1.14% (93 runs sampled)
jwa(PS384) x 4,260,252 ops/sec ±1.14% (92 runs sampled)
jwa(PS512) x 3,976,296 ops/sec ±4.58% (87 runs sampled)
jwa(HS256) x 4,295,952 ops/sec ±0.87% (93 runs sampled)
jwa(HS384) x 4,225,687 ops/sec ±1.10% (89 runs sampled)
jwa(HS512) x 4,314,741 ops/sec ±1.32% (91 runs sampled)
jwa(ES256) x 4,166,067 ops/sec ±1.03% (89 runs sampled)
jwa(ES384) x 4,157,053 ops/sec ±1.17% (91 runs sampled)
jwa(ES512) x 4,167,795 ops/sec ±0.91% (90 runs sampled)

So, instead of creating it every time, I cache the objects with Object.freeze to prevent modification, and also use a object with the key being the hashes and the value being the cached objects, now the performance is:

jwa(RS256) x 1,044,750,439 ops/sec ±1.77% (90 runs sampled)
jwa(RS384) x 46,073,595 ops/sec ±2.94% (87 runs sampled)
jwa(RS512) x 48,740,542 ops/sec ±2.92% (87 runs sampled)
jwa(PS256) x 50,445,379 ops/sec ±2.11% (85 runs sampled)
jwa(PS384) x 50,930,005 ops/sec ±5.51% (85 runs sampled)
jwa(PS512) x 55,984,858 ops/sec ±1.34% (93 runs sampled)
jwa(HS256) x 59,485,338 ops/sec ±0.88% (93 runs sampled)
jwa(HS384) x 61,521,893 ops/sec ±0.90% (88 runs sampled)
jwa(HS512) x 62,314,092 ops/sec ±2.71% (86 runs sampled)
jwa(ES256) x 42,380,646 ops/sec ±3.55% (61 runs sampled)
jwa(ES384) x 40,491,232 ops/sec ±1.25% (90 runs sampled)
jwa(ES512) x 42,010,686 ops/sec ±1.52% (91 runs sampled)

Is an increase in the performance of almost 10x for all cases, also, we reduce to zero the garbage collection by reusing instead of allocating.

More about memory usage

I'm using isitfast, ignore the op/s which is not currently stable.

Before:

jwa(RS512) 1,845,018 op/s (542 ns) ±1% x2,500 | 248 kB ±2% x25
jwa(RS384) 6,211,180 op/s (161 ns) ±1% x2,500 | 232 kB ±2% x25
jwa(RS256) 12,345,679 op/s (81 ns) ±1% x2,500 | 232 kB ±2% x25
jwa(PS512) 1,173,709 op/s (852 ns) ±1% x2,500 | 360 kB ±1% x25
jwa(PS384) 4,524,887 op/s (221 ns) ±1% x2,500 | 360 kB ±1% x25
jwa(PS256) 2,314,815 op/s (432 ns) ±1% x2,500 | 360 kB ±1% x25
jwa(HS512) 4,761,905 op/s (210 ns) ±1% x2,500 | 360 kB ±1% x25
jwa(HS384) 2,169,197 op/s (461 ns) ±1% x2,500 | 360 kB ±1% x25
jwa(HS256) 6,211,180 op/s (161 ns) ±1% x2,500 | 320 kB ±2% x25
jwa(ES512) 5,000,000 op/s (200 ns) ±1% x2,500 | 544 kB ±0.9% x25
jwa(ES384) 4,975,124 op/s (201 ns) ±1% x2,500 | 544 kB ±0.9% x25
jwa(ES256) 6,211,180 op/s (161 ns) ±1% x2,500 | 544 kB ±0.9% x25

Now:

jwa(RS512) ∞ op/s (0 ns) ±0% x2,500 | 0 kB ±0% x25
jwa(RS384) ∞ op/s (0 ns) ±0% x2,500 | 0 kB ±0% x25
jwa(RS256) ∞ op/s (0 ns) ±0% x2,500 | 0 kB ±0% x25
jwa(PS512) ∞ op/s (0 ns) ±0% x2,500 | 0 kB ±0% x25
jwa(PS384) ∞ op/s (0 ns) ±0% x2,500 | 0 kB ±0% x25
jwa(PS256) ∞ op/s (0 ns) ±0% x2,500 | 0 kB ±0% x25
jwa(HS512) ∞ op/s (0 ns) ±0% x2,500 | 0 kB ±0% x25
jwa(HS384) ∞ op/s (0 ns) ±0% x2,500 | 0 kB ±0% x25
jwa(HS256) ∞ op/s (0 ns) ±0% x2,500 | 0 kB ±0% x25
jwa(ES512) ∞ op/s (0 ns) ±0% x2,500 | 0 kB ±0% x25
jwa(ES384) ∞ op/s (0 ns) ±0% x2,500 | 0 kB ±0% x25
jwa(ES256) ∞ op/s (0 ns) ±0% x2,500 | 0 kB ±0% x25

The RS256 with 1B op/s probably is caused by optimization of v8 bail-out after discovering the function could receive values other than RS256.

Testing

I didn't change the behavior, I only introduce a cache and freeze of the objects.

Object.freeze was introduced on NodeJS v0.10.0, so I don't think we will have some compatibility issues.

This change adds test coverage for new/changed/fixed functionality

Checklist

I have added documentation for new/changed functionality in this PR or in auth0.com/docs
All active GitHub checks for tests, formatting, and security are passing
The correct base branch is being used, if not the default branch

perf: cache the creation of the algorithm

a62b312

H4ad closed this Jul 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: cache the creation of the algorithm #47

perf: cache the creation of the algorithm #47

H4ad commented Apr 8, 2023

perf: cache the creation of the algorithm #47

perf: cache the creation of the algorithm #47

Conversation

H4ad commented Apr 8, 2023

Description

Testing

Checklist