[POC] [runtime performance] bottom up lazy issues/paths construction #3487

jussisaurio · 2024-05-11T15:41:23Z

I made a little POC targeting Zod's runtime performance with a similar approach that has been done in other parse-dont-validate style libraries where issue paths are only allocated when they are needed (i.e. when there actually are issues in parsing). This means paths don't need to be allocated in any successful parse cases, which is the common/hot path generally.

I'm not sure if this fits at all with your plans for the v4 version, but I did this mostly as an exercise, and if a polished version of it ends up being something that's feasible from your POV, then great!

Main changes:

_parse() now returns either OK(validResult) or INVALID(issues: IssueData[]); i.e. issues are not added to ctx but are returned along with the unsuccessful status value. Parent schemas will append the child issues to their own issues array recursively
Issue paths are lazily allocated. Parent parse functions receiving errors from children will add path to the child issues, i.e. if an error happens in key foo of an object, the parent will add path: ["foo"] to the child error. This happens recursively, so the parent's parent will then unshift e.g. path: ["bar", "foo"] later. As mentioned, this happens on demand so paths are only constructed when the parser encounters an error.
no ParseContext objects are allocated during parsing inside _parse() since they are not necessary
ParseStatus and the concept of dirty are removed; dirtiness is handled separately in ZodEffects to decide whether to continue with refinements. LMK if I've misunderstood and dirty is needed elsewhere
input is added to IssueData to continue supporting error maps operating on the parse input
Although ctx.addIssue() is not used in the internals, refinements continue to support the ctx.addIssue() API, which just pushes to the issues array under the hood

Some caveats / edge cases / breaking things:

The code is terrible spaghetti in some spots (especially when it comes to async parsing, refinements and handling some edge cases like prototype pollution etc etc) because I mainly wanted to get a POC done where the test set passes and I could concretely demonstrate the performance difference
I removed support for, and the test for "use path in refinement context", since paths are now on-demand constructed, and so, not available to the refinement context. This would be a breaking change ofc, but I couldn't figure out how to make that backwards compatible
The expected return values of a few tests (that assert on which issues are returned) changed; not sure yet if I actually broke something crucial

Performance:

Here's a performance comparison using the typescript-runtime-type-benchmarks library:

colinhacks/v4:

Running "parseSafe" suite...
Progress: 100%

  zod:
    718 254 ops/s, ±0.68%

Finished 1 case!
Running "parseStrict" suite...
Progress: 100%

  zod:
    661 232 ops/s, ±0.38%

Finished 1 case!
Running "assertLoose" suite...
Progress: 100%

  zod:
    653 851 ops/s, ±0.65%

Finished 1 case!
Running "assertStrict" suite...
Progress: 100%

  zod:
    664 157 ops/s, ±0.47%

jussisaurio/v4-lazy-issues:

Running "parseSafe" suite...
Progress: 100%

  zod:
    1 042 488 ops/s, ±7.67%   // <-- 45% faster

Finished 1 case!
Running "parseStrict" suite...
Progress: 100%

  zod:
    975 107 ops/s, ±0.13%   // <-- 46% faster

Finished 1 case!
Running "assertLoose" suite...
Progress: 100%

  zod:
    913 760 ops/s, ±3.97%   // <-- 40% faster

Finished 1 case!
Running "assertStrict" suite...
Progress: 100%

  zod:
    971 467 ops/s, ±0.24%   // <-- 46% faster

colinhacks · 2024-05-14T00:54:15Z

This is really great stuff. I actually have this exact kind of change implemented in a perf branch, and it is indeed a big part of the v4 release. In fact I've been working on it over the past few days and it'll imminently get merged into v4.

The inability to support ctx.path in refinements is the reason I starting thinking about a new major version in the first place - otherwise I would have merged this as an optimization into Zod 3 long ago.

In my branch I've taken care to lazily initialize issues as well, when possible, which ekes out a little more perf. My branch also does a lot of perf work on the primitives.

I plan to clean it up my branch and merge into v4 in the next couple days. Since we've touched all the same code, it'll be tricky to merge our branches I'm afraid, but thanks for the great work on this, and I'll refer to this implementation as I wrap up my branch.

As a aside there's an issue with this exact approach that is quite subtle and took me a while to figure out. This line in ZodEffects#_parse can cause issues:

const value = inner.status === "valid" ? inner.value : input;

You can't assume input can be passed into the refinement here. The inner schema may have both a) transformed the data and b) thrown some non-fatal errors. In this case you will fall back to input, but the refinement being executed may expect a different type.

z
    .string()
    .transform((val) => val.length)
    .refine(() => false, { message: "always fails" })
    .refine(
      (val) => {
        console.log(`val`, val, typeof val);
        return (val ^ 2) > 10;
      } // should be number but it's a string
    )
    .parse("hello")

To get around this you need some way to attach data to an INVALID result, hence "dirty". But I never liked the "dirty" abstraction and I'll probably get rid of it too. This is all hard to conceptualize and of course I only discovered this by making the same mistake. And of course this wasn't caught by the test suite...

jussisaurio · 2024-05-14T07:03:15Z

That's awesome to hear! 🔥 Are you also addressing parseAsync in that branch?

You can't assume input can be passed into the refinement here. The inner schema may have both a) transformed the data and b) thrown some non-fatal errors. In this case you will fall back to input, but the refinement being executed may expect a different type.

I actually thought about this, but forgot about it at some point 😂

In my branch I've taken care to lazily initialize issues as well, when possible, which ekes out a little more perf.

I tried this as well, but it didn't make any meaningful difference in my tests 🤔

colinhacks · 2024-05-14T19:11:34Z

Indeed, I was planning to do promise detection just as you implemented in your branch, but I haven't done it yet. The goal is to detect async refinements/transforms and reflect that in the inferred type. This would then "bubble up" the schema — a ZodObject containing a ZodType<Promise<string>> becomes async itself.

With this + promise detection, Zod could provide a single unified method that obviates the parse/safeParse/parseAsync/safeParseAsync distinction and returns T | ZodError (where T may be a Promise if any async effects exist). This would be a new method because I don't want to break any of the old ones. Sadly it will almost certainly be called validate unless someone convinces me otherwise 😅

I tried this as well, but it didn't make any meaningful difference in my tests 🤔

IIRC I think it only makes a significant difference on primitive parsing. There's already enough overhead in ZodObject that the impact is minimal. But I'll keep experimenting.

colinhacks · 2024-05-14T19:55:24Z

So I'm gonna work off of this branch after all 😅 My perf branch is now over a year old and the rebase onto v4 is driving me insane. I'll cherry pick some of my other optimizations into this branch, and fix the input issue. @jussisaurio you're the best. Feel free to merge your Promise detection PR into this branch too.

Re: putting input into IssueData, Zod has a policy of not doing that so it doesn't get logged in sensitive environments. Though I'm thinking maybe Zod should do this in browsers & if NODE_ENV=development. I'll write up an RFC to see if this would be acceptable to the more security-minded Zod users, because including input would be a big DX win.

jussisaurio · 2024-05-15T17:08:49Z

Awesome!

I won't be able to work on this in the upcoming days so in case you're actively going to be pushing to this branch, I went ahead and fixed the rebase conflicts between this branch and the promise detection branch, so feel free to cherry-pick this commit onto it if you like: jussisaurio@0f20140

Fair warning, there's a few test failures in it (because I don't unfortunately have time to fix them)

colinhacks · 2024-05-15T20:43:07Z

@jussisaurio Awesome, just cherrypicked & fixed the tests. Gonna merge this so I can start putting up more PRs against these changes.

Try not to rewrite Zod again in the next few days — I'm switching Zod to a monorepo so everything will be moving 🙃

wip

144a2dc

jussisaurio force-pushed the v4-lazy-issues branch from 15782aa to 144a2dc Compare May 11, 2024 16:23

jussisaurio mentioned this pull request May 12, 2024

Huge memory and performance gap between parse and parseAsync #3446

Open

colinhacks closed this May 14, 2024

colinhacks reopened this May 14, 2024

colinhacks added 6 commits May 14, 2024 13:43

INVALID -> ZodFailure, fix input issue

53fffc9

Add new benchmarks

76a627a

Switch away from Ok<T> wrapper

0d9bbee

ZodString optimizations

4751159

playground.mts

9d4d252

Remove ZodString opt, update benchmarks

776019a

colinhacks force-pushed the v4-lazy-issues branch from daf4da1 to 776019a Compare May 15, 2024 06:11

jussisaurio and others added 2 commits May 15, 2024 11:56

Detect promises in parsing instead of using ctx.async

2646299

Fix tests

053be93

colinhacks added 3 commits May 15, 2024 16:35

Switch benchmarks to tinybench

1f29160

Update benchmarks

a8b023f

Updates

ec4e64a

colinhacks merged commit 4a4a9ca into colinhacks:v4 May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[POC] [runtime performance] bottom up lazy issues/paths construction #3487

[POC] [runtime performance] bottom up lazy issues/paths construction #3487

jussisaurio commented May 11, 2024 •

edited

Loading

colinhacks commented May 14, 2024 •

edited

Loading

jussisaurio commented May 14, 2024 •

edited

Loading

colinhacks commented May 14, 2024 •

edited

Loading

colinhacks commented May 14, 2024 •

edited

Loading

jussisaurio commented May 15, 2024 •

edited

Loading

colinhacks commented May 15, 2024

[POC] [runtime performance] bottom up lazy issues/paths construction #3487

[POC] [runtime performance] bottom up lazy issues/paths construction #3487

Conversation

jussisaurio commented May 11, 2024 • edited Loading

colinhacks commented May 14, 2024 • edited Loading

jussisaurio commented May 14, 2024 • edited Loading

colinhacks commented May 14, 2024 • edited Loading

colinhacks commented May 14, 2024 • edited Loading

jussisaurio commented May 15, 2024 • edited Loading

colinhacks commented May 15, 2024

jussisaurio commented May 11, 2024 •

edited

Loading

colinhacks commented May 14, 2024 •

edited

Loading

jussisaurio commented May 14, 2024 •

edited

Loading

colinhacks commented May 14, 2024 •

edited

Loading

colinhacks commented May 14, 2024 •

edited

Loading

jussisaurio commented May 15, 2024 •

edited

Loading