Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[POC] [runtime performance] bottom up lazy issues/paths construction #3487

Merged
merged 12 commits into from
May 16, 2024

Conversation

jussisaurio
Copy link
Contributor

@jussisaurio jussisaurio commented May 11, 2024

Hi @colinhacks ,

I made a little POC targeting Zod's runtime performance with a similar approach that has been done in other parse-dont-validate style libraries where issue paths are only allocated when they are needed (i.e. when there actually are issues in parsing). This means paths don't need to be allocated in any successful parse cases, which is the common/hot path generally.

I'm not sure if this fits at all with your plans for the v4 version, but I did this mostly as an exercise, and if a polished version of it ends up being something that's feasible from your POV, then great!

Main changes:

  • _parse() now returns either OK(validResult) or INVALID(issues: IssueData[]); i.e. issues are not added to ctx but are returned along with the unsuccessful status value. Parent schemas will append the child issues to their own issues array recursively
  • Issue paths are lazily allocated. Parent parse functions receiving errors from children will add path to the child issues, i.e. if an error happens in key foo of an object, the parent will add path: ["foo"] to the child error. This happens recursively, so the parent's parent will then unshift e.g. path: ["bar", "foo"] later. As mentioned, this happens on demand so paths are only constructed when the parser encounters an error.
  • no ParseContext objects are allocated during parsing inside _parse() since they are not necessary
  • ParseStatus and the concept of dirty are removed; dirtiness is handled separately in ZodEffects to decide whether to continue with refinements. LMK if I've misunderstood and dirty is needed elsewhere
  • input is added to IssueData to continue supporting error maps operating on the parse input
  • Although ctx.addIssue() is not used in the internals, refinements continue to support the ctx.addIssue() API, which just pushes to the issues array under the hood

Some caveats / edge cases / breaking things:

  • The code is terrible spaghetti in some spots (especially when it comes to async parsing, refinements and handling some edge cases like prototype pollution etc etc) because I mainly wanted to get a POC done where the test set passes and I could concretely demonstrate the performance difference
  • I removed support for, and the test for "use path in refinement context", since paths are now on-demand constructed, and so, not available to the refinement context. This would be a breaking change ofc, but I couldn't figure out how to make that backwards compatible
  • The expected return values of a few tests (that assert on which issues are returned) changed; not sure yet if I actually broke something crucial

Performance:

Here's a performance comparison using the typescript-runtime-type-benchmarks library:

colinhacks/v4:

Running "parseSafe" suite...
Progress: 100%

  zod:
    718 254 ops/s, ±0.68%

Finished 1 case!
Running "parseStrict" suite...
Progress: 100%

  zod:
    661 232 ops/s, ±0.38%

Finished 1 case!
Running "assertLoose" suite...
Progress: 100%

  zod:
    653 851 ops/s, ±0.65%

Finished 1 case!
Running "assertStrict" suite...
Progress: 100%

  zod:
    664 157 ops/s, ±0.47%

jussisaurio/v4-lazy-issues:

Running "parseSafe" suite...
Progress: 100%

  zod:
    1 042 488 ops/s, ±7.67%   // <-- 45% faster

Finished 1 case!
Running "parseStrict" suite...
Progress: 100%

  zod:
    975 107 ops/s, ±0.13%   // <-- 46% faster

Finished 1 case!
Running "assertLoose" suite...
Progress: 100%

  zod:
    913 760 ops/s, ±3.97%   // <-- 40% faster

Finished 1 case!
Running "assertStrict" suite...
Progress: 100%

  zod:
    971 467 ops/s, ±0.24%   // <-- 46% faster

@colinhacks
Copy link
Owner

colinhacks commented May 14, 2024

This is really great stuff. I actually have this exact kind of change implemented in a perf branch, and it is indeed a big part of the v4 release. In fact I've been working on it over the past few days and it'll imminently get merged into v4.

The inability to support ctx.path in refinements is the reason I starting thinking about a new major version in the first place - otherwise I would have merged this as an optimization into Zod 3 long ago.

In my branch I've taken care to lazily initialize issues as well, when possible, which ekes out a little more perf. My branch also does a lot of perf work on the primitives.

I plan to clean it up my branch and merge into v4 in the next couple days. Since we've touched all the same code, it'll be tricky to merge our branches I'm afraid, but thanks for the great work on this, and I'll refer to this implementation as I wrap up my branch.


As a aside there's an issue with this exact approach that is quite subtle and took me a while to figure out. This line in ZodEffects#_parse can cause issues:

const value = inner.status === "valid" ? inner.value : input;

You can't assume input can be passed into the refinement here. The inner schema may have both a) transformed the data and b) thrown some non-fatal errors. In this case you will fall back to input, but the refinement being executed may expect a different type.

z
    .string()
    .transform((val) => val.length)
    .refine(() => false, { message: "always fails" })
    .refine(
      (val) => {
        console.log(`val`, val, typeof val);
        return (val ^ 2) > 10;
      } // should be number but it's a string
    )
    .parse("hello")

To get around this you need some way to attach data to an INVALID result, hence "dirty". But I never liked the "dirty" abstraction and I'll probably get rid of it too. This is all hard to conceptualize and of course I only discovered this by making the same mistake. And of course this wasn't caught by the test suite...

@colinhacks colinhacks closed this May 14, 2024
@colinhacks colinhacks reopened this May 14, 2024
@jussisaurio
Copy link
Contributor Author

jussisaurio commented May 14, 2024

That's awesome to hear! 🔥 Are you also addressing parseAsync in that branch?

You can't assume input can be passed into the refinement here. The inner schema may have both a) transformed the data and b) thrown some non-fatal errors. In this case you will fall back to input, but the refinement being executed may expect a different type.

I actually thought about this, but forgot about it at some point 😂

In my branch I've taken care to lazily initialize issues as well, when possible, which ekes out a little more perf.

I tried this as well, but it didn't make any meaningful difference in my tests 🤔

@colinhacks
Copy link
Owner

colinhacks commented May 14, 2024

Indeed, I was planning to do promise detection just as you implemented in your branch, but I haven't done it yet. The goal is to detect async refinements/transforms and reflect that in the inferred type. This would then "bubble up" the schema — a ZodObject containing a ZodType<Promise<string>> becomes async itself.

With this + promise detection, Zod could provide a single unified method that obviates the parse/safeParse/parseAsync/safeParseAsync distinction and returns T | ZodError (where T may be a Promise if any async effects exist). This would be a new method because I don't want to break any of the old ones. Sadly it will almost certainly be called validate unless someone convinces me otherwise 😅

I tried this as well, but it didn't make any meaningful difference in my tests 🤔

IIRC I think it only makes a significant difference on primitive parsing. There's already enough overhead in ZodObject that the impact is minimal. But I'll keep experimenting.

@colinhacks
Copy link
Owner

colinhacks commented May 14, 2024

So I'm gonna work off of this branch after all 😅 My perf branch is now over a year old and the rebase onto v4 is driving me insane. I'll cherry pick some of my other optimizations into this branch, and fix the input issue. @jussisaurio you're the best. Feel free to merge your Promise detection PR into this branch too.

Re: putting input into IssueData, Zod has a policy of not doing that so it doesn't get logged in sensitive environments. Though I'm thinking maybe Zod should do this in browsers & if NODE_ENV=development. I'll write up an RFC to see if this would be acceptable to the more security-minded Zod users, because including input would be a big DX win.

@jussisaurio
Copy link
Contributor Author

jussisaurio commented May 15, 2024

Awesome!

I won't be able to work on this in the upcoming days so in case you're actively going to be pushing to this branch, I went ahead and fixed the rebase conflicts between this branch and the promise detection branch, so feel free to cherry-pick this commit onto it if you like: jussisaurio@0f20140

Fair warning, there's a few test failures in it (because I don't unfortunately have time to fix them)

@colinhacks
Copy link
Owner

@jussisaurio Awesome, just cherrypicked & fixed the tests. Gonna merge this so I can start putting up more PRs against these changes.

Try not to rewrite Zod again in the next few days — I'm switching Zod to a monorepo so everything will be moving 🙃

@colinhacks colinhacks merged commit 4a4a9ca into colinhacks:v4 May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants