Optimize object proc-macro codegen #1470

valkum · 2024-02-05T13:31:34Z

While investigating a stack overflow in our debug builds, we noticed that the generated resolve_field async function is rather large with a lot of await points. In our case, we have ~100 mutations in the mutation root container which results in a function with over 7k lines of code.
The current code generates per field (mutation): one if statement, one capturing async closure with at least 1 async call, and three glue calls.
Large functions are bad for LLVM and can slow down compilations substantially. LLVM optimization takes quadratic time in function length (according to @jyn514), see GREsau/schemars#246

This PR changes things a bit up. This is currently limited to the object macro, but we can bring these improvements to other macros as well.

Differences:

Enum field matcher. The idea is taken from serde. Instead of doing string comparisons in each if statement, a new enum is generated with a variant for each field. The field variant is matched at the beginning. The aim of this optimization is improved matching speeds.
Split out functions. The inlined blocks of the current if statements are moved into separate functions. This saves the most lines of code. We hoped this would reduce the size of the generated type for the resolve_field function, but we can't prove that right now.
Added some tests that are not covered currently, and hinder other optimizations I tried, but failed for our production code.

The lines of code of a typical resolve_field are now more than 4 times less than before at the cost of a function call.

The generated type for the resolve_field still has N Suspend variants (one for each field), and while the await context is packed for each variant, we still want to try to reduce the Suspend variants because there is only ever one that is really used. This PR is basically a building block to further experiments with different optimizations. For example, reducing the await points by moving the glue function into the generated enum variants and removing the huge match block in resolve_field.

We can hide this behind a feature flag to make this opt-in before making it stable, if you like.

Also, I took the liberty to split out some codegen functions to have a better split between IR and output TokenStream. This makes it easier to debug and work with the macro.

This allows us to introduce object scoped structs and enums

djc · 2024-02-12T09:53:55Z

@sunli829 friendly ping?

sunli829 · 2024-02-14T02:05:56Z

Thank you very much, and I look forward to your help improving other macros if possible. 🙂

valkum added 8 commits February 3, 2024 00:27

extract some generation parts into freestanding functions

b4f938f

some hygiene in object macro

3b4f66c

store flattened resolvers in different vec

efbe3d9

hide object derive implementations in const eval block

545019b

This allows us to introduce object scoped structs and enums

change resolver_field if stmts to derived Field enum for matching

38e077a

avoid capturing async closures in generated resolve_fields for Object

0ca6db5

add impl dyn trait support to object tests

5a912d3

add test for optional output with try operator

eb57620

djc approved these changes Feb 5, 2024

View reviewed changes

sunli829 merged commit a71ec3a into async-graphql:master Feb 14, 2024
7 checks passed

sunli829 mentioned this pull request Feb 14, 2024

Use impl Future instead of async-trait in most traits. #1468

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize object proc-macro codegen #1470

Optimize object proc-macro codegen #1470

valkum commented Feb 5, 2024

djc commented Feb 12, 2024

sunli829 commented Feb 14, 2024

Optimize object proc-macro codegen #1470

Optimize object proc-macro codegen #1470

Conversation

valkum commented Feb 5, 2024

djc commented Feb 12, 2024

sunli829 commented Feb 14, 2024