Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize object proc-macro codegen #1470

Merged
merged 8 commits into from
Feb 14, 2024

Conversation

valkum
Copy link
Contributor

@valkum valkum commented Feb 5, 2024

While investigating a stack overflow in our debug builds, we noticed that the generated resolve_field async function is rather large with a lot of await points. In our case, we have ~100 mutations in the mutation root container which results in a function with over 7k lines of code.
The current code generates per field (mutation): one if statement, one capturing async closure with at least 1 async call, and three glue calls.
Large functions are bad for LLVM and can slow down compilations substantially. LLVM optimization takes quadratic time in function length (according to @jyn514), see GREsau/schemars#246

This PR changes things a bit up. This is currently limited to the object macro, but we can bring these improvements to other macros as well.

Differences:

  • Enum field matcher. The idea is taken from serde. Instead of doing string comparisons in each if statement, a new enum is generated with a variant for each field. The field variant is matched at the beginning. The aim of this optimization is improved matching speeds.
  • Split out functions. The inlined blocks of the current if statements are moved into separate functions. This saves the most lines of code. We hoped this would reduce the size of the generated type for the resolve_field function, but we can't prove that right now.
  • Added some tests that are not covered currently, and hinder other optimizations I tried, but failed for our production code.

The lines of code of a typical resolve_field are now more than 4 times less than before at the cost of a function call.

The generated type for the resolve_field still has N Suspend variants (one for each field), and while the await context is packed for each variant, we still want to try to reduce the Suspend variants because there is only ever one that is really used. This PR is basically a building block to further experiments with different optimizations. For example, reducing the await points by moving the glue function into the generated enum variants and removing the huge match block in resolve_field.

We can hide this behind a feature flag to make this opt-in before making it stable, if you like.

Also, I took the liberty to split out some codegen functions to have a better split between IR and output TokenStream. This makes it easier to debug and work with the macro.

@djc
Copy link
Contributor

djc commented Feb 12, 2024

@sunli829 friendly ping?

@sunli829
Copy link
Collaborator

Thank you very much, and I look forward to your help improving other macros if possible. 🙂

@sunli829 sunli829 merged commit a71ec3a into async-graphql:master Feb 14, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants