Optimize object proc-macro codegen #1470
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
While investigating a stack overflow in our debug builds, we noticed that the generated
resolve_field
async function is rather large with a lot of await points. In our case, we have ~100 mutations in the mutation root container which results in a function with over 7k lines of code.The current code generates per field (mutation): one if statement, one capturing async closure with at least 1 async call, and three glue calls.
Large functions are bad for LLVM and can slow down compilations substantially. LLVM optimization takes quadratic time in function length (according to @jyn514), see GREsau/schemars#246
This PR changes things a bit up. This is currently limited to the object macro, but we can bring these improvements to other macros as well.
Differences:
enum
is generated with a variant for each field. The field variant is matched at the beginning. The aim of this optimization is improved matching speeds.resolve_field
function, but we can't prove that right now.The lines of code of a typical
resolve_field
are now more than 4 times less than before at the cost of a function call.The generated type for the
resolve_field
still has NSuspend
variants (one for each field), and while the await context is packed for each variant, we still want to try to reduce theSuspend
variants because there is only ever one that is really used. This PR is basically a building block to further experiments with different optimizations. For example, reducing theawait
points by moving the glue function into the generatedenum
variants and removing the hugematch
block inresolve_field
.We can hide this behind a feature flag to make this opt-in before making it stable, if you like.
Also, I took the liberty to split out some codegen functions to have a better split between IR and output TokenStream. This makes it easier to debug and work with the macro.