-
Notifications
You must be signed in to change notification settings - Fork 228
Parsing and tokenizing large views thrashes CPU, Memory & GC #635
Comments
/cc @Eilon this is exactly what we were discussing. |
The proposed fixes here: Rename HTML-span based tokens should have a pointer to the source and a range (start, end) of the document. We can't afford to create copies of the HTML regions of the page as they are numerous, arbitrarily large and don't have a ton of duplication. If we can't do this without buffering (because of concurrency, multiple passes on the input), when we might want to substring here, but with de-duplication. For CSharp-span based tokens, they will be extremely repetitive. We should dedupe these as well. We should also consider creating a token-stream data structure, and storing in syntax nodes the range of tokens, rather than an explicit list. For building of spans, it's the wrong approach to There's some further reading here about Roslyn's implementation: The red-green tree discussion is potentially interesting if we want to do incremental parsing for editor scenarios, but in general we need to solve a different problem. They have a lot of text that's punctuation, or very repetitive. |
Updated with some more info |
Cool. Nicholas will take care of these! |
Not sure whether this is part of the same bug but Razor makes extensive use of |
@rynowak had a great suggestion to optimize our usage of
The problem with 2 is with how the parser is written today logic code does not always know when/if it's creating a So I decided to attempt 1. In my branch I was able to get Razor into a runnable state for mainline scenarios and profiled cached Overall the change is 100% positive and enforces Razor's data structures to be more accurate; however, with RTM around the corner, optimizing @rynowak and I talked and we suggest pushing this out to post RTM, BUT should definitely be done because it is an extremely positive/enforcing change. The 4% that we see now would also become more apparent/larger as other parts of the parser are optimized. /cc @Eilon |
To add more to this, one of the best reasons to do this change in the future is that it will fix bugs we have and prevent us from re-introducing those bugs. Agreed that this is probably more of a change that we want to take right now given the time it would take us to react to any bugs we'd introduce and the difficulty of testing the impact on editor/design-time. |
@NTaylorMullen , @rynowak I see that this was merged with another issue. Can you clarify for me... will large views (like those with datatables) still have trouble being parsed as described in this original post or has that part of the issue been resolved |
We're still working on this. Expect to see some concrete improvement in the next week or so. All of the discussion here has fundamentally been about strategies to address the performance issues of parsing large pages. We investigated one avenue to address the issue and decided not to take that particular fix for this release. |
@rynowak from your previous comment it wasn't clear to me if you want this for RC2, RTM, or post-RTM? |
@Eilon - we should do the proposed design change for the tokenizer post RTM. |
@NTaylorMullen and I are breaking up the items and working through them. Here's a baseline (15 iterations doing codegen on MSN.cshtml) CoreCLR x64: Partial fix - 6a4a954 - Removing copies of empty RazorError[] - This is about 50mb |
- The Equals operators were boxing the symbol types like crazy, added an abstract `SymbolTypeEquals` to avoid this. #635
- The Equals operators were boxing the symbol types like crazy, added an abstract `SymbolTypeEquals` to avoid this. #635
Ok so do whatever we need now via this issue, and log a separate issue for whatever we want to do post-RTM. |
We were able to reduce allocations by 42% and have filed the additional follow up issues to reduce it further: #674 |
I'm running RC1-Final. Pages are served fine, but with a large view, the server gets so bogged down, apparently trying to parse all the tags, that the server complains and I end up with HTTP errors at times.
In this instance, I'm serving a view with no razor "@" tags and no kind of inline code. Aside from two "section" tags, it's purely HTML and Javascript. The view defines several data tables and is 188kb in size. Here is what the start and end of requests look like in the VS performance tools (red arrows are start of request and end of request):
Memory snapshot before the page request
Memory snapshot during the page request
Memory snapshot after the page request
Any ideas how to mitigate this beyond "write smaller views with fewer tags to parse" ?
The text was updated successfully, but these errors were encountered: