Parsing and tokenizing large views thrashes CPU, Memory & GC #635

SimonOrdo · 2015-12-09T19:52:11Z

I'm running RC1-Final. Pages are served fine, but with a large view, the server gets so bogged down, apparently trying to parse all the tags, that the server complains and I end up with HTTP errors at times.

In this instance, I'm serving a view with no razor "@" tags and no kind of inline code. Aside from two "section" tags, it's purely HTML and Javascript. The view defines several data tables and is 188kb in size. Here is what the start and end of requests look like in the VS performance tools (red arrows are start of request and end of request):

Memory snapshot before the page request

Memory snapshot during the page request

Memory snapshot after the page request

Any ideas how to mitigate this beyond "write smaller views with fewer tags to parse" ?

rynowak · 2015-12-09T19:53:55Z

/cc @Eilon this is exactly what we were discussing.

rynowak · 2015-12-09T19:56:40Z

The proposed fixes here:

Rename ISymbol to IToken - because these are tokens not symbols** Consider using an abstract base class here because there's going to plenty of functionality that's language-agnostic.

HTML-span based tokens should have a pointer to the source and a range (start, end) of the document. We can't afford to create copies of the HTML regions of the page as they are numerous, arbitrarily large and don't have a ton of duplication. If we can't do this without buffering (because of concurrency, multiple passes on the input), when we might want to substring here, but with de-duplication.

For CSharp-span based tokens, they will be extremely repetitive. We should dedupe these as well.

We should also consider creating a token-stream data structure, and storing in syntax nodes the range of tokens, rather than an explicit list.

For building of spans, it's the wrong approach to ToString() and then write it out. The span should hold on to lower-level constructs like tokens or nodes, and ask those items to write themselves out to a TextWriter. https://github.com/aspnet/Razor/blob/dev/src/Microsoft.AspNet.Razor/Chunks/ChunkTreeBuilder.cs#L144

There's some further reading here about Roslyn's implementation:
http://ericlippert.com/2012/06/08/red-green-trees/
https://roslyn.codeplex.com/discussions/541953

The red-green tree discussion is potentially interesting if we want to do incremental parsing for editor scenarios, but in general we need to solve a different problem. They have a lot of text that's punctuation, or very repetitive.

rynowak · 2015-12-29T00:00:29Z

Updated with some more info

Eilon · 2015-12-30T19:21:32Z

@rynowak how related is this to #614?

rynowak · 2015-12-30T19:53:01Z

@rynowak how related is this to #614?

They are both razor bugs we need to fix 👍 These are both issues you encounter with trying to use large razor pages that are usually completely blocking.

Eilon · 2015-12-30T22:36:49Z

Cool. Nicholas will take care of these!

dougbu · 2016-01-03T03:58:14Z

For building of spans, it's the wrong approach to ToString() and then write it out

Not sure whether this is part of the same bug but Razor makes extensive use of ToString(). Mostly this extra complication improves the debugging experience, eases testing, or supports tracing. But CSharpCodeWriter writes out literals char-by-char, calling ToString() on each. Would be much more efficient for the underlying CodeWriter to expose a Write(char) method.

NTaylorMullen · 2016-01-14T21:03:59Z

@rynowak had a great suggestion to optimize our usage of HtmlSymbols and take a more Roslyn approach when creating them (caching them). When working on the issue I discovered that Razor's SyntaxTree has gaps in it. Meaning you could have a SyntaxTreeNode spanning 0-10 of the document and then its sibling spanning 12-20 (leaving a gap 10-12). In order to cache HtmlSymbols I had to decouple them from their positions and errors and instead rely on their parents positions. I had to decide between two approaches:

Change the Razor parser to not create gaps in the SyntaxTree. This would also make the SyntaxTree more accurate.
Have the parser knowingly create SyntaxTreeNodes with a specified position (and leave the gaps).

The problem with 2 is with how the parser is written today logic code does not always know when/if it's creating a Span. Sooo to make it create Spans with specific positions it'd be a muchhhh larger re-design.

So I decided to attempt 1. In my branch I was able to get Razor into a runnable state for mainline scenarios and profiled cached HtmlSymbols vs non-cached for MSN's home page (very large). In the end the cached HtmlSymbols resulted in a 4% reduction in allocations.

Overall the change is 100% positive and enforces Razor's data structures to be more accurate; however, with RTM around the corner, optimizing HtmlSymbols for a 4% gain would be a very risky change! I had to touch nearly every part of the parser and would need to touch more if I were to fix all the edge case scenarios in my branch. Assuming I fixed every test in the Razor repo I'd be confident in our runtime / design time tests coverage. However, we lack very thorough integration tests, ones that'd be required for a drastic parser change. Also, tooling has expectations on the format of the SyntaxTree that we'd be invalidating.

@rynowak and I talked and we suggest pushing this out to post RTM, BUT should definitely be done because it is an extremely positive/enforcing change. The 4% that we see now would also become more apparent/larger as other parts of the parser are optimized.

/cc @Eilon

rynowak · 2016-01-14T21:16:04Z

RTM, BUT should definitely be done because it is an extremely positive/enforcing change. The 4% that we see now would also become more apparent/larger as other parts of the parser are optimized.

To add more to this, one of the best reasons to do this change in the future is that it will fix bugs we have and prevent us from re-introducing those bugs.

Agreed that this is probably more of a change that we want to take right now given the time it would take us to react to any bugs we'd introduce and the difficulty of testing the impact on editor/design-time.

SimonOrdo · 2016-01-14T21:16:25Z

@NTaylorMullen , @rynowak I see that this was merged with another issue. Can you clarify for me... will large views (like those with datatables) still have trouble being parsed as described in this original post or has that part of the issue been resolved

rynowak · 2016-01-14T21:21:29Z

will large views (like those with datatables) still have trouble being parsed as described in this original post

We're still working on this. Expect to see some concrete improvement in the next week or so.

All of the discussion here has fundamentally been about strategies to address the performance issues of parsing large pages. We investigated one avenue to address the issue and decided not to take that particular fix for this release.

Eilon · 2016-01-14T21:33:43Z

@rynowak from your previous comment it wasn't clear to me if you want this for RC2, RTM, or post-RTM?

rynowak · 2016-01-14T23:07:29Z

@Eilon - we should do the proposed design change for the tokenizer post RTM.

rynowak · 2016-01-14T23:09:17Z

@NTaylorMullen and I are breaking up the items and working through them.

Here's a baseline (15 iterations doing codegen on MSN.cshtml) CoreCLR x64:

Partial fix - 6a4a954 - Removing copies of empty RazorError[] - This is about 50mb

- The Equals operators were boxing the symbol types like crazy, added an abstract `SymbolTypeEquals` to avoid this. #635

rynowak · 2016-01-15T01:27:50Z

ffdf5d2

rynowak · 2016-01-15T01:28:26Z

95ea4cc

NTaylorMullen · 2016-01-15T01:48:14Z

7a0a9c6

rynowak · 2016-01-15T18:38:19Z

08fef95

Eilon · 2016-01-15T18:46:22Z

Ok so do whatever we need now via this issue, and log a separate issue for whatever we want to do post-RTM.

rynowak · 2016-01-15T19:10:12Z

315d79f

#635

NTaylorMullen · 2016-01-15T23:48:13Z

1ce9180

rynowak · 2016-01-20T17:40:57Z

e68c55a

NTaylorMullen · 2016-01-23T01:12:10Z

We were able to reduce allocations by 42% and have filed the additional follow up issues to reduce it further:

#674
https://github.com/aspnet/Razor/issues/675
https://github.com/aspnet/Razor/issues/676

Eilon added Perf 1 - Ready labels Dec 9, 2015

Eilon added this to the 4.0.0-rc2 milestone Dec 9, 2015

Eilon assigned NTaylorMullen Dec 9, 2015

NTaylorMullen mentioned this issue Jan 5, 2016

Change string rendering to be chunked. #652

Merged

NTaylorMullen added 2 - Working and removed 1 - Ready labels Jan 7, 2016

NTaylorMullen added a commit that referenced this issue Jan 15, 2016

Remove boxing of symbol types.

216ebeb

- The Equals operators were boxing the symbol types like crazy, added an abstract `SymbolTypeEquals` to avoid this. #635

NTaylorMullen mentioned this issue Jan 15, 2016

Remove boxing of symbol types. #664

Merged

NTaylorMullen added a commit that referenced this issue Jan 15, 2016

Remove boxing of symbol types.

7a0a9c6

- The Equals operators were boxing the symbol types like crazy, added an abstract `SymbolTypeEquals` to avoid this. #635

dougbu mentioned this issue Jan 15, 2016

Make the CodeWriter more efficient #667

Closed

NTaylorMullen added a commit that referenced this issue Jan 15, 2016

Prevent unneeded allocations of Block children enumerators.

b7cfc4e

#635

NTaylorMullen mentioned this issue Jan 15, 2016

Prevent unneeded allocations of Block children enumerators. #669

Merged

NTaylorMullen added a commit that referenced this issue Jan 15, 2016

Prevent unneeded allocations of Block children enumerators.

f576b5c

#635

NTaylorMullen added a commit that referenced this issue Jan 15, 2016

Prevent unneeded allocations of Block children enumerators.

1ce9180

#635

NTaylorMullen added 3 - Done and removed 2 - Working labels Jan 23, 2016

NTaylorMullen closed this as completed Jan 23, 2016

rynowak mentioned this issue May 24, 2016

Performance Scenario: Razor Code Generation aspnet/Mvc#4177

Closed

2 tasks

NTaylorMullen mentioned this issue Dec 14, 2018

Cache Razor Tokens. dotnet/aspnetcore#4985

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing and tokenizing large views thrashes CPU, Memory & GC #635

Parsing and tokenizing large views thrashes CPU, Memory & GC #635

SimonOrdo commented Dec 9, 2015

rynowak commented Dec 9, 2015

rynowak commented Dec 9, 2015

rynowak commented Dec 29, 2015

Eilon commented Dec 30, 2015

rynowak commented Dec 30, 2015

Eilon commented Dec 30, 2015

dougbu commented Jan 3, 2016

NTaylorMullen commented Jan 14, 2016

rynowak commented Jan 14, 2016

SimonOrdo commented Jan 14, 2016

rynowak commented Jan 14, 2016

Eilon commented Jan 14, 2016

rynowak commented Jan 14, 2016

rynowak commented Jan 14, 2016

rynowak commented Jan 15, 2016

rynowak commented Jan 15, 2016

NTaylorMullen commented Jan 15, 2016

rynowak commented Jan 15, 2016

Eilon commented Jan 15, 2016

rynowak commented Jan 15, 2016

NTaylorMullen commented Jan 15, 2016

rynowak commented Jan 20, 2016

NTaylorMullen commented Jan 23, 2016

Parsing and tokenizing large views thrashes CPU, Memory & GC #635

Parsing and tokenizing large views thrashes CPU, Memory & GC #635

Comments

SimonOrdo commented Dec 9, 2015

rynowak commented Dec 9, 2015

rynowak commented Dec 9, 2015

rynowak commented Dec 29, 2015

Eilon commented Dec 30, 2015

rynowak commented Dec 30, 2015

Eilon commented Dec 30, 2015

dougbu commented Jan 3, 2016

NTaylorMullen commented Jan 14, 2016

rynowak commented Jan 14, 2016

SimonOrdo commented Jan 14, 2016

rynowak commented Jan 14, 2016

Eilon commented Jan 14, 2016

rynowak commented Jan 14, 2016

rynowak commented Jan 14, 2016

rynowak commented Jan 15, 2016

rynowak commented Jan 15, 2016

NTaylorMullen commented Jan 15, 2016

rynowak commented Jan 15, 2016

Eilon commented Jan 15, 2016

rynowak commented Jan 15, 2016

NTaylorMullen commented Jan 15, 2016

rynowak commented Jan 20, 2016

NTaylorMullen commented Jan 23, 2016