Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The biggest source of
all evilparsing allocations isBaseTokenizer.FlushBuffer
:It's not easy to eliminate allocations here, especially when the public API is designed in a way that forces the tokenizer to allocate strings. However, there's some low-hanging fruit that can be taken care of. I've temporary introduced a few intermediate methods to see what exactly this memory is allocated for:
This PR addresses the 18.4% caused by allocating tag names over and over again. Since HTML tag names almost never deviate from a well-known set, we can cache one instance of the string per known tag and reuse it. In order to make the cache check as fast as possible, the lookup code has been pre-generated to account for all the tags from
TagNames
fields (the generator can be found here). As the result of this change, those 18.4% ofFlushBuffer
allocations are almost gone (some unique tags happen after all, but they are quite rare):Optimising away ~18% of a method that's responsible for ~24% allocations is probably not that nice in a grand scheme of things, but that's better than nothing, right? =) And as an additional bonus, the cache lookup is actually faster then creating all those strings:
Before:
After:
Technically speaking, adding an optional parameter to a public method, like I did to(My additional idea of adding an optional parameter toFlushBuffer
, is considered a breaking change by some people: it preserves source compatibility, but doesn't preserve binary compatibility (AngleSharp dll can't be swapped for new version without recompiling the calling assembly). I don't know what your policy for this kind of things is, but if it's a problem, it can be solved by adding an overload instead of an optional parameter.FlushBuffer
was stupid, see the comment below.)