Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GetInnerText() performace #55

Open
GeneThomas opened this issue May 9, 2020 · 1 comment
Open

GetInnerText() performace #55

GeneThomas opened this issue May 9, 2020 · 1 comment

Comments

@GeneThomas
Copy link

GeneThomas commented May 9, 2020

Bug Report

I am writing, what I would think is a fairly simple usage of AngleSharp[.Css], I am extracting a html table of covid-19 cases etc.. by country. The headers [or other cells] can contain html <br>. INode.Text() [an extension] and INode.TextContent() remove the <br> returning values like “TotalCases”. My implementation parses the 3000ish cells in 4.6 seconds. Using AngleSharp.Css’s ElementExtensions’s string GetInnerText(this IElement element); takes over 8 minutes makeing it unusable.

I assume you must implement Css’s display:none and visibility:hidden. I do not require that functionality, as I  do not require an implementation of Javascript. If GetInnerText()  can not be sped up a reasonable solution would be to use something like my code with your implementation of html entities such as © etc..

The attached project’s interesting code is in AngleSharpCssSpeedFault.cs.
AngleSharpCssSpeedFault.zip

The last method InnerText(IElement) has a #if to switch between the two implementations of InnerText().

Prerequisites

Run the attached solution.

Description

see above

Steps to Reproduce

  1. Run the solution
  2. Change the #if in the last method InnerText()
  3. Run the solutino again.

Possible Solution

Use my InnerText() but add the expanding of all html & entities as that is missing.

@Seyden
Copy link

Seyden commented Jan 9, 2024

I debugged it and what slows it down is basically the computation of the style rules and because i also dont need styles for InnerText, except the default rules like paragraph or div break lines and stuff, i added 2 null checks.

In that case i can use InnerText without specifying .WithCss and without calling WithRenderDevice, this makes your code parse in 25 ms, instead of 8 minutes.

I will use my fork for now because this is probably not a acceptable solution for Florian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants