Skip to content

Optimize word count#2974

Merged
yufeih merged 4 commits into
dotnet:v3from
yufeih:wordcount
Jul 2, 2018
Merged

Optimize word count#2974
yufeih merged 4 commits into
dotnet:v3from
yufeih:wordcount

Conversation

@yufeih
Copy link
Copy Markdown
Contributor

@yufeih yufeih commented Jun 29, 2018

LoadHtml has huge performance impact, this PR optimizes word count by:

  • not load a mutated html
  • not split strings

Time to build azure-docs-pr (without git contribution info) drops roughly from 6 mins to 3 mins.

Note:

This implementation is slightly different from the original one, but is more correct:

  • Do not special case <title> tag because title, it is already removed in markup phase
  • Count both starting tags and ending tags as word separators, the old version only treat ending tags as separator (using string replace hack)

This impacts legacy tests, we could consider removing word_count in as impact test post process.

#2972

Copy link
Copy Markdown
Contributor

@superyyrrzz superyyrrzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Comment thread src/docfx/lib/HtmlUtility.cs
Copy link
Copy Markdown
Contributor

@herohua herohua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@yufeih yufeih merged commit 2e1d628 into dotnet:v3 Jul 2, 2018
@yufeih yufeih deleted the wordcount branch July 2, 2018 02:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants