port speller perf fix from 4.8 #4975

SamBent · 2021-08-02T21:50:10Z

Addresses Issue #3350. This is a port of a servicing fix in .NET 4.7-4.8.

Description

PR #2871 added logic to the word-breaking phase to extend misspelled tokens ("words") with non-token characters ("punctuation") if such an extension fixes the misspelling. I've found ways to speed it up:

The logic checks each token and potential extension for misspellings, using an existing method ComprehensiveCheck that automatically populates each spelling error with a list of suggestions obtained from the native layer. These suggestions are never used, and the native calls were accounting for 80% of the time (says Trevor Fellman). The logic only cares whether any errors exist, so use a new method HasErrors that answers that question without asking for suggestions.
The check for potential extensions is skipped if the non-token characters that follow the token are all white-space. In practice, those characters often include nulls ('\0') at the end. Removing these nulls first allows us to skip the extension test altogether in many cases - about 40% in my experiments.
Don't consider trailing whitespace or nulls in the non-token characters. (This generalizes the original heuristic that discarded the non-token characters if they were all whitespace.)
Don't consider whitespace or nulls interior to the non-token characters, until reaching a character that is not whitespace or null.
Cache the 10 most recent HasError results, and answer queries from the cache instead of calling the native spell-checker.

This speeds up the spell-checker quite a bit, but it is still slower than the original. The word-breaking phase still has to check each token for misspelling (which it didn't do before #2871); there's no getting around this expense.

Customer Impact

This bug is blocking migration to .NET Core.

Regression

Regression in .NET 5.0.

Testing

Ad-hoc with customer scenarios.
Standard regression testing.

Risk

Low. Straightforward port of .NETFx fix that was released last year.

port speller perf fix from 4.8

e63647a

SamBent requested a review from a team as a code owner August 2, 2021 21:50

ghost added the PR metadata: Label to tag PRs, to facilitate with triage label Aug 2, 2021

ghost requested review from fabiant3 and ryalanms August 2, 2021 21:50

fabiant3 approved these changes Aug 2, 2021

View reviewed changes

SamBent merged commit 9fb0601 into dotnet:main Aug 17, 2021

ghost locked as resolved and limited conversation to collaborators Apr 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

port speller perf fix from 4.8 #4975

port speller perf fix from 4.8 #4975

Uh oh!

SamBent commented Aug 2, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

port speller perf fix from 4.8 #4975

port speller perf fix from 4.8 #4975

Uh oh!

Conversation

SamBent commented Aug 2, 2021

Description

Customer Impact

Regression

Testing

Risk

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants