Skip to content

Conversation

@SamBent
Copy link
Contributor

@SamBent SamBent commented Aug 2, 2021

Addresses Issue #3350. This is a port of a servicing fix in .NET 4.7-4.8.

Description

PR #2871 added logic to the word-breaking phase to extend misspelled tokens ("words") with non-token characters ("punctuation") if such an extension fixes the misspelling. I've found ways to speed it up:

  1. The logic checks each token and potential extension for misspellings, using an existing method ComprehensiveCheck that automatically populates each spelling error with a list of suggestions obtained from the native layer. These suggestions are never used, and the native calls were accounting for 80% of the time (says Trevor Fellman). The logic only cares whether any errors exist, so use a new method HasErrors that answers that question without asking for suggestions.

  2. The check for potential extensions is skipped if the non-token characters that follow the token are all white-space. In practice, those characters often include nulls ('\0') at the end. Removing these nulls first allows us to skip the extension test altogether in many cases - about 40% in my experiments.

  3. Don't consider trailing whitespace or nulls in the non-token characters. (This generalizes the original heuristic that discarded the non-token characters if they were all whitespace.)

  4. Don't consider whitespace or nulls interior to the non-token characters, until reaching a character that is not whitespace or null.

  5. Cache the 10 most recent HasError results, and answer queries from the cache instead of calling the native spell-checker.

This speeds up the spell-checker quite a bit, but it is still slower than the original. The word-breaking phase still has to check each token for misspelling (which it didn't do before #2871); there's no getting around this expense.

Customer Impact

This bug is blocking migration to .NET Core.

Regression

Regression in .NET 5.0.

Testing

Ad-hoc with customer scenarios.
Standard regression testing.

Risk

Low. Straightforward port of .NETFx fix that was released last year.

@SamBent SamBent requested a review from a team as a code owner August 2, 2021 21:50
@ghost ghost added the PR metadata: Label to tag PRs, to facilitate with triage label Aug 2, 2021
@ghost ghost requested review from fabiant3 and ryalanms August 2, 2021 21:50
@SamBent SamBent merged commit 9fb0601 into dotnet:main Aug 17, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Apr 7, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

PR metadata: Label to tag PRs, to facilitate with triage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants