Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix word separation #3667

Merged
merged 9 commits into from
Apr 17, 2019
Merged

Fix word separation #3667

merged 9 commits into from
Apr 17, 2019

Conversation

ajalab
Copy link
Contributor

@ajalab ajalab commented Apr 9, 2019

What this PR does / why we need it:

This PR fixes the degradation issue that words which contain diacritics (e.g., Städte) are separated in word motions.

Which issue(s) this PR fixes

Two issues are reported in #3665:

  1. A string which contains both Latin alphabets and vim.iskeyword characters (e.g., helloWorld(param1) is wrongly regarded as a single word Word seperate doesn't works well #3665 (comment).
  2. A string which contains both Latin ASCII alphabets and diacritics (e.g., adbdémm, Städte) is wrongly regarded as separate words Word seperate doesn't works well #3665 (comment).

This PR fixes the second one. I've not been able to reproduce the original first problem...

Special notes for your reviewer:

  • Auxiliary methods (makeAsciiWordSegments and makeUnicodeWordSegments) are removed.
  • The behavior of the regex can be checked in https://regex101.com/r/X1agK6/1.

@ajalab
Copy link
Contributor Author

ajalab commented Apr 16, 2019

Thanks to the issue report #3680, it was found that Latin-1 symbols and punctuations were wrongly classified as word characters. I overlooked that the original Vim treats characters whose code point is less than 0x0100 (both ASCII chars and Latin-1 chars) as a special case in mbyte.c#L2879. So I added Latin-1 symbols and punctuations (code point range: 0x00a1-0x00bf) to the symbol table in e831849.

This PR fixes #3680.

Copy link
Member

@jpoon jpoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment, but generally LGTM.

// Spelling alphabets are not listed here since they are covered as non-white letters.
// TODO(ajalab): add Emoji
const codePointRanges: [[number, number], CharKind][] = [
const table: [[number, number], CharKind][] = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename this a more descriptive name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about symbolTable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me.

@jpoon jpoon merged commit 57cb40a into VSCodeVim:master Apr 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants