Skip to content

feat: lex 0s 1s 0's 1's etc as words#775

Merged
elijah-potter merged 3 commits into
Automattic:masterfrom
hippietrail:lex-plural-digits
Mar 5, 2025
Merged

feat: lex 0s 1s 0's 1's etc as words#775
elijah-potter merged 3 commits into
Automattic:masterfrom
hippietrail:lex-plural-digits

Conversation

@hippietrail
Copy link
Copy Markdown
Collaborator

Issues

This addresses #774

Description

In sentences like "Computers work with 1s and 0s", 1 and 0 were lexed as numbers, cutting off the plural suffixes. This left s and to be a detected as a potential error for sand.

This change specifically lexes single digits followed by s or 's as words, leaving the rest up to the linters.

I've also added 0s and 1s to the curated dictionary. 0's and `1's1 are omitted as those should be flagged and suggested to change to the non-apostrophe versions.

Other digits and longer numbers are omitted from the dictionary although they are still lexed as words. They can be added in if we find evidence of them causing similar problems.

One unexpected side-effect is that even with 0s in the dictionary and 0's flagged as an error, Harper does not manage to suggest 0s as a correction for 0's.

Demo

Before:

Image Image

After, spelled correctly, without apostrophes:

Screenshot 2025-02-25 at 3 06 08 pm

After, spelled incorrectly, with apostrophes:

Screenshot 2025-02-25 at 3 06 20 pm

How Has This Been Tested?

I added new tests, including for edge cases I identified, such as more letters or digits following directly after the s.

Checklist

  • I have performed a self-review of my own code
  • I have added tests to cover my changes

@elijah-potter
Copy link
Copy Markdown
Collaborator

One unexpected side-effect is that even with 0s in the dictionary and 0's flagged as an error, Harper does not manage to suggest 0s as a correction for 0's.

This is a flaw in our spell-checker, which is my next target for improvement.

On the whole, this is great work. I'm a little hesitant to add such a special case, but I can see no downside. Onward!

@elijah-potter elijah-potter merged commit 8ec6977 into Automattic:master Mar 5, 2025
@hippietrail
Copy link
Copy Markdown
Collaborator Author

One unexpected side-effect is that even with 0s in the dictionary and 0's flagged as an error, Harper does not manage to suggest 0s as a correction for 0's.

This is a flaw in our spell-checker, which is my next target for improvement.

On the whole, this is great work. I'm a little hesitant to add such a special case, but I can see no downside. Onward!

The more dogfooding I do the more cases of numbers followed by various letter suffixes I find. I also noticed the "wordlikes" concept which I wasn't aware of when I made this or the "decades" stuff.

I think these all needed to be treated as words to prevent the suffixes being separated by the lexer and then taking part in various linters with what comes after the next space, which so far always feels wrong.

I think it's fine to have special-case lexers that catch either words or wordlikes that are hard to catch in monolithic lexers but I think I made the wrong choice with the decade lexer being its own new token type instead of either word or wordlike.

I'm not sure about actual words vs wordlikes yet. I'll dig into that soon I think.

@hippietrail hippietrail deleted the lex-plural-digits branch March 5, 2025 22:03
tmeijn pushed a commit to tmeijn/dotfiles that referenced this pull request Mar 6, 2025
This MR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [Automattic/harper/harper-ls](https://github.com/Automattic/harper) | minor | `v0.23.0` -> `v0.24.0` |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>Automattic/harper (Automattic/harper/harper-ls)</summary>

### [`v0.24.0`](https://github.com/Automattic/harper/releases/tag/v0.24.0)

[Compare Source](Automattic/harper@v0.23.0...v0.24.0)

#### What's Changed

-   feat(core): add five technical words to the curated dictionary by [@&#8203;86xsk](https://github.com/86xsk) in Automattic/harper#767
-   Rid off phrase corrections by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#717
-   chore: new words and fixes to existing entries by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#783
-   fix(core): don't ignore blocking word-likes by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#788
-   fix(core): remove bad `Forthwith` rule by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#787
-   fix(core): address edge-case from [#&#8203;744](Automattic/harper#744) by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#786
-   feat: implement false positive "the great might of" by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#795
-   chore: add words and improve affix annotations by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#793
-   test: check that dictionary attributes can be spread over two entries by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#779
-   test(core): issue [#&#8203;798](Automattic/harper#798) by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#799
-   chore: add words, fix annotations by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#805
-   feat(web): create page for debugging the title-case algo by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#801
-   chore: curate dictionary by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#803
-   build(deps-dev): bump [@&#8203;types/jasmine](https://github.com/types/jasmine) from 5.1.6 to 5.1.7 in /packages by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#810
-   build(deps-dev): bump rollup from 4.34.6 to 4.34.9 in /packages by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#808
-   build(deps): bump clap from 4.5.29 to 4.5.31 by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#814
-   build(deps-dev): bump prettier from 3.5.2 to 3.5.3 in /packages by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#811
-   build(deps): bump serde_json from 1.0.138 to 1.0.139 by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#813
-   feat(core): add a ton of rules from the backlog by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#797
-   WordPress plugin by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#817
-   Dictionary curation 2025 03 04 by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#819
-   build(deps-dev): bump [@&#8203;types/node](https://github.com/types/node) from 22.13.4 to 22.13.9 in /packages by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#818
-   test(core): added test with capital letters by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#738
-   build(deps): bump pulldown-cmark from 0.12.2 to 0.13.0 by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#693
-   refactor(core): proper noun linters use canonical casing and JSON file by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#834
-   Dictionary curation 2025 03 05 by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#833
-   feat: lex 0s 1s 0's 1's etc as words by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#775
-   refactor: nominal and determiner word types by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#731

#### New Contributors

-   [@&#8203;86xsk](https://github.com/86xsk) made their first contribution in Automattic/harper#767

**Full Changelog**: Automattic/harper@v0.23.0...v0.24.0

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4xODYuMCIsInVwZGF0ZWRJblZlciI6IjM5LjE4Ni4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJSZW5vdmF0ZSBCb3QiXX0=-->
tmeijn pushed a commit to tmeijn/dotfiles that referenced this pull request Mar 6, 2025
This MR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [Automattic/harper/harper-ls](https://github.com/Automattic/harper) | minor | `v0.23.0` -> `v0.24.0` |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>Automattic/harper (Automattic/harper/harper-ls)</summary>

### [`v0.24.0`](https://github.com/Automattic/harper/releases/tag/v0.24.0)

[Compare Source](Automattic/harper@v0.23.0...v0.24.0)

#### What's Changed

-   feat(core): add five technical words to the curated dictionary by [@&#8203;86xsk](https://github.com/86xsk) in Automattic/harper#767
-   Rid off phrase corrections by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#717
-   chore: new words and fixes to existing entries by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#783
-   fix(core): don't ignore blocking word-likes by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#788
-   fix(core): remove bad `Forthwith` rule by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#787
-   fix(core): address edge-case from [#&#8203;744](Automattic/harper#744) by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#786
-   feat: implement false positive "the great might of" by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#795
-   chore: add words and improve affix annotations by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#793
-   test: check that dictionary attributes can be spread over two entries by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#779
-   test(core): issue [#&#8203;798](Automattic/harper#798) by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#799
-   chore: add words, fix annotations by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#805
-   feat(web): create page for debugging the title-case algo by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#801
-   chore: curate dictionary by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#803
-   build(deps-dev): bump [@&#8203;types/jasmine](https://github.com/types/jasmine) from 5.1.6 to 5.1.7 in /packages by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#810
-   build(deps-dev): bump rollup from 4.34.6 to 4.34.9 in /packages by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#808
-   build(deps): bump clap from 4.5.29 to 4.5.31 by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#814
-   build(deps-dev): bump prettier from 3.5.2 to 3.5.3 in /packages by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#811
-   build(deps): bump serde_json from 1.0.138 to 1.0.139 by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#813
-   feat(core): add a ton of rules from the backlog by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#797
-   WordPress plugin by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#817
-   Dictionary curation 2025 03 04 by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#819
-   build(deps-dev): bump [@&#8203;types/node](https://github.com/types/node) from 22.13.4 to 22.13.9 in /packages by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#818
-   test(core): added test with capital letters by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#738
-   build(deps): bump pulldown-cmark from 0.12.2 to 0.13.0 by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#693
-   refactor(core): proper noun linters use canonical casing and JSON file by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#834
-   Dictionary curation 2025 03 05 by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#833
-   feat: lex 0s 1s 0's 1's etc as words by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#775
-   refactor: nominal and determiner word types by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#731

#### New Contributors

-   [@&#8203;86xsk](https://github.com/86xsk) made their first contribution in Automattic/harper#767

**Full Changelog**: Automattic/harper@v0.23.0...v0.24.0

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4xODYuMCIsInVwZGF0ZWRJblZlciI6IjM5LjE4Ni4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJSZW5vdmF0ZSBCb3QiXX0=-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants