feat: lex 0s 1s 0's 1's etc as words#775
Conversation
This is a flaw in our spell-checker, which is my next target for improvement. On the whole, this is great work. I'm a little hesitant to add such a special case, but I can see no downside. Onward! |
The more dogfooding I do the more cases of numbers followed by various letter suffixes I find. I also noticed the "wordlikes" concept which I wasn't aware of when I made this or the "decades" stuff. I think these all needed to be treated as words to prevent the suffixes being separated by the lexer and then taking part in various linters with what comes after the next space, which so far always feels wrong. I think it's fine to have special-case lexers that catch either words or wordlikes that are hard to catch in monolithic lexers but I think I made the wrong choice with the decade lexer being its own new token type instead of either word or wordlike. I'm not sure about actual words vs wordlikes yet. I'll dig into that soon I think. |
This MR contains the following updates: | Package | Update | Change | |---|---|---| | [Automattic/harper/harper-ls](https://github.com/Automattic/harper) | minor | `v0.23.0` -> `v0.24.0` | MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot). **Proposed changes to behavior should be submitted there as MRs.** --- ### Release Notes <details> <summary>Automattic/harper (Automattic/harper/harper-ls)</summary> ### [`v0.24.0`](https://github.com/Automattic/harper/releases/tag/v0.24.0) [Compare Source](Automattic/harper@v0.23.0...v0.24.0) #### What's Changed - feat(core): add five technical words to the curated dictionary by [@​86xsk](https://github.com/86xsk) in Automattic/harper#767 - Rid off phrase corrections by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#717 - chore: new words and fixes to existing entries by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#783 - fix(core): don't ignore blocking word-likes by [@​elijah-potter](https://github.com/elijah-potter) in Automattic/harper#788 - fix(core): remove bad `Forthwith` rule by [@​elijah-potter](https://github.com/elijah-potter) in Automattic/harper#787 - fix(core): address edge-case from [#​744](Automattic/harper#744) by [@​elijah-potter](https://github.com/elijah-potter) in Automattic/harper#786 - feat: implement false positive "the great might of" by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#795 - chore: add words and improve affix annotations by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#793 - test: check that dictionary attributes can be spread over two entries by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#779 - test(core): issue [#​798](Automattic/harper#798) by [@​elijah-potter](https://github.com/elijah-potter) in Automattic/harper#799 - chore: add words, fix annotations by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#805 - feat(web): create page for debugging the title-case algo by [@​elijah-potter](https://github.com/elijah-potter) in Automattic/harper#801 - chore: curate dictionary by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#803 - build(deps-dev): bump [@​types/jasmine](https://github.com/types/jasmine) from 5.1.6 to 5.1.7 in /packages by [@​dependabot](https://github.com/dependabot) in Automattic/harper#810 - build(deps-dev): bump rollup from 4.34.6 to 4.34.9 in /packages by [@​dependabot](https://github.com/dependabot) in Automattic/harper#808 - build(deps): bump clap from 4.5.29 to 4.5.31 by [@​dependabot](https://github.com/dependabot) in Automattic/harper#814 - build(deps-dev): bump prettier from 3.5.2 to 3.5.3 in /packages by [@​dependabot](https://github.com/dependabot) in Automattic/harper#811 - build(deps): bump serde_json from 1.0.138 to 1.0.139 by [@​dependabot](https://github.com/dependabot) in Automattic/harper#813 - feat(core): add a ton of rules from the backlog by [@​elijah-potter](https://github.com/elijah-potter) in Automattic/harper#797 - WordPress plugin by [@​elijah-potter](https://github.com/elijah-potter) in Automattic/harper#817 - Dictionary curation 2025 03 04 by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#819 - build(deps-dev): bump [@​types/node](https://github.com/types/node) from 22.13.4 to 22.13.9 in /packages by [@​dependabot](https://github.com/dependabot) in Automattic/harper#818 - test(core): added test with capital letters by [@​elijah-potter](https://github.com/elijah-potter) in Automattic/harper#738 - build(deps): bump pulldown-cmark from 0.12.2 to 0.13.0 by [@​dependabot](https://github.com/dependabot) in Automattic/harper#693 - refactor(core): proper noun linters use canonical casing and JSON file by [@​elijah-potter](https://github.com/elijah-potter) in Automattic/harper#834 - Dictionary curation 2025 03 05 by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#833 - feat: lex 0s 1s 0's 1's etc as words by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#775 - refactor: nominal and determiner word types by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#731 #### New Contributors - [@​86xsk](https://github.com/86xsk) made their first contribution in Automattic/harper#767 **Full Changelog**: Automattic/harper@v0.23.0...v0.24.0 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this MR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box --- This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4xODYuMCIsInVwZGF0ZWRJblZlciI6IjM5LjE4Ni4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJSZW5vdmF0ZSBCb3QiXX0=-->
This MR contains the following updates: | Package | Update | Change | |---|---|---| | [Automattic/harper/harper-ls](https://github.com/Automattic/harper) | minor | `v0.23.0` -> `v0.24.0` | MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot). **Proposed changes to behavior should be submitted there as MRs.** --- ### Release Notes <details> <summary>Automattic/harper (Automattic/harper/harper-ls)</summary> ### [`v0.24.0`](https://github.com/Automattic/harper/releases/tag/v0.24.0) [Compare Source](Automattic/harper@v0.23.0...v0.24.0) #### What's Changed - feat(core): add five technical words to the curated dictionary by [@​86xsk](https://github.com/86xsk) in Automattic/harper#767 - Rid off phrase corrections by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#717 - chore: new words and fixes to existing entries by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#783 - fix(core): don't ignore blocking word-likes by [@​elijah-potter](https://github.com/elijah-potter) in Automattic/harper#788 - fix(core): remove bad `Forthwith` rule by [@​elijah-potter](https://github.com/elijah-potter) in Automattic/harper#787 - fix(core): address edge-case from [#​744](Automattic/harper#744) by [@​elijah-potter](https://github.com/elijah-potter) in Automattic/harper#786 - feat: implement false positive "the great might of" by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#795 - chore: add words and improve affix annotations by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#793 - test: check that dictionary attributes can be spread over two entries by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#779 - test(core): issue [#​798](Automattic/harper#798) by [@​elijah-potter](https://github.com/elijah-potter) in Automattic/harper#799 - chore: add words, fix annotations by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#805 - feat(web): create page for debugging the title-case algo by [@​elijah-potter](https://github.com/elijah-potter) in Automattic/harper#801 - chore: curate dictionary by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#803 - build(deps-dev): bump [@​types/jasmine](https://github.com/types/jasmine) from 5.1.6 to 5.1.7 in /packages by [@​dependabot](https://github.com/dependabot) in Automattic/harper#810 - build(deps-dev): bump rollup from 4.34.6 to 4.34.9 in /packages by [@​dependabot](https://github.com/dependabot) in Automattic/harper#808 - build(deps): bump clap from 4.5.29 to 4.5.31 by [@​dependabot](https://github.com/dependabot) in Automattic/harper#814 - build(deps-dev): bump prettier from 3.5.2 to 3.5.3 in /packages by [@​dependabot](https://github.com/dependabot) in Automattic/harper#811 - build(deps): bump serde_json from 1.0.138 to 1.0.139 by [@​dependabot](https://github.com/dependabot) in Automattic/harper#813 - feat(core): add a ton of rules from the backlog by [@​elijah-potter](https://github.com/elijah-potter) in Automattic/harper#797 - WordPress plugin by [@​elijah-potter](https://github.com/elijah-potter) in Automattic/harper#817 - Dictionary curation 2025 03 04 by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#819 - build(deps-dev): bump [@​types/node](https://github.com/types/node) from 22.13.4 to 22.13.9 in /packages by [@​dependabot](https://github.com/dependabot) in Automattic/harper#818 - test(core): added test with capital letters by [@​elijah-potter](https://github.com/elijah-potter) in Automattic/harper#738 - build(deps): bump pulldown-cmark from 0.12.2 to 0.13.0 by [@​dependabot](https://github.com/dependabot) in Automattic/harper#693 - refactor(core): proper noun linters use canonical casing and JSON file by [@​elijah-potter](https://github.com/elijah-potter) in Automattic/harper#834 - Dictionary curation 2025 03 05 by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#833 - feat: lex 0s 1s 0's 1's etc as words by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#775 - refactor: nominal and determiner word types by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#731 #### New Contributors - [@​86xsk](https://github.com/86xsk) made their first contribution in Automattic/harper#767 **Full Changelog**: Automattic/harper@v0.23.0...v0.24.0 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this MR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box --- This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4xODYuMCIsInVwZGF0ZWRJblZlciI6IjM5LjE4Ni4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJSZW5vdmF0ZSBCb3QiXX0=-->
Issues
This addresses #774
Description
In sentences like "Computers work with 1s and 0s",
1and0were lexed as numbers, cutting off the plural suffixes. This lefts andto be a detected as a potential error forsand.This change specifically lexes single digits followed by
sor'sas words, leaving the rest up to the linters.I've also added
0sand1sto the curated dictionary.0'sand `1's1 are omitted as those should be flagged and suggested to change to the non-apostrophe versions.Other digits and longer numbers are omitted from the dictionary although they are still lexed as words. They can be added in if we find evidence of them causing similar problems.
One unexpected side-effect is that even with
0sin the dictionary and0'sflagged as an error, Harper does not manage to suggest0sas a correction for0's.Demo
Before:
After, spelled correctly, without apostrophes:
After, spelled incorrectly, with apostrophes:
How Has This Been Tested?
I added new tests, including for edge cases I identified, such as more letters or digits following directly after the
s.Checklist