use regex for lexical illusions #1174

Nytelife26 · 2021-06-07T00:36:39Z

This relates to...

The usage of regular expressions rather than simple matches for
lexical_illusions.misc.

Rationale

Prior to this, the lexical illusions check would only catch
3 different lexical illusions, and always in pairs of two rather than
flagging the entire strand. This is inefficient, and does not catch
many lexical illusions at all.

Changes

lexical_illusions.misc now uses regex
lexical_illusions.misc now reports entire strands instead of many word pairs
lexical_illusions.misc reports all lexical illusions except for that that
and had had

Features

N/A.

Bug Fixes

N/A.

Breaking Changes and Deprecations

N/A.

codecov · 2021-06-07T00:36:43Z

Codecov Report

Merging #1174 (b82f8bb) into main (8b3498c) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #1174   +/-   ##
=======================================
  Coverage   90.14%   90.15%           
=======================================
  Files          83       83           
  Lines        1208     1209    +1     
=======================================
+ Hits         1089     1090    +1     
  Misses        119      119

Flag	Coverage Δ
macos-latest	`90.15% <100.00%> (+<0.01%)`	⬆️
py3.6	`89.19% <100.00%> (+<0.01%)`	⬆️
py3.7	`89.19% <100.00%> (+<0.01%)`	⬆️
py3.8	`90.15% <100.00%> (+<0.01%)`	⬆️
py3.9	`90.15% <100.00%> (+<0.01%)`	⬆️
pypypy3	`89.19% <100.00%> (+<0.01%)`	⬆️
ubuntu-latest	`90.15% <100.00%> (+<0.01%)`	⬆️
windows-latest	`90.15% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
proselint/checks/lexical_illusions/misc.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8b3498c...b82f8bb. Read the comment docs.

suchow · 2021-06-07T20:39:23Z

proselint/checks/lexical_illusions/misc.py

@@ -21,11 +21,6 @@ def check(text):
    """Check the text."""
    err = "lexical_illusions.misc"
    msg = u"There's a lexical illusion here: a word is repeated."
+    regex = r"\b(\w+)\b\s\1"


There are a few instances where repeated words are perfectly acceptable. For example, one day I was commenting on student writing and I noticed that the student used the word "that" instead of "which". It's a hard distinction for many, and I don't blame them. So here I am now, telling you that that "that" that that student wrote ought to have been a "which".

See also https://en.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffalo_buffalo_buffalo_Buffalo_buffalo

https://en.wikipedia.org/wiki/James_while_John_had_had_had_had_had_had_had_had_had_had_had_a_better_effect_on_the_teacher

Nytelife26 · 2021-06-07T23:53:45Z

There are a few instances where repeated words are perfectly acceptable.

Gramatically valid, perhaps. Stylistically speaking, it's an absolute atrocity. I had to attempt to read that example you threw in around 3 times before it parsed correctly in my mind. Valid, yes - but also beneficial (and always possible) to avoid. An exception has been made for punctuation inbetween word boundaries, because splitting the lexical structure into phrases makes it readable. I can therefore tell you that the "that" that the student wrote should have been a "which", but there are other ways of writing about the "which" which the student should've written without entering linguistic territory in which a "which" which your student should've written such that the "that" that your student wrote isn't so bad.

Nytelife26 · 2021-06-07T23:58:13Z

https://en.wikipedia.org/wiki/James_while_John_had_had_had_had_had_had_had_had_had_had_had_a_better_effect_on_the_teacher

"is an English sentence used to demonstrate lexical ambiguity and the necessity of punctuation" I do not think from a stylistic perspective we should be referencing esoteric sentences that serve to demonstrate poor linguistic usage when justifying limitations. ``` The sentence is easier to understand with added punctuation and emphasis: James, while John had had "had", had had "had had"; "had had" had had a better effect on the teacher. ``` The tool is designed to help people write better. Avoiding structures like the ones you have referenced in favour of their better formatted alternatives is doing exactly that, I feel. Either way, the decision lies with you. I've said my piece.

suchow · 2021-06-08T02:47:10Z

How about we make an exception for that that (two in a row, not more) and for had had (which is the past perfect form of the verb to have and hardly a verbal atrocity, e.g., "He had had one too many drinks."), and you can have the rest? The links were meant to be fun examples, not things you needed to take seriously, which is why I left them as comments and didn't mark the changes as necessary in the review.

Nytelife26 · 2021-06-08T09:14:37Z

How about we make an exception for that that (two in a row, not more) and for had had (which is the past perfect form of the verb to have and hardly a verbal atrocity, e.g., "He had had one too many drinks."), and you can have the rest?

Works for me. Apologies if that came off aggressive, by the way, I was just sharing my thoughts and it ended up coming out somewhat passionately

suchow · 2021-06-08T17:34:36Z

All good, not going to complain about passion!

Nytelife26 · 2021-06-09T23:59:28Z

It seems like to do so we will need to be able to create exceptions using the
existence check. Unfortunately, Regex does not present us with a viable way to
do this (no amount of negative lookaround hacks would work), so we may need to
use a separate PR to implement that - quite a breaking change.

Nytelife26 · 2021-07-04T15:01:34Z

@suchow The exceptions for had had and that that have been successfully implemented. Furthermore, the lexical illusions check will now flag the entire strand of lexical illusions, rather than repeatedly flagging it in word pairs of two - much more efficient.

suchow

Wonderful, thank you, an excellent way to resolve this issue.

Nytelife26 added type: refactor Issues and PRs related to code cleanup. priority: null Issues and PRs that are of negligible importance so may be postponed. version: patch Issues and PRs with bug fixes belonging to the next patch release. labels Jun 7, 2021

Nytelife26 requested a review from suchow as a code owner June 7, 2021 00:36

Nytelife26 added the status: review-ready PRs that are ready for author review. label Jun 7, 2021

suchow reviewed Jun 7, 2021

View reviewed changes

Nytelife26 mentioned this pull request Jul 4, 2021

add exceptions for existence_check #1182

Merged

Nytelife26 added status: wip Issues and PRs that are still a work in progress. and removed status: review-ready PRs that are ready for author review. labels Jul 4, 2021

Nytelife26 force-pushed the fix/lexical-illusions-regex branch from ad33608 to d4cca14 Compare July 4, 2021 14:58

Nytelife26 requested a review from suchow July 4, 2021 15:04

Nytelife26 added status: review-ready PRs that are ready for author review. and removed status: wip Issues and PRs that are still a work in progress. labels Jul 4, 2021

Nytelife26 added 3 commits July 4, 2021 16:07

refactor(checks): use regex for lexical illusions

e7985d0

test: add punctuation check for lexical illusions regex

4172149

test: add that that and had had tests for lexical illusions

b82f8bb

Nytelife26 force-pushed the fix/lexical-illusions-regex branch from d4cca14 to b82f8bb Compare July 4, 2021 15:07

suchow approved these changes Jul 4, 2021

View reviewed changes

Nytelife26 merged commit c165906 into main Jul 4, 2021

Nytelife26 deleted the fix/lexical-illusions-regex branch July 4, 2021 15:26

Nytelife26 restored the fix/lexical-illusions-regex branch August 22, 2021 20:58

Nytelife26 deleted the fix/lexical-illusions-regex branch August 22, 2021 21:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use regex for lexical illusions #1174

use regex for lexical illusions #1174

Nytelife26 commented Jun 7, 2021 •

edited

Loading

codecov bot commented Jun 7, 2021 •

edited

Loading

suchow Jun 7, 2021

suchow Jun 7, 2021

suchow Jun 7, 2021

Nytelife26 commented Jun 7, 2021 via email

Nytelife26 commented Jun 7, 2021 via email •

edited

Loading

suchow commented Jun 8, 2021 •

edited

Loading

Nytelife26 commented Jun 8, 2021

suchow commented Jun 8, 2021

Nytelife26 commented Jun 9, 2021

Nytelife26 commented Jul 4, 2021

suchow left a comment

use regex for lexical illusions #1174

use regex for lexical illusions #1174

Conversation

Nytelife26 commented Jun 7, 2021 • edited Loading

This relates to...

Rationale

Changes

Features

Bug Fixes

Breaking Changes and Deprecations

codecov bot commented Jun 7, 2021 • edited Loading

Codecov Report

suchow Jun 7, 2021

Choose a reason for hiding this comment

suchow Jun 7, 2021

Choose a reason for hiding this comment

suchow Jun 7, 2021

Choose a reason for hiding this comment

Nytelife26 commented Jun 7, 2021 via email

Nytelife26 commented Jun 7, 2021 via email • edited Loading

suchow commented Jun 8, 2021 • edited Loading

Nytelife26 commented Jun 8, 2021

suchow commented Jun 8, 2021

Nytelife26 commented Jun 9, 2021

Nytelife26 commented Jul 4, 2021

suchow left a comment

Choose a reason for hiding this comment

Nytelife26 commented Jun 7, 2021 •

edited

Loading

codecov bot commented Jun 7, 2021 •

edited

Loading

Nytelife26 commented Jun 7, 2021 via email •

edited

Loading

suchow commented Jun 8, 2021 •

edited

Loading