Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use regex for lexical illusions #1174

Merged
merged 3 commits into from
Jul 4, 2021
Merged

Conversation

Nytelife26
Copy link
Member

@Nytelife26 Nytelife26 commented Jun 7, 2021

This relates to...

The usage of regular expressions rather than simple matches for
lexical_illusions.misc.

Rationale

Prior to this, the lexical illusions check would only catch
3 different lexical illusions, and always in pairs of two rather than
flagging the entire strand. This is inefficient, and does not catch
many lexical illusions at all.

Changes

  • lexical_illusions.misc now uses regex
  • lexical_illusions.misc now reports entire strands instead of many word pairs
  • lexical_illusions.misc reports all lexical illusions except for that that
    and had had

Features

N/A.

Bug Fixes

N/A.

Breaking Changes and Deprecations

N/A.

@Nytelife26 Nytelife26 added type: refactor Issues and PRs related to code cleanup. priority: null Issues and PRs that are of negligible importance so may be postponed. version: patch Issues and PRs with bug fixes belonging to the next patch release. labels Jun 7, 2021
@Nytelife26 Nytelife26 requested a review from suchow as a code owner June 7, 2021 00:36
@codecov
Copy link

codecov bot commented Jun 7, 2021

Codecov Report

Merging #1174 (b82f8bb) into main (8b3498c) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1174   +/-   ##
=======================================
  Coverage   90.14%   90.15%           
=======================================
  Files          83       83           
  Lines        1208     1209    +1     
=======================================
+ Hits         1089     1090    +1     
  Misses        119      119           
Flag Coverage Δ
macos-latest 90.15% <100.00%> (+<0.01%) ⬆️
py3.6 89.19% <100.00%> (+<0.01%) ⬆️
py3.7 89.19% <100.00%> (+<0.01%) ⬆️
py3.8 90.15% <100.00%> (+<0.01%) ⬆️
py3.9 90.15% <100.00%> (+<0.01%) ⬆️
pypypy3 89.19% <100.00%> (+<0.01%) ⬆️
ubuntu-latest 90.15% <100.00%> (+<0.01%) ⬆️
windows-latest 90.15% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
proselint/checks/lexical_illusions/misc.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8b3498c...b82f8bb. Read the comment docs.

@Nytelife26 Nytelife26 added the status: review-ready PRs that are ready for author review. label Jun 7, 2021
@@ -21,11 +21,6 @@ def check(text):
"""Check the text."""
err = "lexical_illusions.misc"
msg = u"There's a lexical illusion here: a word is repeated."
regex = r"\b(\w+)\b\s\1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few instances where repeated words are perfectly acceptable. For example, one day I was commenting on student writing and I noticed that the student used the word "that" instead of "which". It's a hard distinction for many, and I don't blame them. So here I am now, telling you that that "that" that that student wrote ought to have been a "which".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Nytelife26
Copy link
Member Author

Nytelife26 commented Jun 7, 2021 via email

@Nytelife26
Copy link
Member Author

Nytelife26 commented Jun 7, 2021 via email

@suchow
Copy link
Member

suchow commented Jun 8, 2021

How about we make an exception for that that (two in a row, not more) and for had had (which is the past perfect form of the verb to have and hardly a verbal atrocity, e.g., "He had had one too many drinks."), and you can have the rest? The links were meant to be fun examples, not things you needed to take seriously, which is why I left them as comments and didn't mark the changes as necessary in the review.

@Nytelife26
Copy link
Member Author

How about we make an exception for that that (two in a row, not more) and for had had (which is the past perfect form of the verb to have and hardly a verbal atrocity, e.g., "He had had one too many drinks."), and you can have the rest?

Works for me. Apologies if that came off aggressive, by the way, I was just sharing my thoughts and it ended up coming out somewhat passionately

@suchow
Copy link
Member

suchow commented Jun 8, 2021

All good, not going to complain about passion!

@Nytelife26
Copy link
Member Author

It seems like to do so we will need to be able to create exceptions using the
existence check. Unfortunately, Regex does not present us with a viable way to
do this (no amount of negative lookaround hacks would work), so we may need to
use a separate PR to implement that - quite a breaking change.

@Nytelife26 Nytelife26 added status: wip Issues and PRs that are still a work in progress. and removed status: review-ready PRs that are ready for author review. labels Jul 4, 2021
@Nytelife26 Nytelife26 force-pushed the fix/lexical-illusions-regex branch from ad33608 to d4cca14 Compare July 4, 2021 14:58
@Nytelife26
Copy link
Member Author

@suchow The exceptions for had had and that that have been successfully implemented. Furthermore, the lexical illusions check will now flag the entire strand of lexical illusions, rather than repeatedly flagging it in word pairs of two - much more efficient.

@Nytelife26 Nytelife26 requested a review from suchow July 4, 2021 15:04
@Nytelife26 Nytelife26 added status: review-ready PRs that are ready for author review. and removed status: wip Issues and PRs that are still a work in progress. labels Jul 4, 2021
@Nytelife26 Nytelife26 force-pushed the fix/lexical-illusions-regex branch from d4cca14 to b82f8bb Compare July 4, 2021 15:07
Copy link
Member

@suchow suchow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful, thank you, an excellent way to resolve this issue.

@Nytelife26 Nytelife26 merged commit c165906 into main Jul 4, 2021
@Nytelife26 Nytelife26 deleted the fix/lexical-illusions-regex branch July 4, 2021 15:26
@Nytelife26 Nytelife26 restored the fix/lexical-illusions-regex branch August 22, 2021 20:58
@Nytelife26 Nytelife26 deleted the fix/lexical-illusions-regex branch August 22, 2021 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: null Issues and PRs that are of negligible importance so may be postponed. status: review-ready PRs that are ready for author review. type: refactor Issues and PRs related to code cleanup. version: patch Issues and PRs with bug fixes belonging to the next patch release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants