Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Codespell doesn't offer to capitalize word result if it doesn't start with the same first letter (?) #1883

Closed
luzpaz opened this issue Feb 28, 2021 · 9 comments
Assignees
Labels

Comments

@luzpaz
Copy link
Collaborator

luzpaz commented Feb 28, 2021

./src/effects/Reverb.cpp:455: Rever ==> Revert, fever
In the analysis example above fever is not capitalized

@peternewman
Copy link
Collaborator

I can't see this in the dictionary @luzpaz . Is it personal custom dictionary?

@luzpaz
Copy link
Collaborator Author

luzpaz commented Mar 1, 2021

No... this is not a custom dictionary. And I'm not referring to how the words are displayed in the custom dictionary. This is specifically related to the logic that is used by codespell to capitalize/capital-case the suggested correctly-spelled words because the typo itself was capitalized/capital-cased. Make sense?

@peternewman
Copy link
Collaborator

Ah, which version are you running? I tried various searches via GitHub and failed to find it, but by inspecting the dictionary:

rever->revert, refer, fever,

Please can you also share the whole line it's matching on and any command line flags/config you're using.

Also hopefully there's a PR to add reverb as a suggestion?

@peternewman
Copy link
Collaborator

Edit, agreed, it's broken:

text.txt:3: rever ==> revert, refer, fever
text.txt:4: Rever ==> Revert, refer, fever
text.txt:5: REVER ==> REVERT, REFER, FEVER

fix_case needs erm fixing. Well some tests adding first. It needs a for each around each suggestion individually, but also a quick check the reason isn't getting passed to it if present:

def fix_case(word, fixword):
if word == word.capitalize():
return fixword.capitalize()
elif word == word.upper():
return fixword.upper()
# they are both lower case
# or we don't have any idea
return fixword

@peternewman peternewman added bug and removed question labels Mar 1, 2021
@bl-ue bl-ue self-assigned this Jun 11, 2021
@bl-ue
Copy link
Contributor

bl-ue commented Jun 11, 2021

Ah, I see @peternewman: the problem is that fix_case is being called essentially like this:

fix_case("Rever", "revert, refer, fever")

So it's capitalizing fixword, which is why only the first word is capitalized.

@peternewman
Copy link
Collaborator

Exactly. But the fix is not to just do a naive split, as the last one shouldn't be capitalised because it's a potential note.

@bl-ue
Copy link
Contributor

bl-ue commented Jun 11, 2021

Oh, that's a good point. So maybe we should change build_dict() and Misspelling to have a list of suggestions and an optional comment? That sounds like the best idea to me.

def build_dict(filename, misspellings, ignore_words):
with codecs.open(filename, mode='r', encoding='utf-8') as f:
for line in f:
[key, data] = line.split('->')
# TODO for now, convert both to lower. Someday we can maybe add
# support for fixing caps.
key = key.lower()
data = data.lower()
if key in ignore_words:
continue
data = data.strip()
fix = data.rfind(',')
if fix < 0:
fix = True
reason = ''
elif fix == (len(data) - 1):
data = data[:fix]
reason = ''
fix = False
else:
reason = data[fix + 1:].strip()
data = data[:fix]
fix = False
misspellings[key] = Misspelling(data, fix, reason)

I could easily undertake it.

@peternewman
Copy link
Collaborator

I think that handles it already, and reason is the reason. Basically just make sure the tests cover both cases, but yeah, feel free to get stuck in.

@peternewman
Copy link
Collaborator

@DimitriPapadopoulos fixed this in #2223

So maybe we should change build_dict() and Misspelling to have a list of suggestions and an optional comment? That sounds like the best idea to me.

def build_dict(filename, misspellings, ignore_words):
with codecs.open(filename, mode='r', encoding='utf-8') as f:
for line in f:
[key, data] = line.split('->')
# TODO for now, convert both to lower. Someday we can maybe add
# support for fixing caps.
key = key.lower()
data = data.lower()
if key in ignore_words:
continue
data = data.strip()
fix = data.rfind(',')
if fix < 0:
fix = True
reason = ''
elif fix == (len(data) - 1):
data = data[:fix]
reason = ''
fix = False
else:
reason = data[fix + 1:].strip()
data = data[:fix]
fix = False
misspellings[key] = Misspelling(data, fix, reason)

I could easily undertake it.

But something like this would be good still @bl-ue !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants