Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Apostrophes #26

Closed
nicno90 opened this issue Sep 29, 2020 · 6 comments · Fixed by #28
Closed

[BUG] Apostrophes #26

nicno90 opened this issue Sep 29, 2020 · 6 comments · Fixed by #28
Labels
bug Something isn't working

Comments

@nicno90
Copy link

nicno90 commented Sep 29, 2020

Removing "s" after apostrophe
When apostrophes are in sentence yields weird results.

To Reproduce

#Steps to reproduce the behavior:
1. Init spacy model '...'
2. Add contextualSpellCheck '....'
3. supply the sentence "Spell Checking based on Peter Norvig’s blog post."
4. doc._.outcome_spellCheck gives result: "Spell Checking based on Peter Norvig, blog post."

Expected behavior
" 's " should not be touched.

Version (please complete the following information):

  • contextualSpellCheck [e.g. 0.3.0]
  • Spacy: [e.g. 2.3.2]
  • transformers [e.g. 3.1.0]
@nicno90 nicno90 added the bug Something isn't working label Sep 29, 2020
@R1j1t
Copy link
Owner

R1j1t commented Sep 30, 2020

I tried running the given sentence and it worked as expected on my local. If there is no spelling correction then it returns '' as mentioned in the README.

>>> import spacy
>>> import contextualSpellCheck
>>> nlp = spacy.load("en") 
>>> contextualSpellCheck.add_to_pipe(nlp)
<spacy.lang.en.English object at 0x7f911cef4a90>
>>> nlp.pipe_names
['tagger', 'parser', 'ner', 'contextual spellchecker']
>>> doc = nlp('Spell Checking based on Peter Norvig’s blog post.')
>>> doc._.outcome_spellCheck
''
>>> 

Please provide the code to replicate and version info of the package if you are not using the latest.

@R1j1t R1j1t added the question Further information is requested label Sep 30, 2020
@nicno90
Copy link
Author

nicno90 commented Sep 30, 2020

nlp = spacy.load('en_core_web_lg')
spell_checker = ContextualSpellCheck(max_edit_dist=4)
nlp.add_pipe(spell_checker)


def correct_spelling(sentence):
    global nlp
    doc = nlp(sentence)
    return doc._.outcome_spellCheck


print(correct_spelling("Pure Python Spell Checking based on Peter Norvig’s blog post on setting up a simple spell checking algorithm."))

Version:

  • contextualSpellCheck == 0.3.0
  • Spacy == 2.3.2
  • transformers == 3.3.1

@R1j1t
Copy link
Owner

R1j1t commented Sep 30, 2020

Thanks @nicno90 I will have a look.

@R1j1t R1j1t removed the question Further information is requested label Oct 1, 2020
@R1j1t
Copy link
Owner

R1j1t commented Oct 4, 2020

I checked and have found the issue.

When using en_core_web_sm, token 's is identified as PERSON from spaCy pipeline but in en_core_web_lg it is not identified as PERSON. Below is my logic to identify misspell.

if (
(token.text.lower() not in self.vocab)
and (token.ent_type_ != "PERSON")
and (not token.like_num)
and (not token.like_email)
and (not token.like_url)
# added after 0.0.4
and (not token.is_space)
and (not token.is_punct)
and (token.ent_type_ != "GPE")
and (token.ent_type_ != "ORG")
):
misspell.append(token)

token.ent_type_ != "PERSON" is causing the issue. I will try to fix it and release it by next weekend.

Thank you for identifying this issue!

@nicno90
Copy link
Author

nicno90 commented Oct 5, 2020

@R1j1t Thank you for looking into it!

R1j1t added a commit that referenced this issue Oct 10, 2020
At present spacy does not seperate `'` with trailing s (not sure of anyother). So to generalise will seperate punct with trailing words and check in vocab.
bug fix #26
@R1j1t R1j1t closed this as completed in #28 Oct 25, 2020
@R1j1t
Copy link
Owner

R1j1t commented Oct 25, 2020

@nicno90 I have added the fix and test cases. I have released it to PyPi in v0.3.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants