-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
English spellchecking #84
Comments
So the issue is in the Note that the contraction isn't marked as a mistake, it is that they are turned into more than one word. So |
I am afraid that is not a solution since there are punctuation signs (see the last word in my example), and " fit." is placed into misspelled. By the way, what is the difference between ver 0.5.4 and ver 0.5.6 that produced different spelling results? |
You can see the information in the Change log as to the differences. The biggest are new dictionaries that attempt to fix these exact issues, a fix for python 3.9, and removing python 2.7 support. As for how to parse your string, that isn't really this libraries goal. The goal is to be simple to use and pure python and to not require any dependencies. I used the NLTK WhitespaceTokenizer to build the dictionaries (non-spanish). It is up to you to figure out how you would like to parse your text to make it testable. If there is a good method that can be used to update the simplistic For your instance, perhaps something like this would work: from spellchecker import SpellChecker
spell = SpellChecker()
words = "That is how that's and don't do not fit.".split()
misspelled = spell.unknown(words)
# NOTE: this is based on a simple split. Up to the user to figure out what is best!
# This example is only dealing with trailing punctuation, not leading.
for w in misspelled:
if w.endswith(tuple([".", "?", ",", '"', "'", "!", "]", ")"])) and w[:-1] in spell:
# the word is not misspelled, it was punctuation!
# likely, you would want to make sure there are
# not more punctuation in a row, etc. But this is a
# possible solution for your exact problem.
print("({}) is not misspelled!".format(w)) |
Understood. Thank you very much. |
perhaps something like this would work? rgx = re.compile("(\w[\w']*\w|\w)")
s = "John's mom went there, but he wasn't there. So she said: 'Where are you!!' 'A a'"
rgx.findall(s) If this makes sense, I can update the basic split_words() function to do something like this. |
Hello Team!
I am new to the Project and I have a question.
I use python 3.7 and run into problem with this test program:
With pyspellchecker ver 0.5.4 the printout is:
So free standing 't' and 's' are not marked as errors neither are contractions.
If I change the phrase to:
and use pyspellchecker ver 0.5.6 the printout is:
So contractions are marked as mistakes again.
(I read barrust comment on Oct 22, 2019}
Please, assist.
The text was updated successfully, but these errors were encountered: