Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addition of remove_by_threshold #18

Closed
wants to merge 10 commits into from
Closed

Conversation

ksingha5
Copy link

@ksingha5 ksingha5 commented Oct 2, 2018

Provides a way for the user to remove words from the word frequency list under a user provided frequency threshold.

spell = SpellChecker()

"teh" in spell._word_frequency
#True

spell.word_frequency.remove_by_threshold(7)

"teh" in spell._word_frequency
#False

Let me know your thoughts!
Would help resolve #16

1. Included the wordnet and stopwords corpus from nltk and modals. This is later used to check known words. Using the WordFrequncy dictionary to validate words would end up passing words with errors. For example: "teh" passed as a correct word as it existed in the word frequency list but it should have been corrected to "the".
2. Modified def known to check the existence of words in the combined corpus of wordnet, stopwords and modals rather than the word frequency list.
Added 'shall' to the list of modals
Update fork to original
This request adds the remove_by_threshold functionality to help user remove words under a provided frequency threshold
@coveralls
Copy link

Coverage Status

Coverage decreased (-3.0%) to 95.42% when pulling 1fe0988 on ksingha5:master into 2bdb305 on barrust:master.

2 similar comments
@coveralls
Copy link

Coverage Status

Coverage decreased (-3.0%) to 95.42% when pulling 1fe0988 on ksingha5:master into 2bdb305 on barrust:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-3.0%) to 95.42% when pulling 1fe0988 on ksingha5:master into 2bdb305 on barrust:master.

@barrust
Copy link
Owner

barrust commented Oct 3, 2018

@ksingha5 Thank you for the PR! Unfortunately I have already merged a very similar approach to get this accomplished! It will be available in the next release, which hopefully will be by the end of the week. (Give myself a deadline!)

Thank you for your insights and suggestions! P

@barrust barrust closed this Oct 3, 2018
@ksingha5
Copy link
Author

ksingha5 commented Oct 3, 2018

Thank you @barrust !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Word Frequency Threshold
3 participants