Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "gypsy" to bad word list #34

Closed
wants to merge 1 commit into from
Closed

Conversation

asf-stripe
Copy link

As suggested in dariusk/corpora#255 (comment), this word might be worth including here too.

As suggested in dariusk/corpora#255 (comment), this word might be worth including here too.
@hugovk
Copy link
Contributor

hugovk commented Mar 16, 2017

Oh, I just realised that this runs on substrings, so the existing "gyp" will also exclude "gypsy". Sorry for the confusion, and thanks anyway!

Also note that due to the complexities of the English language, I am considering anything containing the substring of a bad word to be blacklisted. For example, even though "homogenous" is not a bad word, it contains the substring "homo" and it gets filtered. The reason for this is that new slang pops up all the time using compound words and I can't possibly keep up with it. I'm willing to lose a few words like "homogenous" and "Pakistan" in order to avoid false negatives.

@asf-stripe
Copy link
Author

Hah, I'd missed that. Thanks for clarifying - sounds like it's covered. I'll close this out, then (:

@asf-stripe asf-stripe closed this Mar 16, 2017
@asf-stripe asf-stripe deleted the patch-1 branch March 16, 2017 17:28
@dariusk
Copy link
Owner

dariusk commented Mar 16, 2017

I appreciate the pull request regardless!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants