New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a new rule - stem_separator_regex #187
Add a new rule - stem_separator_regex #187
Conversation
Thanks for the PR, I will have a look at it in the next few days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR, this is generally looking good to me, I just have some minor comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the corrections, just two more things and a test failure. Then I think we can merge this PR :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Actually, thank YOU! It took too much of your time but resolved a (for me) major issue. |
This will mainly be useful for blacklisted proper names with suffixes. If you blacklist the stem word (e.g. a person's name) it should be enough.
If specified, the code splits words at the given characters to reach the stem words to check them again against the blacklist, e.g. prevents "Rust's" to pass if "Rust" is in the blacklist.
It is a simple regex of separators. For example, for apostrophes, you specify
stem_separator_regex = "[']"
in the rule file.If you do not specify it, or set it to
= ""
or= "[]"
it will not be triggered.It works after the initial blacklist check is done and only checks stem words extracted with stem_separator_regex against the blacklist.