Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new rule - stem_separator_regex #187

Merged

Conversation

HarikalarKutusu
Copy link
Contributor

@HarikalarKutusu HarikalarKutusu commented Jun 24, 2023

This will mainly be useful for blacklisted proper names with suffixes. If you blacklist the stem word (e.g. a person's name) it should be enough.

If specified, the code splits words at the given characters to reach the stem words to check them again against the blacklist, e.g. prevents "Rust's" to pass if "Rust" is in the blacklist.

It is a simple regex of separators. For example, for apostrophes, you specify stem_separator_regex = "[']" in the rule file.

If you do not specify it, or set it to = "" or = "[]" it will not be triggered.

It works after the initial blacklist check is done and only checks stem words extracted with stem_separator_regex against the blacklist.

@MichaelKohler
Copy link
Member

Thanks for the PR, I will have a look at it in the next few days.

Copy link
Member

@MichaelKohler MichaelKohler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR, this is generally looking good to me, I just have some minor comments.

src/checker.rs Show resolved Hide resolved
src/checker.rs Outdated Show resolved Hide resolved
src/checker.rs Outdated Show resolved Hide resolved
src/checker.rs Outdated Show resolved Hide resolved
src/checker.rs Outdated Show resolved Hide resolved
src/checker.rs Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
Copy link
Member

@MichaelKohler MichaelKohler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the corrections, just two more things and a test failure. Then I think we can merge this PR :)

src/checker.rs Outdated Show resolved Hide resolved
src/checker.rs Outdated Show resolved Hide resolved
src/checker.rs Outdated Show resolved Hide resolved
src/checker.rs Outdated Show resolved Hide resolved
Copy link
Member

@MichaelKohler MichaelKohler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@MichaelKohler MichaelKohler merged commit fe47245 into common-voice:main Jun 25, 2023
5 checks passed
@HarikalarKutusu
Copy link
Contributor Author

Actually, thank YOU! It took too much of your time but resolved a (for me) major issue.

@HarikalarKutusu HarikalarKutusu deleted the feature/stem-blacklisting branch June 25, 2023 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants