Add a new rule - stem_separator_regex #187

HarikalarKutusu · 2023-06-24T20:55:47Z

This will mainly be useful for blacklisted proper names with suffixes. If you blacklist the stem word (e.g. a person's name) it should be enough.

If specified, the code splits words at the given characters to reach the stem words to check them again against the blacklist, e.g. prevents "Rust's" to pass if "Rust" is in the blacklist.

It is a simple regex of separators. For example, for apostrophes, you specify stem_separator_regex = "[']" in the rule file.

If you do not specify it, or set it to = "" or = "[]" it will not be triggered.

It works after the initial blacklist check is done and only checks stem words extracted with stem_separator_regex against the blacklist.

MichaelKohler · 2023-06-24T21:22:05Z

Thanks for the PR, I will have a look at it in the next few days.

MichaelKohler

Thanks for this PR, this is generally looking good to me, I just have some minor comments.

src/checker.rs

README.md

MichaelKohler

Thanks for the corrections, just two more things and a test failure. Then I think we can merge this PR :)

src/checker.rs

MichaelKohler

Thank you!

HarikalarKutusu · 2023-06-25T15:27:39Z

Actually, thank YOU! It took too much of your time but resolved a (for me) major issue.

HarikalarKutusu added 3 commits June 24, 2023 23:45

Add stem_separator_regex to readme

5862b09

Add stem_separator_regex check & tests

6c07bea

Add stem_separator_regex to rules

4e66f8c

MichaelKohler requested changes Jun 25, 2023

View reviewed changes

HarikalarKutusu added 2 commits June 25, 2023 16:49

Resolve comments in common-voice#187

b0dcec1

Refix maybe_stem_word & reference

66ffe0d

MichaelKohler requested changes Jun 25, 2023

View reviewed changes

src/checker.rs Outdated Show resolved Hide resolved

src/checker.rs Outdated Show resolved Hide resolved

src/checker.rs Outdated Show resolved Hide resolved

src/checker.rs Outdated Show resolved Hide resolved

Fix test and final touches.

5813c83

MichaelKohler approved these changes Jun 25, 2023

View reviewed changes

MichaelKohler merged commit fe47245 into common-voice:main Jun 25, 2023
5 checks passed

HarikalarKutusu deleted the feature/stem-blacklisting branch June 25, 2023 16:07

HarikalarKutusu mentioned this pull request Jun 28, 2023

Add Turkish support (2023-08 finalized) #185

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a new rule - stem_separator_regex #187

Add a new rule - stem_separator_regex #187

HarikalarKutusu commented Jun 24, 2023 •

edited

MichaelKohler commented Jun 24, 2023

MichaelKohler left a comment

MichaelKohler left a comment

MichaelKohler left a comment

HarikalarKutusu commented Jun 25, 2023

Add a new rule - stem_separator_regex #187

Add a new rule - stem_separator_regex #187

Conversation

HarikalarKutusu commented Jun 24, 2023 • edited

MichaelKohler commented Jun 24, 2023

MichaelKohler left a comment

Choose a reason for hiding this comment

MichaelKohler left a comment

Choose a reason for hiding this comment

MichaelKohler left a comment

Choose a reason for hiding this comment

HarikalarKutusu commented Jun 25, 2023

HarikalarKutusu commented Jun 24, 2023 •

edited