-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tags and mute words #2968
Add tags and mute words #2968
Conversation
* Add bare minimum hashtags support As atproto/api already parses hashtags, this is as simple as hooking it up like link segments. This is "bare minimum" because: - Opening hashtag "#foo" is actually just a search for "foo" right now to work around #2491. - There is no integration in the composer. This hasn't stopped people from using hashtags already, and can be added later. - This change itself only had to hook things up - thank you for having already put the hashtag parsing in place. * Remove workaround for hash search not working now that it's fixed
|
||
if (mutedWordNoPunc === wordNoPunc) return true | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to enable support for phrases, I think what we can do is this:
- Instead of splitting the text into words, run your punctuation strip on it
- Convert the muteword into a regex like this (havent tested this regex):
new RegExp('[\s\t\n\r\f\,\.\;\!\?\_\-\"\(\)]' + stripPunctuation(muteWord) + '[\s\t\n\r\f\,\.\;\!\?\_\-\"\(\)]')
- Just run the muteword regexes against the punctuation-stripped post text
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's how Mastodon is more or less doing its matching if you need another perspective:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the link! Will have a look and compare 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work Eric. Really just into the tweaking stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything here looks solid. Test cases look good, giving it a spin for the new few days will probably turn up something but I say let's
Hot diggity dog |
Glad to see this PR merged🎉, but forgive me for commenting when it is already closed. Mute words appear to split the body text at the following locations,
However, this often fails to mute words in languages that are not separated by spaces, such as Japanese. social-app/src/lib/__tests__/moderatePost_wrapped.test.ts Lines 74 to 91 in e2ae534
There are testing mute of a single kanji character here, but we often mute Japanese characters on a word-by-word basis as well. For example, "ハッシュタグ"(means Hashtags) in "次のバージョンでハッシュタグが実装される!" It would be great if you could consider a non word-by-word word mute option! |
@dolciss no problem at all! Appreciate you bringing this to our attention. We've considered adding options for allowing users to select which type of matching they'd like to do, and we may continue that investigation. For now, we pushed a new change just now that checks for the post language, and if it's a language that doesn't use spaces — or uses spaces in ways other than separating words — then we check for the string directly instead of word-by-word. In English, this direct matching would return false-positives e.g. If you don't mind, may I ask: with this new change, do you happen to know if this type of partial-match issue will be a concern in Japanese as well? |
@estrattonbailey Thank you very much for your improve at #3018 !
Yes, partial-match problems are a concern in Japanese as well. (e.g. |
// rest of the checks are for `content` only | ||
if (!mute.targets.includes('content')) continue | ||
// single character, has to use includes | ||
if (mutedWord.length === 1 && postText.includes(mutedWord)) return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This checks the utf16 code-unit length, which may give an unexpected result for certain international scripts and/or emojis (maybe it's fine? I'm not sure I totally understand the purpose of this check)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good shout. Yeah the original intention was to handle international scripts with this. In response to the comments above I added a check a couple weeks ago for a few languages that commonly don't use spaces as well.
But what you've called out here is still an issue: trying to match a single character (potentially with length > 1
) in a language not part of our exceptions may fail unless we can split the text at the word boundary. Interesting...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a practical test-case if someone mutes "💩", then it it'll match when it's on its own, but not as part of a "word"
Muting an emoji is probably quite a common thing to want to do, and I think counting graphemes would give the expected behavior in most cases (which is normally an annoying thing to do, but I know the app already has grapheme counting logic in it elsewhere)
No description provided.