Muted Keywords #1144

parion · 2023-05-18T16:10:08Z

parion
May 18, 2023

As a Bluesky services developer, I would like keywords to be mutable from the timeline so my users can choose to hide posts from their feed if they have certain keywords.

Background

This issue is a server-side companion to the following issue on the bluesky repo. While this popular issue could be solved with a client side solution of parsing all feed record's texts for words, a server-side implementation would prove more efficient and dynamic.

Proposed Implementation

A new database table, mutedKeywords would be created along with two new app.bsky.graph lexicons, mutedKeyword and getMutedKeywords, the structures of which would look very similar to that of app.bsky.graph.block. Keywords can then be inserted, along with the user's DID and datetime into the table one at a time through a simple endpoint (with a deletion endpoint also in place). On retrieval of any feed, post's records would be checked against the keywords matched by the requester's DID, should they exist. Posts will be excluded should they include any keywords.

Some Tricky Situations

Hiding replies down the tree should be easy enough, but hiding parents may be tricky. How should parent data be returned if they have keywords? That's going to be one wonky looking tree.
What algorithm should be used for comparing keywords against record texts?
- Inclusion ("Melons are disgusting".includes("elon") ✅)?
- Split ("Delicious apples".split().contains("apple") ❌)?
- Something else?

Evalprime · 2023-05-18T17:17:14Z

Evalprime
May 18, 2023

On retrieval of any feed, post's records would be checked against the keywords matched by the requester's DID, should they exist.

how would this hold up to thousands of keywords? I ask this because it might be very slow to iterate through every single keyword

0 replies

parion · 2023-05-18T17:31:23Z

parion
May 18, 2023
Author

We could find a theoretical maximum number of keywords where performance isn't noticeably hindered when retrieving a user's complete timeline. We then cap the number of words a user is allowed to have to that maximum value.
Other services, like Twitter, also implement a cap, usually around 100.

0 replies

Evalprime · 2023-05-18T17:37:08Z

Evalprime
May 18, 2023

sounds good then, I would make a pr but I want to wait for the bsky devs to say something

0 replies

Matrix89 · 2023-05-18T18:01:21Z

Matrix89
May 18, 2023

Some thoughts

Should the lists be public and subscribeable, like mute lists?
Imo yes.

Should the lists support and kind of normalization? eg. "Word" == "word" this also includes special characters and characters which lookalike. This isn't required for an MVP, but would most certainly be nice to have.
In my teenage experience running a team speak server and some minecraft servers, unfortunately people can be really creative with trying to circumvent word block lists.

More of a implementation detail, but what if every post returned includes a blocked word?

Regarding performance I wouldn't care initially, from the high level this sounds like something that can be easily parallelised to reduce end user latency.

0 replies

parion · 2023-05-18T19:25:05Z

parion
May 18, 2023
Author

Great thoughts! Here are my responses

Should the lists be public and subscribeable, like mute lists?

It certainly wouldn't hurt for lists of keywords to be shared amongst users, but I'm not sure the added complexity necessitates it. At the end of the day, these are simple words, and adding/removing words from your list isn't as complex as muting undesired actors. However, I'm thinking this v1 can be focused on simple keyword lists. It shouldn't be too difficult to build off this and add functionailty down the line to make subscribeable keywords lists.

Should the lists support and kind of normalization?

I was thinking for now we'll have a lowercase-to-lowercase comparison ("WORD"=="word"). I agree that bad actors will always try to find a way around normalization filters, but I don't think it's worth the added time/compute power for now to come up with a list of possible leetspeak alternatives ("w0rd"=="word") unless we really start to notice abuse. Of course we can take this as complex as we want, like training an AI to perform Levenshtein distance on word comparisons. But I think you're gonna need another server and team for that 🤣

Though, something that comes to mind, should we check if a string is located within a string ("Melons are disgusting" -> "elon" ✅) or separate words by a set of delimiters ("Delicious apples" -> "apple" ❌)?

More of a implementation detail, but what if every post returned includes a blocked word?

It would just return an empty feed, like a situation where an actor doesn't have a single record.

0 replies

intrnl · 2023-09-20T15:58:36Z

intrnl
Sep 20, 2023

As for matching of the keyword itself, it might be best to just have it be similar to Mastodon's, here's roughly how it creates the matching for it:

const ESCAPE_RE = /[.*+?^${}()|[\]\\]/g;
const escape = (str: string) => {
	return str.replace(ESCAPE_RE, '\\$&');
};

const WORD_START_RE = /^[\p{M}\p{L}\p{N}\p{Pc}]/u;
const WORD_END_RE = /[\p{M}\p{L}\p{N}\p{Pc}]$/u;

export interface KeywordFilter {
  keyword: string;
  whole: boolean;
}

export const createRegexMatcher = (filters: KeywordFilter[]) => {
	let str = '';

	let pfx = '';
	let sfx = '';

	for (let i = 0, l = filters.length; i < l; i++) {
		const { keyword, whole } = matchers[i];

		str && (str += '|');

		if (whole) {
			pfx = WORD_START_RE.test(keyword) ? '\\b' : '';
			sfx = WORD_END_RE.test(keyword) ? '\\b' : '';

			str += pfx + escape(keyword) + sfx;
		} else {
			str += escape(keyword);
		}
	}

	return new RegExp(str, 'i');
};

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Muted Keywords #1144

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Muted Keywords #1144

parion May 18, 2023

Background

Proposed Implementation

Some Tricky Situations

Replies: 6 comments

Evalprime May 18, 2023

parion May 18, 2023 Author

Evalprime May 18, 2023

Matrix89 May 18, 2023

parion May 18, 2023 Author

intrnl Sep 20, 2023

parion
May 18, 2023

Evalprime
May 18, 2023

parion
May 18, 2023
Author

Evalprime
May 18, 2023

Matrix89
May 18, 2023

parion
May 18, 2023
Author

intrnl
Sep 20, 2023