Allow filtering moderation queue by language #2161

foysalit · 2024-02-09T00:42:49Z

This PR adds language tracking to moderation subjects and allows queryModerationStatuses endpoint to filter the moderation queue by languages.

For record subjects, the record is fetched from the appview and the language is retrieved from langs property.
For repo subjects, the author's feed is fetched from the appview and all languages are collected from all posts in the first page of their timeline.

foysalit · 2024-02-09T01:46:13Z

pnpm-lock.yaml

@@ -1,9 +1,5 @@
 lockfileVersion: '6.0'

-settings:


not sure why these changed 🤷🏽‍♂️

foysalit · 2024-02-09T01:46:39Z

packages/ozone/tests/moderation-statuses.test.ts

@@ -39,8 +39,8 @@ describe('moderation-statuses', () => {
    }
    const bobsPost = {
      $type: 'com.atproto.repo.strongRef',
-      uri: sc.posts[sc.dids.bob][1].ref.uriStr,
-      cid: sc.posts[sc.dids.bob][1].ref.cidStr,
+      uri: sc.posts[sc.dids.bob][0].ref.uriStr,


This post has klingon lang set, the previous one doesn't.

bnewbold

I think we will want to generalize this to an internal flag/tag system, and have flags like lang-ja or lang:ja to do queries. That will let us do faceting/filtering/queues on arbitrary metadata, and make it way easier to add that kind of metadata/feature without doing a database migration etc.

Timeline wise, it would be great to get flags in before mod service launch.

I know the per-language queues is a pretty urgent need, and i'm not against going ahead with this in the shape it is now to get it out this week, but if it isn't too much work to do flags instead, i'd lean that way.

Curious what @devinivy thinks.

bnewbold · 2024-02-12T23:41:13Z

packages/ozone/src/db/migrations/20240208T213404429Z-add-lang-column-to-moderation-subject.ts

+export async function up(db: Kysely<unknown>): Promise<void> {
+  await db.schema
+    .alterTable('moderation_subject_status')
+    .addColumn('langs', 'jsonb')


would the array datatype be easier here? I don't have a strong opinion

hmm... do you have anything written down about the flag approach? just trying to imagine the design for this. do we just have a flags array column which stores arbitrary string flags? or do we need a more elaborate, many to many relationship set up to normalize the flags?
If we go with the former, it's straight forward of course.

We already use jsonb for structured arrays in a couple places (i.e. blob cids), I'd be inclined to stick with that just by convention.

bnewbold · 2024-02-13T17:25:53Z

(moving flags conversation out of thread)

I was previously imagining that flags would basically work the same as labels, just be internal only. And we would want the ability to query/filter subjects by flag values.

I think that this would mean a new mod event type for changing flags on a subject. Unlike labels, flags don't need signatures, negations, or as much metadata about createdAt, etc; that metadata would be captured in the event log. I think there would basically be a one-to-many table of subjects and flag values. I could be wrong but I assume we don't need to fully normalize flag values out as a table (many-to-many), just use the strings as values (with an index I guess). Maybe a string array column on the subject table itself would work? I'm not sure, one-to-many feels like the usual way to do it. I wouldn't be against caching createAt or other metadata about flags as well, just don't have any product/feature need for it right now, I think.

Flags would get hydrated in to subject view responses.

A possible downside is if this creates a lot of event churn. For the current task of adding language tags to newly-reported content, i'm not sure if we'd want the ozone backend to just add the language tag as part of the report event, or if there would be a report event and then a separate "flag" event. If there isn't a flag event, there doesn't end up being provenance info about when and why the flag was created; on the other hand maybe we don't really need that for system flags like language. A related precedent is doing the takedown labels for a takedown event: this results in a label which doesn't have an associated label-mod-event (IIRC). Probably fine, but gets a bit at an abstraction question of whether the mod event log captures every fine detail or whether it is mostly to capture human and external-bot (eg, automod) interventions.

devinivy · 2024-02-13T17:36:23Z

@bnewbold I'm a bit inclined to capture the events, and then later make a judgement call about how/whether to present them.

bnewbold

quick review of modEventFlag lexicon. left some notes, but generally this shape/direction looks correct!

bnewbold · 2024-02-15T02:33:57Z

lexicons/com/atproto/admin/defs.json

+    "modEventFlag": {
+      "type": "object",
+      "description": "Add/Remove a flag on a subject",
+      "required": ["add", "remove"],


do the arrays need to be required, or can they be omitted if empty?

I was trying to stay in line with the labelling lexicons where we require both create and negate properties. I don't think we allow defining required fields as "one or the other" which is probably why this is kinda needed?

bnewbold · 2024-02-15T02:34:30Z

lexicons/com/atproto/admin/defs.json

+        "remove": {
+          "type": "array",
+          "items": { "type": "string" },
+          "description": "Flags to be removed to the subject. Ignores a flag If doesn't exists, won't be duplicated."


"Ignores a flag if it doesn't exists."

bnewbold · 2024-02-15T02:36:36Z

lexicons/com/atproto/admin/queryModerationStatuses.json

+          "langs": {
+            "type": "array",
+            "items": { "type": "string" }
+          },


I think we could make these flags now, right? (maybe you were just focusing on the other lexicons for now)

I thought the api seems a bit nicer if we used specifically named params like this that ultimately searches against flags. but now realizing that it would make things difficult to add more filters so will move this to flags instead.

bnewbold · 2024-02-15T02:36:54Z

packages/api/src/client/lexicons.ts

+              type: 'string',
+            },
+            description:
+              "Flags to be removed to the subject. Ignores a flag If doesn't exists, won't be duplicated.",


same as other description

bnewbold · 2024-02-15T02:38:46Z

lexicons/com/atproto/admin/defs.json

@@ -587,6 +587,23 @@
        }
      }
    },
+    "modEventFlag": {


would definitely add optional comment field; I think every event type should have comment

devinivy · 2024-02-15T14:27:00Z

packages/ozone/src/mod-service/index.ts

+    if (isModEventTag(event)) {
+      if (event.add.length) meta.addedTags = event.add.join(' ')
+      if (event.remove.length) meta.removedTags = event.remove.join(' ')
+    }


Is there a reason these are stuffed in meta rather than having their own jsonb columns (i.e. in line with labels)?

Another thing that's a little funky here is that the fields addedTags and removedTags are not reflected in the model (i.e. in the types for db/tables/). Even if they end-up in meta, we have access to the full range of jsonb, so I'd be all for modeling them as an array rather than a space-separated strings.

no reason in particular other than trying to keep the column count low. we would need 2 columns if we want to go with the label approach and for most rows these columns will just be nulls.

devinivy · 2024-02-15T14:27:16Z

packages/ozone/src/mod-service/index.ts

@@ -745,6 +771,7 @@ export class ModerationService {
      nullsLast: true,
    })

+    // console.log(paginatedBuilder.compile())


Suggested change

// console.log(paginatedBuilder.compile())

devinivy · 2024-02-15T14:29:48Z

packages/ozone/src/mod-service/index.ts

+      builder = builder.where(
+        sql<string>`${ref(
+          'moderation_subject_status.tags',
+        )} @> ${JSON.stringify(tags)}`,


If I recall, I added a new jsonb() db type helper that would fit here.

devinivy · 2024-02-15T14:30:45Z

packages/ozone/src/mod-service/index.ts

-    const { ref } = this.db.db.dynamic
+    if (tags.length) {
+      builder = builder.where(
+        sql<string>`${ref(


This wont evaluate to a string, but the type also doesn't need to be preserved here anyway so just nixing it.

Suggested change

sql<string>`${ref(

sql`${ref(

devinivy · 2024-02-15T14:33:02Z

packages/ozone/src/mod-service/lang.ts

+          createdBy,
+        })
+      } catch (err) {
+        console.error('Error getting record langs', err)


We should use the logger here rather than console.error() so that we can consume the log in a standard format.

devinivy · 2024-02-15T14:42:03Z

packages/ozone/src/mod-service/views.ts

@@ -163,6 +163,12 @@ export class ModerationViews {
      eventView.event.sticky = true
    }

+    if (event.action === 'com.atproto.admin.defs#modEventTag') {
+      eventView.event.add = event.meta?.addedTags?.toString().split(' ') || []


A common issue with space-separated strings is that we need a little bit more logic than a split(' '). If the string is empty '' then the split gives you [''], when you probably want [].

moved to dedicated jsonb columns as you recommended above so we won't have to worry about the string related edge cases.

devinivy · 2024-02-15T14:44:02Z

packages/ozone/tests/__snapshots__/get-record.test.ts.snap

@@ -39,6 +39,9 @@ Object {
      },
      "subjectBlobCids": Array [],
      "subjectRepoHandle": "alice.test",
+      "tags": Array [
+        "lang:unknown",


I assume we like the lang:unknown as it's explicit, versus not having a lang tag?

yeah and also, this helps reduce unnecessary calls to figure out lang tag on every event.
since this tagger runs on every event for a subject unless the subject already has a lang tag, failing to figure out the right tag once probably means future attempts will fail too so we will just avoid running it again.

the 3-char code und ("Undetermined") is a pseudo-specified way to represent this.

thinking ahead to UI stuff... we may want to have a special/specific drop-down/conversion for language tags, different from other tags? But I think we can get to that whenever, doesn't need to be first iteration.

devinivy · 2024-02-15T14:47:22Z

packages/ozone/src/mod-service/status.ts

+    if (addedTags.length) {
+      const tags = currentStatus?.tags || []
+      newStatus.tags = jsonb([
+        ...new Set([...tags, ...addedTags]),
+      ]) as unknown as string[]
+      subjectStatus.tags = newStatus.tags
+    }
+    if (removedTags.length) {
+      const tags = currentStatus?.tags || []
+      newStatus.tags = jsonb(
+        tags.filter((tag) => !removedTags.includes(tag)),
+      ) as unknown as string[]
+      subjectStatus.tags = newStatus.tags
+    }


It seems that adding and removing a tag at the same time would respect the removal but not the addition.

fixed and cleaned up the logic here. there's also a test for this now. however, one edge case here is if a request comes in with add: ['x'], remove: ['x']. with the current implementation, we will remove x and not add.

devinivy · 2024-02-15T14:52:50Z

packages/ozone/src/mod-service/lang.ts

+          event: {
+            $type: 'com.atproto.admin.defs#modEventTag',
+            add: recordLangs
+              ? recordLangs.map((lang) => `lang:${lang}`)


Small thing, just an idea. But the language on a post could be highly specific, anything within BCP 47. I wonder if for the purposes here it would make sense to use only the primary language tag. E.g. en-US would be boiled down to en, not distinguishing between US english and british english en-GB.

yes! I think this would be good to do

It looks like the the first fragment of the hyphenated codes is the generalized code? That's what I'm storing now but not sure if we would want the full code to be there too?

bnewbold · 2024-02-17T00:13:03Z

seems like this backend part has moved in a good direction. LGTM; i'll leave to dan/divy to review the actual implementation.

bnewbold · 2024-02-17T00:21:34Z

Cross-posting from another issue: I think we need the ability to filter on "not-tag" in addition to tag. (not sure what the query param name should be for that)

foysalit · 2024-02-17T15:38:15Z

Fixed up some minor issues and added a way to retrieve subjects that do not have certain tags using a excludeTags filter param. @devinivy / @bnewbold waiting on your final ✅ on this.

devinivy

Nice!

foysalit added 4 commits February 9, 2024 01:00

✨ Store languages on subjects from record data

ffdd0ff

✨ Fetch lang from author feed for repo subject

21f7a05

✨ Regenerate lex

df4d758

✨ Add lang to profile subject

ffbeac8

foysalit mentioned this pull request Feb 9, 2024

❇️ Language filter on moderation queue bluesky-social/ozone#25

Merged

foysalit requested review from dholms and devinivy February 9, 2024 01:45

foysalit commented Feb 9, 2024

View reviewed changes

pnpm-lock.yaml

@@ -1,9 +1,5 @@

lockfileVersion: '6.0'

settings:

Copy link

Contributor Author

foysalit Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why these changed 🤷🏽‍♂️

foysalit commented Feb 9, 2024

View reviewed changes

foysalit requested a review from bnewbold February 12, 2024 23:34

bnewbold reviewed Feb 12, 2024

View reviewed changes

✨ Move lang to flag

2efb5a9

bnewbold reviewed Feb 15, 2024

View reviewed changes

foysalit added 6 commits February 15, 2024 11:03

✨ Add comment to flag event

66f0205

✅ Update pds test snapshot

a27b5c1

🐛 Fix broken import

69b0886

✨ Rename flag to tag

f1ca136

✅ Update snapshot for event by id

a563dd6

✅ Bring back skipped test

12d0ca5

devinivy reviewed Feb 15, 2024

View reviewed changes

foysalit added 5 commits February 16, 2024 11:08

✨ Move tags to dedicated columns

2e72590

✅ Update test snapshots, use logger

9bb23c8

✨ Change lang:unknown to lang:und

df65f8e

✅ Update test snapshots

59bb6b9

🧹 Cleanup

ed1d2a5

foysalit added 4 commits February 17, 2024 15:21

✨ Add filter params for tags on queryModerationEvents endpoint

9fac99f

✨ Add tags property to subject status model

c40706b

✅ Update test code

b6fdb60

✨ Add filter to get subjects that do not have certain tags

8b83e82

devinivy approved these changes Feb 19, 2024

View reviewed changes

devinivy merged commit d815732 into main Feb 19, 2024
10 checks passed

devinivy deleted the mod-subject-language-filter branch February 19, 2024 15:35

Allow filtering moderation queue by language #2161

Allow filtering moderation queue by language #2161

Conversation

foysalit commented Feb 9, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bnewbold left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bnewbold commented Feb 13, 2024

devinivy commented Feb 13, 2024

bnewbold left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

devinivy Feb 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bnewbold commented Feb 17, 2024

bnewbold commented Feb 17, 2024

foysalit commented Feb 17, 2024

devinivy left a comment

Choose a reason for hiding this comment

devinivy Feb 15, 2024 •

edited

Loading