-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow filtering moderation queue by language #2161
Conversation
@@ -1,9 +1,5 @@ | |||
lockfileVersion: '6.0' | |||
|
|||
settings: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why these changed 🤷🏽♂️
@@ -39,8 +39,8 @@ describe('moderation-statuses', () => { | |||
} | |||
const bobsPost = { | |||
$type: 'com.atproto.repo.strongRef', | |||
uri: sc.posts[sc.dids.bob][1].ref.uriStr, | |||
cid: sc.posts[sc.dids.bob][1].ref.cidStr, | |||
uri: sc.posts[sc.dids.bob][0].ref.uriStr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This post has klingon lang set, the previous one doesn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we will want to generalize this to an internal flag/tag system, and have flags like lang-ja
or lang:ja
to do queries. That will let us do faceting/filtering/queues on arbitrary metadata, and make it way easier to add that kind of metadata/feature without doing a database migration etc.
Timeline wise, it would be great to get flags in before mod service launch.
I know the per-language queues is a pretty urgent need, and i'm not against going ahead with this in the shape it is now to get it out this week, but if it isn't too much work to do flags instead, i'd lean that way.
Curious what @devinivy thinks.
export async function up(db: Kysely<unknown>): Promise<void> { | ||
await db.schema | ||
.alterTable('moderation_subject_status') | ||
.addColumn('langs', 'jsonb') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would the array datatype be easier here? I don't have a strong opinion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm... do you have anything written down about the flag approach? just trying to imagine the design for this. do we just have a flags
array column which stores arbitrary string flags? or do we need a more elaborate, many to many relationship set up to normalize the flags?
If we go with the former, it's straight forward of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already use jsonb
for structured arrays in a couple places (i.e. blob cids), I'd be inclined to stick with that just by convention.
(moving flags conversation out of thread) I was previously imagining that flags would basically work the same as labels, just be internal only. And we would want the ability to query/filter subjects by flag values. I think that this would mean a new mod event type for changing flags on a subject. Unlike labels, flags don't need signatures, negations, or as much metadata about createdAt, etc; that metadata would be captured in the event log. I think there would basically be a one-to-many table of subjects and flag values. I could be wrong but I assume we don't need to fully normalize flag values out as a table (many-to-many), just use the strings as values (with an index I guess). Maybe a string array column on the subject table itself would work? I'm not sure, one-to-many feels like the usual way to do it. I wouldn't be against caching createAt or other metadata about flags as well, just don't have any product/feature need for it right now, I think. Flags would get hydrated in to subject view responses. A possible downside is if this creates a lot of event churn. For the current task of adding language tags to newly-reported content, i'm not sure if we'd want the ozone backend to just add the language tag as part of the report event, or if there would be a report event and then a separate "flag" event. If there isn't a flag event, there doesn't end up being provenance info about when and why the flag was created; on the other hand maybe we don't really need that for system flags like language. A related precedent is doing the takedown labels for a takedown event: this results in a label which doesn't have an associated label-mod-event (IIRC). Probably fine, but gets a bit at an abstraction question of whether the mod event log captures every fine detail or whether it is mostly to capture human and external-bot (eg, automod) interventions. |
@bnewbold I'm a bit inclined to capture the events, and then later make a judgement call about how/whether to present them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quick review of modEventFlag
lexicon. left some notes, but generally this shape/direction looks correct!
"modEventFlag": { | ||
"type": "object", | ||
"description": "Add/Remove a flag on a subject", | ||
"required": ["add", "remove"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do the arrays need to be required, or can they be omitted if empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to stay in line with the labelling lexicons where we require both create
and negate
properties. I don't think we allow defining required fields as "one or the other" which is probably why this is kinda needed?
lexicons/com/atproto/admin/defs.json
Outdated
"remove": { | ||
"type": "array", | ||
"items": { "type": "string" }, | ||
"description": "Flags to be removed to the subject. Ignores a flag If doesn't exists, won't be duplicated." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Ignores a flag if it doesn't exists."
"langs": { | ||
"type": "array", | ||
"items": { "type": "string" } | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could make these flags
now, right? (maybe you were just focusing on the other lexicons for now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the api seems a bit nicer if we used specifically named params like this that ultimately searches against flags. but now realizing that it would make things difficult to add more filters so will move this to flags instead.
packages/api/src/client/lexicons.ts
Outdated
type: 'string', | ||
}, | ||
description: | ||
"Flags to be removed to the subject. Ignores a flag If doesn't exists, won't be duplicated.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as other description
lexicons/com/atproto/admin/defs.json
Outdated
@@ -587,6 +587,23 @@ | |||
} | |||
} | |||
}, | |||
"modEventFlag": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would definitely add optional comment
field; I think every event type should have comment
if (isModEventTag(event)) { | ||
if (event.add.length) meta.addedTags = event.add.join(' ') | ||
if (event.remove.length) meta.removedTags = event.remove.join(' ') | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason these are stuffed in meta
rather than having their own jsonb columns (i.e. in line with labels)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing that's a little funky here is that the fields addedTags
and removedTags
are not reflected in the model (i.e. in the types for db/tables/
). Even if they end-up in meta
, we have access to the full range of jsonb, so I'd be all for modeling them as an array rather than a space-separated strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no reason in particular other than trying to keep the column count low. we would need 2 columns if we want to go with the label approach and for most rows these columns will just be nulls.
@@ -745,6 +771,7 @@ export class ModerationService { | |||
nullsLast: true, | |||
}) | |||
|
|||
// console.log(paginatedBuilder.compile()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// console.log(paginatedBuilder.compile()) |
builder = builder.where( | ||
sql<string>`${ref( | ||
'moderation_subject_status.tags', | ||
)} @> ${JSON.stringify(tags)}`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I recall, I added a new jsonb()
db type helper that would fit here.
const { ref } = this.db.db.dynamic | ||
if (tags.length) { | ||
builder = builder.where( | ||
sql<string>`${ref( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wont evaluate to a string, but the type also doesn't need to be preserved here anyway so just nixing it.
sql<string>`${ref( | |
sql`${ref( |
createdBy, | ||
}) | ||
} catch (err) { | ||
console.error('Error getting record langs', err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use the logger here rather than console.error()
so that we can consume the log in a standard format.
@@ -163,6 +163,12 @@ export class ModerationViews { | |||
eventView.event.sticky = true | |||
} | |||
|
|||
if (event.action === 'com.atproto.admin.defs#modEventTag') { | |||
eventView.event.add = event.meta?.addedTags?.toString().split(' ') || [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A common issue with space-separated strings is that we need a little bit more logic than a split(' ')
. If the string is empty ''
then the split gives you ['']
, when you probably want []
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to dedicated jsonb columns as you recommended above so we won't have to worry about the string related edge cases.
@@ -39,6 +39,9 @@ Object { | |||
}, | |||
"subjectBlobCids": Array [], | |||
"subjectRepoHandle": "alice.test", | |||
"tags": Array [ | |||
"lang:unknown", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume we like the lang:unknown
as it's explicit, versus not having a lang tag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah and also, this helps reduce unnecessary calls to figure out lang
tag on every event.
since this tagger runs on every event for a subject unless the subject already has a lang
tag, failing to figure out the right tag once probably means future attempts will fail too so we will just avoid running it again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the 3-char code und
("Undetermined") is a pseudo-specified way to represent this.
thinking ahead to UI stuff... we may want to have a special/specific drop-down/conversion for language tags, different from other tags? But I think we can get to that whenever, doesn't need to be first iteration.
if (addedTags.length) { | ||
const tags = currentStatus?.tags || [] | ||
newStatus.tags = jsonb([ | ||
...new Set([...tags, ...addedTags]), | ||
]) as unknown as string[] | ||
subjectStatus.tags = newStatus.tags | ||
} | ||
if (removedTags.length) { | ||
const tags = currentStatus?.tags || [] | ||
newStatus.tags = jsonb( | ||
tags.filter((tag) => !removedTags.includes(tag)), | ||
) as unknown as string[] | ||
subjectStatus.tags = newStatus.tags | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that adding and removing a tag at the same time would respect the removal but not the addition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed and cleaned up the logic here. there's also a test for this now. however, one edge case here is if a request comes in with add: ['x'], remove: ['x']
. with the current implementation, we will remove x
and not add.
event: { | ||
$type: 'com.atproto.admin.defs#modEventTag', | ||
add: recordLangs | ||
? recordLangs.map((lang) => `lang:${lang}`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small thing, just an idea. But the language on a post could be highly specific, anything within BCP 47. I wonder if for the purposes here it would make sense to use only the primary language tag. E.g. en-US
would be boiled down to en
, not distinguishing between US english and british english en-GB
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes! I think this would be good to do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the the first fragment of the hyphenated codes is the generalized code? That's what I'm storing now but not sure if we would want the full code to be there too?
seems like this backend part has moved in a good direction. LGTM; i'll leave to dan/divy to review the actual implementation. |
Cross-posting from another issue: I think we need the ability to filter on "not-tag" in addition to tag. (not sure what the query param name should be for that) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
This PR adds language tracking to moderation subjects and allows
queryModerationStatuses
endpoint to filter the moderation queue by languages.For record subjects, the record is fetched from the appview and the language is retrieved from
langs
property.For repo subjects, the author's feed is fetched from the appview and all languages are collected from all posts in the first page of their timeline.